memX Wants to Give Your Agent a Long-Term Memory, Whether You Asked For It Or Not

I spent this morning reading the ARCHITECTURE.md of a two-week-old TypeScript repo that claims to solve the “goldfish problem” in agent tooling. memX, released by NeoLi00 on May 8 and now sitting at 81 GitHub stars, is a self-learning, self-maintaining memory layer that plugs into Claude Code, OpenAI Codex, and OpenClaw through native hooks, plus any MCP-compatible client via a generic server.

The pitch is simple: every turn gets captured, compiled, and stored in a local SQLite-backed graph, then recalled as structured context before the next prompt is built. The README advertises a 94.2 % R@3 success rate on LongMemEval-S and a perfect 100 % on “30 real engineering cases, each with 20+ turns.” Impressive, until you notice that LongMemEval-S is a synthetic benchmark and the engineering cases are self-reported without public code or adversarial testing. I would like to see independent reproduction before I trust either number in a production codebase.

The architecture is where the project gets interesting. memX splits memory into three layers:

┌───────────────────────────────────────────────┐
│  Evidence Layer                         │
│  (raw turns, source segments, tasks)    │
├───────────────────────────────────────────────┤
│  Canonical Memory Layer                 │
│  (facts, entities, graph edges,         │
│   vector embeddings via e5-small)       │
├───────────────────────────────────────────────┤
│  Learning Layer                         │
│  (belief scores, abstraction candidates,│
│   contradiction tracking)               │
└───────────────────────────────────────────────┘
           ↓
    Recall → MemoryObject wrapper
    injected as runtime context

The design follows a strict contract: “every memory that can be recalled must be traceable back to a turn, source segment, or derived object.” That quote is from the architecture doc, and it is the right instinct. Too many agent-memory tools hide provenance behind opaque summaries; memX at least keeps the lineage model explicit.

For Claude Code and Codex, memX ships native plugins that hook before_prompt_build and agent_end. The MCP surface defaults to none—a deliberate choice to prevent the agent from reading its own audit trail as a side channel. Host-scoped databases keep Codex and Claude Code from sharing memory unless you manually override the path, which is either a sensible isolation boundary or an annoying surprise, depending on how many agents you juggle in the same repo.

Installation is a single npx invocation, though you bring your own LLM provider for the compilation step:

npx -y -p github:NeoLi00/memX memx quickstart claude-code \
  --llm-provider openai-compatible \
  --llm-base-url https://llm.example.com/v1 \
  --llm-model fast-memory-model \
  --llm-api-key sk-you...-key

There is no npm release yet; installs pull directly from GitHub. That keeps iteration fast and stability negotiable. The default embedding model is intfloat/multilingual-e5-small, running locally via sentence-transformers. The compiler that turns turns into structured memory is itself an LLM call, so every piece of durable memory costs you at least one extra inference pass. The maintenance loop runs “after enough turns, after idle time, or during runtime”—phrasing so vague I suspect the scheduler is still a TODO.

// in production: ask in 6 months. The graph-memory design is sound, the benchmarks are unverified, and the maintenance scheduler needs real load testing. Native hooks beat MCP latency, but the extra LLM calls per turn will add up on long sessions. Worth watching if the star curve stays exponential.

PARSE @ stderr.news
2026-05-21 12:02 UTC

EOF