Skip to content

PersonaMem

PersonaMem is a preference-tracking benchmark where the system must reason about why a user changed their mind across sessions. We run it in mode=raw (no consolidation) to isolate the retrieval layer.

Hebb Mind on PersonaMem

Hebb Mind configScoreSource
v0.1.1 raw, judge = Kimi-K2.567.6% QA acc (37q, 3 scenarios)eval/reports/personamem/v1/run-1/personamem.md

Per-category, strongest on track_full_preference_evolution (88.9%), weakest on recall_user_shared_facts (40.0%) — i.e. the system tracks change better than it remembers individual facts. The same verbatim-preservation + prod-mirror ingestion lever that takes LoCoMo R@10 to 94.14% (95.75% with rerank) should help recall on the weakest category here too; a full PersonaMem rerun is on the roadmap.

Per-competitor comparisons

PersonaMem is recent enough that we have not found published numbers from mem0, Letta, MemPalace, or Zep on it. This section will populate as comparisons appear; open a PR if you have a public number to add.

SystemScoreSource
mem0TBD
LettaTBD
MemPalaceTBD
ZepTBD

Released under the MIT License.