Search
One endpoint per collection, plus a tenant-wide variant. Hybrid by default; filter as narrowly as you like.
Basic#
hits = mem.search("what theme does Alex like?", collection="prefs",
limit=5, min_score=0.3, mode="hybrid")
const hits = await mem.search("what theme does Alex like?",
{ collection: "prefs", limit: 5, minScore: 0.3, mode: "hybrid" });
curl -s https://rememberos.ai/v1/memory/collections/prefs/search \
-H "Authorization: Bearer $LONGMEM_API_KEY" -H "Content-Type: application/json" \
-d '{"query": "what theme does Alex like?", "limit": 5, "mode": "hybrid"}'
Modes#
| Mode | How | Use when |
|---|---|---|
hybrid (default) | vector + full-text fused with Reciprocal Rank Fusion (k=60) | almost always |
vector | pure embedding similarity | paraphrase-heavy queries, no keyword overlap |
text | Postgres full-text rank | exact terms, ids, names; no embed cost |
The two arms are fused symmetrically: a strong keyword-only match — a rare token,
an error code, an id, an unusual name — surfaces in hybrid mode even when it
falls below min_score for vector similarity. min_score gates the
vector arm only; it never hides an exact lexical hit.
Filters#
min_score(default 0.3) — drop weak vector matches. Gates the vector arm only (not exact keyword hits).category— exact match on the memory's category.metadata_filter— JSONB containment, e.g.{"source": "runbook"}.container_tag— scope to a sub-namespace (see Container Tags).
Boosters: rewrite & rerank (optional)#
{"query": "…", "rewrite": true, "rerank": true}
rewrite— one LLM call expands/disambiguates the query before retrieval (synonyms, resolved references). Better recall on vague queries.rerank— retrieves 3x the limit, then one LLM call re-orders by meaning and trims. Better precision on subtle queries.
Both are off by default and each adds an LLM round-trip (~300–800 ms), billed as usage purposesrewrite/rerank. They use your BYOK extraction endpoint if configured. Failures degrade gracefully to the plain search.
Across all collections#
POST /v1/memory/search
{"query": "…", "container_tag": "user-alice"}
Each result carries a collection field so you know where it came from.
Current-truth semantics#
Search excludes superseded (is_latest = false) and expired memories — you
get what's true now, automatically.
The result cache#
Exact-repeat queries are served from Redis in ~1 ms. The cache is keyed by a
per-collection version that every write bumps, so results are never stale: any
add/update/delete in the collection invalidates instantly. The response carries
"cached": true when it was a cache hit.