How RememberOS Works
One Postgres, three layers: ingest anything, index it two ways (embeddings + a fact graph), retrieve only what's relevant — fast.
The pipeline#
- Ingest. Text arrives via
add/remember, files via the drop endpoints, rows via connectors (SharePoint, dlt, email). Files are stored in object storage; their text is extracted (Whisper for audio/video, parsers for PDF/Office, vision captions for images) and chunked. - Index. Every memory is embedded into pgvector
(by default with an on-box ONNX model — no text leaves the server for embeddings) and
indexed for full-text search.
rememberadditionally runs graph extraction: an LLM turns prose into atomic typed facts and links them to what you already know. - Retrieve. Search fuses vector similarity and full-text rank (Reciprocal Rank Fusion, k=60) and filters to the current truth — superseded and expired facts are excluded by default.
What runs where#
| Step | Where | Notes |
|---|---|---|
| Embeddings | on-box (ONNX) or OpenAI | configurable; on-box is the privacy default |
| Fact extraction | background worker | remember returns 202 instantly; the worker calls the LLM (platform key, your BYOK key, or a local model) |
| File processing | background worker | big drops never block a request |
| Search | Postgres + pgvector | one database — relational, vector, and full-text together |
Speed: the result cache#
Exact-repeat searches are served from a Redis cache keyed by a per-collection version — a write to the collection invalidates instantly, so the cache is never stale by more than one write. A repeated query costs ~1 ms instead of an embed + vector scan.
Design choice: no graph database, no separate vector store. The "graph" is a relational edges table; pgvector handles similarity. One boring, operable Postgres — that's a feature.