How RememberOS Works

One Postgres, three layers: ingest anything, index it two ways (embeddings + a fact graph), retrieve only what's relevant — fast.

The pipeline#

Ingest. Text arrives via add/remember, files via the drop endpoints, rows via connectors (SharePoint, dlt, email). Files are stored in object storage; their text is extracted (Whisper for audio/video, parsers for PDF/Office, vision captions for images) and chunked.
Index. Every memory is embedded into pgvector (by default with an on-box ONNX model — no text leaves the server for embeddings) and indexed for full-text search. remember additionally runs graph extraction: an LLM turns prose into atomic typed facts and links them to what you already know.
Retrieve. Search fuses vector similarity and full-text rank (Reciprocal Rank Fusion, k=60) and filters to the current truth — superseded and expired facts are excluded by default.

What runs where#

Step	Where	Notes
Embeddings	on-box (ONNX) or OpenAI	configurable; on-box is the privacy default
Fact extraction	background worker	`remember` returns 202 instantly; the worker calls the LLM (platform key, your BYOK key, or a local model)
File processing	background worker	big drops never block a request
Search	Postgres + pgvector	one database — relational, vector, and full-text together

Speed: the result cache#

Exact-repeat searches are served from a Redis cache keyed by a per-collection version — a write to the collection invalidates instantly, so the cache is never stale by more than one write. A repeated query costs ~1 ms instead of an embed + vector scan.

Design choice: no graph database, no separate vector store. The "graph" is a relational edges table; pgvector handles similarity. One boring, operable Postgres — that's a feature.

PreviousMigrate to RememberOS NextGraph Memory