Performance

A memory query does three things: embed the query, check the cache, and run hybrid search. RememberOS keeps all three fast and — importantly — keeps embeddings on-box, so a search never makes a network round-trip to an embedding API.

The query path#

Measured latency#

Per-stage p50/p95 measured live on the production box (a single 4 GB Hetzner instance — deliberately modest hardware), from GET /v1/memory/admin/metrics:

Stagep50p95What it is
Embed (query)~10 ms~17 mson-box ONNX MiniLM — no network
Cache get~0.2 ms~2 msRedis lookup
Search~2.4 ms~32 mspgvector + full-text, RRF-fused
These are real numbers from a live run, not a benchmark rig — your mileage varies with collection size, hardware, and whether you switch embeddings to OpenAI (which adds a network round trip but frees the box). The live values for your own deployment are always on the admin analytics page and the stages field of /admin/metrics.

Notes#