Content Types
Drop anything; RememberOS extracts the text, makes it searchable, and keeps the original downloadable.
| Type | What happens |
|---|---|
| Plain text / Markdown / CSV | indexed directly; long documents are chunked into multiple searchable memories |
| Audio (mp3, wav, m4a, ogg…) | transcribed with Whisper; the transcript becomes searchable memory, the file stays playable in the Vault |
| Video (mp4, mov, webm…) | audio track transcribed; original playable |
| PDF, docx, pptx, xlsx | text extracted per format, chunked, embedded |
| Images (png, jpg, webp…) | captioned with a vision model so they're semantically searchable; rendered inline in the Vault |
| JSON / structured rows | via the dlt destination or the API: one row → one memory, text column embedded, the rest queryable metadata; the Vault pretty-prints JSON bodies |
| Anything else | stored and downloadable, indexed by filename |
Limits & behaviour#
- 10 MB per file (oversize files are reported, not silently dropped).
- Originals live in object storage (yours, with BYO storage) and are served via short-lived presigned URLs.
- Big batches upload per-file, directly to storage (presigned PUT) — one unreadable file fails only itself and is named in the result.
- Audio/video transcription and image captioning use the platform OpenAI key by default — these are the only content types whose bytes touch a third party; plain text and embeddings stay on-box.