How it works · End-to-end pipeline

From export to evidence report in four steps.

TextTimeline is a retrieval-first system. Indexing happens once, up front, in a few minutes. Searching is instant. The deliverable is a $99 PDF + CSV that an attorney can hand to a judge.

  1. 1. Export your text messages

    Use any export tool you trust — iMazing or Decipher for iPhone, SMS Backup & Restore for Android, OurFamilyWizard or TalkingParents for co-parenting platforms, AT&T or Verizon portals for carrier exports. We accept XML, CSV, and PDF. Don't worry about cleaning the file — the parser handles MMS attachments, group threads, and weird carrier metadata.

  2. 2. Upload or paste a cloud link

    Direct upload up to 500 MB, or paste a Google Drive / Dropbox / iCloud share link for larger exports. The raw file is fetched, processed, and deleted. We never persist the original. What we keep is a derived, encrypted index — and that index is auto-deleted on a 90-day rolling window unless you opt into Matter Storage.

  3. 3. Search like a litigator

    Hybrid retrieval runs two systems in parallel: a semantic retriever over Vertex AI text-embedding-005 vectors, and a lexical BM25 retriever over enriched tokens. Their results are fused with reciprocal rank fusion. You can search in plain language ("threats about taking the kids," "hidden bank accounts after the filing," "post-no-contact messages"), or pick a pre-built category (custody, financial, threat language, alienation, coercive control, substance abuse). Every match shows the message in context — five before, five after — so you can read the conversation, not just the hit.

  4. 4. Build the evidence report

    Select up to 50 messages that prove your theory of the case. Generate a PDF + CSV report for $99. Every finding cites the source message ID, sender, recipient, and timestamp. The executive summary is LLM-synthesized but grounded — every claim ties to a specific cited message. Chain-of-custody fields (source filename, file checksum, processing log) are included so the report can be authenticated under FRE 901 without further fishing.

What's actually under the hood

TextTimeline is built on a deliberately small stack. The backend is FastAPI on Cloud Run with FAISS and BM25 in-process (no external vector database — that's a cost and complexity tax for a 30k-message corpus). Embeddings are Vertex AI text-embedding-005. The query agent and synthesis pass run on Gemini 2.5 Flash, used only for query expansion (turning "threats about taking the kids" into exemplar messages) and for grounded synthesis on the messages you've already selected — never for full-corpus classification. State lives in Firestore; derived artifacts live encrypted in Cloud Storage with TTL auto-delete.

The result is a system that's instant to use, cheap to operate, and easy to reason about under FRE 901. The architecture is documented in docs/DECISIONS/0004; the retrieval evaluation harness is open in the repo too — every category ships with a measured precision-at-K so we can tell you what "good" looks like before you trust the output.

Frequently asked questions

What text-export formats do you accept?

Android SMS Backup & Restore XML, iMazing CSV/PDF, OurFamilyWizard PDF, TalkingParents CSV, and most carrier-portal CSV exports. If you have a format we don't recognize, send a sample to evan@texttimeline.com and we'll add a parser.

How long does indexing take?

Most matters of a few thousand messages finish in 1–3 minutes. A multi-year corpus of 30k+ messages takes 5–10 minutes. You don't have to sit there — when it's ready, the search workspace becomes interactive and (when email is wired) we'll notify you.

What happens to the raw export?

The raw file is fetched, parsed, and deleted. We never persist the original export. The artifact we keep is a derived index: parsed messages, enrichment metadata, semantic embeddings, and a BM25 lexical index. Original message bodies are stored alongside the index because the report has to cite them — but the original file as you uploaded it is gone after processing.

What does 'hybrid search' actually mean?

Two retrievers run in parallel: a semantic retriever (FAISS over Vertex AI text-embedding-005 vectors) and a lexical retriever (BM25 over enriched tokens). Their result lists are fused with reciprocal rank fusion. That gets you both fuzzy/conceptual matches ('threats about taking the kids' even when no message uses that exact phrase) and crisp keyword hits ('child support' as a literal string).

Can it find things you don't search for explicitly?

Yes. Each message is enriched at index time with sentiment, profanity, threat-language, financial-term, custody-term, alienation, coercive-control, substance-abuse, and legal-term flags. The category quick-picks run pre-built queries against those flags so you can surface 'all coercive-control messages' or 'all substance-abuse references' without writing a query yourself.

How accurate is the synthesis in the report?

Synthesis runs LLM-side only over the messages you've already selected for the report — it never reads the full corpus and never invents content. Every claim in the executive summary is grounded in cited message IDs. If we can't ground a claim, we don't make it.

See it on a real corpus

The live demo runs against a fictitious 80-message custody dispute. It's the same pipeline that would run on your own export — just with a public, fully fabricated dataset.