This is a free tool for browsing publicly available records. It does not make accusations or draw conclusions about any individual. Appearance in these records does not imply guilt or wrongdoing. Many individuals appear simply as witnesses, attorneys, or incidental contacts.
The Full Picture
The DOJ identified approximately 6 million pages of potentially responsive records in the Epstein Task Force Archive. Of those:
~2.77 million pages text-extracted
OCR and text extraction performed by rhowardstone (full text corpus). This is what we index.
~730K pages are non-text content
Images, videos, and other media files released by the DOJ but not extractable as structured text.
~2.5 million pages unreleased
Still held by the DOJ with no announced release date. Nobody has access to these documents.
What We Have
Live data currently searchable and browsable in this app.
People & Entities
LiveNamed individuals, organizations, locations, properties, aircraft, and shell companies. Includes bios, categories, aliases, role descriptions, DOJ document mention counts, and relationship connections.
Documents
LiveMetadata stubs for DOJ-released documents across Datasets 1-12. Each record includes EFTA number, dataset, page count, document type, associated persons, and a direct link to the DOJ PDF. Full text is available locally but not yet served in-app.
Emails
LiveEmail threads and individual messages from two sources: House Oversight Committee releases (parsed by notesbymuneeb) and DOJ communications database (41,924 emails with body text joined from full text corpus).
Flights
LiveFlight records for Epstein-associated aircraft including the Boeing 727 ('Lolita Express') and Gulfstream jets. Includes routes, dates, tail numbers, pilots, and passenger manifests.
Black Book
LiveDigitized version of Epstein's personal address book. Includes names, phone numbers (general/work/home/mobile), addresses, companies, email addresses, and page references to the original document.
Network Graph
LiveInteractive force-directed graph built from typed relationships (traveled_with, employed_by, victim_of, paid_by, etc.) and NER co-occurrence edges (shared DOJ document appearances). Edges come from multiple sources and are deduplicated.
What We Don't Have (and Why)
Known data gaps, limitations, and features that are planned but not yet available.
Unreleased DOJ Pages (~2.5M pages)
Not publicly available~2.5 million pages
The DOJ identified approximately 6 million pages of potentially responsive records. Only ~3.5 million pages have been released across 12 datasets. The remaining ~2.5 million pages have not been disclosed and no release timeline has been announced.
Image & Video Files (~730K files)
Available externally~730,000 files
DOJ Datasets 1-12 contain approximately 730,000 image and video files (JPG, PNG, MP4, etc.) that are non-textual content. These are available as raw files from the DOJ but cannot be meaningfully indexed as structured data. The Native Files Catalog lists 3,071 videos, 137 audio files, and 15 spreadsheets.
House Oversight Committee Documents (~53K pages)
Available externally~53,000 pages
The House Oversight Committee released documents related to DOJ/BOP handling of Epstein's custody. These exist as raw PDFs but have not been OCR'd or structured into a searchable database by any known project.
DOJ Raw PDFs (218 GB)
Available externally218 GB
The original DOJ release consists of 218 GB of raw PDF files across 12 dataset ZIP archives. We index the extracted text and metadata but do not host the raw PDFs. Users can download them directly from the DOJ.
Full-Text Search (In-App)
Coming soon6.3 GB / 1.38M documents
We have the full text corpus downloaded locally (1,385,879 documents, 2,770,167 pages, 3.3 GB of extracted text) but have not yet built an in-app search interface for it. The data is ready - the feature is coming.
AI Chat with RAG
Available externallyFull corpus
AI-powered chat using Bedrock Claude + Knowledge Bases. Searches across 2.5M+ court documents and 1.78M emails and returns answers with source citations. Available now at /chat.
Audio/Video Transcripts (In-App)
Coming soon1,628 transcriptions
We have 1,628 audio and video transcriptions from the DOJ releases (190,000+ words) downloaded locally but not yet served through the app. The structured data exists and will be added.
External Resources
Other places to explore Epstein-related documents online. These sites may contain data we don't currently support because it isn't structured, or because it covers different document sets.
DOJ Epstein Document Portal
Official U.S. Department of Justice page with all 12 dataset releases. Direct download of raw PDFs (218 GB total). The primary source for all released documents.
Google Pinpoint - Epstein Documents
Google's document analysis tool with OCR'd, searchable versions of the DOJ releases. Supports full-text search, entity extraction, and AI-assisted analysis of scanned documents.
jmail.world
Web interface for browsing the Epstein email corpus. Provides a searchable email client-style view of communications extracted from DOJ releases.
Epstein Exposed
Comprehensive database with structured person profiles, flight records, and relationship data. Provides API access to 1,500+ categorized individuals and 1,700+ flight records.
Epstein Investigation
Research database with entity profiles, relationship mapping, and typed connections between individuals, organizations, and locations.
Epstein's Black Book
Searchable, digitized version of Epstein's personal contact book with names, phone numbers, addresses, and company affiliations.
Sifter Labs - Epstein Document Viewer
AI-powered document analysis platform with entity recognition and document classification applied to the Epstein DOJ releases.
House Oversight Committee - Epstein
Documents released by the House Oversight Committee related to the DOJ and Bureau of Prisons handling of Jeffrey Epstein. Separate from the DOJ EFTA releases.
Internet Archive - Flight Logs
Archived copies of Epstein flight manifests, including the original handwritten logs entered into evidence during legal proceedings.
rhowardstone/Epstein-research-data
Open-source GitHub repository with structured SQLite databases: knowledge graph (524 entities, 2,096 relationships), full text corpus (1.38M docs), communications (41K emails), transcripts (1.6K), and OCR results (39K). Our primary structured data source.
notesbymuneeb/epstein-emails (HuggingFace)
HuggingFace dataset containing 5,082 parsed email threads with 16,447 individual messages extracted from House Oversight Committee email releases.
Data Source Attribution
Every data source that feeds into this explorer.
| Source | Feeds Into | Records | License |
|---|---|---|---|
| rhowardstone - knowledge_graph.db | People, Network Graph | 524 entities, 2,096 relationships | Open source (GitHub) |
| rhowardstone - persons_registry.json | People | 1,614 person records | Open source (GitHub) |
| rhowardstone - full_text_corpus.db | Documents, Emails (body text) | 1.38M documents / 2.77M pages | Open source (GitHub) |
| rhowardstone - communications.db | Emails | 41,924 parsed emails | Open source (GitHub) |
| rhowardstone - epstein_lite.db (LMSBAND NER) | People (NER names, co-occurrence edges) | 88K entities, 110K co-occurrences | Open source (GitHub) |
| notesbymuneeb/epstein-emails | Emails | 5,082 threads / 16,447 messages | Open dataset (HuggingFace) |
| DOJ EFTA - Datasets 1-12 | Documents (metadata + PDF links) | ~2.7M pages across 12 releases | Public domain (U.S. gov) |
| Epstein Exposed API | People (bios, categories), Flights | 1,516 persons, 1,708 flights | Public API |
| epsteininvestigation.org | People (roles, relationships) | 137 entities, 330 relationships | Public dataset |
| epsteinsblackbook.com | Black Book | 2,327 contacts | Public court records |
About This Project
Epstein Files Explorer is a free project built to make publicly released records more accessible and searchable. We aggregate, deduplicate, and structure data from multiple community research projects and government sources into a single browsable interface.