Name: Epstein Files Explorer Dataset
Creator: Nobler Works
License: https://www.usa.gov/government-copyright

This is a free tool for browsing publicly available records. It does not make accusations or draw conclusions about any individual. Appearance in these records does not imply guilt or wrongdoing. Many individuals appear simply as witnesses, attorneys, or incidental contacts.

The Full Picture

The DOJ identified approximately 6 million pages of potentially responsive records in the Epstein Task Force Archive. Of those:

~2.77 million pages text-extracted

OCR and text extraction performed by rhowardstone (full text corpus). This is what we index.

~730K pages are non-text content

Images, videos, and other media files released by the DOJ but not extractable as structured text.

~2.5 million pages unreleased

Still held by the DOJ with no announced release date. Nobody has access to these documents.

2.77M text-extracted730K non-text~2.5M unreleased

What We Have

Live data currently searchable and browsable in this app.

People & Entities

Live

Named individuals, organizations, locations, properties, aircraft, and shell companies. Includes bios, categories, aliases, role descriptions, DOJ document mention counts, and relationship connections.

knowledge_graph.dbpersons_registry.jsonEpstein Exposed APIepsteininvestigation.orgLMSBAND NER (epstein_lite.db)DOJ communications email participants

Documents

Live

Metadata stubs for DOJ-released documents across Datasets 1-12. Each record includes EFTA number, dataset, page count, document type, associated persons, and a direct link to the DOJ PDF. Full text is available locally but not yet served in-app.

full_text_corpus.dbefta_dataset_mapping.json

Emails

Live

Email threads and individual messages from two sources: House Oversight Committee releases (parsed by notesbymuneeb) and DOJ communications database (41,924 emails with body text joined from full text corpus).

epstein_email_threads.parquet (HuggingFace)communications.db (rhowardstone)

Flights

Live

Flight records for Epstein-associated aircraft including the Boeing 727 ('Lolita Express') and Gulfstream jets. Includes routes, dates, tail numbers, pilots, and passenger manifests.

Epstein Exposed APIepstein_email_threads.parquet (flight logs)

Black Book

Live

Digitized version of Epstein's personal address book. Includes names, phone numbers (general/work/home/mobile), addresses, companies, email addresses, and page references to the original document.

epsteinsblackbook.com

Network Graph

Live

Interactive force-directed graph built from typed relationships (traveled_with, employed_by, victim_of, paid_by, etc.) and NER co-occurrence edges (shared DOJ document appearances). Edges come from multiple sources and are deduplicated.

knowledge_graph.dbepsteininvestigation.orgLMSBAND co-occurrence edges

What We Don't Have (and Why)

Known data gaps, limitations, and features that are planned but not yet available.

Unreleased DOJ Pages (~2.5M pages)

Not publicly available

~2.5 million pages

The DOJ identified approximately 6 million pages of potentially responsive records. Only ~3.5 million pages have been released across 12 datasets. The remaining ~2.5 million pages have not been disclosed and no release timeline has been announced.

Image & Video Files (~730K files)

Available externally

~730,000 files

DOJ Datasets 1-12 contain approximately 730,000 image and video files (JPG, PNG, MP4, etc.) that are non-textual content. These are available as raw files from the DOJ but cannot be meaningfully indexed as structured data. The Native Files Catalog lists 3,071 videos, 137 audio files, and 15 spreadsheets.

DOJ Epstein Document Portal

House Oversight Committee Documents (~53K pages)

Available externally

~53,000 pages

The House Oversight Committee released documents related to DOJ/BOP handling of Epstein's custody. These exist as raw PDFs but have not been OCR'd or structured into a searchable database by any known project.

House Oversight Epstein Investigation

DOJ Raw PDFs (218 GB)

Available externally

218 GB

The original DOJ release consists of 218 GB of raw PDF files across 12 dataset ZIP archives. We index the extracted text and metadata but do not host the raw PDFs. Users can download them directly from the DOJ.

DOJ Dataset Downloads (justice.gov)Google Pinpoint (searchable scans)

Full-Text Search (In-App)

Coming soon

6.3 GB / 1.38M documents

We have the full text corpus downloaded locally (1,385,879 documents, 2,770,167 pages, 3.3 GB of extracted text) but have not yet built an in-app search interface for it. The data is ready - the feature is coming.

AI Chat with RAG

Available externally

Full corpus

AI-powered chat using Bedrock Claude + Knowledge Bases. Searches across 2.5M+ court documents and 1.78M emails and returns answers with source citations. Available now at /chat.

Try AI Chat

Audio/Video Transcripts (In-App)

Coming soon

1,628 transcriptions

We have 1,628 audio and video transcriptions from the DOJ releases (190,000+ words) downloaded locally but not yet served through the app. The structured data exists and will be added.

Not publicly available - nobody has itComing soon - data exists, feature pendingAvailable elsewhere - links provided

External Resources

Other places to explore Epstein-related documents online. These sites may contain data we don't currently support because it isn't structured, or because it covers different document sets.

DOJ Epstein Document Portal

Official U.S. Department of Justice page with all 12 dataset releases. Direct download of raw PDFs (218 GB total). The primary source for all released documents.

Google Pinpoint - Epstein Documents

Google's document analysis tool with OCR'd, searchable versions of the DOJ releases. Supports full-text search, entity extraction, and AI-assisted analysis of scanned documents.

jmail.world

Web interface for browsing the Epstein email corpus. Provides a searchable email client-style view of communications extracted from DOJ releases.

Epstein Exposed

Comprehensive database with structured person profiles, flight records, and relationship data. Provides API access to 1,500+ categorized individuals and 1,700+ flight records.

Epstein Investigation

Research database with entity profiles, relationship mapping, and typed connections between individuals, organizations, and locations.

Epstein's Black Book

Searchable, digitized version of Epstein's personal contact book with names, phone numbers, addresses, and company affiliations.

Sifter Labs - Epstein Document Viewer

AI-powered document analysis platform with entity recognition and document classification applied to the Epstein DOJ releases.

House Oversight Committee - Epstein

Documents released by the House Oversight Committee related to the DOJ and Bureau of Prisons handling of Jeffrey Epstein. Separate from the DOJ EFTA releases.

Internet Archive - Flight Logs

Archived copies of Epstein flight manifests, including the original handwritten logs entered into evidence during legal proceedings.

rhowardstone/Epstein-research-data

Open-source GitHub repository with structured SQLite databases: knowledge graph (524 entities, 2,096 relationships), full text corpus (1.38M docs), communications (41K emails), transcripts (1.6K), and OCR results (39K). Our primary structured data source.

notesbymuneeb/epstein-emails (HuggingFace)

HuggingFace dataset containing 5,082 parsed email threads with 16,447 individual messages extracted from House Oversight Committee email releases.

Data Source Attribution

Every data source that feeds into this explorer.

Source	Feeds Into	Records	License
rhowardstone - knowledge_graph.db	People, Network Graph	524 entities, 2,096 relationships	Open source (GitHub)
rhowardstone - persons_registry.json	People	1,614 person records	Open source (GitHub)
rhowardstone - full_text_corpus.db	Documents, Emails (body text)	1.38M documents / 2.77M pages	Open source (GitHub)
rhowardstone - communications.db	Emails	41,924 parsed emails	Open source (GitHub)
rhowardstone - epstein_lite.db (LMSBAND NER)	People (NER names, co-occurrence edges)	88K entities, 110K co-occurrences	Open source (GitHub)
notesbymuneeb/epstein-emails	Emails	5,082 threads / 16,447 messages	Open dataset (HuggingFace)
DOJ EFTA - Datasets 1-12	Documents (metadata + PDF links)	~2.7M pages across 12 releases	Public domain (U.S. gov)
Epstein Exposed API	People (bios, categories), Flights	1,516 persons, 1,708 flights	Public API
epsteininvestigation.org	People (roles, relationships)	137 entities, 330 relationships	Public dataset
epsteinsblackbook.com	Black Book	2,327 contacts	Public court records

About This Project

Epstein Files Explorer is a free project built to make publicly released records more accessible and searchable. We aggregate, deduplicate, and structure data from multiple community research projects and government sources into a single browsable interface.

Built by Nobler Works