Epstein Files Jan 30, 2026 Release - Archived from Justice.gov

xodoh74984@lemmy.world · edit-2 5 hours ago

Epstein Files Jan 30, 2026 Release - Archived from Justice.gov

Kindly_District9380@lemmy.world · edit-2 13 hours ago

Superb, I have 1-8, 11-12.

Only remaining 10 (to complete - downloading from Archive.org now)

Dataset 9 is the biggest. I ended up writing a parser to go through every page on justice.gov and make an index list.

Current estimate of files list is:

~1,022,500 files (50 files/page × 20,450 pages)
My scraped index so far: 528,586 files / 634,573 URLs
Currently downloading individual files: 24,371 files (29GB)
Download rate ~1 file/sec to avoid getting blocked = ~12 days continuous for full set

Your merged 45GB + 86GB torrents (~500K-700K files) would be a huge help. Happy to cross-reference with my scraped URL list to find any gaps.

UPDATE DATASET 9 Files List:

Progress:

Scraped 529,334 file URLs from Justice .gov (pages 0-18333, ~89% of index)
Downloading individual files: 30K files / 41GB so far
Also grabbed the 86GB DataSet_9.tar.xz torrent (~500K files) - extracting now

Uploaded my URL index to Archive.org - 529K file URLs in JSON format if anyone wants to help download the remaining files.

link: https://archive.org/details/epstein-dataset9-index

The link is live and shows the 75.7MB JSON file available for download.

UPDATE Dataset Size Sanity Check:

Dataset Report Generated: 2026-01-31T23:28:29.198691 Base Path: /mnt/epstein-doj-2026-01-30

Summary

Dataset	Files	Extracted	ZIP	Types
DataSet_1	6,326	2.48 GB	1.23 GB	.pdf, .opt, .dat
DataSet_1_incomplete	3,158	1.24 GB	N/A	.pdf, .opt, .dat
DataSet_2	577	631.66 MB	630.79 MB	.pdf, .dat, .opt
DataSet_3	69	598.51 MB	595.00 MB	.pdf, .dat, .opt
DataSet_4	154	358.43 MB	351.52 MB	.pdf, .opt, .dat
DataSet_5	122	61.60 MB	61.48 MB	.pdf, .dat, .opt
DataSet_6	15	53.02 MB	51.28 MB	.pdf, .opt, .dat
DataSet_7	19	98.29 MB	96.98 MB	.pdf, .dat, .opt
DataSet_8	11,042	10.68 GB	9.95 GB	.pdf, .mp4, .xlsx
DataSet_9_files	35,480	40.44 GB	45.63 GB	.pdf, .mp4, .m4a
DataSet_9_45GB_unique	28	84.18 MB	N/A	.pdf, .dat, .opt
DataSet_9_extracted	531,256	94.51 GB	N/A	.pdf
DataSet_9_45GB_extracted	201,357	47.45 GB	N/A	.pdf, .dat, .opt
DataSet_10_extracted	504,030	81.15 GB	78.64 GB	.pdf, .mp4, .mov
DataSet_11	14,045	1.17 GB	25.56 GB	.pdf
DataSet_12	154	119.89 MB	114.09 MB	.pdf, .dat, .opt
TOTAL	1,307,832	281.07 GB	162.87 GB

https://pastebin.com/zdHbsCwH

here is a little script that can generate the above report if you have your dir something like this:

 # Minimum working example:
  my_directory/
  ├── DataSet_1/
  │   └── (any files)
  ├── DataSet_2/
  │   └── (any files)
  └── DataSet 2.zip  (optional - will be matched)

kongstrong@lemmy.world · edit-2 3 hours ago

Would love to help still from my PC on dataset 9 specifically. Any way we can exchange progress so I won’t start with downloading files you already have downloaded?

E: just started scraping starting from page 18330 (as you mentioned you ended around 18333), hoping I can fill in the remaining 4000-ish pages

Update 2 (1715UTC): just finished scraping up until the page 20500 limit you set in the code. There are 0 new files in the range between 18330-20500 compared to the ones you already found. So unless I did something wrong, either your list is complete or the DOJ has been scrambling their shit (considering the large number of duplicate pages, I’m going with the second explanation).

Either way, I’m gonna extract the 48GB and 100GB torrent directories now and try to mark down which of the files already exist within those torrents, so we can make an (intermediate) list of which files are still missing from them