• 0 Posts
  • 3 Comments
Joined 21 hours ago
cake
Cake day: January 31st, 2026

help-circle


  • Superb, I have 1-8, 11-12.

    Only remaining 10 (to complete - downloading from Archive.org now)

    Dataset 9 is the biggest. I ended up writing a parser to go through every page on justice.gov and make an index list.

    Current estimate of files list is:

    • ~1,022,500 files (50 files/page × 20,450 pages)
    • My scraped index so far: 528,586 files / 634,573 URLs
    • Currently downloading individual files: 24,371 files (29GB)
    • Download rate ~1 file/sec to avoid getting blocked = ~12 days continuous for full set

    Your merged 45GB + 86GB torrents (~500K-700K files) would be a huge help. Happy to cross-reference with my scraped URL list to find any gaps.


    UPDATE DATASET 9 Files List:

    Progress:

    • Scraped 529,334 file URLs from Justice .gov (pages 0-18333, ~89% of index)
    • Downloading individual files: 30K files / 41GB so far
    • Also grabbed the 86GB DataSet_9.tar.xz torrent (~500K files) - extracting now

    Uploaded my URL index to Archive.org - 529K file URLs in JSON format if anyone wants to help download the remaining files.

    link: https://archive.org/details/epstein-dataset9-index

    The link is live and shows the 75.7MB JSON file available for download.


    UPDATE Dataset Size Sanity Check:

    Dataset Report Generated: 2026-01-31T23:28:29.198691 Base Path: /mnt/epstein-doj-2026-01-30

    Summary

    Dataset Files Extracted ZIP Types
    DataSet_1 6,326 2.48 GB 1.23 GB .pdf, .opt, .dat
    DataSet_1_incomplete 3,158 1.24 GB N/A .pdf, .opt, .dat
    DataSet_2 577 631.66 MB 630.79 MB .pdf, .dat, .opt
    DataSet_3 69 598.51 MB 595.00 MB .pdf, .dat, .opt
    DataSet_4 154 358.43 MB 351.52 MB .pdf, .opt, .dat
    DataSet_5 122 61.60 MB 61.48 MB .pdf, .dat, .opt
    DataSet_6 15 53.02 MB 51.28 MB .pdf, .opt, .dat
    DataSet_7 19 98.29 MB 96.98 MB .pdf, .dat, .opt
    DataSet_8 11,042 10.68 GB 9.95 GB .pdf, .mp4, .xlsx
    DataSet_9_files 35,480 40.44 GB 45.63 GB .pdf, .mp4, .m4a
    DataSet_9_45GB_unique 28 84.18 MB N/A .pdf, .dat, .opt
    DataSet_9_extracted 531,256 94.51 GB N/A .pdf
    DataSet_9_45GB_extracted 201,357 47.45 GB N/A .pdf, .dat, .opt
    DataSet_10_extracted 504,030 81.15 GB 78.64 GB .pdf, .mp4, .mov
    DataSet_11 14,045 1.17 GB 25.56 GB .pdf
    DataSet_12 154 119.89 MB 114.09 MB .pdf, .dat, .opt
    TOTAL 1,307,832 281.07 GB 162.87 GB

    https://pastebin.com/zdHbsCwH

    here is a little script that can generate the above report if you have your dir something like this:

     # Minimum working example:
      my_directory/
      ├── DataSet_1/
      │   └── (any files)
      ├── DataSet_2/
      │   └── (any files)
      └── DataSet 2.zip  (optional - will be matched)