Epstein Files Jan 30, 2026

Data hoarders on reddit have been hard at work archiving the latest Epstein Files release from the U.S. Department of Justice. Below is a compilation of their work with download links.

Please seed all torrent files to distribute and preserve this data.

Ref: https://old.reddit.com/r/DataHoarder/comments/1qrk3qk/epstein_files_datasets_9_10_11_300_gb_lets_keep/

Epstein Files Data Sets 1-8: INTERNET ARCHIVE LINK

Epstein Files Data Set 1 (2.47 GB): TORRENT MAGNET LINK
Epstein Files Data Set 2 (631.6 MB): TORRENT MAGNET LINK
Epstein Files Data Set 3 (599.4 MB): TORRENT MAGNET LINK
Epstein Files Data Set 4 (358.4 MB): TORRENT MAGNET LINK
Epstein Files Data Set 5: (61.5 MB) TORRENT MAGNET LINK
Epstein Files Data Set 6 (53.0 MB): TORRENT MAGNET LINK
Epstein Files Data Set 7 (98.2 MB): TORRENT MAGNET LINK
Epstein Files Data Set 8 (10.67 GB): TORRENT MAGNET LINK


Epstein Files Data Set 9 (Incomplete). Only contains 49 GB of 180 GB. Multiple reports of cutoff from DOJ server at offset 48995762176.

ORIGINAL JUSTICE DEPARTMENT LINK

SHA1: 6ae129b76fddbba0776d4a5430e71494245b04c4

/u/susadmin’s More Complete Data Set 9 (96.25 GB)
De-duplicated merger of (45.63 GB + 86.74 GB) versions

Unverified version incomplete at ~101 GB.


Epstein Files Data Set 10 (78.64GB)

ORIGINAL JUSTICE DEPARTMENT LINK

SHA256: 7D6935B1C63FF2F6BCABDD024EBC2A770F90C43B0D57B646FA7CBD4C0ABCF846 MD5: B8A72424AE812FD21D225195812B2502


Epstein Files Data Set 11 (25.55GB)

ORIGINAL JUSTICE DEPARTMENT LINK

SHA1: 574950c0f86765e897268834ac6ef38b370cad2a


Epstein Files Data Set 12 (114.1 MB)

ORIGINAL JUSTICE DEPARTMENT LINK

SHA1: 20f804ab55687c957fd249cd0d417d5fe7438281
MD5: b1206186332bb1af021e86d68468f9fe
SHA256: b5314b7efca98e25d8b35e4b7fac3ebb3ca2e6cfd0937aa2300ca8b71543bbe2


This list will be edited as more data becomes available, particularly with regard to Data Set 9.

  • susadmin@lemmy.world
    link
    fedilink
    arrow-up
    32
    ·
    edit-2
    5 hours ago

    I’m in the process of downloading both dataset 9 torrents (45.63 GB + 86.74 GB). I will then compare the filenames in both versions (the 45.63GB version has 201,358 files alone), note any duplicates, and merge all unique files into one folder. I’ll upload that as a torrent once it’s done so we can get closer to a complete dataset 9 as one file.

    • Edit 31Jan2026 816pm EST - Making progress. I finished downloading both dataset 9s (45.6 GB and the 86.74 GB). The 45.6GB set is 200,000 files and the 86GB set is 500,000 files. I have a .csv of the filenames and sizes of all files in the 45.6GB version. I’m creating the same .csv for the 86GB version now.

    • Edit 31Jan2026 845pm EST -

      • dataset 9 (45.63 GB) = 201357 files
      • dataset 9 (86.74 GB) = 531257 files

      I did an exact filename combined with an exact file size comparison between the two dataset9 versions. I also did an exact filename combined with a fuzzy file size comparison (tolerance of +/- 1KB) between the two dataset9 versions. There were:

      • 201330 exact matches
      • 201330 fuzzy matches (+/- 1KB)

      Meaning there are 201330 duplicate files between the two dataset9 versions.

      These matches were written to a duplicates file. Then, from each dataset9 version, all files/sizes matching the file and size listed in the duplicates file will be moved to a subfolder. Then I’ll merge both parent folders into one enormous folder containing all unique files and a folder of duplicates. Finally, compress it, make a torrent, and upload it.


    • Edit 31Jan2026 945pm EST -

      Still moving duplicates into subfolders.


    • Edit 31Jan2026 1027pm EST -

      Going off of xodoh74984’s comment (https://lemmy.world/post/42440468/21884588), I’m increasing the rigor of my determination of whether the files that share a filename and size between both version of dataset9 are in fact duplicates. This will be identical to rsync --checksum to verify bit-for-bit that the files are the same by calculating their MD5 hash. This will take a while but is the best way.


    • Edit 01Feb2026 1227am EST -

      Checksum comparison complete. 73 files found that have the same file name and size but different content. Total number of duplicate files = 201257. Merging both dataset versions now, while keeping one subfolder of the duplicates, so nothing is deleted.


    • Edit 01Feb2026 1258am EST -

      Creating the .tar.zst file now. 531285 total files, which includes all unique files between dataset9 (45.6GB) and dataset9 (86.7GB), as well as a subfolder containing the files that were found in both dataset9 versions.


    • Edit 01Feb2026 215am EST -

      I was using wayyyy to high a compression value for no reason (ztsd --ultra --22). Restarted the .tar.zst file creation (with ztsd -12) and it’s going 100x faster now. Should be finished within the hour


    • Edit 01Feb2026 311am EST -

      .tar.zst file creation is taking very long. I’m going to let it run overnight - will check back in a few hours. I’m tired boss.


    • EDIT 01Feb2026 831am EST -

    COMPLETE!

    And then I doxxed myself in the torrent. One moment please while I fix that…


    Final magnet link is HERE. GO GO GOOOOOO

    I’m seeding @ 55 MB/s. I’m also trying to get into the new r/EpsteinPublicDatasets subreddit to share the torrent there.

    • helpingidiot@lemmy.world
      link
      fedilink
      arrow-up
      5
      ·
      11 hours ago

      Have a good night. I’ll be waiting to download it, seed it, make hardcopies and redistribute it.

      Please check back in with us

    • Kindly_District9380@lemmy.world
      link
      fedilink
      arrow-up
      17
      ·
      edit-2
      11 hours ago

      Superb, I have 1-8, 11-12.

      Only remaining 10 (to complete - downloading from Archive.org now)

      Dataset 9 is the biggest. I ended up writing a parser to go through every page on justice.gov and make an index list.

      Current estimate of files list is:

      • ~1,022,500 files (50 files/page × 20,450 pages)
      • My scraped index so far: 528,586 files / 634,573 URLs
      • Currently downloading individual files: 24,371 files (29GB)
      • Download rate ~1 file/sec to avoid getting blocked = ~12 days continuous for full set

      Your merged 45GB + 86GB torrents (~500K-700K files) would be a huge help. Happy to cross-reference with my scraped URL list to find any gaps.


      UPDATE DATASET 9 Files List:

      Progress:

      • Scraped 529,334 file URLs from Justice .gov (pages 0-18333, ~89% of index)
      • Downloading individual files: 30K files / 41GB so far
      • Also grabbed the 86GB DataSet_9.tar.xz torrent (~500K files) - extracting now

      Uploaded my URL index to Archive.org - 529K file URLs in JSON format if anyone wants to help download the remaining files.

      link: https://archive.org/details/epstein-dataset9-index

      The link is live and shows the 75.7MB JSON file available for download.


      UPDATE Dataset Size Sanity Check:

      Dataset Report Generated: 2026-01-31T23:28:29.198691 Base Path: /mnt/epstein-doj-2026-01-30

      Summary

      Dataset Files Extracted ZIP Types
      DataSet_1 6,326 2.48 GB 1.23 GB .pdf, .opt, .dat
      DataSet_1_incomplete 3,158 1.24 GB N/A .pdf, .opt, .dat
      DataSet_2 577 631.66 MB 630.79 MB .pdf, .dat, .opt
      DataSet_3 69 598.51 MB 595.00 MB .pdf, .dat, .opt
      DataSet_4 154 358.43 MB 351.52 MB .pdf, .opt, .dat
      DataSet_5 122 61.60 MB 61.48 MB .pdf, .dat, .opt
      DataSet_6 15 53.02 MB 51.28 MB .pdf, .opt, .dat
      DataSet_7 19 98.29 MB 96.98 MB .pdf, .dat, .opt
      DataSet_8 11,042 10.68 GB 9.95 GB .pdf, .mp4, .xlsx
      DataSet_9_files 35,480 40.44 GB 45.63 GB .pdf, .mp4, .m4a
      DataSet_9_45GB_unique 28 84.18 MB N/A .pdf, .dat, .opt
      DataSet_9_extracted 531,256 94.51 GB N/A .pdf
      DataSet_9_45GB_extracted 201,357 47.45 GB N/A .pdf, .dat, .opt
      DataSet_10_extracted 504,030 81.15 GB 78.64 GB .pdf, .mp4, .mov
      DataSet_11 14,045 1.17 GB 25.56 GB .pdf
      DataSet_12 154 119.89 MB 114.09 MB .pdf, .dat, .opt
      TOTAL 1,307,832 281.07 GB 162.87 GB

      https://pastebin.com/zdHbsCwH

      here is a little script that can generate the above report if you have your dir something like this:

       # Minimum working example:
        my_directory/
        ├── DataSet_1/
        │   └── (any files)
        ├── DataSet_2/
        │   └── (any files)
        └── DataSet 2.zip  (optional - will be matched)
      
      • kongstrong@lemmy.world
        link
        fedilink
        arrow-up
        4
        ·
        edit-2
        2 hours ago

        Would love to help still from my PC on dataset 9 specifically. Any way we can exchange progress so I won’t start with downloading files you already have downloaded?

        E: just started scraping starting from page 18330 (as you mentioned you ended around 18333), hoping I can fill in the remaining 4000-ish pages

        Update 2 (1715UTC): just finished scraping up until the page 20500 limit you set in the code. There are 0 new files in the range between 18330-20500 compared to the ones you already found. So unless I did something wrong, either your list is complete or the DOJ has been scrambling their shit (considering the large number of duplicate pages, I’m going with the second explanation).

        Either way, I’m gonna extract the 48GB and 100GB torrent directories now and try to mark down which of the files already exist within those torrents, so we can make an (intermediate) list of which files are still missing from them

    • thetrekkersparky@startrek.website
      link
      fedilink
      arrow-up
      8
      ·
      18 hours ago

      I’m downloading 8-11 now, I’m seeding 1-7+12 now. I’ve tried checking up on reddit, but every other time i check in the post is nuked or something. My home server never goes down and I’m outside USA. I’m working on the 100GB+ #9 right now and I’ll seed whatever you can get up here too.

    • epstein_files_guy@lemmy.world
      link
      fedilink
      arrow-up
      8
      ·
      18 hours ago

      looking forward to your torrent, will seed.

      I have several incomplete sets of files from dataset 9 that I downloaded with a scraped set of urls - should I try to get them to you to compare as well?

      • susadmin@lemmy.world
        link
        fedilink
        arrow-up
        5
        ·
        18 hours ago

        Yes! I’m not sure the best way to do that - upload them to MEGA and message me a download link?

        • epstein_files_guy@lemmy.world
          link
          fedilink
          arrow-up
          6
          ·
          18 hours ago

          maybe archive.org? that way they can be torrented if others want to attempt their own merging techniques? either way it will be a long upload, my speed is not especially good. I’m still churning through one set of urls that is 1.2M lines, most are failing but I have 65k from that batch so far.

            • epstein_files_guy@lemmy.world
              link
              fedilink
              arrow-up
              5
              ·
              edit-2
              1 hour ago

              I’ll get the first set (42k files in 31G) uploading as soon as I get it zipped up. it’s the one least likely to have any new files in it since I started at the beginning like others but it’s worth a shot

              edit 01FEB2026 1208AM EST - 6.4/30gb uploaded to archive.org

              edit 01FEB2026 0430AM EST - 13/30gb uploaded to archive.org; scrape using a different url set going backwards is currently at 75.4k files

              edit 01FEB2026 1233PM EST - had an internet outage overnight and lost all progress on the archive.org upload, currently back to 11/30gb. the scrape using a previous url set seems to be getting very few new files now, sitting at 77.9k at the moment

    • xodoh74984@lemmy.worldOP
      link
      fedilink
      arrow-up
      3
      ·
      edit-2
      11 hours ago

      When merging versions of Data Set 9, is there any risk of loss with simply using rsync --checksum to dump all files into one directory?

      • susadmin@lemmy.world
        link
        fedilink
        arrow-up
        5
        ·
        16 hours ago

        rsync --checksum is better than my file name + file size comparison, since you are calculating the hash of each file and comparing it to the hash all other files. For example, if there is a file called data1.pdf with size 1024 bytes in dataset9-v1, and another file called data1.pdf with size 1024 bytes in dataset9-v2, but their content is different, my method will still detect them as identical files.

        I’m going to modify my script to calculate and compare the hashes of all files that I previously determined to be duplicates. If the hashes of the duplicates in dataset9 (45GB torrent) match the hashes of the duplicates in dataset9 (86GB torrent), then they are in fact duplicates between the two datasets.

        • xodoh74984@lemmy.worldOP
          link
          fedilink
          arrow-up
          2
          ·
          edit-2
          10 hours ago

          Amazing, thank you. That was my thought, check hashes while merging the files to keep any copies that might have been modified by DOJ and discard duplicates even if the duplicates have different metadata, e.g. timestamps.

    • ModernSimian@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      14 hours ago

      Be prepared to wait a while… idk why this person chose xz, it is so slow. I’ve been just trying to get the tarball out for an hour.