• 0 Posts
  • 4 Comments
Joined 2 hours ago
cake
Cake day: February 1st, 2026

help-circle



  • ADD MY DISCORD FOR MORE DISCUSSION: redbarinternet

    Dataset 9 is cooked:

    Anyone got discord? I have been scraping the website collecting all the links. There is not even 1million links available. I have a full dashboard for it:

    Theres no where close to 3.5 million files even in the full link collection. I have scraped all possible links too.

    Current streak is how many page in a row on dataset 9 that it scraped before finding a new page. My threshold for stopping is set at 4000 duplicate pages in a row.

    Why?

    Yes this means what you think. Each streak represents at least 5 separate times where there were greater than 400 duplicate pages before new data was found. These are unique instances too so that means at one point you would have went through 436 pages before a new one, then another was 816 pages before a new one, and so on.

    Total counts based on links available that were scraped at the time my database tracked them: This indicates that we have the potential to download ~900k files out of the current document range that we should have 1 to 2,731,783