Nest expanded the data by adding an even broader scrape of links shared on Reddit as well as a scrape of English-language Wikipedia and a mysterious dataset called Books2, details of which OpenAI has never disclosed, but which two people with knowledge of the dataset told me contained published books ripped from Library Genesis, an online shadow repository of torrented books and scholarly articles. In 2023, the Authors Guild and seventeen authors, including George R. R. Martin and Jodi Picoult, would sue OpenAI and Microsoft alleging mass copyright infringement. OpenAI would respond in March
...more

