Matt’s Kindle Notes & Highlights

Supremacy: AI, ChatGPT, and the Race that Will Change the World, by Parmy Olson

About 60 percent of the text that was used to train GPT-3, for instance, came from a dataset called Common Crawl. This is a free, massive, and regularly updated database that researchers use to collect raw web page data and text from billions of web pages.