More on this book
Community
Kindle Notes & Highlights
by
Parmy Olson
Read between
January 26 - February 13, 2025
About 60 percent of the text that was used to train GPT-3, for instance, came from a dataset called Common Crawl. This is a free, massive, and regularly updated database that researchers use to collect raw web page data and text from billions of web pages.