AI companies are searching for more data to use for training (one estimate suggests that high-quality data, like online books and academic articles, will be exhausted by 2026), and continue to use lower-quality data as well. There is also active research into understanding whether AI can pretrain on its own content.