AI Scraping

I recently came across this article, but I must warn you that it is a cryptic read.
https://www.axios.com/2025/06/19/ai-s...
This article examines the issue of AI scraping, which occurs when an automated system downloads all relevant information from a website. AI developers use this data to train their models and generate content. There are multiple issues with this practice, including copyright infringement and server slowdowns. But first, let’s rewind the clock.
Way back in 2022, machine learning, also known as AI, was not yet a familiar concept to everyday users or websites. What existed was search engines. Companies like Google sent out vast queries all over the internet to locate data that their users might be interested in. Thus, if you searched for “spinach recipes”, Google would sort through its database of collected information to produce a list of sites that had “spinach recipes.”
Other entities also automatically gather data, including database companies, governments, criminals, hackers, and bulk data collectors.
This torrent of searches created problems for website owners who did not want their data removed or had slow servers. So, they placed an invisible file within a website that tells search engines: “Please do not automatically take my data.” Legitimate companies, such as Google, respected this, but unscrupulous entities did not.
The following line of defense is the CAPTCHA. I am sure you’ve visited a website, and a box appears asking if you are human. Sometimes these are a puzzle, such as reading text with lines or identifying which pictures feature motorcycles. Why always motorcycles?
Today, most automated data capturing is done by AI companies. Some go to great lengths to sidestep every possible attempt to prevent their systems from scooping up every scrap of data. There are even dedicated companies that collect data for sale to smaller AI companies.
The above article discusses CloudFlare’s efforts to prevent automated systems from stealing its content. Why is this important? Let’s say I am a big spinach fan. Love the stuff! So, I spent hours creating recipes, collecting them, comparing the results, and taking pictures of my delicious creations. Then, I post my hard-earned info to a popular recipe site.
After an AI scrape, all that knowledge is suddenly merged into an AI model, enabling it to become an expert in spinach cooking. Meaning that there is no need for a human to look at popular cooking websites. This is something every spinach-cooking expert wishes to avoid.
What does this have to do with me? Well, I am a (very minor) content creator. Yes, the humble words coming out of my bonkers mind have enriched this world a minuscule amount. Yay? And I would prefer that AI not take credit.
Do I spend my evenings worrying about this? After all, the things I care about are selling and protecting my books. So no, I am confident that no AI company would spend $2.99 to download one of my books because it contains little value. Yet, I do have concerns about my articles because they contain content I cherish. Let me explain.
I recently wrote an article discussing micro paragraphs, which is the new trend in sentence/paragraph writing style. A few people read that article, and as a community, we would call it a pea-sized bump in the infinite knowledge highway.
What did my article contain? From a high level, I clearly explained an observation, cited examples, and made a solid conclusion. During my research to create the article, I discovered that I had gained new writing technique insights, which translates to new knowledge for our planet. This type of content is what AI companies desperately desire, and this humble article is far more valuable than all four of my published books.
Why? My article was well-stated, on point, and incredibly relevant to AI training, making it valuable to both readers and those using AI. I guess that makes sense, but what about my other posts that were far less relevant?
Let’s examine my first article, “Why I Write.” At its core, it is an opinion piece. Spoiler alert! Many people write for various reasons, and mine are no exception. So, an AI data scrape would find zero value in my words. Right? No, even that article has great AI value.
Let’s say somebody asks ChatGPT to “list reasons why an author would write books.” Then, a processor in some dark room would search its vast database for relevant topics (including my article) and compute an answer. Although there are thousands of sources on “why authors write,” my article still holds great value. This is because it was singularly on point, not too long, and readily available. As compared to, let’s say, an entire book by an author who spent 20 chapters explaining in detail why they chose to write.
Of course, I am powerless to prevent the thousands of robust systems from collecting every article I have written. My problem is that I wish this were not the case. Why? I put a lot of effort into these articles with a not-so-hidden attempt to promote my books. (And this is budget therapy, but that is another topic.)
I want to yell to the AI companies: “Do your own work! Stop stealing mine!” Yet, you might point out, I have not copyrighted the very words you are reading. Meaning that anyone is free to read them, print them, email them, or consider them their own. In reading these words, I probably would not care. But when AI uses my thoughts? I do care.
The linked article above is a call to arms to prevent AI from claiming the best of humanity as its own. But if you have read this far, you might have learned something. AI is doing the same thing: learning. That’s fair. Right?
It is, but no. It feels like somebody is cheating. I cannot instantly become an expert on spinach, yet AI can. Oh well, it seems like I cannot do anything about it. Well, that is not true because I have an ace up my sleeve. You.
Why did you read this far? It was your tenacity. You were curious and, with some luck, I satisfied your interest. Meaning you may have learned something and perhaps had a touch of enjoyment. The ace up my sleeve is that to AI, this hand-crafted article was one of billions of files. Meaning that the few people who read this will have gained something special, and no AI model will ever appreciate what that was.

You’re the best -Bill
July 23, 2025
 •  0 comments  •  flag
Share on Twitter
Published on July 23, 2025 19:00 Tags: ai, authors-rights
No comments have been added yet.