Like It or Not, Publishers Are Licensing Books for AI Training—And Using AI Themselves

The Platform Authors Need Now (That I... The Florence (Italy) Enigma for Creat...

Like It or Not, Publishers Are Licensing Books for AI Training—And Using AI Themselves

Image: an AI generated illustration of book publishing executives in the meeting room of an urban skyscraper, enjoying large handfuls of cash they've received from licensing content to AI companies.

AI-generated image using prompt “book publishing executives receiving lots of money from a technology company.” Apart from their love of oversized cash, notice their whiteness.

The following article condenses material that I’ve been writing about for the last 18 months in my paid newsletter, The Hot Sheet.

The train has left the station, the ship has sailed, pick your preferred metaphor.

This week, the Copyright Clearance Center (CCC) announced the ability for publishers and other rights holders to include AI training rights as part of licensing arrangements. In other words, they’re giving AI companies a one-stop shop for all their model training needs.

What is the CCC? It is a for-profit company that manages collective copyright licensing for corporate and academic publishers. Generally, its mission is to help publishers earn money off copyright and expand copyright protections for rights holders.

In the CCC’s announcement, the president of the Association of American Publishers says, “Voluntary licensing solutions are a win-win for everybody in the value chain, including AI developers who want to do the right thing. I am grateful to organizations like CCC, as they are helping the next generation marketplace to evolve robustly and in forward-thinking fashion.”

A handful of book publishers have already struck deals with the AI companies directly.

Wiley, a major academic publisher who is also known for the Dummies series, announced two deals in June, to the tune of $44 million. Many major media organizations, like News Corp and The Atlantic, have also struck deals. (Here is a running list.)

I think it’s fair to say that, before long, every major publisher will be earning money through AI training, whether it’s through the CCC, another collective licensing agency, or directly with tech companies, if they are big or desirable enough (as Wiley is).

How do writers protect themselves?

I’m asked this question a lot, and often I say things like “Join the Authors Guild,” since they’re deeply involved in the issue of compensation for authors and advocate for their rights.

But increasingly, I’m also pushing back on the question: What do you need protecting from? While the AI companies will always carry the original sin of training on copyrighted work without permission or licensing, they’re now going through appropriate channels to obtain training material. Yes, there are lawsuits underway (by the New York Times and the Authors Guild, among others) that have to play out and may settle out of court. But even if the rights holders win in the end, the models will not shut down. The AI companies will not go out of business. Instead, remedies will be found for rights holders and business will continue as usual.

Recently, Mary Rasenberger at the Authors Guild told Publishers Marketplace (sub required) that they see AI licensing as a good source of income for writers down the road and that they’ve been talking to publishers for months about who owns AI training rights and how to work the revenue splits. She said, “I am completely optimistic there will be joint agreements between publishers and authors on this. It is not the hardest problem in the world.” Fortunately, she says publishers so far agree they need permission from authors to license books for AI.

Theoretically, authors could object and withhold their material from training, but that would be turning down free money. The average author’s concerns about AI training or ingestion often betrays a misunderstanding about what today’s large-language models are intended to do. They are not databases where you retrieve information. They are not machines that intend to steal, plagiarize, or regurgitate. (If and when they do, the developers consider that a flaw to be worked out.) Benedict Evans has expressed this eloquently: “OpenAI hasn’t ‘pirated’ your book or your story in the sense that we normally use that word, and it isn’t handing it out for free. Indeed, it doesn’t need that one novel in particular at all. In Tim O’Reilly’s great phrase, data isn’t oil; data is sand. It’s only valuable in the aggregate of billions, and your novel or song or article is just one grain of dust in the Great Pyramid.”

That said, authors might certainly object to the AI companies themselves, how they are run, the ethics of the people behind them, the future implications of AI use, etc—and avoid involvement for that reason. But refusing to engage at all with the technology may end up penalizing yourself more than them—not because there’s going to be some incredible revolution (I don’t buy into most of the hype surrounding AI), but because you’ll end up working harder or spending more money than everyone else who is using these tools. The technology is destined to be integrated into daily life, for better and for worse.

Authors and publishers are using AI to write and publish—today.

And it plays a role at all stages of the writing and publishing process that many professionals would find acceptable and ethical. While it may be unethical for someone to use AI to generate 5,000 spammy reviews, in other cases people prefer AI content, like when it’s used to improve summaries of scientific articles.

Publishers are beginning to differentiate between two types of AI use in the writing and publishing process. During a Book Industry Study Group panel about AI use, Gregory M. Britton, editorial director at Johns Hopkins University Press, discussed these two types. One is content creation, which publishers have legal concerns about, and the other is the content management, or the editorial tools, which JHU encourages. “I think it would be foolish for an author to submit a manuscript without running spell check on it before they turn it in,” and he sees AI editing tools as analogous.

One of JHU’s authors, José Antonio Bowen, used AI to find all the places where he may have been repetitive in the manuscript, and he also used AI successfully to help him with fact-checking and citations. He disclosed all of this use to his editors. Some may be surprised that AI can find factual errors in a manuscript, given the problematic results it can generate, but much depends on the tool, the user, and the prompt. Which brings us to the next important point.

Authors are responsible for the quality and correctness of their work, whether they use AI or not. Even if the use of AI in content creation blurs the lines of intellectual property and originality, authors remain accountable for the quality of their work. That means you can’t blame the AI for getting something wrong; you remain responsible for vetting what the AI does.

Even those who question the ethicality of generative AI believe that writers and students today should (or must) learn to use it. “What faculty and teachers call cheating, business calls progress,” Bowen said during the panel. “If you say you can’t use a tool or refuse to use it, your colleagues who use the tool will complete their work faster and better.” In other words, AI is raising the average. However, Bowen said, “AI is better than 80 percent of humans at a lot of things, but it’s not better than the experts. … The best writers, the best experts are better than AI.”

AI is being used to fuel translated works.

Machine translation has been around for a long time, but advances in generative AI are leading to a new renaissance in book translation. Once again, a Book Industry Study Group panel examined how AI is being used right now to translate and to assist human translators; panelists included Robert Casten Carlberg, the CEO and co-founder of Nuanxed, a translation agency.

Because AI-assisted translation is incredibly cheaper and faster, it has the potential to grow the market for translations and lead to new jobs in the management of translations. Founded in 2021, translation firm Nuanxed works mostly on translating commercial fiction between European languages, using a hybrid process that includes AI tools before, during, and after translation. They pass savings onto the publishers while still paying a good market rate to human translators. Carlberg said, “Most publishers we start working with are very skeptical to the way we are working but realize once they’ve tried it, the quality is good, and the readers really like it.” And the authors also like it, he added.

Carlberg’s firm is growing fast, and he’s hearing from more translators who want to work with Nuanxed. He says their big value add is that they pass every translation through the appropriate “cultural lens” and make sure the work is coherent throughout.

Yes, there are still problems and valid fears.

Some writers fear that AI use will pollute the market (as it’s doing now) and lead to various types of AI fraud—the kind of thing that happened to me. Some form of this fraud has existed for as long as Amazon KDP or digital publishing has existed, only it’s more prevalent now and easy to execute with AI tools. I sometimes get upset about the pollution as well and what it might mean for writers and publishers over the long term. But I’m hoping we’ll also gain methods of filtering the garbage just as we have in the past.

The other concern is that AI-generated work will be less creative and interesting in the long run, since it tends to generate what’s rather average or what’s already dominant in the culture. For example, a recent study showed that AI could boost creativity individually, but it lowers creativity collectively. (A friend of mine who reads a lot of genre fiction that’s heavily AI-assisted or AI-generated said she’s read five novels recently all featuring a main character named “Jaxon.”) That’s what AI does. Revert to the mean or what’s most predictable. I expect more progress and more tools that modify these predictable outcomes when they’re not desirable for the user or the output.

I’ll close with the words of The Atlantic’s CEO Nicholas Thompson:

AI is this rainstorm, or it’s this hurricane, and it’s coming towards our industry, right? It’s tempting to just go out and be like, “Oh my God, there’s a hurricane that’s coming,” and I’m angry about that. But what you really want to do is, it’s a rainstorm, you want to put on a raincoat and put on an umbrella. If you’re a farmer, you want to figure out what new crops to plant. You want to prepare and deal with it.

And so my job is to try to separate the fear of what might happen and work as hard as I can for the best possible outcome, knowing that because I have done a deal with an AI company, people will be angry because AI could be a very bad thing, and so there’s this association. But regardless, I have to try to do what is best for The Atlantic and for the industry.

If you enjoyed this article, check out my paid newsletter, The Hot Sheet, which always has the latest news and developments related to AI and book publishing.

View more on Jane Friedman's website »

Like • 0 comments • flag

Published on July 17, 2024 02:00

No comments have been added yet.

Jane Friedman

The future of writing, publishing, and all media—as well as being human at electric speed.

Jane Friedman's profile
1885 followers