Earlier this month, an author’s career went down in flames. Hachette, one of what are often called the “Big Five” major publishing houses, announced that it would not be going ahead with the U.S. publication of a horror novel called Shy Girl by Mia Ballard this April, and would be pulping the existing copies that had already been published in the U.K. late last year. The reason? Artificial intelligence.
Rumblings gained traction on platforms like Reddit and Goodreads in January (and even earlier) from readers who’d gotten hold of the book in the U.K., saying that it felt like Shy Girl bore hallmarks of ChatGPT’s style of writing. One nearly three-hour-long video by the books-focused YouTuber Frankie’s Shelf dissected the novel in its entirety, pointing out things like endless repetition of the word sharp, excessive “rule of three” constructions, and various other supposed tells of A.I.-generated writing. Eventually, these accusations made their way to the New York Times, which verified claims from Max Spero, the founder of the A.I. detection program Pangram, that large parts of Shy Girl appeared to show patterns characteristic of A.I.-generated writing. In response, Hachette said it was pulling the novel, marking the first known instance of a major publisher walking back a book because of accusations of A.I. use. (As for Ballard herself, she has not said much, barring a vague statement to the Times that echoed a now-deleted comment she left on the Frankie’s Shelf YouTube video, claiming that someone she knows had done an edit on the book and may have introduced A.I. that way.)
In the past couple of weeks since this bombshell rocked the publishing industry, there has been a lot of noise—much of it angry—about Shy Girl, the threat of the machines invading what we read, and what A.I.-generated writing even looks like. I have now read Shy Girl after getting my hands on a copy through somewhat covert means, and can confirm that, whoever had a hand in writing it, human or otherwise, it is not a very good novel. I wanted to see if I, like whoever played a role in Hachette’s publication of this title, would have been fooled. But I also wanted to try to understand how something like this could have happened in the first place. How did a book make it into the market that, in hindsight, seems so obviously to have been written, at least in part, by generative A.I.? Many in the book world have been sending up flares warning that something like this was likely to happen at some point. What made them so sure? I spoke to various people across the publishing industry to find out.
The author and critic Emily C. Hughes read Shy Girl a few months ago and thought it was fine. Not life-changing, but OK. When the A.I. accusations came out, she was forced to reckon with what it meant that she hadn’t noticed it on a first reading. Many have asked why an editor at Hachette didn’t spot that there was something off about the book. This likely had something to do with the fact that the book was self-published first. “When a publisher picks up a previously self-published book to bring it to a wider audience, the amount of editing that happens between the self-published version and the traditionally published version is minimal,” Hughes said. “It’s maybe a copy edit. And the logic there is, Well, people already like this, so we don’t have to mess with it too much.”
But it runs deeper than that, and people I spoke to said they expected that we will see A.I. usage creep into printed books across the industry. Books in which ideas or plot, rather than the quality of the writing itself, are the key sells will probably be affected the most by A.I. creep, said one editor who used to work at a Big Five publisher and wished to remain anonymous. “Editors are being very clear that it’s not acceptable, but it’s hard to police without potentially offending someone who just doesn’t write well.”
The problem is not just that large language models are now widely available to anybody who wants to use them to clean up their writing, nor just that, because this technology is still so new, publishing houses are scrambling to refine their stance on it, although those are certainly both true. The biggest issue is that editors often aren’t picking up on A.I. usage because they simply don’t have time to edit books as closely as they would like to.
“With the constriction of the publishing industry over the past 20 years or so, fewer employees are asked to do more work and fill more roles. So there’s a lot more expectation placed on editors, and they have the same hours in the day that they did,” said Hughes. The majority of editors do their editing work during the evenings and on weekends, industry insiders told me, and have to devote most of their working life to acquiring books, working on the campaigns to sell them, and managing author relationships. “I would love to see editing become a core part of the job again,” said the anonymous editor. “If you did a poll that asked editors whether they’ve been told they spend too much time editing, I reckon you’d get a 100 percent response rate.”
“Quality control has collapsed,” said another editor. John Baker, a literary agent at Bell Lomax Moreton, said that editors are seeing their jobs transform more and more into project management, and that while publishing houses make good profits, “the last thing they spend it on is more staff. Everyone working well beyond capacity is the norm.”
Presuming that it is, unfortunately, unlikely that publishing houses are suddenly going to make an about-turn and hire more staff to free up editors to do the actual work of editing, how do we move forward from here? What’s going to stop another Shy Girl from happening? A.I. detection software, while improving, has been shown to be fallible at best, often falsely labeling text produced by writers who are non-native English speakers as A.I.-generated. And the very nature of LLMs is that they are constantly learning. What may currently seem like easy ways to spot A.I. language—em dashes and the like, wink, wink—will soon be scrubbed out as the models learn that those em dashes are giving them away.
As I read Shy Girl, I did spot what I would think of as A.I. tells: overuse of parallelism, endless repetition of the same bits of vocabulary, awkward and abundant similes. I could quote any number of individual passages that rang alarm bells, such as this one:
My breath evens out, the outline of possibility taking shape. A job. The thought lingers, solid and improbable. A flicker of hope.
When the call ends, I sit for a moment, the phone warm in my hand, the silence loud. That fragile swell of hope trying to push through the cracks. But the feeling doesn’t last. The buzz of a notification snaps me back, its vibration rippling through the table like an electric jolt.
Nathan’s name flashes on the screen.
My chest tightens, anticipation blooming sharp and fast, and I unlock the phone with a clumsiness that betrays me.
But plenty of human writers, including non-native speakers, writers with autism, and people who make those stylistic choices simply because they want to, write that way, too. After all, LLMs write the way they do because they learned from us. The thing that ultimately convinced me that A.I. had had a hand in the text I was reading was a feeling: the sense, quite literally, of a lack of a person behind the words. Obviously, knowing the context, I was primed to look for it, but I felt this persistently while reading Shy Girl. The novel was haunted by a consistent flatness, one emotional note being struck throughout, and a hollow one at that.
I spoke to one of the A.I. sleuths who first posted on Reddit about the potential use of A.I. in Shy Girl. Let’s call this woman Dora. She was a freelance literary editor for 12 years, until a couple of years ago, when she decided to get out of the game because of the deluge of ChatGPT-generated material she was having to read. “I found that, I think understandably, soul-destroying a little bit, because I was in publishing because I love books and I love art and I love humans,” she said. She got a copy of Shy Girl from the library because she’s a big horror fan, and the concept—a woman kept as a pet by an abusive man finds herself exhibiting more and more animal qualities—sounded compelling. She got the A.I. ick that was so familiar to her from the first sentence: “I kept reading, and there was no phrase, word choice, grammar, structure, syntax, or punctuation mark, in my opinion, that ChatGPT would not have done.” But, again, more so than any individual detail, it was about a feeling she got when reading the novel. “I think it all comes down to pattern recognition,” she told me. “I mean, humans are good at it.”
Dylan Garity, a freelance editor based in Brooklyn who has edited more than 500 books, agreed that it’s not about individual tells: “It’s 100 things stacked on top of each other. And it’s less even about taking it down into those individual 100 things, and more about an overall feeling that you get upon reading it. That sounds kind of vague, and not back-up-able, but it is most of what we have to go on.”
That instinctive feeling is what A.I. companies will inevitably develop LLMs not to trigger. As for the rest of us, though, in the absence of reliable software or some foolproof means of backing up our suspicions when that feeling gets triggered, we’re going to have to rely on trust. The Authors Guild, the largest professional organization for writers in the U.S., has launched a program called Human Authored, where authors can register their book for the right to be able to use a certification mark on their cover to affirm that no A.I., beyond minimal spelling and grammar checks, was used in the production of their work. This will be an honor-based system. Authors can say they don’t use A.I., and we, as readers, will be tasked with believing them.
This is what’s so damaging about this Shy Girl debacle: not that one book that looks like it had A.I.-generated content in it made its way into a small number of unwitting readers’ hands, but that a high-profile scandal like this erodes trust in the whole publishing process. “A big concern I have is that publishing is blinkered about how much public goodwill there is towards it as an industry and how a few robust scandals could completely undermine it,” another anonymous editor told me.
Hughes, the author and critic, is most worried about the trust element, too. Now “there’s this desire not to get taken in and not to get fooled on the part of the audience” when they’re reading, she said. Reading is an act “that has brought so many people so much joy over so long, and to have that doubt introduced into the process is just really depressing. I say this as a reader, as an author, and as a critic,” she lamented.
For my own part, I’m an A.I. hater. If we could dig a hole in the desert, dump all the servers in there, and never speak of it again, I’d be right there with my shovel. And I’ll hold my hands up: I, too, have been fooled more than once by A.I.-generated videos I’ve seen on Instagram showing, for instance, cats doing outlandish things. As a result, when I’m scrolling there, I now do so with a new and unpleasant skepticism cutting through what small enjoyments are to be found mindlessly looking at social media. Is this real, or am I being duped? I can’t help but think. I don’t want to have to do this when I’m reading, too. Trust, once broken, is notoriously hard to regain.