More Research Shows, Again, that AI Detection is "Not Reliable." It's Not Right. Again.
Plus, Inside Higher Ed promotes another cheating company. Plus, class notes and reader survey results.
Issue 326
Subscribe below to join 4,210 other smart people who get “The Cheat Sheet.” New Issues every Tuesday and Thursday.
If you enjoy “The Cheat Sheet,” please consider joining the 16 amazing people who are chipping in a few bucks via Patreon. Or joining the 45 outstanding citizens who are now paid subscribers. Paid subscriptions start at $8 a month. Thank you!
Research: AI Detection “Not Reliable.” Again. But Wrong. Again.
Stop me if you’ve heard this one before.
Another research paper is making the social media outrage rounds, claiming that AI “detectors are not reliable in practical scenarios.”
The paper’s premise is that detectors are not reliable because with enough effort, they can be beaten. Meanwhile, their baseline results show detection rates at — I kid you not — 99% accurate.
I think we need to discuss what unreliable actually means.
In any case, this paper, published this past February, is by Vinu Sankar Sadasivan, Aounon Kumar, Sriram Balasubramanian, Wenxiao Wang, and Soheil Feizi, all from the University of Maryland.
The paper’s arching theme is:
In this paper, we show that these detectors are not reliable in practical scenarios. In particular, we develop a recursive paraphrasing attack to apply on AI text, which can break a whole range of detectors, including the ones using the watermarking schemes as well as neural network-based detectors, zero-shot classifiers, and retrieval-based detectors. Our experiments include passages around 300 tokens in length, showing the sensitivity of the detectors even in the case of relatively long passages.
Like others before it, people are going to read the first line about “not reliable” and stop reading. Though, in reality, I think we can probably agree that most people will only read what someone else wrote about it on social media. Those people probably only read the first line.
But from this initial summary alone, we already have two issues.
One, this research team subjected AI detection to “recursive paraphrasing attacks,” meaning they took AI-generated text, had different AI paraphrase it, and did that over and over again. Second, based on what I could find, 300 tokens is about 225 words. That’s too short to be adequately or accurately evaluated by most AI detection systems. In fact, Turnitin won’t even run an AI scan if the selected text is fewer than 300 words:
File must have at least 300 words of prose text in a long-form writing format
No surprise, in other words, that if you take short sections of text and spin them in paraphrase engines over and over and over again, the fingerprints wash away and AI detectors become disoriented. I am not sure how much of a “practical scenario” that is, but that’s what we have.
To which, I say again — proving that AI detection systems can be beaten does not mean they do not work. Two entirely different goal posts. The example I usually give to make this point is door locks. Anyone can kick in a door, whether it’s locked or not. No one ever cites this reality to make the case that door locks are “not reliable in practical scenarios.” We still lock our doors. And we should.
But let me try another. AI detectors are like smoke detectors. They alert users to conditions that may require further inquiry and potential action. Papers such as this one cover smoke detectors in wet blankets or cover them in ice, then see if they can still detect heat or smoke. When their sensitivity suffers, the paper claims the detector does not work — that it is unreliable.
We treat no other detection technology this way. Car alarms, CAT scans, x-rays, airport metal detectors, collision sensors in cars, security cameras, fraud warnings on our credit cards — nothing.
No sane person says that because airport metal detectors cannot detect a carbon polymer gun, they do not work. We judge detection and warning systems on how well they detect what they were meant to detect under normal, highly typical conditions. Will an airport metal detector reliably alert a human about what may be a real gun or other weapon? If yes, it works. No one says, “well, bank robbers could wear masks, so let’s unplug the security cameras.”
This is not complicated.
Before going further, let me share that this paper is complicated, at least for me. I am not entirely sure what AI text the team assessed — some was watermarked, some was not — or what detectors the team evaluated. I think it was ChatGPT’s old, now discontinued, detector which the paper identifies as “RoBERTa-Large-Detector from OpenAI.” The paraphrase engine appears to be called “Parrot,” which I do not know.
I mention this because very little of that is likely in a real world academic setting. Parrot, maybe. I don’t know. But the team did not appear to evaluate any of the three most common detectors in scholastic settings, for example. Meaning that a student starting with watermarked text, using Parrot over and over again and a teacher using OpenAI’s detector, which was never popular and is now defunct, is unlikely to the absurd — despite this paper setting such a condition as a “practical scenario.”
Nonetheless, this paper found:
our recursive paraphrasing attack on watermarked texts, even over relatively long passages of 300 tokens in length, can drop the detection rate (true positive rate at 1% false positive rate or TPR@1%FPR) from 99.3% to 9.7%
Yes — before the team decided to “attack” the detectors, they were 99.3% accurate at detecting watermarked AI text with a 1% false positive rate. Ninety-nine. To one. But they are unreliable.
Then there is this, also with watermarked text:
after a single round of paraphrasing (pp1) with a watermarked [text] as the target LLM, TPR@1%FPR of watermark detector degrades from 99.8% to 80.7% and 54.6%
Before paraphrasing, watermarked text was 99.8% detected. And remember, OpenAI, the maker of ChatGPT, can watermark their text. In fact, they promised they would. They have decided not to.
In another section of the paper is this:
with multiple queries to the detector, an adversary can paraphrase more efficiently to bring down TPR@1%FPR of the [detector] from 100% to 80%.
That test does not appear to use watermarked text. And — one hundred percent. And it can brought down to just 80%.
And:
this detector detects almost all of the AI outputs even after a round of paraphrasing. However, the detection accuracy drops below ∼ 60% after five rounds of recursive paraphrasing
Even after five rounds of paraphrasing, detection was still better than 50/50. And it was nearly perfect even after initial paraphrasing.
I give up. Even after seeing numbers such as 99.3%, 99.8%, and 100%, all anyone wants to talk about is how AI detection does not work.
Paraphrasing, as Grammarly or QuillBot or others can do, can obscure AI detection. We know this. It’s why these companies and others sell this service — to duck detection.
The paper continues:
deploying vulnerable detectors may not be the right solution to tackle this issue since it can cause its own damages, such as falsely accusing a human of plagiarism.
Vulnerable detectors — sure. Fine. They — like any other detection regime — can be beaten. It is possible that an x-ray does not show the broken bone. But I want to highlight the completely inaccurate idea that these detectors may be “falsely accusing a human of plagiarism.” Ugh.
Once again — for those who may need the repetition — AI detectors are airport baggage screeners. They flag things that need human inspection and inquiry. They do not make accusations. Not ever.
Finally, let me underscore a passage from this paper that may be better in making the social media rounds, though it never will. The paper says:
Security methods need not be foolproof.
True.
We do not expect security to be 100% impenetrable. Good security thwarts the obvious and easy attacks and makes it hard and complicated to do bad things. It raises the possibility of detection and, ideally, consequences. That any security provision can be fooled does not mean it is bad security. In detecting AI, as with most other security interventions, the only alternative is not doing it — allowing the easy and obvious attacks. Tell me how not locking your doors is the better solution, please.
I also want to note briefly that the authors:
acknowledge the use of OpenAI’s ChatGPT to improve clarity and readability
While I am glad they disclosed it, using a chatbot to help write your research paper does not give me confidence. Maybe I am wrong, but I feel as though five professors at a public research flagship ought to be able to write a clear paper on their own.
Inside Higher Ed Promotes Cheating Company. Again.
At what point does a pattern become a choice? Or a reflection of operational philosophy?
I ask because this morning’s e-mail from Inside Higher Ed (IHE) had another ad from a cheating enabler — Grammarly, which does full-scale AI text creation, including citations, and conveniently offers an AI detector for students to “check” their work:
Grammarly’s AI detection shows users what part of their text, if any, appears to have been AI-generated, and we provide guidance on interpreting the results. This percentage may not answer “why” text has been flagged. However, it allows the writer to appropriately attribute sources, rewrite content, and mitigate the risk of being incorrectly accused of AI plagiarism.
Sure. If you happen to need to “rewrite content,” Grammarly has a free paraphrasing tool.
But this is not about Grammarly. Here’s the ad in IHE:
As I have pointed out again and again, IHE has a compromised relationship with cheating providers. I do not mean just taking their money to provide them credibility and promote their services, I also mean publishing news coverage that reliably downplays threats to academic integrity and sides with those who obliterate it for profit. I mean, IHE does it so often that it’s become cliche — see Issue 102, Issue 159, Issue 59, Issue 49, Issue 274, Issue 316, Issue 311, Issue 195, or Issue 167. And that’s not half of the examples.
And so, predictably, here we are again. IHE is promoting a service that students are using to shortcut academic work and short circuit learning.
The link promoted by IHE goes to “The Faculty Guide to Getting Started With AI.” Because, you know, AI is star-spangled awesome. And if you’re a teacher who’s gullible enough to let Grammarly guide you into welcoming their AI in your classroom, well, it’s pretty hard to turn around and penalize students for using Grammarly AI to cheat.
The guide:
offers 20 practical AI activities and 9 lesson plans that are easy to use and adaptable to various subjects. Whether you're just starting to explore AI or looking to expand its use in the classroom, this guide will help transform AI from a transactional tool into a powerful catalyst for new thinking and creativity.
Grammarly is writing lesson plans and learning activities now. Fantastic.
I also need to mention that Grammarly says this guide was:
Created in collaboration between The University of Texas at Austin
That is the same University of Texas at Austin that has advised faculty not to use AI detection software (see Issue 282). The school is now collaborating with known AI cheating tool Grammarly. I swear I have no idea what’s going on at Texas, but I know enough to know it is not good.
I can also speculate as to why Grammarly is just thrilled to be in a triumvirate with the likes of Inside Higher Ed and the University of Texas. I continue to be bewildered and disappointed by the other two.
Class Notes, Reader Poll Results, Etc.
In a recent Issue of The Cheat Sheet, I asked readers to let me know what value they received from our efforts by answering two poll questions.
If you are interested, here are the results:
So, news and research. I am not surprised, but good to know. Thank you for sharing your views and thoughts.
I want to mention that we’re up to 45 paid subscribers, including one new corporate/institution subscriber — which is very flattering and amazing. And since I’m obviously never going to be asked to write for Inside Higher Ed, it’s important. Thank you.
Finally, just passing along something I found funny, here’s a post on LinkedIn from Emily Nordmann, in Scotland. She says:
Hear me out, you're allowed to use AI and cheat as much you like but after graduating you're only allowed to use doctors, dentists, lawyers etc who used AI as much as you whilst studying.
Sounds fair. Absolutely love it.
Correct me if I'm misunderstanding, but: most critics of AI detectors seem to feel this way because they are worried about false positives, so, of students who did their own work of being falsely accused of cheating by their professors. But this article seems to say the opposite: that detectors are unreliable because they can too easily be manipulated to give a false negative. For me, this is not really a convincing reason not to use them.