Research Team Says Their AI Detector is 97% Accurate
Plus, LeBlanc leaves, no one mentions the cheating. Plus, oh, Trevor. Plus, Utah is the state most likely to cheat? Probably not.
Issue 262
To join the 3,710 smart people who subscribe to “The Cheat Sheet,” enter your e-mail address below. New Issues every Tuesday and Thursday. It’s free:
If you enjoy “The Cheat Sheet,” please consider joining the 15 amazing people who are chipping in a few bucks via Patreon. Or joining the 28 outstanding citizens who are now paid subscribers. Thank you!
Research: Our AI Detector is 96% Accurate
In Issue 259, I mentioned that I’d get to the recent research on AI detection from the University of Darmstadt in Germany. The paper is by Kavita Kumari, Alessandro Pegoraro, Hossein Fereidooni, and Ahmad-Reza Sadeghi — all from Darmstadt.
In Issue 259, we noted that the paper says that:
The most effective online tool we examined demonstrated a success rate of less than 50%
As I also noted at the time, the phrase “we examined” was doing some pretty heavy lifting since the research team did not examine any of the best or most popular systems — a recurring and very annoying research fault. People keep testing bad systems and reporting that all systems don’t work.
And that this 50% number was only for detecting AI, not overall accuracy.
But most importantly — and I’ll bet any amount of money on this — this study will be quoted as saying that AI similarity detectors do not work. It’s an easy bet because it already has been used that way. When, in fact, the purpose of the report was to see if the team could build a detection system that works. They named it DEMASQ and report:
Our evaluation demonstrates that DEMASQ achieves high accuracy in identifying content generated by ChatGPT.
Imagine that. An AI detector with “high accuracy.”
Their solution involves something with vibrations and phonic energy. I have no idea what they did. Something about drumheads and the Doppler effect. And there’s math. Based on what can understand, it seems unworkable. Anyway, they say that:
it achieves an high accuracy of 97% on a representative benchmark dataset that contains diverse prompts from both ChatGPT and humans
Back to the tests, the team says they used a pre-existing data set of GPT and human texts and:
expanded the dataset by incorporating responses from popular social networking platforms (such as Reddit and Wikipedia Q&A)
I have no insight into their existing dataset but you can probably guess how I feel about using Reddit and Wikipedia. Not good. I feel not good about it. Moreover, here are the systems the research team tested:
Bleumink et al., ZeroGPT, OpenAI Classifier, GPTZero, Hugging Face, Guo et al., Perplexity (PPL), Writefull GPT, Copyleaks, Cotton et al., Khalil et al., Mitrovic et al., Content at Scale, Orignality.ai, Writer AI Detector, Draft and Goal, Gao et al., Liu et al. [34] (Detector 1 - Task1), Liu et al. (Detector 2 - Task2), and Liu et al. [34] (Detector 3 - Task3)
Foremost, there are some big names missing — Turnitin, CheckforAI, and Crossplag (Inspera). In other research, those three systems were 96%, 96% and 92% accurate overall, respectively. All three with zero false positives (see Issue 250.) And two of those — Turnitin and Inspera — are probably the most-used checkers anywhere.
Of the 20 systems this team tested, I have even heard of eight of them. And most of those have been repeatedly shown to be terrible. In other words, this ain’t the A-team being put through its paces.
But even these flawed systems were not total failures. Of the 16 systems for which there are scores, 14 were 90% accurate or better at identifying human-written content, which is where you want these systems to be the most accurate. To the extent that you trust the data samples, the tested systems struggled to reliably pick up AI. And if you take an average of success — the rate of accuracy with both human and AI material — all but one system scored above 50%. Hugging Face scored a 36.8%, which proves my point. Freaking Hugging Face? Come on. What are we doing here?
Anyway, the authors of this paper may have a unique way of detecting AI text. They say they do. I cannot judge. But what I can judge is that this paper does not substantiate that AI similarity detectors don’t work because, once again, the good ones were not tested. And even the bad ones were not total failures, which is really the best I can say about them.
Paul LeBlanc to Leave SNHU
Paul LeBlanc has announced plans to leave Southern New Hampshire University.
And while I have thoughts about the kind of institution he created and what it’s done to higher education, this news is here because LeBlanc is on the Board of Directors at Chegg.
That’s right, the man who ran a large, online, inexpensive university was making hundreds of thousands of dollars selling cheating. It remains manifestly incompressible to me how this was not a red-letter scandal. Like a guy who owns a jewelry store serving as a Director at a company that makes fake gemstones, you’d have to question what he was selling.
But the real reason that I’m sharing LeBlanc’s departure is because none of the news outlets that covered his announcement mentioned his paid affiliation with Chegg. Not one. Not Inside Higher Ed — which is not really a surprise, honestly. But not Chronicle of Higher Ed. No one, as far as I can tell.
Are Folks in Utah the Most Likely to Cheat?
Probably not.
I ask because a few weeks ago I received an e-mail press pitch from a company called Nootroedge which said it:
analyzed the Google search volume behind ‘AI essay writer’ and scaled the data to each state's population size - identifying where students are most willing to use AI to cheat with their studies.
So, obviously, using Google to search for something does not mean you’re willing to use it. And there are a billion reasons that someone may look up “AI essay writer” that do not involve academic cheating.
At the same time, if you kind of assume that those reasons are more or less evenly spread across the country, a high search volume is interesting. Informative? Not really. But interesting.
From their pitch:
Shockingly, the data revealed Utah students are most likely to reach for AI assistance when it comes to writing tricky essays, with the highest average search volume across the past 12 months of 18.1 per 100k.
It turns out Utah students are by far more likely to use AI to cheat compared with the other states, boasting an average search volume that notably surpasses that of the second-ranking state.
Massachusetts secured the second position, boasting an average search volume of 13.3 per 100k. This suggests a noteworthy dependence on AI among students, although it still falls considerably lower than Utah's searches.
New York, ranked as the third most likely state to cheat with a search volume of 13.2 per 100k, coming in just behind Massachusett’s score of 13.3. Ranking in fourth place, Rhode Island had an average search volume of 13.1 per 100k.
In Texas, the study revealed an average search volume of 13.0 per 100k, placing it in fifth place. California secured sixth place with a search volume of 12.5 per 100k. Indicating both states could benefit from some additional measures to discourage the use of AI for cheating
Connecticut ranked seventh with an average search volume of 12.2 per 100k. In eighth place, New Jersey had an average search volume of 12.1 per 100k. Florida, ranking ninth, and Georgia, securing the tenth spot, exhibited average search volumes of 12.1 and 11.9, respectively.
Again, this in no way shows a likelihood to cheat with AI.
On the lower end of the search spectrum, the pitch says:
The study revealed South Dakota, is the most honest state with an impressively low search volume for ‘AI essay writer’ of just 5.6 per 100k. Alaska followed in second-best place with a search volume of just 6.4, per 100k.
Other states that seek less AI support include Delaware, with 6.4 searches p/100k, Montana (6.7 searches p/100k), and Wyoming which also had just 6.7 searches.
Not searching for AI essay writing does not mean honest. But I’ve made that point. And I do think, looking at this list — the common thread among South Dakota, Alaska, Delaware, Montana, and Wyoming may not be “honesty.”
Anyway, sharing it because you may find it worth a glance in the same way I always look at those stupid maps showing what people in different states put on their hotdogs or what the best-selling soda pop is.
Oh, Trevor.
On Monday, Trevor Milton was sentenced to four years in prison for fraud — lying about the electric car company he founded, resulting in massive investor losses. Like, $660 million.
It’s in The Cheat Sheet because a New York Times article before his sentencing had this gem:
Mr. Milton also lied about his personal history, prosecutors said. He had said that he dropped out of college to pursue his entrepreneurial dreams even though he was expelled for paying someone to do his academic work.
Oh.
Wikipedia says Milton was bounced from Utah Valley University, a public school in Orem, Utah. First, massive credit to Utah Valley for standing behind its integrity and actually expelling someone. That’s incredibly, incredibly rare. And, if past is prologue, well — past was prologue. Liars gonna lie and cheaters gonna cheat. Anyway, good for them. Looks as though the school may have been just a bit wiser than those investors.
Also, I am now reconsidering that thing about cheating in Utah.
Class Notes
Scheduling — due to impending holidays, there will not be an Issue of The Cheat Sheet on Tuesday, December 26. Sorry to derail your planned family holiday reading. I will do one on or about December 28, though it may be the annual Best/Worst issue. Not sure yet.
Self-promotion — you may not be aware that I also publish EdTech Chronicle, which tends to be a bit less jaded and more general than what I write here. And, over at ETC, we’ve started an awards program — the Best in Education Awards. The nomination process is easy and the deadline is in January. Unlike other awards, this one is focused on people and their work instead of products. If you know anyone who may be interested in it, I thank you for sharing it.