Inside Higher Ed Reporting on AI Detection Falls Short
Plus, UAE bans Chegg-style business practices. Plus, a must-read from a high school paper. Plus, mountain racing and Lindsey Buckingham. You're welcome.
Issue 274
To join the 3,761 smart people who subscribe to “The Cheat Sheet,” enter your e-mail address below. New Issues every Tuesday and Thursday. It’s free:
If you enjoy “The Cheat Sheet,” please consider joining the 17 amazing people who are chipping in a few bucks via Patreon. Or joining the 32 outstanding citizens who are now paid subscribers. Thank you!
IHE: AI Detection Shows “Mixed Results”
Inside Higher Ed, as you likely know by now, has a compromised relationship with cheating, having taken money to promote cheating companies, misrepresented facts about misconduct, and the like (see Issue 49 or Issue 53). If you choose, you may believe that taking money and getting the facts wrong are unrelated.
With that history, IHE published a piece this week about the ability of AI detection systems. It’s not that the piece is wrong. It’s just that it’s not really accurate either. And one of the problems with promoting cheating companies is that when an article such as this one comes out, questions and concerns about the misleading and inaccurate parts surface — whether it’s an honest misfire on a complicated topic or covering for the illicit conduct of advertisers.
Either way is not good. But the piece exists and so I’ll touch a few points on it.
The headline and subhead are:
Professors Cautious of Tools to Detect AI-Generated Writing
Mixed performance by AI-detector tools leaves academics with no clear answers.
“Mixed performance” is fair, in my view. If you spread every detector out on a table, you’d see that some perform quite well, others do quite poorly. Overall then, mixed is fair.
The problem is that to reach that assessment, you’d need to consider them all equally and they are absolutely not all equal. The detection systems that have been built for academic settings, the ones 90% of teachers and schools use, are very good. Some of the rest, as I’ve written dozens of times, are complete garbage. Some that have made their way into studies were built in labs and used in real life by no one at all. Not ever. Lumping every detector together in the same view is distortive at least, intentional misrepresentation at most.
Let me try an example.
For decades, mountain car racing was very competitive and very popular. It was demanding. The corners tight, the inclines and declines steep. The consequences of error, profound.
Car makers, most notably German makers, made cars to race these courses. They had low centers of gravity; their engines were in the back of the car. The fancy ones had multiple transmissions so drivers could downshift inner wheels in tight corners at high speeds. These cars were, and still are, amazing. And if you wanted to race up and down an Austrian mountain and win, or even survive, you needed one.
In this example, what IHE and others are doing is taking every car on the road mountain racing — the family van, the cargo truck, a three-wheel delivery cart, the top-heavy SUV, all of them. Then they’re reporting that cars show “mixed results” on mountain courses.
It’s that bad.
Simplified: many of the AI similarity tools they are counting in the results were not in any way designed for the tasks they’re being tested on. Worse, many, such as OpenAI’s now shelved system and GPTZero, were just bad at mostly everything — the Corvair of AI detectors.
If you’re younger than 55, I am sorry but you may need to Google that. And here you thought “The Cheat Sheet” was not an educational read.
Anyway, what IHE and others insist on doing — counting ill-fitting or simply bad systems as part of an average — is exhausting and discoloring the facts of important conversations.
IHE reports that:
Montclair announced in November—a year after the launch of ChatGPT—that academics should not use the AI-detector feature in a tool from Turnitin. That followed similar moves from institutions including Vanderbilt University, the University of Texas at Austin and Northwestern University.
I have not verified this from Montclair specifically. Or Texas. However it is true that some schools have decided that they cannot trust their faculty to use AI detection correctly and prefer instead to have less information about student conduct and performance. It remains a bewildering choice.
Probably more importantly, IHE asks:
A big question driving these decisions is: Do AI-detection tools even work?
The answer is yes, they do. Despite what people believe and insist is true, there is zero evidence to the contrary (see Issue 250 or Issue 241). And I will say it again, if anyone has evidence that AI detection systems do not work, please send it.
Continuing, IHE quotes Holly Hassel, Director of the Composition Program at Michigan Technological University:
“You imagine it as a tool that could be beneficial while recognizing it’s flawed and may penalize some students.”
Yes, beneficial. And it’s a detection technology, so it’s going to have flaws.
But for what feels like the millionth time, AI detection tools do not penalize anyone. They provide information to educators who make their own informed decisions. People decide. I trust teachers to take information, weigh it appropriately, and make sound decisions regarding the work of their charges. They are not going to be perfect. But they never have been, which has nothing to do with AI or AI detection.
Back to IHE’s central question about whether these systems work or not. They report:
In June last year, an international team of academics found a dozen AI-detection tools were “neither accurate nor reliable.”
That same month, a team of University of Maryland students found the tools would flag work not produced by AI or could be entirely circumvented by paraphrasing AI-generated text. Their research found “these detectors are not reliable in practical scenarios.”
That June study, we covered in Issue 250. And they actually tested 14 AI detectors, not 12. But still, it’s true that when the research team tested more than a dozen systems as equal, the average rate of success was not good. That’s the mountain racing problem again — it’s simply dishonest to lump all AI systems together and make one conclusion.
And a reminder, as reported in Issue 250, that study actually found that:
three of the 14 tested detectors were more than 96% accurate overall — getting the human work perfect and missing just one of 18 AI works. Another was 92% accurate overall - getting all the human work right but missing two of 18 AI submissions.
So, to repeat — the technology that was built to do this, the technology that most people actually use, does it well. IHE didn’t mention that. Only that detectors overall were not accurate.
And it is true that AI detectors, like every form of integrity check, can be circumvented. People who are motivated to cheat will spend the effort and money to do it. As I have also said before, people break into houses. No one credibly suggests that door locks don’t work because they can be bypassed.
Seriously, put the quote above in that context. Here, let me. Because locks can be picked or removed by a locksmith, “Door locks are not reliable in practical scenarios.” That sounds nutty to me. Seriously, lock your doors. If what you have is valuable, if you want to protect it, do.
There are more issues in this story that probably should be addressed. But even I have limited patience and steely reserve. So, I am going to stop.
An Important Read from Menlo-Atherton High School
Menlo-Atherton High School (CA), ran a short piece in their student paper recently that has some things worth highlighting. Which I will do now.
[fun fact: Melo-Atherton is where Stevie Nicks met Lindsey Buckingham. See, we’re educational.]
The article is about whether teachers can see if students cheat on Canvas, the popular learning management system that can also deliver tests and other assessments. The short answer is yes, if they want to. But that’s where this piece gets really interesting. And important.
And though it’s only February, the article has several candidates for 2024, academic integrity quote of the year.
For one, the article quotes a student:
who is familiar with monitoring systems said, “I only have Canvas tests in two of my classes, and in one of them pretty much everyone Googles the answers, but —- are care-free because the quizzes have nearly no impact on our grades to begin with,” they explained. “On the other hand, it’s impossible in my other class where the teacher will grant an immediate zero if we even have another tab open on the browser.”
Print this. Pin it.
This student says “everyone” cheats when the assessments have “nearly no impact on our grades.” I love this because it’s been a source of angst for me for a long time that many educators think they can reduce cheating by reducing the stakes of assessments. I’ve maintained that the inverse is true, that by lowering the stakes you’re lowering the risk and actually incentivizing misconduct.
I also love that it’s followed immediately by the alternative. In classes where “the teacher will grant an immediate zero” — where that is expected — cheating is “impossible.”
Seems pretty clear to me. Students get this. Educators seem confounded by it.
Then there’s this, from a teacher at the school:
“However, ultimately in AP classes, the big assessments are what ultimately define your grade. So, if you cheat on a reading quiz, the teacher is likely going to decide that it isn’t worth their time and energy to call you out.”
Holy Vienna Sausage. From a teacher — that it “isn’t worth their time and energy to call you out.” For cheating. I’m genuinely horrified.
But again we see this dichotomy of high stakes and low stakes. Low-stakes carry the message of low value and, in this case, zero enforcement effort. It seems both teacher and student agree that low stakes assessments simply are not worth not cheating. Still, it’s a good thing I am typing because I am speechless.
But wait.
The piece also has this quote:
“On one of my finals freshman year, I was in a class that I felt was taught poorly, and almost everybody I knew in that class cheated on the final,” said another student. “The teacher didn’t think to check the Canvas history either, so to my knowledge, no one was ever caught.”
This isn’t about low stakes versus significant stakes, but it’s a reminder of the very frequent rationalization for misconduct — that the class was “taught poorly.” In other words, it’s the teacher’s fault. Or the school.
As we’ve seen in compelling research, being able to rationalize a choice to cheat is nearly essential in executing it (see Issue 64). So, this matters.
There’s also truth to this, unfortunately. When students infer that educators don’t care about the course, or that they don’t care about academic integrity, they won’t either. Whether that’s rationalization or setting standards, it’s real. Some of the responsibility for these situations has to move up the chain to Deans or Department Chairs or, in this case, maybe the Principal. Cheating will be common and undetected if teachers continue to teach without a mindset of care for the content and attention to misconduct — which are deeply related.
It’s not too often you get three bombshell quotes and core realities in one article. Amazing.
UAE Stiffens Penalties for Cheating Chegg-Style
The UAE has added new penalties for exam cheating in school, according to this news report.
The article says there’s a new fine, but it does not say how much. Even so, it does seem clear that this law applies to non-students, those who facilitate cheating:
As per a federal law, any individual who is not a student and who commits any of the three acts that are prohibited before, during, or after the examinations is subject to the punishment. These include the following: printing, publishing, promoting, transmitting, or leaking information connected to questions, answers, or the substance of the test in any way; modifying answers or the grades that are granted; and impersonating a student in order to allow them to sit the examination in his or her place.
The three acts are cheating, destroying exam papers, and verbal assault on test staff.
Still, the law seems to prohibit most of what cheating companies such as Chegg and Course Hero do — printing, publishing, promoting, or transmitting information connected to questions, answers, or the substance of a test. In any way. Seriously: “information connected to.”
Good for UAE.
That clicking sound you’re hearing in the background is Course Hero’s lawyers looking up what the fines are.
And not being a lawyer, I wonder if the “promoting” part may get to companies like Google or Facebook/Meta that advertise cheating products and services.
And it’s a good time to point out that none of that is prohibited in the United States. You can print, publish, and promote answers to tests as much as you want. You can even make millions doing it. Many do.