Issue 215
To join the 3,305 smart people who subscribe to “The Cheat Sheet,” enter your e-mail address below. New Issues every Tuesday and Thursday. It’s free:
If you enjoy “The Cheat Sheet,” please consider joining the 13 amazing people who are chipping in a few bucks a month via Patreon. Or joining the 12 outstanding citizens who are now paid subscribers. Thank you!
Learning Moments from UC Davis
There were two articles in recent weeks, in two high-brand outlets, about AI detection and academic misconduct at University of California, Davis.
One in USA Today in April and the other, a few days ago in Rolling Stone. Both are vaguely about the same thing - AI detection software flagging a student paper and the student(s) saying the software was wrong. Though it should not be, this is known as a “false positive” indication.
The twin articles are a great jumping off place to review a few things about AI-created material, AI detection, and the academic integrity process.
The Examples
The bones of the USA Today article are that a student’s take home midterm was flagged by a professor as likely generated by AI. According to the reporting, the teacher was suspicious of an answer that, “(bore) little resemblance to the questions” and checked the text on GTPZero. After receiving a score indicative of AI creation, the teacher assigned a failing grade and referred the student for an academic misconduct case. The student denied using AI assistance. The paper says:
He eventually was cleared of the accusation.
During the reviews of his case, the student shared document revision data from Google Docs and, according the reporting, also presented:
a slew of research on the fallibility of GPTZero and other AI detection tools
The narrative continues that, a month after the referral for misconduct, it was nullified and:
an associate director with the University's Office of Student Support and Judicial Affairs, wrote to [the student] "After talking with you, talking with your instructor, and doing my own research into indicators of AI-generated text, I believe you most likely wrote the text you submitted for your midterm. In fact, we have no reliable evidence to the contrary."
The Rolling Stone story follows a similar arc, though with a different student - also at UC Davis. It carries the headline:
She Was Falsely Accused of Cheating With AI — And She Won’t Be the Last
This student’s work was flagged by Turnitin’s detector, though the story does not mention whether there was some indication beforehand that there may be issues with it, as was the case in the previous example.
Like the USA Today story though, this second student said she did not use AI and showed revision history from a Google Doc. Two weeks after the case was initiated, it was dismissed and closed.
Both students complained of the stress of the inquiry.
The Analysis
To start, it’s important to contextualize that, like every other school on the planet, UC Davis has a cheating problem (see Issue 98). Though neither article mentioned it, or the increase in academic misconduct generally, it probably does mean that teachers and administrators at UC Davis were - and are - on alert for cheating. And they should be.
USA Today
Since it was first, I’ll begin with the USA Today example and point out that GTPZero is pretty awful. I’ve said this before and often (see Issue 191 or Issue 189 or Issue 187 or especially Issue 180). By my rough scorekeeping, it’s the second-worst AI classifier on the market. Only ChatGTP’s own detection tool is worse. If GTPZero gave me an assessment score, I’d get a second opinion.
I mean, they don’t outright say it but it seems that UC Davis knows this since a spokesperson is quoted in the USA Today piece as saying:
We had a number of professors who submitted reports based on the output of GPTZero. As we learned of the fallibility of these tools, we shared information with instructors
Pretty blunt, if you ask me.
I’ll also note quickly that the USA Today article also includes that GTPZero is getting out of the detection game:
[The founder] said GPTZero is pivoting from its former artificial intelligence detection model, and its next version will not be detecting AI "but highlighting what's most human."
I confess I have no idea what that means. But it did make me laugh.
It is worth nothing as well that the USA Today example arose from a take-home exam. We’re just never going to learn this lesson, it seems.
Anyway, when everything is balanced out, the example from the USA Today narrative feels nearly perfect. It shows exactly how the system is supposed to work - a teacher had a funny feeling and checked. Getting confirmation - albeit from a flimsy source - they acted. The teacher did not rely on the score alone or even get a score until provoked. I would have preferred the teacher talk to the student before making a final call, but still, that’s pretty good. Two points of evidence feels as though it should be enough to act.
Further, the school reviewed the case, considered evidence, heard the student, and kicked it. Instead of failure, I’d say this is how we want these things to go - teacher first, technology second, thorough review and disposition third.
Further, from the USA Today story:
The university is advising professors to use "a variety of tools, along with our own analysis of the student’s work, to reach a preponderance of evidence rather than relying on a single tool"
That’s great. And that’s exactly what happened in this case. Again, to the letter of the law.
Consider further that this first example was resolved, start to finish, in a month. That’s lightspeed for academic hearings. Good for UC Davis. So, aside from the student’s stress during that process, which I do not wish to minimize, nothing happened. The universe is as it was.
In other words, USA Today could have just as faithfully reported: UC Davis integrity system works.
A final note on the USA Today story - I’m not sure whether the professor gave their account during the review process. Honestly, I’m not a fan of making teachers testify at these things, for many reasons. But if she did and if she felt that the submitted work was inauthentic, that should have carried significant weight. At least it would have with me, GTPZero aside. And I do hope that in dismissing cases such as this one, the University is not disincentivizing detailed review and reporting by faculty.
Now Rolling Stone
Jumping in, the story says this student:
received an email from her professor indicating that a portion of it had been flagged by Turnitin as AI-written.
That’s not what Turnitin’s software - or any AI-detection software for that matter - does. That includes GTPZero. Despite their name, these systems do not flag material “as AI-written.”
Instead, they look for words or word patterns that meet the hallmarks of computer-generated text. They look for very average writing, very predictable word choices and sentence structures - the kinds of things a computer would pick when using an algorithm. When these review systems flag text, they are not declaring it as generated by AI, they are saying it is statistically like something AI could have generated. And because that’s what they do, they flag text that has been run through paraphrasers or grammar changing software or, on occasion, simply written by a very average, very predictable writer.
That’s why all the companies that make these products say they should not be used on their own to initiate misconduct accusations or cases. Declaring something as produced by AI is simply not what they do. So, Rolling Stone goofed there.
As mentioned, their piece is silent on whether their example was initiated by a professor’s suspicion or by the Turnitin flag alone. One is absolutely correct. The other is not.
I’ll add that most of the Rolling Stone article is complaining about the academic integrity process at Davis, how the student said it was stressful. How the “very long” e-mail was short on details, how she felt intimidated and “in the dark,” and how having to read and understand the school’s academic integrity process was distracting from school work. I particularly liked this bit, from the student:
it’s just not fun to have to figure out the school’s complicated academic integrity policies while doing classes.
She’s a senior. She’s applying to law schools.
Moreover, according to the article, the process in this case took “two weeks.” Which, again, is very, very speedy. I do not think we can ask for more efficiency from any school. And, again, her case was dismissed.
Without knowing more about how this incident started or how the review went down, it’s hard to know whether this is an example of a process working or not. Given what we learned from USA Today, it’s at least possible this is also the product of successful detection, analysis, and outcome.
Overall
The reason I dragged you through all that is to make these final, hopefully few points regarding the new normal in AI-squared - the era of academic integrity and artificial intelligence.
Students who use Google Docs to write assignments will have an advantage in proving authenticity, should they be called upon to do so. Schools should immediately begin suggesting their use or, better yet, require students to write in LMS systems which can track time on page, revision history and keystroke data - all of which can be very helpful in parsing originality. One way or the other.
There is no such thing as a “false positive” in AI detection because the systems don’t detect AI specifically. What we mean, incorrectly, is that a detection technology flagged text that was not generated by AI or not linked to cheating. I don’t think we’re ever going to fix that glitch in the language but it’s important to note that when a detector flags text, it’s working correctly. It just may not be flagging text for the reason you think, which is exactly why deeper inquiry is necessary.
We should all exercise caution in using the term “false positive,” not only because it’s technologically untrue but because the counter source of information - students - is unreliable. Not everyone admits to misconduct. People lie when caught. They rationalize and obfuscate. They challenge the evidence. If, for example, a teacher suspects AI usage and an AI detector flags the text as well, but the student denies it, that can’t possibly be considered a “false positive.” At best, it’s counter-indicated and unclear. At best.
Further, even if an integrity case is eventually dismissed due to countervailing evidence, it does not mean anything was “false.” A dismissed inquiry does not mean no misconduct took place. Many times, inconclusive situations should be resolved in favor of inaction. I don’t see anything wrong with that. That said, using these correct and lenient dismissal outcomes to sow doubt about the integrity of the process itself feels misguided and dangerous.
Which brings us to the necessity that AI detectors should never, ever be used to initiate or decide a case of academic misconduct. At best, it’s evidence, not conclusion. I often liken it to a smoke detector. It’s a pretty good indicator of a problem but that problem may or may not be an actual fire. Human decision-making is essential. In one example, the USA Today case, that was done right. We don’t know about the other.
Personally, I think both of these examples were handled well and correctly by UC Davis. With evidence that the questioned work may have been original and authentic, the cases were dropped - and quickly. As such, I’m certain neither one was worthy of national press attention. But here we are.
International Quick Bites
Since that first piece was so long, I’m closing with International Quick Bites today.
China has issued an official warning against cheating ahead of its national college entrance exams. The warnings include police action and cite the case of an exam cheat from 2020 who was sentenced to three years in prison. It also advises parents not to trust tutors or preparation companies that promise results. The national exams began Wednesday and will be taken by a record 12.9 million students. Similar coverage can be found here.
In Uganda, the state Business and Technical Evaluations Board has increased penalties for exam cheating. Going forward, a finding of cheating will invalidate academic results not just for the exam but for the entire semester. Those who are caught cheating three times will incur a lifetime ban on taking exams.
In India, five students were arrested for sneaking phones into an exam and using a messaging app to share answers.
In Morocco, four students have been arrested for possession of equipment that was intended to be used in cheating on exams including, “64 electronic cards, 30 micro batteries, a set of wireless earphones, wires and electrical equipment, a laptop computer, as well as an amount of money suspected of being the proceeds of this criminal activity.”