Bloomberg's Latest Article on AI and Detection is Exhausting
Plus, Brown returns to on-paper, in-class assessments. Plus, just another example of cheating service marketing.
Issue 318
Subscribe below to join 4,156 other smart people who get “The Cheat Sheet.” New Issues every Tuesday and Thursday.
If you enjoy “The Cheat Sheet,” please consider joining the 16 amazing people who are chipping in a few bucks via Patreon. Or joining the 42 outstanding citizens who are now paid subscribers. Paid subscriptions start at $8 a month. Thank you!
Bloomberg Writes The Same Article Countless Others Have Written — Just as Sensational, Just as Misleading
I am exhausted.
It’s too much to ask you to care about my state of affairs, but I am simply tired of fighting this fight, calling out sloppy and misleading journalism about academic integrity over and over and over again. I’m not sure what my sin was, but like Sisyphus, I feel I’ve pushed this rock up this hill a thousand times already, only to be rewarded with having to do it again. And again.
Motivating this is an article from Bloomberg, the latest pre-digested article on generative AI and tools to detect it in academic settings. We’ve read this a dozen times already. Honestly, I preferred to ignore it. But a few people sent me the article and some people have been sharing it on social media, declaring that using AI detectors was immoral — yes, immoral.
I will try — try — to not spend too much time on this because, to underscore the point, nothing I am going to say is new. The boulder has been up this hill already.
Let me begin on immoral and disagree. To me, immoral is not doing everything you can do to stop cheating. Because we know that lack of effort to stop academic fraud actually incentives academic fraud, not using AI detection tools as part of a cheating prevention playbook is indefensible. Allowing honest students to be mugged, giving grades and degrees for unearned work, watching the rotting away of academic credentials, putting unqualified and dangerous work into public spaces — that’s immoral. Some people are so sacred about having an awkward, honest conversation and so worried that their students won’t like them, that they simply close their eyes to injustice.
Ask me how I really feel.
The Headline, Etc.
Breathing — on to Bloomberg. And we can start with their headline:
AI Detectors Falsely Accuse Students of Cheating—With Big Consequences
I’ve lost count of how many times I have said this but AI detectors do not accuse anyone of anything. They are airport metal detectors. If they go off, it is not an accusation of terrorism. If it was — where is the Bloomberg reporting that airport security measures falsely accuse millions of travelers a year? Why have they not broken the case that car alarms falsely accuse millions of people of auto theft? They’d win a Pulitzer, I am sure, for their in-depth reporting on the false accusations of fire alarms.
And, on “Big Consequences,” Bloomberg starts — as all pearl-clutching reporting must today — with a human story. This one about Moira Olmsted, a parent who saved her money to go back to college. The story says she was accused of submitting work that “was likely generated by AI.”
The big consequence — a warning:
“she received a strict warning: If her work was flagged again, the teacher would treat it the same way they would with plagiarism.”
I mean, the student did have to speak with people to defend her work. And, I am sure it is stressful to be suspected of misconduct and have to engage a review. But, when it all comes out in the wash, we have a warning. Not a grade reduction, not suspension, not expulsion — a warning.
Here I will also say, yet again, that when someone accused of cheating says they did not cheat, it does not mean they did not cheat. I am not saying this student cheated and lied about it. I am saying students cheat and lie about it. But when it comes to media coverage, their word is always uncritically accepted as fact, case closed — they never consider the possibility that a cheater may be lying.
Not done, Bloomberg quotes another student, Ken Sahib, who it says he is multilingual and was given a zero on a writing assignment because of suspected AI use. The article does not say if his grade was reversed, only that:
Sahib says he ultimately passed the class, but the incident fractured his relationship with his professor. “After that we barely spoke,” he says.
We may assume that if the accusation of misconduct cost Sahib a position in graduate school, or a scholarship, or long administrative wrangling, he would have said so. I have no doubt Bloomberg would have.
The “big consequences” here is a fractured relationship. Tragic, but I am not sure I’d categorize it as consequential. Which also overlooks that this entire dynamic — an accusation and a fractured relationship could have easily happened in 1980 or 1950. It does not necessarily have anything whatsoever to do with AI.
The Good Are Good
I will give credit to Bloomberg here, for this reporting:
The best AI writing detectors are highly accurate, but they’re not foolproof. Businessweek tested two of the leading services—GPTZero and Copyleaks
Well, half credit.
Yes, good detectors are highly accurate. But if we just accepted “highly accurate,” then we would not have a story. And nothing is foolproof.
But as for testing GPTZero and Copyleaks as “two of the leading services,” that’s just a joke. Yes, they have high visibility. But as has been shown many, many times, GPTZero is highly unreliable (see Issue 250 for just one example). Copyleaks, meanwhile, is a business partner with Chegg and other cheating providers (see Issue 208). But to Bloomberg — all good. GPTZero and Copyleaks will stand in for all AI detection systems.
Later, Bloomberg quotes:
The AI detection service QuillBot
You have to be kidding me. Sometimes, I confess, the boulder just flattens me.
Bad, Bad Data — Bad, Bad Test
But, I can hear you perhaps saying, Bloomberg tested them. And they did, on:
a random sample of 500 college application essays submitted to Texas A&M University in the summer of 2022, shortly before the release of ChatGPT, effectively guaranteeing they weren’t AI-generated. The essays were obtained through a public records request, meaning they weren’t part of the datasets on which AI tools are trained.
Cool, cool.
But before I explain why that’s still a real problem, let me share that Bloomberg found:
the services falsely flagged 1% to 2% of the essays as likely written by AI, in some cases claiming to have near 100% certainty.
One to two percent — or nearly exactly what the best AI detection companies say their likely error rates are. Imagine that.
But to show the glaring problem with the data set Bloomberg used, let me quote Bloomberg, from later in this same article:
Bloomberg found using Grammarly to “improve” an essay or “make it sound academic” will turn work that passed as 100% human-written to 100% AI-written. Grammarly’s spell checker and grammar suggestions, however, have only a marginal impact on making documents appear more AI-written.
Grammarly, QuillBot and a host of other AI-powered rewriting and paraphrasing services existed well before the summer of 2022, when Bloomberg pulled their data. QuillBot launched in 2017. In other words, in 2022, before ChatGPT, it was very possible to use any one of a number of services to rewrite or improve an essay. Many of those services will trip modern AI detectors because — wait for it — they were at least partly composed by AI.
So, when Bloomberg finds three 2022 essays from a set of 500 that are “flagged as AI,” that’s at least possible — or it’s at least possible that the AI detector is not wrong. Even if the detection is absolutely wrong, that’s a .6 percent error rate, by the way. Point six percent.
Bloomberg found nine other essays in the pile of 500 that it says were “part AI and part human” — a 1.8% error rate, assuming those are actually errors. Bloomberg does not say at what threshold the line was drawn — 10% likely AI? 25%? Whatever it was, Bloomberg counted them as wrong.
Interestingly, Bloomberg also reports in this same article that:
AI detection startups have attracted about $28 million in funding since 2019, according to the investment data firm PitchBook, with most of those deals coming after ChatGPT’s release.
Most of these deals for AI detection services were after ChatGPT, but not all. As early as 2019, AI was being used to assist or modify writing and companies were building technologies to try and find it. Once again, AI use in writing was available and used well before the 2022 sample that Bloomberg acquired — and that’s according to their own writing in this very story.
I cannot know whether the essays tested by Bloomberg were corrupted with paraphrasing tools and therefore not “original” work. I can say it’s at least possible that some were. One to two percent corruption? I’d buy that. But if that’s true, these are not false flags or errors at all.
About their test, Bloomberg says:
Even such a small error rate can quickly add up, given the vast number of student assignments each year, with potentially devastating consequences for students who are falsely flagged. As with more traditional cheating and plagiarism accusations, students using AI to do their homework are having to redo assignments and facing failing grades and probation.
Wait, what? Bloomberg says that students using AI to do their homework are having to do it again or face failing grades and probation? What a scandal.
Come on.
Other Bits
The article says:
classrooms remain plagued by anxiety and paranoia over the possibility of false accusations, according to interviews with a dozen students and 11 teachers across the US. Undergraduates now pursue a wide range of time-consuming efforts to defend the integrity of their work, a process they say diminishes the learning experience. Some also fear using commonplace AI writing assistance services and grammar checkers that are specifically marketed to students, citing concerns they will set off AI detectors.
Yes, because the real fear is the very unlikely chance of a false accusation — which is different and considerably less than the 2% chance of an incorrect flag from an AI detector. No fear at all, not even a mention, of the 40% of students who admit to using AI on their academic work. Bloomberg is really, really fixated on how flammable the drapery is on the Titanic.
And by “efforts to defend the integrity of their work” Bloomberg means that students frequently take their work to available AI detection systems to pre-check it before they submit it. This is not erasing fingerprints so you don’t get caught — of course not. It’s defending your integrity. Never in my life have I thought to take my bank records to a forensic accountant just to be sure no one can falsely accuse me of fraud. I’m honest, but, you know, just to be safe.
Consider:
Most of the schools that work with Copyleaks now give students access to the service, Yamin says, “so they can authenticate themselves” and see their own AI scores.
Authenticate yourself. Sure.
Again, I did not forge any checks but I’m asking the bank to review every one because — you know — to authenticate myself. Honest, I’m buying a case of luminol and checking my kitchen floor for traces of blood just to be sure no one can falsely accuse me of murder.
Continuing, a different incident:
prompted Grammarly to develop a detection tool for students that identifies whether text was typed, pasted from a different source or written by an AI model. “It’s almost like your insurance policy,” Maxwell says.
Sure. Luminol. As insurance.
This cannot be a serious thing.
Moreover, I keep pointing out that Copyleaks and other “detectors” are heavily, heavily invested in helping students bypass detection, not helping schools and teachers find it so they can address it. That’s literally the only reason any AI detector would allow students to check their work for AI flags before they turn it in — not to encourage better scholarship, but to erase fingerprints. I have no idea why we are all pretending this is not what is happening.
More:
Nathan Mendoza, a junior studying chemical engineering at the University of California at San Diego, uses GPTZero to prescreen his work. He says the majority of the time it takes him to complete an assignment is now spent tweaking wordings so he isn’t falsely flagged—in ways he thinks make the writing sound worse. Other students have expedited that process by turning to a batch of so-called AI humanizer services that can automatically rewrite submissions to get past AI detectors.
People are using AI-powered “humanizer services” to “get past AI detectors” as a way to “expedite [the] process” of not being “falsely flagged.” I did the work, but just to be safe, as “insurance,” I’m going to check it in an AI detector, then ask a AI service to rewrite it so it sounds human and won’t get flagged.
Are we actually hearing what we’re saying? Are the people who write this actually reading it? Are the people reading it actually understanding it?
No one — and I mean no one — is using an AI “humanizing tool” on work they actually wrote because, and hold on with me — they. are. human.
To quote Zoolander, I feel like I am taking crazy pills.
Bloomberg, of course, cites the fact that “humanizer services” can defuse AI detection as a reason to say that AI detection is flawed or pointless — which is one of my favorite nonsense arguments. If you buy that, please leave your home doors unlocked tonight.
QuillBot, which Bloomberg quotes as a “detection service” is actually a paraphrasing service aimed at bypassing detection, which is why they let you check your work first — to be sure you need their services. Grammarly: same.
The story also cites that bonkers study from Stanford again, like it’s worthy (see Issue 216).
After All That
Two things are certain. One, I suck at not getting dragged into these things and being brief. Though I did skip several points I could have made. Two, I am sucker for continuing to push this rock up this hill. Sure as can be, it will roll back down again.
Brown University Goes Back to Pencil and Paper
According to coverage in the independent student paper, professors at the Ivy League’s Brown University have returned to old-school, in-person, pencil and paper exams. The reason — AI-driven cheating.
I’m not clear on whether Brown is a capital-H Honor Code school or not, the kind that does not allow professors to use anti-cheating tools or even be present during an exam. Their policy is not clear to me. Still, I like its clarity here:
A student who obtains credit for work, words, or ideas that are not the products of his or her own effort is dishonest and in violation of Brown’s Academic Code.
And:
A student’s name on any exercise (e.g., a theme, report, notebook, performance, computer program, course paper, quiz, or examination) is regarded as assurance that the exercise is the result of the student’s own thoughts and study, stated in his or her own words, and produced without assistance, except as quotation marks, references, and footnotes acknowledge the use of printed sources or other outside help.
I usually suggest that an integrity policy mention specific companies such as Chegg and ChatGPT, to minimize wiggle room. But the Brown policy seems more than clear enough.
But that’s all an aside. The news is — fed up with chatbot cheating, professors at the Ivy are pulling exams and other assessments back in the classroom. There are several quotes and portions of the article to share. It starts:
After an era of take-home exams, primarily due to COVID-19, in-person exams are returning to campus. For some professors, suspected cheating and AI use is behind the shift.
And there is this, from a Brown professor:
“I grew tired of dealing with suspected academic dishonesty (and) students collaborating or straight-up having AI generate their solutions,” wrote Applied Math Lecturer Amalia Culiuc PhD’16, an instructor for APMA 1650, in an email to The Herald. “There was always some plausible deniability: friend groups all had the exact same answer because, according to them, they had studied together.”
The professor and the context continue:
“It’s very hard to explain — hence why it’s so hard to prove — but you can really tell when a text doesn’t quite sound human-generated,” she wrote. “I think students literally copied and pasted the entire exam into ChatGPT and had it output answers.”
She added she had even seen the phrase “as an AI language model” in her students’ work, indicating they did not do any proofreading.
Culiuc added she often had to turn a blind eye to blatantly obvious cheating, due to the “lack of admissible evidence” to prove cheating had occurred.
Yes, students are lazily turning in AI-generated content. And, no, they rarely meet any kind of consequence — not just in these classes, and not just at Brown.
The reality of academic integrity practice is that very, very few cases of even “blatantly obvious cheating” proceed to formal action. Some estimates are as few as one percent.
The reasons vary.
Some professors prefer to handle matters themselves, more directly and efficiently. In these situations, a re-test or a different kind of assessment may be in order. Or a grade penalty. Or both. Professors also simply prefer not to be police, and do not consider enforcing integrity and potentially sanctioning students to be part of their job. “I want to teach the students who want to learn,” is what we often hear on this one. Then there are teachers who loathe the extra work and emotional tax that formal charges always require, or simply think that formal proceedings are too complicated to be worth it. Others rightly worry that any enforcement along integrity lines will destroy their post-course student reviews, which it will. Accordingly, they worry that bad reviews will impact their continued employment, which they may. For some teachers, it’s all of the above — the process and the cost of initiating formal complaints, even for “blatantly obvious cheating,” is just not worth it.
The point is, even when students are “caught” cheating, most often, nothing really happens. These quotes from professor Culiuc are a good reminder.
A quick note on this quote too:
“Interestingly, my student evaluations included quite a few comments about how take-home exams had made them less likely to engage with the material and how some had observed their classmates cheat but didn’t feel like they could say anything,” she wrote.
Here, neither students nor the teacher are moving ahead with formal accusations, even when cheating is fairly obvious. Juxtapose that reality with this, from Brown’s integrity policy:
It is also incumbent on those who know or suspect that someone else has violated Brown’s academic code to report their knowledge or suspicions to the appropriate University authorities.
My point is that, unless a school acts on its honor code and integrity policy, it is useless. You may as well not have one.
Back to the news, and this:
ANTH 0300: “Culture and Health,” a class that has essay-based midterms, also switched to in-person blue book exams after a few semesters of online exams. Associate Professor of Anthropology Katherine Mason cited ChatGPT as a major reason for this switch.
“If an exam is given online, the temptation to cheat using ChatGPT would be really high,” Mason wrote in an email to The Herald. “Good old paper solves this problem and that’s why I made the change I did.”
Agree. Online exams, especially unsupervised, unproctored exams, are an invitation to cheat. No argument.
I’ll close, as the article does, with this quote from professor Culiuc, who said she does not think AI is bad, but:
Still, “when you read your 5th AI generated proof in a row, warn the student not to do it again and then they apologize with an AI generated email, you start wondering what the point of your job even is,” she said. “It goes without saying that that student’s email did not, in fact, find me well.”
Medium, And Advertising Cheating
If you’re not familiar with Medium, it’s a self-publishing platform. I used to post stuff there from time to time, years ago.
But I quit and closed my account because I’d noticed that Medium’s deliberate lack of oversight and editors had become a problem — a growing weight of dangerous, inaccurate, and misleading garbage was on Medium, and I did not want to be. Today, I was reminded why, with this post, headlined:
Help with English Assignment and Pay to Do My Assignment: Your Guide to Academic Success
It is not even subtle about directly advertising cheating services.
The pitch is familiar — doing work is hard, take the easy way. It’s also a good reminder that cheating companies are aggressive, agile marketers, constantly putting their wares in front of potential customers, often uncontested.
This example is a good one as well because the “author,” named Kevin Obroy, has an author page filled with little more than articles about how to access cheating services. A new one pops up every few days. Given that most of these articles start with the exact same phrase, or with a word or two changed, it’s doubtful that Kevin exists. It is also certain that Medium does not care.
In addition to sharing this as an example of the boldness and persistence of cheating providers, it speaks to why legal action is important. In several countries and some US states, advertisements such as these are illegal, and some liability for them may confer to the publisher, who clearly does not have appropriate content policies or enforce them. Or even know what is on their site, under their masthead.
Maybe that’s a different problem. But it is a problem.