New Orleans Police Department Pauses Promotions Amid Allegations of Exam Cheating
Plus, a proctoring company fires at the wrong target. Plus, Penn seeks integrity officer. Plus, a class note.
Issue 327
Subscribe below to join 4,210 other smart people who get “The Cheat Sheet.” New Issues every Tuesday and Thursday.
If you enjoy “The Cheat Sheet,” please consider joining the 16 amazing people who are chipping in a few bucks via Patreon. Or joining the 45 outstanding citizens who are now paid subscribers. Paid subscriptions start at $8 a month or $80 a year. Thank you!
New Orleans Police Pause Promotions Amid Cheating Inquiry
Local TV news coverage in New Orleans reports that the city’s police department has paused all promotions while an inquiry into promotion exam cheating is ongoing.
The pause is not entirely about exam misconduct, there are allegations of bias and favoritism wrapped up in this too. But at least part of the investigation does involve “anonymous allegations of cheating.”
A spokesperson for the City told the TV station:
The whole process was set up in a way to virtually eliminate any potential for cheating
Before going further, that is just not possible. But, the spokesperson says the cheating allegations “center around a promotions exam provided by a contractor.” He says further that the exam provider has a reputation for doing these tests and that they won’t let a single jurisdiction impact that reputation.
I mean, sure. But that in no way means the exam was not cheated. Professional promotion exams provided by outside contractors are cheated all the time, often with great ease. And very sadly, police exams in particular are cheated — see Issue 225, Issue 145, Issue 300, or Issue 94.
I have no sense as to whether there was cheating on these exams this time. Maybe not. But I do know that, if it did happen, it would not be the first time.
The exam provider is identified as Industrial/Organizational Solutions.
Seeing this story, I am reminded of one of my favorite lines from the 80s movie The Big Easy: “New Orleans is a marvelous environment for coincidence.” I’m just saying.
A Proctoring Company Aims at the Wrong Target
A month or so ago, an executive at a proctoring company wrote an article in Ed Tech Digest on AI, integrity, and assessment.
I have met the author a few times and found him informed, measured and pleasant — all the good stuff. I respect him. But I am not a fan of some of things in his article because I think they miss the mark on a greater purpose.
He is right when he sets the challenge — that students are using generative AI tools to counterfeit their academic work and why this is a serious problem for schools. One hundred percent. He suggests to schools that:
Developing a dedicated committee [on AI use] may be beneficial as institutions create and implement new policies and guidelines for using AI tools, develop training and resources for students, faculty, and staff on academic integrity, and encourage the responsible use of AI in education.
Why not? Good idea.
But the author and article leave me behind at the section about AI detection titled:
Study Results Paint a Grim Picture
This, in particular:
While some existing detection tools show promise, they all struggle to identify AI-generated writing accurately.
And:
Looking at a growing level of research, there are strong concerns about these tools’ inabilities to detect AI.
My issue is that this is not true.
Good AI detection systems are really, really good at spotting academic work created by AI. And while there may be “strong concerns” about such things, they are in no way related to research. The collected research, absent any exception that I have seen, is quite direct — AI detection software is good at detecting text created by AI.
The Ed Tech Digest (ETD) article cites and summarizes a study that:
tested the largest commercial plagiarism and AI detection tool against ChatGPT-generated text. It was found that when text is unaltered, the detection tool effectively detects it as AI-generated. However, when Quillbot paraphrased it, the score dropped to 31% and 0% after two rephrases.
I left the link in. And, indeed, that casual study found that Turnitin’s AI detection system:
showed that the five essays copied verbatim from ChatGPT were successfully detected as AI-generated by Turnitin’s AI detector
And:
The results for the first five essays indicated that Turnitin was very accurate in detecting AI content, even across diverse essay topics. However, accuracy declined when the text was interleaved with human text (essay #6).
And that:
Importantly, no human text was incorrectly categorised as AI-generated, so there were no false positives
And:
Turnitin was 100% accurate in detecting an essay submitted that was entirely generated by Gemini.
Even with prompt engineering asking ChatGPT to write more “like a human” in an effort to fool Turnitin:
The output from ChatGPT was subsequently run through Turnitin, which detected 91% of the text as AI-generated. A total of five additional trials were conducted, with Turnitin’s AI detection scoring no less than 90% for each trial, indicating 90% accuracy in detecting AI-generated content since each essay was 100% AI-generated.
To highlight, Turnitin’s detector “successfully detected” all the essays that were 100% AI-generated by both ChatGPT and Gemini, and it was “very accurate in detecting AI content” with “no false positives.”
Yet, we’re writing about “strong concerns about these tools’ inabilities to detect AI” and that “they all struggle to identify AI-generated writing accurately.”
Just not so.
When the test used Quillbot to paraphrase text spit out by ChatGPT, Turnitin missed:
The findings indicated that Turnitin was unable to detect the essay as AI-generated after it had been paraphrased by QuillBot, resulting in an AI detection score of 0%
First, Quillbot is a cheating engine designed and sold to deceive AI detection systems, washing away evidence of cheating. Second, we know that “washing” or “spinning” text in this way does degrade the ability of detection systems. Third, this specific outcome may be an artifact of time and place when Turnitin did not initially have the capacity to detect text that had been washed in Quillbot. For whatever reason, this was not in the Turnitin arsenal when their detection software was released. But it is now. At the time, it was a zero. Not sure it would be today.
But moreover, as I write all the time, the ability for detection to be bypassed does not mean it does not work — these are different measurements.
This is not in any way to denigrate these research tests. They are quite innovative, and I will add them to the list to address separately. But I do not think it credible to cite these tests as evidence that AI detectors do not work. I mean, the paper itself literally says, “Turnitin was very accurate in detecting AI content.”
Another “experiment” cited in the Ed Tech Digest was supposed to show that Turnitin cannot detect text that has been paraphrased by AI, but the article is a dead link — and a problematic one.
The link goes to something at Medium, which is itself a problem since Medium has no editorial oversight and people who post there are paid for clicks. Linking to anything there is highly questionable. But whatever was there was removed, and the page now says:
This account is under investigation or was found in violation of the Medium Rules.
Yikes.
Not sure what you have to do in order to be under investigation at Medium, but it’s not good. That’s not necessarily the fault of our original author in Ed Tech Digest. But I do we think we can safely conclude that whatever was there that was supposed to make the case against AI detection, it was not credible.
The piece also cites the study we covered in Issue 250, which many people have incorrectly used to show that AI detection does not work. When in fact, using my words from Issue 250, that paper found:
Four of the 14 systems (CheckforAi, Winston AI, GPT-2 Output, and Turnitin) were 94% accurate with the AI work, missing just one of 18 test samples each. Another two detectors (Compilatio and CrossPlag) were 89% accurate with AI text.
Three of the four that were 94% accurate with AI were also 100% accurate with human-written work. So was one of the systems that was 89% accurate with the AI.
In other words, three of the 14 tested detectors were more than 96% accurate overall — getting the human work perfect and missing just one of 18 AI works. Another was 92% accurate overall - getting all the human work right but missing two of 18 AI submissions.
I’m pretty darn sure our Ed Tech Digest author and proctoring provider know this.
Perhaps worse, the article cites a fourth study which:
fed 50 ChatGPT-generated essays into two text-matching software systems from the largest and most well-known plagiarism tool. The results of the submitted essays “demonstrated a remarkable level of originality stirring up alarms of the reliability of plagiarism check software used by academia.”
AI chatbots are improving at writing, and more effective prompts help them generate more human-like content. In the examples above, AI detection tools from the biggest companies to the free options were tested against various content types, including long-form essays and short-form assignments across different subjects and domains. No matter the size or content type, they all struggled to detect AI.
Based on that text alone from the ETD article, did you spot the error? Pretty impressive if you did.
The problem is that this study checked ChatGPT text against a plagiarism checker, not against an AI detector. The study was published in January 2023, before Turnitin made their AI detector available. In other words, the GPT text checked in that paper was not even checked for AI because AI detection did not really exist at that point. In fact, the paper itself says:
Turnitin has already raised concerns and are working on updating their plagiarism engine to detect cheating using chatbots such as ChatGPT
Exactly. This paper took place in the roughly six month window between the debut of ChatGPT and the launch of competent tools to detect it.
To summarize, of the four citations used in the ETD piece, one showed very strong detection of AI text, one was removed for some reason, the third showed very strong detection of AI text, and number four did not even look for AI text. As such, I don’t know what to say about using those citations to tell the public:
In the examples above, AI detection tools from the biggest companies to the free options were tested against various content types, including long-form essays and short-form assignments across different subjects and domains. No matter the size or content type, they all struggled to detect AI.
I guess all I can say is that it is not true.
Conveniently, the ETD piece continues:
Given the ineffectiveness of AI detection tools, academic institutions must consider alternative methods to curb AI usage and protect integrity.
And:
Institutions can also proctor written assignments like an online exam. This helps to block AI usage and removes access or help from phones. Proctoring can be very flexible, allowing access to specific approved sites, such as case studies, research articles, etc., while blocking everything else.
The text above linked to the company for which the author works.
And look, I get the business case for advocating the service you sell, even if it’s at the marginal expense of a competing solution. What I do not get is spreading misinformation that continues to undermine the menu of anti-cheating interventions.
Yes, proctoring works. So does AI detection. I am deeply agnostic as to which solution(s) a teacher or institution uses.
But let’s please, please not give people more reasons to do nothing, especially when those reasons are incorrect. Doing so undercuts what I see as a broader and more important mission than selling product — helping us collectively address the surge in academic misconduct.
Penn Seeks Integrity Officer
That’s not what the school calls it. They call the position “Executive Director, Center for Community Standards and Accountability.” But it’s the school’s integrity officer, and the job is open.
The post at the University of Pennsylvania says:
The Executive Director (ED) of the Center for Community Standards and Accountability (CSA) oversees Penn's student disciplinary system and Restorative Practices @ Penn program, as well as provides leadership and direction for efforts to educate students about academic integrity and conduct issues. This position supervises the staff of the Case Management Team and the Restorative Practices Team (10 positions), coordinates and offers workshops and other educational programs to faculty, students and campus leaders, and advises senior leadership about major disciplinary policies and issues.
And so on.
Ivy League. Dream Job. Someone has to go make sure those degrees are actually worth what so many people believe they are.
Class Note:
Over at Forbes, I wrote about the study showing that more than 90% of AI writing goes undetected by teachers (see Issue 325). Just passing it along.