336: Student Says He Was Expelled Over False Accusation of AI Use
Plus, research team watermarks answers to nab 28 students cheating on online, take-home exam.
Issue 336
Subscribe below to join 4,278 (+2) other smart people who get “The Cheat Sheet.” New Issues every Tuesday and Thursday.
If you enjoy “The Cheat Sheet,” please consider joining the 16 amazing people who are chipping in a few bucks via Patreon. Or joining the 45 outstanding citizens who are now paid subscribers. Paid subscriptions start at $8 a month or $80 a year, and corporate or institutional subscriptions are $240 for a year. Thank you!
Student Booted from PhD Program Over AI Use
This one is going to take a hot minute to dissect. Minnesota Public Radio (MPR) has the story.
The plot contours are easy. A PhD student at the University of Minnesota was accused of using AI on a required pre-dissertation exam and removed from the program. He denies that allegation and has sued the school — and one of his professors — for due process violations and defamation respectively.
Starting the case.
The coverage reports that:
all four faculty graders of his exam expressed “significant concerns” that it was not written in his voice. They noted answers that seemed irrelevant or involved subjects not covered in coursework. Two instructors then generated their own responses in ChatGPT to compare against his and submitted those as evidence against Yang. At the resulting disciplinary hearing, Yang says those professors also shared results from AI detection software.
Personally, when I see that four members of the faculty unanimously agreed on the authenticity of his work, I am out. I trust teachers.
I know what a serious thing it is to accuse someone of cheating; I know teachers do not take such things lightly. When four go on the record to say so, I’m convinced. Barring some personal grievance or prejudice, which could happen, hard for me to believe that all four subject-matter experts were just wrong here. Also, if there was bias or petty politics at play, it probably would have shown up before the student’s third year, not just before starting his dissertation.
Moreover, at least as far as the coverage is concerned, the student does not allege bias or program politics. His complaint is based on due process and inaccuracy of the underlying accusation.
Let me also say quickly that asking ChatGPT for answers you plan to compare to suspicious work may be interesting, but it’s far from convincing — in my opinion. ChatGPT makes stuff up. I’m not saying that answer comparison is a waste, I just would not build a case on it. Here, the university didn’t. It may have added to the case, but it was not the case. Adding also that the similarities between the faculty-created answers and the student’s — both are included in the article — are more compelling than I expected.
Then you add detection software, which the article later shares showed high likelihood of AI text, and the case is pretty tight. Four professors, similar answers, AI detection flags — feels like a heavy case.
Denied it.
The article continues that Yang, the student:
denies using AI for this exam and says the professors have a flawed approach to determining whether AI was used. He said methods used to detect AI are known to be unreliable and biased, particularly against people whose first language isn’t English. Yang grew up speaking Southern Min, a Chinese dialect.
Although it’s not specified, it is likely that Yang is referring to the research from Stanford that has been — or at least ought to be — entirely discredited (see Issue 216 and Issue 251). For the love of research integrity, the paper has invented citations — sources that go to papers or news coverage that are not at all related to what the paper says they are.
Does anyone actually read those things?
Back to Minnesota, Yang says that as a result of the findings against him and being removed from the program, he lost his American study visa. Yang called it “a death penalty.”
With friends like these.
Also interesting is that, according to the coverage:
His academic advisor Bryan Dowd spoke in Yang’s defense at the November hearing, telling panelists that expulsion, effectively a deportation, was “an odd punishment for something that is as difficult to establish as a correspondence between ChatGPT and a student’s answer.”
That would be a fair point except that the next paragraph is:
Dowd is a professor in health policy and management with over 40 years of teaching at the U of M. He told MPR News he lets students in his courses use generative AI because, in his opinion, it’s impossible to prevent or detect AI use. Dowd himself has never used ChatGPT, but he relies on Microsoft Word’s auto-correction and search engines like Google Scholar and finds those comparable.
That’s ridiculous. I’m sorry, it is. The dude who lets students use AI because he thinks AI is “impossible to prevent or detect,” the guy who has never used ChatGPT himself, and thinks that Google Scholar and auto-complete are “comparable” to AI — that’s the person speaking up for the guy who says he did not use AI. Wow.
That guy says:
“I think he’s quite an excellent student. He’s certainly, I think, one of the best-read students I’ve ever encountered”
Time out. Is it not at least possible that professor Dowd thinks student Yang is an excellent student because Yang was using AI all along, and our professor doesn’t care to ascertain the difference? Also, mind you, as far as we can learn from this news story, Dowd does not even say Yang is innocent. He says the punishment is “odd,” that the case is hard to establish, and that Yang was a good student who did not need to use AI. Although, again, I’m not sure how good professor Dowd would know.
As further evidence of Yang’s scholastic ability, Dowd also points out that Yang has a paper under consideration at a top academic journal.
You know what I am going to say.
To me, that entire Dowd diversion is mostly funny.
More evidence.
Back on track, we get even more detail, such as that the exam in question was:
an eight-hour preliminary exam that Yang took online. Instructions he shared show the exam was open-book, meaning test takers could use notes, papers and textbooks, but AI was explicitly prohibited.
Exam graders argued the AI use was obvious enough. Yang disagrees.
Weeks after the exam, associate professor Ezra Golberstein submitted a complaint to the U of M saying the four faculty reviewers agreed that Yang’s exam was not in his voice and recommending he be dismissed from the program. Yang had been in at least one class with all of them, so they compared his responses against two other writing samples.
So, the exam expressly banned AI. And we learn that, as part of the determination of the professors, they compared his exam answers with past writing.
I say all the time, there is no substitute for knowing your students. If the initial four faculty who flagged Yang’s work had him in classes and compared suspicious work to past work, what more can we want? It does not get much better than that.
Then there’s even more evidence:
Yang also objects to professors using AI detection software to make their case at the November hearing.
He shared the U of M’s presentation showing findings from running his writing through GPTZero, which purports to determine the percentage of writing done by AI. The software was highly confident a human wrote Yang’s writing sample from two years ago. It was uncertain about his exam responses from August, assigning 89 percent probability of AI having generated his answer to one question and 19 percent probability for another.
“Imagine the AI detector can claim that their accuracy rate is 99%. What does it mean?” asked Yang, who argued that the error rate could unfairly tarnish a student who didn’t use AI to do the work.
First, GPTZero is junk. It’s reliably among the worst available detection systems. Even so, 89% is a high number. And most importantly, the case against Yang is not built on AI detection software alone, as no case should ever be. It’s confirmation, not conviction. Also, Yang, who the paper says already has one PhD, knows exactly what an accuracy rate of 99% means. Be serious.
A pattern.
Then we get this, buried in the news coverage:
Yang suggests the U of M may have had an unjust motive to kick him out. When prompted, he shared documentation of at least three other instances of accusations raised by others against him that did not result in disciplinary action but that he thinks may have factored in his expulsion.
He does not include this concern in his lawsuits. These allegations are also not explicitly listed as factors in the complaint against him, nor letters explaining the decision to expel Yang or rejecting his appeal. But one incident was mentioned at his hearing: in October 2023, Yang had been suspected of using AI on a homework assignment for a graduate-level course.
In a written statement shared with panelists, associate professor Susan Mason said Yang had turned in an assignment where he wrote “re write it, make it more casual, like a foreign student write but no ai.” She recorded the Zoom meeting where she said Yang denied using AI and told her he uses ChatGPT to check his English.
She asked if he had a problem with people believing his writing was too formal and said he responded that he meant his answer was too long and he wanted ChatGPT to shorten it. “I did not find this explanation convincing,” she wrote.
I’m sorry — what now?
Yang says he was accused of using AI in academic work in “at least three other instances.” For which he was, of course, not disciplined. In one of those cases, Yang literally turned in a paper with this:
“re write it, make it more casual, like a foreign student write but no ai.”
He said he used ChatGPT to check his English and asked ChatGPT to shorten his writing. But he did not use AI. How does that work?
For that one where he left in the prompts to ChatGPT:
the Office of Community Standards sent Yang a letter warning that the case was dropped but it may be taken into consideration on any future violations.
Yang was warned, in writing.
If you’re still here, we have four professors who agree that Yang’s exam likely used AI, in violation of exam rules. All four had Yang in classes previously and compared his exam work to past hand-written work. His exam answers had similarities with ChatGPT output. An AI detector said, in at least one place, his exam was 89% likely to be generated with AI. Yang was accused of using AI in academic work at least three other times, by a fifth professor, including one case in which it appears he may have left in his instructions to the AI bot.
On the other hand, he did say he did not do it.
Findings, review.
Further:
But the range of evidence was sufficient for the U of M. In the final ruling, the panel — comprised of several professors and graduate students from other departments — said they trusted the professors’ ability to identify AI-generated papers.
Several professors and students agreed with the accusations. Yang appealed and the school upheld the decision. Yang was gone. The appeal officer wrote:
“PhD research is, by definition, exploring new ideas and often involves development of new methods. There are many opportunities for an individual to falsify data and/or analysis of data. Consequently, the academy has no tolerance for academic dishonesty in PhD programs or among faculty. A finding of dishonesty not only casts doubt on the veracity of everything that the individual has done or will do in the future, it also causes the broader community to distrust the discipline as a whole.”
Slow clap.
And slow clap for the University of Minnesota. The process is hard. Doing the review, examining the evidence, making an accusation — they are all hard. Sticking by it is hard too.
Seriously, integrity is not a statement. It is action. Integrity is making the hard choice.
MPR, spare me.
Minnesota Public Radio is a credible news organization. Which makes it difficult to understand why they chose — as so many news outlets do — to not interview one single expert on academic integrity for a story about academic integrity. It’s downright baffling.
Worse, MPR, for no specific reason whatsoever, decides to take prolonged shots at AI detection systems such as:
Computer science researchers say detection software can have significant margins of error in finding instances of AI-generated text. OpenAI, the company behind ChatGPT, shut down its own detection tool last year citing a “low rate of accuracy.” Reports suggest AI detectors have misclassified work by non-native English writers, neurodivergent students and people who use tools like Grammarly or Microsoft Editor to improve their writing.
“As an educator, one has to also think about the anxiety that students might develop,” said Manjeet Rege, a University of St. Thomas professor who has studied machine learning for more than two decades.
We covered the OpenAI deception — and it was deception — in Issue 241, and in other issues. We covered the non-native English thing. And the neurodivergent thing. And the Grammarly thing. All of which MPR wraps up in the passive and deflecting “reports suggest.” No analysis. No skepticism.
That’s just bad journalism.
And, of course — anxiety. Rege, who please note has studied machine learning and not academic integrity, is predictable, but not credible here. He says, for example:
it’s important to find the balance between academic integrity and embracing AI innovation. But rather than relying on AI detection software, he advocates for evaluating students by designing assignments hard for AI to complete — like personal reflections, project-based learnings, oral presentations — or integrating AI into the instructions.
Absolute joke.
I am not sorry — if you use the word “balance” in conjunction with the word “integrity,” you should not be teaching. Especially if what you’re weighing against lying and fraud is the value of embracing innovation. And if you needed further evidence for his absurdity, we get the “personal reflections and project-based learnings” buffoonery (see Issue 323). But, again, the error here is MPR quoting a professor of machine learning about course design and integrity.
MPR also quotes a student who says:
she and many other students live in fear of AI detection software.
“AI and its lack of dependability for detection of itself could be the difference between a degree and going home,” she said.
Nope. Please, please tell me I don’t need to go through all the reasons that’s absurd. Find me one single of case in which an AI detector alone sent a student home. One.
Two final bits.
The MPR story shares:
In the 2023-24 school year, the University of Minnesota found 188 students responsible of scholastic dishonesty because of AI use, reflecting about half of all confirmed cases of dishonesty on the Twin Cities campus.
Just noteworthy. Also, it is interesting that 188 were “responsible.” Considering how rare it is to be caught, and for formal processes to be initiated and upheld, 188 feels like a real number. Again, good for U of M.
The MPR article wraps up that Yang:
found his life in disarray. He said he would lose access to datasets essential for his dissertation and other projects he was working on with his U of M account, and was forced to leave research responsibilities to others at short notice. He fears how this will impact his academic career
Stating the obvious, like the University of Minnesota, I could not bring myself to trust Yang’s data. And I do actually hope that being kicked out of a university for cheating would impact his academic career.
And finally:
“Probably I should think to do something, selling potatoes on the streets or something else,” he said.
Dude has a PhD in economics from Utah State University. Selling potatoes on the streets. Come on.
Research: Watermarking Answers in Online, Take-Home Exams Catches Cheating
Sent in by a friend of The Cheat Sheet, research from this past summer shows that watermarking answers to online, take-home tests caught student cheating.
The paper is by Christopher Cui, Jui-Tse Hung, Pranav Sharma, Saurabh Chatterjee, and Thad Starner, all from Georgia Institute of Technology (Georgia Tech).
Before we get into it, there’s something great going on at Georgia Tech. Few schools are as reliably present in the conversations and realities of academic integrity as Georgia Tech. The school is, and has been, a real leader on a topic that’s essential to all academia. I do not get why, but it’s true. And I am here for it.
Anyway, the paper deployed a unique approach to catching misconduct during online, take-home exams. It is important because not only did the intervention uncover cheating that probably would have gone undetected otherwise, it also uncovered that students intentionally attempted to hide their misconduct from traditional detection methods.
The paper starts with this sentence:
Cheating detection in large classes with online, take-home exams is an extremely difficult problem
To which:
I tease because I love — and I do love this paper. I’ve said for a long time that online, take-home exams are cheated so often and so comprehensively as to be effectively useless.
One of the things I love is this, from the authors:
Cheating, which can manifest as plagiarism, collusion, or unauthorized collaboration, not only violates academic integrity but also devalues the efforts of honest peers, undermines the credibility of educational institutions, and hinders learning. Detecting cheating in take-home exams allows instructors to take proactive measures to ensure a fair assessment environment and fosters a culture of honesty and ethical behavior among students.
Love.
Please note that the paper says that detecting cheating fosters a culture of honesty and ethical behavior. I agree entirely and emphatically.
They also say:
With reputations at stake, it is important for institutions to accredit the quality of the educational program, and it is crucial for institutions to build trustworthy assignments
Preach.
A bit more preamble before getting to the watermark idea and its findings, the paper also says matter-of-factly:
Online and take-home exams are far more vulnerable to students cheating.
Say it again, louder.
One of the issues, the team correctly points out, is that other efforts to detect cheating in online exams are vulnerable to manipulation. Statistical analysis to detect probability of correct and incorrect responses in collusion cheating is effective, but easy to circumvent with a little effort. From the paper:
A problem with most methods of detecting cheating in online, take-home exams is that they are vulnerable to obfuscation by students actively trying to hide their cheating.
With all that, to watermarking. The team constructed a complex question on the online take-home:
The answer to the question requires a complex string, which students could easily mistype. To aid students in creating a string that was parsable by our automatic grading scripts, we directed students towards an external website dedicated to helping students explore the problem, build the answer, and format this answer to ensure ease of grading. This website would generate an answer string from the selected actions that we asked students to do, and then copy-paste into the question answer form. Unknown to the students, the website would also embed an invisible Unicode watermark into the answer string. The watermarking process ensured that each visit to the website resulted in the creation of a new, unique watermark. Consequently, it was virtually impossible for two students to receive identical watermarks without some form of unauthorized collaboration
Tricky.
And effective:
In Spring 2023, 28 students were found to have watermark contamination and subsequently pursued for plagiarism. We perform a statistical analysis on the answers these students submitted to determine how two students who are cheating together change their answers to try to avoid detection.
In these classes, 28 is a small percentage. But a significant finding is that, with the exception of just one pair of colluding students, these cases of misconduct would have gone undetected. Equally important is that the watermarking technique provided irrefutable evidence of misconduct — not 99% probable, not 85% likely. A lock.
It’s important to note as well that 28 was not the number who cheated, it was the number caught using this particular method.
Finally, as mentioned, the test and results are important because they show conclusively that students engaging in misconduct will also reliably act to avoid being caught. Maybe this is not surprising. But knowing it, being able to prove it, undercuts the idea of confusion or anxiety or pressure as motivations or excuses cheating. When you are wiping your fingerprints from the gun, you know what you’re doing is wrong, however you got there. It happens far more often than professors like to admit.
In this test:
We see that of the students caught cheating through our watermarks, only one pair submitted the exact same answers.
One pair of 14 pairs — the others changed or altered answers, so their exams did not match precisely. Students are not stupid.
The authors say that their results are:
demonstrating the need for more advanced methods of catching cheating in online, take-home exams.
Hard to argue. Although this intervention, while effective, is impractical for most disciplines and most teachers:
a watermarked question requires both the building of an external website specifically for an intended question, as well as a question complex enough that students would be motivated to utilize the accompanying website for the question. Both of these require a significant investment of time from an instructor and may not be feasible in an understaffed classroom.
As true as that is, I don’t think that was the point. The point I took away was that traditional methods of catching collusion in online take-home exams are missing at least some of it. In part, because students know how to fool those existing systems.
Finally, I’m sharing this because it’s very funny. From one of the cases of cheating that the teaching team found and submitted:
Without the watermarked question, it would have been virtually impossible to convict or even identify these students were collaborating. Notably, despite the watermark, this case was mistakenly dropped due to anti-plagiarism staff misunderstanding the nature of the watermarked answers. This result shows the need for presenting evidence in a manner that is easily understood and intuitive to an outside observer.
Come on — tell me you did not laugh.
It’s great work. And, in my view, an important reminder that the cheating we see is only the cheating we’re in position to see.
Great issue! Thank you Derek. In case of Yang I‘d propose to let him write a few assignments in a text editor that‘s integrated with Cursive. Cheers, Tommi