(361) Research: ChatGPT Use and Plagiarism Show Significant Correlation
Plus, University of Waterloo chooses ignorance.
Issue 361
Subscribe below to join 4,651 other smart people who get “The Cheat Sheet.” New Issues every Tuesday and Thursday.
The Cheat Sheet is free. Although, patronage through paid subscriptions is what makes this newsletter possible. Individual subscriptions start at $8 a month ($80 annual), and institutional or corporate subscriptions are $250 a year. You can also support The Cheat Sheet by giving through Patreon.
Research: ChatGPT and Plagiarism Show “Significant and Positive Correlation,” Weak Causality
Strong research out of Spain finds that the link between using ChatGPT and academic fraud, i.e., plagiarism, is significant. Though the research finds little to no connection around causality.
The paper is by Héctor Galindo-Domínguez, Lucía Campo, Nahia Delgado, and Martín Sainz de la Maza. All four are from University of the Basque Country, though they appear to be from different campuses or centers. The research is a survey of 507 university students, though heavily concentrated in education majors. It aims to establish connections between use of ChatGPT and academic misconduct by plagiarism, as well as explore factors that may contribute to AI-related misconduct.
The eight theoretical frameworks for academic misconduct reviewed in the paper alone are worth your time to review.
The team found that students who are inclined to plagiarize their work frequently use ChatGPT — and vice versa. No surprise. The team also found that using ChatGPT did not cause students to plagiarize, not in any significant way.
From the paper:
the frequency of using ChatGPT for academic purposes does not significantly predict plagiarism levels.
For the record, the team does not address causality as non-existent, just as not significant and “weak.” Further:
It could be observed that although the use of ChatGPT for academic purposes and plagiarism levels correlated, previous analyses indicated that this independent variable was a poor predictor of plagiarism levels, suggesting weak causality.
Assuming the survey responses are accurate, which should always be questioned in surveys about academic cheating — especially when interviewing current students in a survey from their own school — these results are both easy to understand and a bit curious.
ChatGPT is a tool. It’s a tool that’s remarkably easy to use for academic misconduct. But it’s just a tool and students disposed to use shortcuts will use the available tools.
Based on the massive move of shortcut seekers away from essay mills and services such as Chegg, to ChatGPT, we can infer that how students were cheating was changing. It was never clear that the existence of ChatGPT drove more people to cheat. Although I personally think it has, simply by making cheating more casual, easier, more ubiquitous and accepted — causing people to make dumb arguments about “what actually is cheating now?” and other crazy stuff.
For the record, this paper does not examine whether plagiarism has increased, although the authors imply it may have. They simply say, essentially, that because a student uses ChatGPT does not mean they are cheating. To which I say, of course not. From the paper:
These results confirm that a higher frequency of using ChatGPT for academic purposes does not necessarily lead to a higher likelihood of plagiarism, as ChatGPT can be used academically for responsible purposes.
Makes sense and I agree.
At the same time, I want to highlight that there is significant overlap between using ChatGPT and cheating. Without getting into whether one is driving the other, the Venn diagram is pretty tight. From the paper:
there is a statistically significant and positive correlation between the frequency of using ChatGPT for academic purposes and plagiarism
Using ChatGPT does mean someone is cheating. But cheaters do use ChatGPT — like often. This strong correlation explains almost everything ChatGPT/Open AI does as a company.
But it’s the sentence after the one quoted above that should merit the most attention, in my view. It says that the “weak causality” between ChatGPT use and plagiarism:
might suggest that factors other than the use of ChatGPT for academic purposes could be the actual drivers of increased plagiarism levels.
And here’s a really fun, really important kind of quiz, based on this research. I hope you’ll play along.
In their survey, the team asked students about seven factors that literature and cognitive frameworks indicate may exacerbate academic misconduct. They were:
academic performance
workload
time management
lack of motivation
hyper-competitiveness
culture of cheating, and
ignorance of the consequences of plagiarism
Of those seven — the team identified two with significant correlation to GPT-related cheating. Care to guess which two?
I knew the answers before I laid this out as a quiz, but I think I would have picked correctly. And I am going to spoil it here if you have not locked in your answers.
Even though it seems we hear educators and others frame cheating around workload pressure, and time management, and ignorance to some degree, the team found that lack of motivation and culture of cheating were the two conditions related to using ChatGPT for plagiarism. Students did not care, and others were doing it.
From the paper:
The factors that truly influence the increase or decrease in plagiarism levels are other personal variables associated with students, such as cheating culture or amotivation.
The team defines these as follows.
Cheating culture:
to what extent university students are immersed in an environment where dishonest behaviors such as plagiarism have been carried out by people in their surroundings
Amotivation:
the complete lack of motivation toward the academic career
I may quibble and say the lack of motivation is probably centered closer to any particular academic task, or maybe course, than it is an entire academic career. But the point is taken. Even if you’re in school to get the degree, to check the box, and nothing more — this applies. Either way.
On this finding, students have been telling us this for a long, long time. My favorite recent example is in Issue 322 where a student, as relayed by their professor, who is being quoted, says:
One student was not embarrassed to tell me at a dinner in front of 10 other students, ‘Of course I’m going to cheat’ in some gen-ed, or what we call distribution requirements, class. ‘You know, some pain-in-the-butt thing that I don’t see as particularly relevant.’
The other — the culture of cheating — has been tagged as a significant factor for a long time. My only note is that I think it’s not just, as the authors of this study put it, “an environment where dishonest behaviors such as plagiarism have been carried out.” It’s that these dishonest behaviors are working — those who cheat are getting good marks and not getting caught. In my view, it’s not so much the doing, it’s the prospering as a result that drives a cheating culture.
I also, for whatever it’s worth, think these are related. It’s hard to really care or be motivated to do work that no one else seems to care about — especially teachers or their institutions. If teachers don’t care enough to invest in stopping cheating, students aren’t going to care enough to do the work.
But whatever. Workload, pressure, and time management. Here again is this quote from the most recent Issue — Issue 360:
Jason McCormick, interim director of The Writing Center and visiting faculty member in the English department [at Southern Mississippi University], highlighted the pressures that may lead students to cheat.
“I think that students often cheat because there is a huge amount of pressure on students,” McCormick said. “I think that most students don’t go into classes planning on cheating. I think that the majority of students that I’ve had who do some sort of academic dishonesty do so because they find themselves in a position of being overworked, overtaxed, and not knowing how to deal with that pressure and feeling like they have no other options.”
Two final points from the paper, then I’ll stop.
One:
The rest of student-centered variables (academic achievement, Academic load, Time Management, Hypercompetitiveness, and unawareness of plagiarism consequences) were excluded from the multilevel regression as they were not statistically significant predictors of [plagiarism with AI].
Two:
The combination of the frequency of using ChatGPT for academic purposes, cheating culture, and amotivation explains 28.0% (R2 = .280) of the variance in plagiarism levels, with cheating culture being the main predictor.
In other words, although using ChatGPT does not in and of itself, strongly predict misconduct, using GPT, seeing other people cheat, and not caring about the work is a potent potable for pinpointing cheating, with cheating culture being the most potent part.
I’ll wrap this up by sharing another quote from the literature review of this Spanish paper. Regarding one citation, the authors share:
recent studies reveal that half of university students express favorable attitudes toward plagiarism when using artificial intelligence-based systems
And:
On this matter, it is thought that this favorable attitude may stem from the fact that, although university students are to some extent aware of the risks associated with plagiarism when using such tools, a significant percentage believes that their use is acceptable for learning purposes.
I have not looked it up, but here is the citation for that:
Khalaf, M. A. (2024). Does attitude towards plagiarism predict aigiarism using ChatGPT? AI and Ethics, 1–12. Crossref.
If that’s true, that half of university students have a favorable attitude toward plagiarism when using AI, I just — I’ve got nothing.
University of Waterloo Closes its Eyes, Holds its Breath
Speaking of cheating culture being a leading driver of using AI to commit plagiarism, the University of Waterloo (CAN), is just fine with that.
In a stunning announcement, the school is turning off its AI detection.
At the highest level, the school has chosen ignorance over information. They have pro-actively decided they no longer want to be aware of what their students are doing, the work they are submitting to meet degree requirements. I can’t say earn the degree, because at Waterloo, no one will have any idea whether that’s true.
The school says:
the decision was made to discontinue the AI detection tool in Turnitin. This function will no longer be available to University of Waterloo users as of September 2025.
And:
Given the expense of the tool in U.S. dollars, unreliability, and bias, it was determined the costs associated with Turnitin’s AI detection feature outweigh the benefits.
They put that in italics. Not sure why.
To start, I want to give the University credit because, by citing expense and costs, I think they unintentionally said why they’re really turning off Turnitin. They don’t want to pay for it. The school wants integrity, I am sure. Just not enough to pay for it when closing your eyes and covering your ears is free.
Of course, the school can’t really say they’re turning off assessment security to save money. So they say this instead:
Research has shown that AI detection tools are unreliable (Perkins et al., 2024; Weber-Wulff et al., 2023; Sadasivan et al., 2023). AI detection tools have also been found to be biased toward students whose first language is not English (Leong & Zhang, 2025; Rafiq, Qurat-ul-Ain & Afzal, 2025).
I left the links in, but this is beyond embarrassing.
Language Bias
I’ll start with the language bias part and with the phrase “AI detection tools have also been shown.” Really? All of them? Says who?
But they cited research. So, I looked at it.
The Leong & Zang study is one of the oddest things I’ve ever read. I’m not sure they tested the right tools, to be honest. They say, for example, that Turnitin’s accuracy at detecting AI was 40%. That cannot be right, and I’d really like to know how they got that, but I can’t. That’s because they provide no information at all on what kinds of test papers they used. Here is all they say about what they tested:
Directly Copied Text: Unmodified segments of online articles
Paraphrased Text: Manually rephrased content.
AI-Generated Text: Content generated by models like ChatGPT, designed to evade traditional detection
How long were these sections? How do we know if the “online articles” were written by AI? Where are they from? How much paraphrasing? What was the AI prompt? What do you mean by “designed to evade traditional detection?” And generated by models like ChatGPT? Or actually with ChatGPT? The paper has none of this information. Zip.
But Mr. Cheat Sheet, Waterloo cited this paper as related to AI detection being “biased toward students whose first language is not English.” Fair. So, what did this paper find about language bias? Wait for it. Wait.
Nothing.
The paper is not about language bias at all. They never tested it. At least not in the paper cited. Here is everything this paper says about non-native English speakers:
AI algorithms may inadvertently exhibit bias in detection, especially if they rely on datasets that lack diversity in language patterns and academic style (Leong, 2025b). This can result in disproportionately high false positives for students who write in non-standard academic English or come from diverse linguistic backgrounds. Addressing bias is crucial to ensuring that AI detection tools do not inadvertently disadvantage certain groups of students.
That’s it. No test, no data, no foundation at all. Just like, you know, they may do this. I left in the citation reference, but there’s no link. And that’s a shame because this is the citation:
Leong, WY, 2025b, ‘Beyond Exams: Alternative Evaluation Techniques to Measure Engineering Competency’, 2025 IEEE 8th Eurasian Conference on Educational Innovation (IEEE ECEI 2025), 6-8 Feb 2025, Bali, Indonesia.
For the life of me, I cannot find this paper. The Google box is stumped. If anyone has it, I’d love to chase this another step because it’s kind of hard for me to believe that good research about language bias in AI detection is in a paper on alternative evaluation techniques for engineering. Maybe it is. But again, we can’t see it.
What’s really troubling is that the University of Waterloo used this paper to substantiate language bias in AI detection based, I am assuming, on that single passage.
Honestly, that’s fun. Let me try. Here’s a passage from the same paper:
The integration of AI into plagiarism detection has transformed the ways academic institutions maintain integrity, offering more precise and efficient means of identifying academic dishonesty.
Hmm — AI detection offers institutions a more precise and efficient means of identifying academic dishonesty. Just not at the University of Waterloo. Because, even though it’s in the paper they cited, they don’t want that.
But what about that second citation on language bias, the one from Rafiq, et al? I’m glad you asked. That paper is a collection of:
semi-structured interviews and focus group discussions with educators, academic integrity officers, and postgraduate students.
The “data” is the boiled down views of 35 people in Pakistan who have:
experience with AI detection tools such as Turnitin AI Detection, GPTZero, and ZeroGPT.
Feels air-tight.
At least this paper mentions bias against non-native English speakers, though most of that is in the literature review. The paper itself has zero findings on whether AI detection systems do in fact have a negative bias against those who may write in English as a non-native language. The 35 people they interviewed said they think there is such bias. But no evidence is presented.
And when the paper does mention language bias in AI detection it’s like this:
educators have expressed concerns about the potential biases in AI detection algorithms, as certain linguistic patterns or non-native English writing styles may be misclassified as AI-generated (Deans et al., 2024).
Concerns that potential biases may result in misclassification. Strong stuff.
There is also this:
Recent studies suggest that AI detection models are more likely to flag content written by non-native English speakers, even when their work is original (Zapata-Rivera et al., 2024). This linguistic bias places international students at a higher risk of false accusations, reinforcing concerns about fairness and equity in AI-assisted assessments (Aleynikova & Yarotskaya, 2024).
Maybe those studies have evidence of this bias. I didn’t go look. I just don’t have time to tumble further down this rabbit hole. I have an actual job. The point is that this study — the one cited by the University of Waterloo — does not. Neither cited work does. And definitely not enough to support the claim that:
AI detection tools have also been found to be biased toward students whose first language is not English
Seriously, I remind you that this is a public statement from a real university, absolutely failing in basic research. Two citations, neither one offering any evidence at all on the point the school claims.
OK, I am a nerd’s nerd. I found that Zapata-Rivera paper. Found — there was a link. I clicked it. The initial paper cited by the University says that this Zapata-Rivera paper has the studies suggesting that the work of non-native English speakers is more likely to be flagged by AI detection. But this Zapata-Rivera paper is an:
Editorial: Generative AI in education
It links to 12 other papers. Only one of those has “bias” in the summary and it does not address bias related to language, from what I can tell. So, I’m at least three citations deep and still nowhere on finding evidence to support the Waterloo claim.
Unreliable
Waterloo says AI detection is unreliable. Research has shown, they say.
The good news is that I know these three cited papers; we’ve covered them in The Cheat Sheet already.
But before we do a little review, let’s frame the question. It is not whether, as the University says, “AI detection tools are unreliable.” Some are. As I have written many times, some are complete junk. But the University of Waterloo did not turn off detection tools, they said they’re turning off Turnitin. So, the proper question really should be, is Turnitin reliable? I mean, what difference does it make if GTPZero or Copyleaks is unreliable if the school doesn’t use it in the first place?
Perkins
Anyway, the first cited research offered up is the Perkins study from 2024. I named it the worst piece of academic integrity research for the year and covered it in some detail in Issue 288.
One major issue with the Perkins paper is that its test subject papers are not academic work, such that a University — say, the University of Waterloo, for example — would receive. They tested a cover letter to apply for an internship, a middle-school level writing assignment, a professional blog post, and a magazine article.
But then, Perkins and his colleagues altered these papers specifically to avoid AI detection. They had AI add spelling errors and odd sentence structures. They asked it to add language a non-native speaker may use, and so on. In one example, tested papers had “more than 20” intentional spelling errors in a 300-word paper.
The altered, tested papers were so bad that even Perkins conceded:
some of the samples generated after applying adversarial techniques for testing may not accurately represent the quality of work that students would submit in a real-world setting. Although these samples evaded detection by software tools, they are likely to evoke suspicion from human markers because of their poor quality, strange phrasing choices, and excessive errors.
They may not accurately represent the work students may submit in a real-world setting.
As I wrote in Issue 288:
But they tested them anyway — papers that students would not conceivably submit in an educational setting — and concluded that the detectors had a problem. That’s solid work.
But it’s good enough for the University of Waterloo.
Before we move away from this Perkins paper, you may find it interesting that Perkins was an author of a different paper on AI detection, which we covered in Issue 253. I won’t drag you through it, but I will share that, like the other paper, Perkins used AI to create work specifically designed to avoid AI detection. As I wrote in Issue 253:
To detect AI, the team used Turnitin’s system. And, drumroll please — Turnitin correctly identified 91% of the AI-generated papers, despite the efforts to avoid it.
This is directly from the other Perkins paper:
Turnitin’s ability to detect 91% of the generated submissions containing AI-generated content, despite the deployment of prompt engineering techniques by the research team to evade detection, is promising. As it is likely that any detection will raise suspicion of markers assessing the paper for potential academic misconduct violations, this shows that Turnitin AI detection may be a valuable tool in supporting academic integrity.
The dude Waterloo cited to claim that AI detection was unreliable literally wrote, “Turnitin AI detection may be a valuable tool in supporting academic integrity.” Guess they missed that one too. Pretty convenient.
Weber-Wulff
The next citation is a paper by Weber-Wulff and others, covered in Issue 250. In this research, the team tested 54 papers — a mix of AI, human-written, and AI then edited — on 14 different detection systems.
On the human-written text, ten of the 14 systems, including Turnitin, were perfect. Zero misfires.
As I wrote in Issue 250:
Four of the 14 systems (CheckforAi, Winston AI, GPT-2 Output, and Turnitin) were 94% accurate with the AI work, missing just one of 18 test samples each. Another two detectors (Compilatio and CrossPlag) were 89% accurate with AI text.
Three of the four that were 94% accurate with AI were also 100% accurate with human-written work. So was one of the systems that was 89% accurate with the AI.
In other words, three of the 14 tested detectors were more than 96% accurate overall — getting the human work perfect and missing just one of 18 AI works.
So, in the paper that the University of Waterloo cited to say AI detection was unreliable, Turnitin — the system they are turning off — was 96% accurate. It was perfect on human text and 94% accurate at spotting the AI. With the text that was altered to evade detection included, the Weber-Wulff test found Turnitin to be 74% accurate across all test papers, with zero false-positives.
I cannot stress this enough, the paper that the University of Waterloo cited to say AI detection was unreliable shows the system they are turning off to be quite reliable.
Sadasivan
And to the paper from Sadasivan and others, covered in Issue 326.
It’s a complicated paper in which researchers took text created by AI and had different paraphrase engines rewrite it over and over again to see at what point persistent paraphrasing could escape AI detection. That’s cool, but not very practical.
Two things you can take away from this paper though — one, the research team did not even test Turnitin. They tested AI detectors that, as far as I can tell, are not in use anywhere outside of computer labs. And two, before setting AI text on a paraphrase loop, detection rates were in the 99% to 100% range. I wrote this in Issue 326:
Even after five rounds of paraphrasing, detection was still better than 50/50. And it was nearly perfect even after initial paraphrasing.
I give up. Even after seeing numbers such as 99.3%, 99.8%, and 100%, all anyone wants to talk about is how AI detection does not work.
Bottom line is that this paper shows that computer scientists can more or less break some AI detectors if they hammer them enough. Again, that’s cool.
But maybe, if you’re the University of Waterloo and you’re trying to defend unplugging Turnitin — maybe cite a study that tested Turnitin. Maybe also don’t cite a study showing a 100% accuracy rate for other AI detectors. Just a suggestion.
Conclusion
The University of Waterloo has decided to not invest in securing its academic work, making it unable to verify the skills and competencies of its graduates. It is, as I said earlier, pathological blindness, self-inflicted ignorance. And based on the papers the University cited, completely without research support. Which is why I think the school really just wanted to save the money.
As a result, cheating with AI will go up at Waterloo because students know they won’t be caught anymore. And integrity cases will go down — another nice benefit for the school. But teachers will still use AI detectors, they’ll just be all over the map, using the junk in some classes, better ones in other classes, none at all in still others. Fairness and consistency will be gone.
But whatever. They’re saving money. And telling themselves they did the right thing, I am sure. I will never understand it.
Has it occurred to anyone to run the Waterloo announcement through originality.ai? Because I see red flags 👀