335: Research: ChatGPT Passes Grad School Course Undetected, Earns High Marks
Plus, more on that survey used by The Guardian. Plus, a pretty stunning letter from Chegg's CEO.
Issue 335
Subscribe below to join 4,273 (-3) other smart people who get “The Cheat Sheet.” New Issues every Tuesday and Thursday.
If you enjoy “The Cheat Sheet,” please consider joining the 16 amazing people who are chipping in a few bucks via Patreon. Or joining the 45 outstanding citizens who are now paid subscribers. Paid subscriptions start at $8 a month or $80 a year, and corporate or institutional subscriptions are $240 for a year. Thank you!
Research: ChatGPT Passes Grad School Course Undetected and With High Marks
New research from scholars in South Carolina shows that ChatGPT, registered as a “student” in an online Masters-level health administration course, not only earned high grades, but its output and participation in course activities went undetected.
The paper is by Kenneth R. Deans Jr., Jami Jones, Jillian B. Harvey, and Daniel Brinton. Deans is with Health Sciences South Carolina, while the other authors are with Medical University of South Carolina, a public medical college.
The test and findings are pretty straight-forward, but are nonetheless due to be misquoted and misapplied in at least one regard. Either way, I also think there are a two or three significant findings.
Without informing instructors or classmates, the research team enrolled a fictitious student in the online graduate course and used output from ChatGPT (GPT-4) to complete all coursework, including quizzes, essays, discussion posts, and even “participation in live seminars.”
It’s an important caveat that, although the test/study did not add any human thinking to GPT responses, the team did submit all GPT output to Grammarly for grammar errors — “grammatical refinement” — and to adhere to format requirements. They also used Grammarly to check GPT output for plagiarism — none was found, probably because generative AI does not plagiarize. Finally, the team said that they used Google Scholar to verify all citations, to eliminate infamous GPT hallucinations. So, what we have is not naked ChatGPT, but refined and buffed-up ChatGPT.
With that, the team found that ChatGPT performed exceptionally well in the course and its output was entirely undetected. From the paper:
AI’s final grade is 99.36, which is higher than both the cohort’s mean (97.70) and median (98.53). This places AI’s performance near the top of the class.
I have a note on this in a moment.
But also, as noted, none of ChatGPT’s text was spotted or flagged as potentially non-authentic by “in-place” detection regimes. This is what I fear will be misunderstood or, more likely, taken out of context. That’s because the research team makes it clear that the “in-place” AI detection systems that the professor or program or school used may be none whatsoever. The authors mean only that, whatever the school was or was not doing, it did not work.
From the paper:
During the study, no specific AI-detection platforms were disclosed or verified to be in use by the institution. As a result, the detection of AI-generated coursework relied on whatever mechanisms were inherently in place within the academic setting, including the possibility of human instructors identifying anomalies or the institution’s use of AI-detection software.
So, it’s not that an AI detector failed to catch the fingerprints of ChatGPT, it’s that in all likelihood no AI detector was used — “no specific AI-detection platforms were disclosed or verified to be in use by the institution.”
It’s also clear that the authors see the lack of detection as a flaw and directly call for more robust detection and security:
Equally significant is the absence of detection; the AI’s contributions were not identified as synthetic by human educators or any in-place algorithmic detection platforms, signaling the need for enhanced integrity protocols.
And, the lack of detection:
also highlights the need for improved detection frameworks to safeguard academic standards
And:
Our study’s undetectable contributions of AI indicate that current educational and in-place detection methods need revising, which calls for proactive measures to address the emergent challenges posed by AI in educational contexts, such as developing more sophisticated AI-detection algorithms and establishing clear guidelines for AI use in academia.
Let me pull-out for the sake of emphasis: “the need for enhanced integrity protocols,” and “need for improved detection frameworks,” and “proactive measures … such as developing more sophisticated AI-detection algorithms.”
All we know for sure is that, in this test, human detection of AI did not work, which should not surprise (see Issue 325). Though I will bet a donut that someone will cite this paper as a failure of AI detection software, if it has not happened already.
Bottom line here is that if you’re not looking for AI content, not looking for cheating, you won’t find any. And, by the way, the AI — with a little help — is good enough to not just pass your class, but to ace it. So, if you care about the validity of your grades, your course, your program, your degree, or your school, you may want to do something. Just a suggestion.
I have one final thing on this — to revisit this from the paper:
AI’s final grade is 99.36, which is higher than both the cohort’s mean (97.70) and median (98.53). This places AI’s performance near the top of the class.
So, we have an online class, described by the paper authors as being “at a well-known university,” a graduate-level class, where the mean grade is 97.7 and the median score is 98.5. For those who have forgotten their statistics, the average grade in this online course was nearly 98%. That means that unless the class had a thousand students, no one failed. No one even got a C. With the median score of 98.5%, half the class scored 99% — or better.
This has more red flags than a May Day parade.
And I suggest, respectfully, that academic fraud with AI may not be the only problem here. The first problem is online classes in general. The second is lack of integrity and security. If more than half the class is scoring a 98.5% — or better! — the class is either paint-by-numbers easy or it’s entirely compromised. Or both.
If there is an online class anywhere, on any subject, and the responsible parties are not actively deterring, detecting, and adjudicating academic misconduct, I also suggest that the results convey nothing whatsoever. An AI chatbot just “sat” in your class for an entire semester and got an A. No one noticed.
Whether anyone cares enough to change anything, we will see.
For the record, I seriously doubt it.
The average grade was a 98%. In a graduate program. I just can’t.
A Note on That Survey The Guardian Used
In the last Issue of The Cheat Sheet, we looked at the downright terrible article The Guardian decided to release into the universe.
In it was:
More than half of students now use generative AI to help with their assessments, according to a survey by the Higher Education Policy Institute, and about 5% of students admit using it to cheat.
I pointed out that five percent admitting to cheating is crazy far away from the percent of students who are actually cheating with AI. I also said how many other surveys put the “admit” to cheating figure several times higher.
After I wrote about The Guardian and that survey, a reader wrote to me to be sure I knew that HEPI, the Higher Education Policy Institute, which sponsored the survey with the curious results, was partly sponsored by Chegg. Imagine that — Chegg paying money to a group that does a survey showing a very, very low number of students cheating.
Inconceivable.
Our source said Chegg was a member of HEPI’s “Partnership Program,” which costs about $20,000 a year. Curiously, Chegg is not listed as a HEPI partner now. But if you go back to last summer, sure enough, there’s Chegg listed a partner. Here’s a screen image from the HEPI partnership list from June, 2024:
And, if you buzz over to Chegg’s newsroom, in November of 2024, just like eight weeks ago, Chegg announced an absurdly conceived “Academic Integrity in GenAI Principles” document, which is like Purdue Pharma releasing a white paper on ethical narcotics marketing. Anyway, in the announcement, Chegg says:
These principles were crafted with insights from Chegg’s Academic Advisory Board and 15 higher education leaders from the United Kingdom, convened by the Higher Education Policy Institute (HEPI).
So, I would not take anything HEPI says on academic integrity seriously. Good thing I already didn’t. Seeing them issue survey results so clearly wrong was enough for me. Finding out Chegg had their hands in it is no surprise whatsoever.
Thanks to our informed and active reader for pointing that out.
Chegg’s CEO Letter
Speaking of Chegg, the company’s new CEO — poor lad — sent a public letter a few days ago. It’s largely unremarkable except for one bit, which is really quite breathtaking.
Among other marketing nonsense, the CEO says a new product is coming soon:
We’re planning to launch a solution comparison tool so you can see Chegg’s high quality solutions right next to GPT, Gemini, and others. We know you are looking for more than the solution, and this comparison tool will help you synthesize and learn more easily.
Sorry, they’re actually “planning to launch.” Not sure what that means. But OK.
And that does not sound like a bad thing, being able to compare one source with another — though we can all have a laugh about Chegg telling its users that they “know you are looking for more than a solution.” Sure. But I’ll bite. If that’s true, Chegg, if you “know” that your customers are not just paying to cheat and get answers, show us. Tell us how you know that. Go on.
But that is not my point.
My point is about this “comparison tool” that Chegg is thinking about maybe thinking about doing something on later. Unfortunately for Chegg, their own letter gave their game away just a few sentences before.
The CEO says:
You have told us that you are spending too much time cross-checking and verifying answers from AI platforms
Ah, so there it is.
In other words, this new Chegg comparison thing is not really for comparing sources, it’s for verifying the junk ChatGPT is telling you while you’re asking it for answers. Nice.
Maybe it’s obvious, but if Chegg wants to help a student cross-check and verify answers from AI platforms, students are getting answers from AI platforms.
And Chegg, it seems, is all good with that — so long as you pay them to verify it.
Also, I ask rhetorically, why would a student need to “cross-check and verify?” Oh, right — because they don’t know the answer themselves. But no worries, Chegg is here to tell you if your AI chatbot answer is right or not.
And, rhetorically again, why would that be helpful? There’s the grade, obviously. If you turn in an answer that the American Civil War happened in 2024, which is when the movie of the same name came out, you will probably fail. But you may also face questions about using AI to copy-and-paste your answer. So, good thing Chegg is on the case. Not to help a student learn anything whatsoever, but to make sure the answers you cheat with are easy to verify.
After all, the real pain point for students is “spending too much time” fact-checking the answers they are getting from AI. Yes, a genuine problem that Chegg is all too happy to solve, for a fee. Now, students don’t have to spend time thinking about the questions, the AI answers, or even considering whether or not they are right.
I’d say it is unbelievable, but it’s a Cheggucation.
"AI’s final grade is 99.36, which is higher than both the cohort’s mean (97.70) and median (98.53). "
In the US, graduate student loans have no cap, whereas undergraduate loans have a maximum some where near 55K. I am not sure this study highlights the potential for student cheating using ChatGPT or, instead, the potential for massive abuse of graduate school financial aid. These passing students can borrow towards their housing(mortgage included),food, car and all kinds of tangible property, do not seem to be doing challenging work and seem to have zero chance of not moving on to the next course. As it is healthcare related, they may even qualify for loan forgiveness. I taught for quite some time...the grades and pass rate of this course should merit a thorough investigation outside of the use of ChatGPT. Maybe DOGE needs their first assignment?
the Health Admin course study echos the results of the A real-world test of artificial intelligence infiltration of a university examinations system: A “Turing Test” case study performed at the University of Reading (UK); my takeaway also was that IF writing is used, more than human detection is needed to level the playing field for students (though in that case the mean/median weren't A+ already) https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0305354