347: Florida Professor Watermarks AI Text for Easy Detection
Plus, a Quick Quote. Plus, Times Higher Ed writes on assessment validity and cheating.
Issue 347
Subscribe below to join 4,423 (+9) other smart people who get “The Cheat Sheet.” New Issues every Tuesday and Thursday.
The Cheat Sheet is free. Although, patronage through paid subscriptions is what makes this newsletter possible.
A University of Florida Professor Developed a Watermark for AI-Created Text
Yuheng Bu, Ph.D., a professor at The University of Florida, announced that he has developed a watermark for text created by AI, making it easy to spot.
Disclosure, I am a Gator. Go Gators.
But first, the article and announcement. Then, a comment. Or two.
The coverage correctly frames AI as a cheating enabler, quoting Dr. Bu:
“If I'm a student and I'm writing my homework with ChatGPT, I don't want my professor to detect that”
Amazing how obvious that is.
Further:
Using UF’s supercomputer HiPerGator, Bu and his team are working on an invisible watermark method for Large Language Models designed to reliably detect AI-generated content – even altered or paraphrased – while maintaining writing quality.
I also love that this article quotes the research we covered in Issue 325. From the news article:
To address this, Peter Scarfe, Ph.D., and other researchers from the University of Reading in the United Kingdom tested AI detection levels in classrooms last year. They created fake student profiles and wrote their assignments using basic AI-generated platforms.
“Overall, AI submissions verged on being undetectable, with 94% not being detected,” the study noted. “Our 6% detection rate likely overestimates our ability to detect real-world use of AI to cheat on exams.”
Yup.
Continuing:
Bu’s work enhances the system’s strength against common text modifications in daily use, such as synonym replacement and paraphrasing, which often render AI detection tools ineffective. Even if a user completely rewrites the watermarked text, as long as the semantics remain unchanged, the watermark remains detectable with high probability. And a watermark key is applied by the platform itself.
“The entity that applies the watermark also holds the key required for detection. If text is watermarked by ChatGPT, OpenAI would possess the corresponding key needed to verify the watermark,” Bu said. “End users seeking to verify a watermark must obtain the key from the watermarking entity. Our approach employs a private key mechanism, meaning only the key holder can detect and validate the watermark.”
That Bu’s watermarks are resistant to paraphrasing and rewriting is encouraging because savvy students try to erase the evidence of misconduct. That the watermark also needs a unique key is also interesting, and perhaps a good step forward in commercialization and usability.
Watermarking is not a new idea. When ChatGPT first rolled out, OpenAI promised to watermark the text and subsequently developed a tool to do just that. But the company has refused to release it, saying it will hurt their business (see Issue 308). Which means that OpenAI/ChatGPT thinks it will make more money by continuing to be deceptive, allowing and enabling their users to deceive others.
Which leads me to my question about what Professor Bu has developed — whether his watermarking efforts will depend on the AI companies to implement. If they will, they may be powerful, but permanently mothballed. There’s no way most AI-generation companies want their text discoverable. Even if they do cooperate, if a student knows the AI text in their LMS is watermarked, they will bypass it and access the chatbot directly or use a different provider.
I e-mailed Dr. Bu to ask about this. He responded, but the question as to whether his methods require the cooperation of AI text creation companies, or if it only operates in the LMS, remains unclear to me.
I’m not being intentionally dismissive. I think all efforts to mitigate the cheating potency of AI are helpful, and I am on board with watermarking. If AI is being used correctly, ethically, I see no reason anyone would object.
Good for Professor Bu and his colleagues.
Times Higher Ed on Assessments
Times Higher Ed, which continues to pace the universe in coverage of academic integrity, had a story a few weeks ago about assessment design, validity, and cheating.
It’s not bad, necessarily. But I definitely find it unhelpful.
My main gripe is that the entire essence of the story downplays, minimizes, or de-prioritizes integrity, playing it as a secondary concern. The headline is literally:
AI: cheating matters but redrawing assessment ‘matters most’
If you follow the words “cheating matters” with the word “but,” I am out.
The subhead is:
Universities should prioritise ensuring that assessments are ‘assessing what we mean to assess’ rather than letting conversations be dominated by discussions around cheating
Prioritize x, rather than “discussions around cheating.”
I’m out.
Let me try it this way. I don’t care how valid your assessment design is, if students are cheating, it is invalid. If some runners in your race are using steroids, while some are in bare feet, you have not measuring ability to run fast — no matter how accurate your timing gauge is.
Let’s try a second bite at this apple. To be valid, every assessment must have basic integrity. Integrity is not a nice-to-have.
The article pivots on Professor Phillip Dawson, at Deakin University in Australia. I know Dawson a little and consider him thoughtful and in the right camp, by and large. But then he says things such as this, as quoted in the article:
“validity matters more than cheating”
Again, if you have cheating, you do not have validity.
From the article, quoting Dawson:
Speaking at the conference of the UK’s Quality Assurance Agency, he said: “Cheating and all that matters. But assessing what we mean to assess is the thing that matters the most. That’s really what validity is…We need to address it, but cheating is not necessarily the most useful frame.”
I agree that it’s important to assess what you mean to assess. No question.
But to bludgeon you with repetition, if you have a very focused, very valid assessment and 40% of the assessed are cheating, you have probably measured nothing whatsoever. At least not comparatively. And because you are presumably letting cheaters get grades, it becomes impossible to say what getting an “A” actually represents. Learning? We cannot be sure.
Tainted is the word we usually use for marks achieved under suspicion. If you put “cheating and all that” second, your marks are tainted.
Further, Dawson dips in the surrender-to-AI camp saying, as quoted in the article:
“Discursive changes are not the way to go. You can’t address this problem of AI purely through talk. You need action. You need structural changes to assessment, [and not just a] traffic light system that tells students, ‘This is an orange task so you can use AI to edit but not to write.’
“We have no way of stopping people from using AI, if we aren’t in some way supervising them; we need to accept that. We can’t pretend some sort of guidance to students is going to be effective at securing assessments. Because if you aren’t supervising, you can’t be sure how AI was or wasn’t used.”
But I agree — talk about AI use is useless. I mean, teachers should tell students what the expectations are, what will be assessed, and where the lines are. I don’t think Dawson, or anyone, is suggesting that students need no guidance whatsoever on classroom policies and expected learning goals. But where I think Dawson is — that the policy alone is pointless — I agree.
We do need action, as he says.
I also agree that supervision is essential. If assessments are not supervised, especially online assessments, AI use — and misuse — should be assumed. No matter what the policy is.
Although I will add that some technologies are quite promising in being able to limit AI use in unsupervised assessment settings. At the same time, you may argue that these tools are a form of supervision, depending on how you define it.
My take-away from that — supervise assessments. This seems like a decidedly pro-integrity position.
And for the record, I am not against redesigning or reconsidering assessment design, strategy, or objectives. I do think that reconsidering any assessment is a waste of time if you’re leaving your front door wide open — like an interior designer meticulously redesigning a house that is currently on fire.
The article wraps with another quote from Dawson:
“The times of assessing what people know are gone.”
I thought I was a cynic.
And I hope that’s wrong. If Dawson is correct, the value proposition of organized, credential-based education is also gone.
To me, the answer is simple. The execution is hard, and the investment is high. But if supervision of assessments is required, supervise the assessments. If you need a tool to secure exams, get that tool. The outcome is worth the effort.
Quick Quote
In Issue 345, we looked at the survey from HEPI which showed that 88% of higher education students in the UK admit to using AI on their academic assignments.
Not noted at the time was that a few news outlets covered the survey, including this, in The Guardian. I’m not going over the coverage, as it’s banal and presumptive. Instead, I’m only sharing this quote from the survey/coverage:
One student told researchers: “I enjoy working with AI as it makes life easier when doing assignments; however, I do get scared I’ll get caught.”
Caught is an interesting word choice — almost like they know they’re doing something wrong.
Almost.
A Funny, From ICAI
As mentioned previously, I was honored to be at the annual ICAI conference this past week. And I highly recommend the event(s) to any and all interested. It will be in Denver next year.
After my remarks, I apologized for being a downer — for not having much hope to deliver. Later, commenting on me being a giver of gloom, a friend/colleague said The Cheat Sheet was “the new DEI — depressing, entertaining, and informative.”
Not untrue. Love it.