US May Require AI Text to be Watermarked
Plus, new information on that study showing AI detection to be biased against non-native English writers. Plus, Chegg continues to shrink.
Issue 251
To join the 3,670 smart people who subscribe to “The Cheat Sheet,” enter your e-mail address below. New Issues every Tuesday and Thursday. It’s free:
If you enjoy “The Cheat Sheet,” please consider joining the 14 amazing people who are chipping in a few bucks via Patreon. Or joining the 25 outstanding citizens who are now paid subscribers. Thank you!
New Rules Include Order to Explore Required Watermarks in AI Text
Earlier this week, the White House issued an order regarding AI technology. Part of that order, according to the coverage at CNN, includes:
The order aims to prevent AI-related fraud by directing the Commerce Department to develop guidance for watermarking AI-generated content.
That’s really good news for academic integrity.
Not that I think it will happen or that watermarking will solve the issues related to academic misconduct, but because it means someone in the policy perches of government recognizes the potential to misuse and abuse AI for fraud.
It will be interesting to see if, when, and how the major AI producers lobby the Commerce Department to not require watermarks or not make a rule at all. As I have mentioned before, generative AI companies could have watermarked their text already. They have chosen not to. And OpenAI, the biggest and most visible of the bunch, has actively opposed or undermined efforts to detect its work (see Issue 241).
Experts have told me already, though, that watermarking won’t fix the fraud challenge because the technology that can easily find watermark signatures will also be able to modify it, essentially erasing it from text. Even so, I’d favor making those who want to use AI to deceive people at least take an extra step to do it. No reason not to have an extra lock on the door, is my view.
Further, if we get watermarking, it will ameliorate the argument that a flaw in current AI detection regimes is that they have no backup, no proof of their conclusion.
In any case, I am for it. And I am glad the powers that be are working on something and openly discussing the problems.
A New Note on That AI, Non-Native Language Study
There is a little new information on the study that keeps going around about how AI detectors are biased against non-native English writers. You may remember from the last Issue, that Debora Weber-Wulff cited it, and/or that I’d written about why it’s flawed (see Issue 216.), or that the press has uncritically repeated it.
Anyway, Turnitin responded.
In a post on their website, the company says they ran their own tests and found no statistically significant bias against text written by non-native English writers. Maybe that’s not a surprise. The details of their test are in their post.
But what I found most interesting about what Turnitin had to say is not only did the research not test Turnitin’s system at all (as I pointed out in Issue 216), but Turnitin says it:
is grounded on a small collection of works of just 91 Test of English as a Foreign Language (TOEFL) practice essays, all of which are less than 150 words long.
Wait, what?
I knew about the TOEFL part and highlighted some very questionable sourcing. But I did not know they were so short. That’s key because most decent detection systems don’t work well with so little information. In fact, Turnitin’s systems won’t even calculate a finding on texts that brief. They say:
Turnitin's AI writing detection was not included in this evaluation, perhaps in part because we won't make predictions about documents that are so short.
So, yeah. In addition to the other obvious errors, including what appear to be simply made-up citations and not testing the biggest, most popular system in the world, the texts they tested are simply unreadable by many systems.
But people keep sharing it and citing it as though it was gospel.
I just don’t know what to say.
Chegg Continues to Shrink
Chegg, the NYSE-listed cheating provider, reported its earnings this week. And they were not good. Again.
Chegg stock, once listed at more than $113 a share, is now selling for less than $7.50.
Conventional wisdom is that Chegg is being devoured by generative AI options which will help students cheat for free, instead of paying Chegg. And though that may be true, Chegg’s slide pre-dated the release of ChatGPT and similar tools.
From the company’s earnings press release announcing its Q3 results:
Total net revenue was $157.9 million, down 4%, year over year
Total subscription revenue, down 4% year over year
4.4 million subscribers, down 8% year over year
A net loss of $18.3 million
Even though its customer base and share prices are shrinking before our eyes, focus on the fact that Chegg collected $157 million last quarter and had more than four million subscribers. Cheating with Chegg may be in decline, but it’s anything but gone.
In CEO Dan Rosenweig’s prepared remarks he says:
91% of students report when they use Chegg they get better grades
I have no doubt they do.
I also think Rosenweig may have slipped here when, in describing what Chegg plans to offer soon, he says:
This means Chegg can provide answers from our proprietary database
Provide answers. You don’t say?
Actually, he did say.
Another little snippet I noticed is that in addition to saying that Chegg was going to make practice tests and flash-cards, Rosenweig said:
We also plan to let students connect to each other and share content.
Yikes. Talk about a perfect plan to run headlong into copyright and other IP problems, that’s it. Ask Napster. But I guess desperate times …
Finally, Chegg says they will do better next quarter, which is this quarter. They project revenue at $185 and $187 million. That’s $185 million in 90 days. Say whatever you like about cheating, but don’t say it does not pay. If you sell it, it pays pretty well.