Artificial Intelligence's second-language English problem
A study has shown AI consistently misclassifies non-native English writing. It could make the current media bias against second-language English speakers even worse.
I’ll start by putting my hands up and admitting I’m no expert on Artificial Intelligence. My knowledge extends to spending a couple of hours asking ChatGPT all sorts of random questions, until it linked Sophia Loren to fascism, just to later admit that it was wrong and that one of the world’s biggest film stars had never, in fact, held fascist views. It felt like a victory of sorts but I suspect AI would have been more useful if I’d asked it less leading questions.
Most social gatherings I’ve attended recently have had what feels like a compulsory half-hour conversation about AI. It usually goes from ‘will any of us have jobs soon’ to ‘it’s the best thing that’s ever happened’ finally ending up with ‘OMG why did no one take the plot of The Terminator seriously?’
The formidable Lara Lewington, one of my best friends and tech journo supremo, has been doing a lot of great work on AI (check some of it out here) so she’s become an oracle on this topic for our group of friends. She’s generally reassuring about the whole thing and because of her I won’t start sounding like one of those doomsayers who think all change will lead to the apocalypse.
However, I do believe that the issue of language - specifically the global dominance of English - isn’t being assessed enough in the advance of AI.
So I was intrigued to see new research around the specific topic of non-native English speakers and AI.
Stanford University has released a study which shows that AI-detectors are biased against non-native English writers.
What are AI detectors? They’re programmes that are used to ‘detect’ whether something was written by AI or by an actual person. This is becoming increasingly important to determine whether job applications, university essays or articles are written using AI or not, so as to make sure that students don’t cheat and that job applicants are submitting their own work.
But the Stanford University research has found that these detectors frequently mis-categorise the work written by second-language English writers as being written by AI.
Scientists checked 91 English essays written by non-native English speakers with 7 popular GPT detectors. It turns out that more than half of those essays were mistakenly identified by the detectors as being written by AI. One particular detector was wrong in a staggering 98% of cases.
Meanwhile, essays written by native speakers were overwhelmingly correctly categorised as having been created by a human.
Apparently the reason for this is text perplexity: a way of measuring how predictable the next word in a sentence is. The more predictable it is, the lower the rank of text perplexity and the more likely it is to be considered to be created by AI.
Programmes like ChatGPT are trained to use low perplexity. But second-language English speakers are also more likely to use simpler words and sentences. Hence the confusion, where some detectors tend to categorise the writing of second-language English speakers as AI generated.
This could obviously lead to discrimination in job applications and essay grading for non-native speakers.
At least the Stanford study has highlighted the issue, and should make the mistakes easier to identify and avoid. But the bias of AI against non-native English speakers may not always be so clear.
I saw something else in the past few days relating to language and AI which I found very disturbing.
Take a look at the video I’ve added below. It’s from my former Al Jazeera colleague Melissa Chan who is now, among other things, working at Deutsche Welle’s English language news channel. She uploaded this video to her LinkedIn account. I found it both amazing and horrifying in equal measure. See what you think.
Melissa is a very talented journalist but by her own admission her Spanish is not of live broadcast quality. And yet here she is, sounding (as far as I can tell) flawless, both in delivery and content. As she mentions in her LinkedIn post , she used AI to change her voice, inflections and all, from English to Spanish.
Now, this is, in many ways, absolutely amazing. The barriers around national broadcasting could come down, making every news bulletin anywhere linguistically accessible around the world, without having to go through a stilted human instantaneous translation. This could be the start of a truly international way of providing the news.
This technology is potentially game-changing for a continent like Europe which, unlike the Middle East, the United States and most of South America, doesn’t have a common language. Imagine what Euronews, the nearest thing Europe has to a continent-wide news channel, could do with this.
But this technology could also worsen the very situation it could be helping.
The diversity of language options on offer could also, ironically, be a step backwards for true diversity of thought.
In this clip Melissa may sound like a native Spanish broadcaster, but the original language of the broadcast is English, which means the news script would have been written by a fluent English speaker - most likely a native English speaker, with the cultural background that comes with that (I addressed this issue specifically in a previous newsletter ‘There’s no such thing as International News’). And it would have been written with an English-speaking audience in mind. As it happens, in this particular case the broadcaster was the English-language channel of Deutsche Welle, which is German, but seeing as English is the global language, and certainly the language of international journalism, I think it’s safe to assume that this technology would overwhelmingly be used by the Anglophone media, already the world’s dominant voice.
Regular readers of this newsletter will know I strongly believe that language is never just language: Language is culture.
So while AI can translate news clips any language and make them sound native, this technology could actually end up cementing the anglophone narrative internationally, because the content would be written and thought in English, using anglophone priorities, frames of reference and cultural sensitivities. The fact it would sound like another language would mask its origin, therefore making the Anglosphere’s dominance of the global narrative more subtle and even stronger.
AI has a wealth of opportunities that we haven’t even scratched the surface of. Harnessing those opportunities could prove life changing, but we must never ignore the risks that this new technology could also bring, including worsening the disparity of narrative which is already embedded in so-called international media.
The recent AI hype is utter nonsense. And it won't be the last time. This laughable uninformed hype and airhead, unscientific fear will be cropping up every other year from now on. The way to avoid the nonsense is the usual recommendation: don't listen to motor-mouth Americans with annoying nasal accents and worthless rubbish they are trying to sell.
It is true that lots of journalists will be thrown out of work (are there any non-avatars left anyway!?) because "AI" is good at doing what the majority have always done: churn out torrents of inaccurate verbiage. I don't think I really care that much: thick, gullible, unintelligent people want to imbibe crap and will continue to do so. It's a golden age for them. Nothing can change that.
"AI" is a hype-in-an-acronym. There is no "intelligence" of any kind there: all of it, without any exception, in 2023, is "SP", Statistical Processing. This is not unimpressive (Google Translate, for example), and it can be expected that in a few years there will be some autonomous vehicles, able to handle certain situations. It will be at least 30 years before you can legally use one to go to a country pub and come back having drunk alcohol. "Intelligence" will never result from SP. Artificial Intelligence will result only from the IT field of Artificial Life and "emergent systems". It's decades and decades away.
I don't really see this specific concern about newsreaders who for some unaccountable reason want to be "anchors" in a country where they are not a native speaker. Being an anchor is showbiz. Many are called, few are chosen. Much more importantly, to my way of thinking, there are hundreds and hundreds of minority languages which are under huge threat from English, on all fronts. English is the language of the Internet, the language of the EU, the "common" lingua franca of the most populous country on Earth (India) and of the most powerful one. This pressure to learn and speak in English can only get worse, and then much worse.
Barbara, your article has made me think, as all your articles do, even though I don't always agree with everything you write. So thank you, for yet another thought-provoking and eloquent piece of writing.
Before I read your article, my thoughts about AI detectors were as follows. They obviously have problems, but they're even new technology than the AI bots themselves, and can therefore be expected to have some catching up to do. I think back to the mid 1990s, when the World Wide Web was jokingly referred to as the "World-wide wait", but people could see its potential, even if they were waiting 10 seconds or more for every web page they visited to appear. AI detectors are in their infancy; and at some point in the future, they'll be more accurate than they are now, and less biased against the writings of non-native speakers.
But having read and thought about your article, I now believe this view is wrong. There will be an ongoing "arms race" between the creators of the bots and the creators of the detectors. The bot creators want to make bots whose writing is harder and harder to distinguish from human writing. The detector creators want to stymie the efforts of the bot creators. But the detector creators will always be behind the 8-ball, and will never achieve the accuracy we'd like them to. And as you point out, they will likely focus their efforts on "native writing". That means they'll always produce a product which can identify the writing of a native speaker more accurately than it can identify the writing of a non-native speaker. I don't know the answer to this problem, but I'm glad you've highlighted it.
As for the second half of your article, I have less to agree with.
Let's replace the wonderful Ms Chan with a journalist from Palestine who speaks only Arabic and Hebrew. Perhaps that journalist wishes to tell the world about the oppression that her people are suffering at the hands of the occupying power. Until recently, only the Arabic and Hebrew speaking world had access to her content. But the same technology that produced the artificially Hispanophone Melissa allows our hypothetical (but certainly oh so real) Palestinian journalist a much wider audience. If she can translate not only her message, but also her delivery, mannerisms and so on to English, Mandarin, Spanish, Hindi, and a bunch of other popular languages and cultures; then maybe the world will sit up and take notice.
Or consider maybe a man from some African country, who wants to tell the world what's going on around him, but speaks only Swahili and a few words of broken French. I want to listen to that man. I want him to appear on my screen, animatedly gesticulating, explaining in crystal clear English about the political and economic situation in his country, and how it impacts the daily life of the common people. Isn't this what international journalism is supposed to be all about?
Suddenly, new scripts don't have to be written by fluent English speakers, to be accessible to the Anglophone world - to sad, ignorant monoglots like me and so many others. Once the technology is sufficiently advanced, the narrative will literally be the thoughts of the original journalist, unencumbered by Anglophone prejudice or frames of reference. Of course, news from English speaking countries will still have English or American cultural nuances, no matter what language it's translated into; but so much "international news" happens elsewhere that we shouldn't turn our backs on.
I started watching Al-Jazeera English many years ago, for one reason only. Al-Jazeera English simply doesn't have the pro-Western bias of the news networks of New Zealand, Australia, USA and the UK. It was so refreshing to be reminded that the world doesn't revolve around USA, and important events actually happen in parts of the world where not everyone is Anglophone and pro-American. (Getting to see the inimitably lovely Ms Serra in action was a secondary reason to watch). Surely, the new technology can only make it easier for the rest of the world to remind me, and others like me, of its importance and even its very existence.