ChatGPT Speaks the Language of Making Profits, but Not in Challenging Dialects
Approximately 6,000-7,000 languages are actively spoken on Earth. While AI has incredible abilities, it can’t translate all of them. As the website restoftheworld.org reports, AI still isn’t perfect.
Even ChatGPT is having problems with languages, and not all of them are obscure. For instance, Thai has tripped up the algorithm, and it has difficulty producing the same quality of translation in Bengali, Swahili and Urdu, though millions of people speak it. It comes down to how AI was trained—primarily in English and on internet content. As writer Andrew Deck reports:
“When Rest of World tested ChatGPT’s ability to respond in underrepresented languages, we found problems reaching far beyond translation errors, including fabricated words, illogical answers and, in some cases, complete nonsense.”
Yes much like having AI write a term paper or fill out a lawyer’s brief, the complexity and lack of supporting data creates problems in translation.
“If you ask ChatGPT in Tigrinya or Amharic the simplest and most frequently asked questions, it gives you gibberish, a mix of Tigrinya and Amharic, or even made-up words,” said Asmelash Teka Hadgu, co-founder and chief technology officer of Lesan, a startup developing machine translation products for Ethiopian languages. “Chatbots like ChatGPT are utterly broken or useless for these languages.”
A recent study by researchers at the University of Oregon made a similar finding, testing ChatGPT’s ability to complete several writing tasks in 37 different languages. In low-resource languages, the chatbot routinely underperformed in the tasks. The amount of training data for each language was not the only factor at play: The study found that the chatbot had particular difficulty with low-resource languages that were structurally different from English.
Currently, OpenAI does not include any language guidelines in its usage policy for ChatGPT.
It appears from the testing done by this website that OpenAI is releasing the algorithms that make money from paying customers in Asian and African countries. If the language is remote and doesn’t have enough people speaking it the algorithm may not speak it either.
All of Rest of World’s tests were conducted using GPT-3.5, the free and most-used version of ChatGPT released in November 2022. However, some early evidence shows that GPT-4, released this past March, is slightly more proficient in South Asian languages. The new model is being marketed in India to paying customers who speak Bengali, Urdu, Punjabi, Marathi, and Telugu. In Bengali, for example, ChatGPT still struggles with some grammar issues, but otherwise responds fluently to many simple prompts.
A blog post about CEO Sam Altman’s recent — and much publicized — tour of 22 countries around the world stated:
“We are also working toward better performance for languages other than English, considering not only lab benchmarks, but also how accurately and efficiently our models perform in the real-world deployment scenarios that matter most to our developers.”
OpenAI declined to share any specifics on these efforts. The examples of language fails provided in the article are simply hilarious and good for a laugh.
read more at restofworld.org