Conversational AI takes more than pat answers to complete tasks.

AI Language Benchmarks Studied

When we talk to our digital assistants, like Alexa or Echo, we don’t always realize what it takes to have these machines become cognizant and adept at interpreting our mode of speaking or our language.

Due to the vast amounts of differences in speech, speech patterns and even basic language, it’s a tough task.

The words used to guide these human to digital assistant conversations are called benchmarks. Some benchmarks lack “multi-turn” dialogue, which the authors in a recent paper also criticized. Multiple studies have found that people using AI to accomplish concrete tasks respond best to back and forth dialogue, involving the ability to ask multiple questions or engage in conversation, instead of issuing a series of single, separate commands, as described in a story on Venturebeat.

“The promise of conversational AI is that, unlike virtually any other form of technology, all you have to do is talk. Natural language is the most natural and democratic form of communication. After all, humans are born capable of learning how to speak, but some never learn to read or use a graphical user interface. That’s why AI researchers from Element AI, Stanford University, and CIFAR recommend academic researchers take steps to create more useful forms of AI that speak with people to get things done, including the elimination of existing benchmarks.”

The story explains the paper, titled “Towards Ecologically Valid Research on Language User Interfaces,” which was published recently on preprint repository arXiv. It promotes the creation of practical language models that can help people in their professional or personal lives, and identifies common shortcomings in existing popular benchmarks like SQuAD, which does not focus on working with target users, and CLEVR, which uses synthetic language.

Examples of challenges for speech interfaces that academic researchers could pursue instead, authors say, include AI assistants that can talk with citizens about government data or benchmarks for popular games like Minecraft. Facebook AI Research released data and code to encourage the development of a Minecraft assistant last year. Talk about your vertical markets.

In other recent news in language models, Microsoft researchers said this week they created advanced NLP for health care professionals, and last month researchers developed a method for identifying bugs in cloud AI offerings from major companies like Amazon, Apple, and Google. It’s all in how we learn to use communication in our digital world