Alibaba Group’s machine-learning technology is better at reading comprehension than humans, according to a well-known test built for the industry by Microsoft.
On June 20, the Alibaba model topped human scores when tested by the Microsoft Machine Reading Comprehension dataset, one of the artificial-intelligence world’s most challenging tests for reading comprehension.
Developed by scientists at DAMO Academy, Alibaba’s global research program, the model scored 0.54 in the MS Marco question-answering task, which evaluates a machine’s ability to use natural language – the way humans communicate – to answer real questions posed by humans. That topped the human score of 0.539, a benchmark provided by Microsoft.
To earn a winning score, machine-learning models must deliver answers to real queries posed to Microsoft’s search engine, Bing – such as “biggest cities in Illinois by population” and “how many carbohydrates in asparagus” – that best match the human answers in the dataset. Per its website, the MS Marco dataset has a collection of more than three million web documents, about 1,010,916 anonymized user queries and 182,669 real answers written by humans.
This particular skill is becoming ever-more important for machines with the growing popularity of intelligent technologies like chatbots and smart speakers. Realistic questions and answers help train systems to “better deal with the nuances and complexities regular people actually ask, including questions that have no clear answer or multiple possible answers,” MS Marco’s developers said in a blog post. The reading-comprehension challenge has attracted universities, research institutes and the AI arms of companies across the world, such as Alibaba, Baidu, Samsung and Facebook.
Si Luo, who heads the Language Technology Lab at DAMO Academy, said using the framework of the lab’s self-developed “deep cascade learning model” was a key factor in beating the challenge.
The model streamlines the traditional question-answering process through multitasking – several components work in tandem to quickly eliminate irrelevant documents and paragraphs while filtering through the remaining content to further reduce potential answers that are not exact enough to produce the best possible results. Once that process is complete, a more-sophisticated deep-machine learning model can examine the remaining texts in more detail to find the best answer.
“The model progressively evolves from coarse to fine, moving from document- and paragraph-level ranking of candidate texts towards more precise machine reading-comprehension answer extraction,” said Si. “This expedites the time required to look up the relevant documents and phrases, while ensuring the accuracy of the answers.”
This was not the first time that Alibaba’s software had outperformed humans in reading comprehension. Last year, the company scored higher than the human benchmark in the Stanford Question Answering Dataset – also one of the most-popular machine reading-comprehension challenges worldwide.
Alibaba said it had already applied the Natural Language Processing technology it used to take on MS Marco and SQuAD to everyday operations. The technology powers Alime, the company’s customer-service chatbot, which served about 50 million daily active users on its e-commerce sites Taobao and Tmall during the 11.11 Global Shopping Festival, Alibaba’s largest sales day of the year. The Alime bot handled approximately 98% of requests made on the platforms that day. At its peak, it served over 83,000 users in a minute, Alibaba reported.
“NLP has been a core technology that underpins Alibaba’s business, which serves hundreds of millions of customers on our e-commerce platforms, including Taobao, Tmall and Lazada,” Si said.
“Moving forward, we plan to put NLP on our cloud-computing platform Alibaba Cloud, so more clients – especially businesses in retail, tourism and public services that involve QA tasks – could benefit from the technology,” Si said.
He added that the lab is also developing a new model, which combines NLP with speech AI and machine translation, to help users more freely communicate across different languages in real time.