Alibaba AI Model Tops Humans in Reading Comprehension

Score one for machines in the battle of man versus machine, with an Alibaba deep-learning model this month topping humans for the first time in one of the world’s most-challenging reading comprehension tests.

Alibaba’s Institute of Data Science and Technologies (iDST) said Monday its deep neural network model scored 82.44 in the Stanford Question Answering Dataset (SQuAD) on Jan. 11, beating the human score of 82.304 for Exact Match, i.e. providing exact answers to questions. The SQuAD is a large-scale reading comprehension dataset comprised of over 100,000 question-answer pairs based on over 500 Wikipedia articles.

“It is our great honor to witness the milestone where machines surpass humans in reading comprehension,” said Luo Si, iDST’s chief scientist for Natural Language Processing. “We are thrilled to see NLP research has achieved significant progress over the year. We look forward to sharing our model-building methodology with the wider community and exporting the technology to our clients in the near future.”

Teams competing in the challenge need to build machine-learning models that can provide answers to the questions in the dataset, such as “what causes rain?” The Alibaba model’s accuracy was tied to its ability to read from paragraphs to sentences to words, locating precise phrases that contain potential answers. That model, which leverages the Hierarchical Attention Network, is viewed as having strong commercial value. Alibaba has used the underlying technology in its 11.11 Global Shopping Festival for several years, with machines answering large amounts of inbound customer inquiries.

Other potential customer-service uses included tutorials for visitors to museums and online responses to inquiries from some medical patients.

The SQuAD is perceived as the world’s top machine reading-comprehension test and attracts universities and institutes ranging from Google, Facebook, IBM, Microsoft to Carnegie Mellon University, Stanford University and the Allen Research Institute.

While its SQuAD performance is a milestone, it’s just one of the proof points made by the iDST’s Natural Language Processing Team recently. Other successes include the best scores and prizes in the ACM CIKM Cup, which focuses on personalized e-commerce searches, Chinese Grammar Error Diagnosis and English-named entity classifications tasks at the Text Analysis Conference, a series of workshops arranged by the U.S. National Institute of Standards and Technology.

The iDST is Alibaba’s primary research arm focusing on artificial intelligence. It’s heavily into Natural Language Processing and solving problems that lead to real-world applications.