Alibaba’s DAMO Unveils LLMs Designed For Southeast Asia

Photo credit: Shutterstock

Published on Dec. 11, 2023

Alibaba Group’s research institute DAMO Academy unveiled on Monday two large language models designed to reflect Southeast Asia’s diverse linguistic and cultural landscape.

DAMO Academy released a model called SeaLLM and a conversationally finetuned version called SeaLLM-chat.

The models, which both come in two sizes, 13 billion and 7-billion-parameters, are capable of processing local languages including Vietnamese, Indonesian, Thai, Malay, Khmer, Lao, Tagalog, and Burmese. Both can perform tasks that better align with local customs, style and legal stipulations.

The initiative comes amid rising demand for more locally relevant LLMs from Southeast Asian countries. Singapore, as an example, has created a $52 million AI initiative to develop the Lion City’s research and engineering capabilities in multi-modal LLMs.

Alibaba said the launches were designed to create more inclusive and regionally relevant LLMs that reflect the cultural nuances of Southeast Asia. Most LLMs originate from western countries and are trained on datasets that are based disproportionately on languages derived from English and languages derived from Latin.

“This innovation is set to hasten the democratization of AI, empowering communities historically underrepresented in the digital realm,” said Bing Lidong, Director of the Language Technology Lab at Alibaba’s DAMO Academy.

DAMO Academy has open-sourced the models on Hugging Face, making them freely available for research and commercial use.

Bridging the Linguistic Divide

Trained on a diverse set of Southeast Asian languages, SeaLLM can interpret and process text up to nine times longer than models like ChatGPT for non-Latin languages, and has more complex task execution capabilities. It outperforms most open-source LLMs in understanding a wide spectrum of subjects from science, chemistry, physics to economics, in the region’s languages.

The model outperforms other existing models in machine translation capabilities between English and low-resource languages, referring to those with limited data for training conversational AI systems, such as Lao and Khmer. It also delivers performance on par with state-of-the-art models in most high-resource languages, referring to languages for which many training data sources exist, such as Vietnamese and Indonesian.

Through pre-training enhancements and culturally tailored fine-tuning, the AI assistant powered by SeaLLM-chat can comprehend, respect and accurately reflect the cultural context of the languages in the region, including social norms, linguistic preferences and legal considerations.

“This initiative has the potential to unlock new opportunities for millions who speak languages beyond English and Chinese. Alibaba’s efforts in championing inclusive technology have now reached a milestone with SeaLLM’s launch,” said Luu Anh Tuan, Assistant Professor in the School of Computer Science and Engineering (SCSE) at Nanyang Technological University, Alibaba’s long-term partner in multi-language AI studies.

The culturally attuned LLMs can also empower companies to build their own chatbot assistants for businesses dealing with Southeast Asian markets.

Read more about technology in China

Alibaba’s DAMO Academy Unveils LLMs Designed For Southeast Asia

Alibaba’s DAMO Academy Unveils LLMs Designed For Southeast Asia

Popular Stories

Alibaba Cloud and OBS Unveil AI-powered OBS Cloud 3.0 at Paris 2024

Alibaba Celebrates Gender Equality at Paris 2024 with AI-powered Restoration of Historic Photos

Alibaba International’s Aidge AI Toolkit Hits Half a Million Merchant Adoptions

Alibaba AI Tool Creates Picture Books for Children with Autism

Alibaba’s DAMO Academy Unveils LLMs Designed For Southeast Asia

Never Miss a Story

Popular Stories

Sign Up For Our Newsletter