SingaKids Pic2Speak: Multilingual AI Tutor – Uplifting Singapore’s Bilingual Edge

Lead PI:

Dr. Nancy F. Chen
Institute for Infocomm Research (I2R), A*STAR


Institute for Infocomm Research (I2R), A*STAR

  • Dr. Dong Ming Hui
  • Dr. Tan Hui Li

National Institute of Education (NIE)

  • Dr. Wong Lung Hsiang
  • Dr. Sun He
  • Dr Khor Ean Teng, Karen
  • Dr. Goh Hock Huan
  • Seetha Lakshmi
  • Suryani Bte Atan

Nanyang Technological University (NTU)

  • Assoc. Prof Zhang Hanwang
  • Assoc. Prof Chng Eng Siong

National University of Singapore (NUS)

  • Assoc. Prof Kan Min Yen

Tech Leads

  • Dr. Zhang Huayun, Institute for Infocomm Research (I2R), A*STAR
  • Liu Zhengyuan, Institute for Infocomm Research (I2R), A*STAR

Host Institution: Institute for Infocomm Research (I2R), A*STAR

Over the past two decades, the use of mother tongue languages such as Mandarin, Malay and Tamil in Singapore has been declining. This impacts Singapore’s multilingual edge, reducing our competitive advantage in business and political opportunities, and also impacting social bonds and the connection to our cultural heritage.  

To complement current efforts by educators, the team taps on AI technology to develop a multimodal, multilingual AI tutor that will guide children to verbally describe a given picture using mother tongue languages (Mandarin, Malay or Tamil).

The learning companion will be powered by a range of AI technology techniques, such as multimodal comprehension, engagement evaluation, intervention design, child speech recognition, neural machine translation, visual question answering, pedagogically-anchored dialogue management and controllable neural language generation.

The AI tutor features a range of AI techniques developed by A*STAR’s I2R to address various challenges.

  1. As younger children have limited attention span, the ability to sustain their interest is key to increase their motivation to learn. The team addresses this challenge with their research in modelling engagement that focuses on detecting engagement through the speech signal, which minimises privacy concerns compared to other modalities e.g. videos. By disentangling neural features to dimensions of pronunciation, rhythm intonation and semantics, it will make these AI models more interpretable and explainable, enabling more detailed and personalised feedback.  
  2. While deep learning has fuelled major developments such as large language models, these models are known to hallucinate, causing factually incorrect text to be generated. This is problematic when it comes to high-stakes applications such as education, which the team tackles with their research in controllable neural generation. 

The team’s education scientists from NIE will conduct research to position and validate that the developed EdTech solutions are useful to students and teachers. Their in-depth understanding of the workflow in school settings will ensure seamless integration during deployment.

This project is an interdisciplinary endeavour to integrate pedagogy, machine learning and linguistics in AI technology.

The AI tutor benefits both students and teachers. 

While there are four official languages in Singapore, the majority of students speak English at home, since many parents are not fluent in mother tongue languages. Our applied and translational research can help enhance multilingual exposure at home to complement classroom learning.




Teachers can also obtain more data points outside the classroom to track each student’s individual progress more easily. 



These AI developments can potentially enrich the local ecosystem with a range of technology options, which can be transformed into commercialised products and services in the future.