An Improved Machine Learning Algorithm and Model to Generate Better Quality Search Results

Project Reference :


Institution :

Singapore Management University (SMU)

Principal Investigator :

Professor Paulin Tay Straughan

Technology Readiness :

4 (Technology validated in lab)

Technology Categories :

AI- Machine Learning - Information Retrieval

Background/Problem Statement

Current chatbots use a static FAQ database to answer questions. If a user raises a question not found in the FAQ database, the chatbot system is not able to give any answer.


An improved learning-to-rank machine learning (ML) algorithm is used to train an ML model that helps generate new and better quality answers to a question. The improved ML algorithm achieved a normalized discounted cumulative gain (NDCG) of 0.3521 from the base NDCG of 0.2134. The ML model achieved an accuracy of 65%. 

Specifically, a new question classification tool was created to automatically classify questions into relevant categories and an answer extraction tool was created to derive answers using Google search engine (through Google’s search API) to generate candidate answers to a given question, by scraping the text from each URL and splitting them into sections that are then ranked according to the frequency of keywords from the processed user input query and returning the top three URLs with the most relevant answers to the question. Besides returning the URLs, the solution also extracts part of the returned webpage as answers to the given question.


The solution improves the performance of existing chatbots by increasing their capacity to answer more questions beyond those found in their FAQ database.

Potential Application(s)

  • The solution can be incorporated into existing chatbot systems to provide possible answers to questions not found in the FAQ database. 
  • The solution can also be potentially used in other applications such as customer service and healthcare that require relevant answers to be provided to questions by locating sentences/paragraphs from a webpage that is relevant to the question and returning these extracted text segments.

We welcome interest from the industry for collaboration/ co-development / customisation of the technology into a new product or service. If you have any enquiries or are keen to collaborate, please contact us.