Automatic Text Mining of Medical Reports using AI

Project Reference :


Institution :

Nanyang Technological University (NTU)

Principal Investigator :

Professor Sun Aixin

Technology Readiness :

4 (Technology validated in lab)

Technology Categories :

AI - Deep Learning

Background/Problem Statement

The data mining tools market is expected to grow from USD 664.6 Million in 2020 to USD 1577.0 Million by 2028 – growing at a CAGR of 11.4% during the forecast period. Healthcare organisations are wading deeper into data mining to leverage the growing data sources as a result of the industry’s journey toward digitalisation.

One hurdle in digitalisation and the implementation of data mining in healthcare is that medical data is spread across different sources governed by different states, hospitals, and administrative departments. Such medical databases may not be technically compatible to make data sharing possible. Moreover, the current process manually analyses, extracts, and feeds meaningful data from non-electronic and electronic sources to the medical registries. This process is very labor-intensive, time-consuming, and prone to human error.


An AI-based solution has been developed to automate the process of data analysis and extraction of identified data variables from unstructured data in electronic medical records. The extracted data variables are then fed automatically to the medical registry.

 The solution includes: (i) a phrase matching engine that is generic in supporting fuzzy (or non-exact) text search in medical records and can be used for efficient dataset exploration and (ii) a collection of extraction models, including rule-based, machine learning and deep learning algorithms to help automate the process of data collection for medical conditions for use in research/audits.      

The technology uses NLP algorithms to interpret and extract identified data variables with high accuracy, achieving >95% accuracy in test and validation sets for 8 identified data variables.


  • The solution efficiently streamlines the healthcare data mining process and reduces the long-term cost and manpower effort involved in maintaining diverse medical databases
  • Medical data from various sources can be integrated, providing a seamless interface for accurate end-to-end data extraction and data feed from one data source to another
  • The solution has the potential to reduce the cost of treatment and medical errors, and enables the use of data for predictive analytics

Potential Application(s)

Data mining of medical databases for research, analysis, and business intelligence

We welcome interest from the industry for collaboration/ co-development / customisation of the technology into a new product or service. If you have any enquiries or are keen to collaborate, please contact us.