Auto-Tagging of Patent Applications to Facilitate Discovery of Relevant Prior Art

Project Reference :


Institution :

Nanyang Technological University (NTU)

Principal Investigator :

Professor Chen Lihui

Technology Readiness :

4 (Technology validated in lab)

Technology Categories :

Infocomm- Natural language processing

Background/Problem Statement

NLP is the fastest-growing subset of AI that applies linguistics and computer science to make human language understandable to machines. The NLP market is expected to grow from USD 11.6 billion in 2020 to USD 35.1 billion by 2026, at a CAGR of 20.3% during the forecast period.

Time is a critical factor in the field of prior art searches. Traditionally, patent examiners manually search by keyword, class, or citation in order to extract an exhaustive list of prior art relevant to the invention proposed in the patent application in question. With the increasing volume of patent data, finding relevant information and analysing prior arts would be time-consuming when done manually, especially if you want to achieve quality and accuracy.

From an AI perspective, the key linguistic and semantic challenges are legal wording, long sentences, acronyms, and the technical nature of patent applications. Existing AI algorithms on their own cannot support every aspect of the prior art search process.


Two AI-based techniques have been developed that combine deep learning, natural language processing techniques and semantic technologies to analyse the patent applications, automate classification, retrieve the closest prior art, visualize the retrieved prior art, rank the relevant prior art, and tag the prior art to the patent application in question. 

  1.  Label-based Attention for Hierarchical Multi-label Text Classification Neural Network (LA-HCN) is a hierarchical multi-label text classification (HMTC) model with label base attention designed to hierarchically extract important information from the text based on the labels from different hierarchy levels. LA-HCN can learn disjoint and non-important features for each hierarchical level while sharing hierarchical information across levels, and preserving the hierarchical label-based information. LA-HCN outperforms other state-of-the-art HMTC algorithms across four benchmark datasets. The visualisation of the learned label-based attentions revealed that LA-HCN is better, compared to other learned attentions like HARNN and HAN and can extract meaningful information corresponding to the different labels. 
  2. Contrastive Learning of Semantic Sentence Embedding (CLOSSE) is a new training paradigm designed for fine-tuning a pretrained contextualised language model (PCLM) to learn high-quality sentence-level representations as well as capture the semantic properties of the sentences.

    CLOSSE unifies 3 learning methodologies, label-based supervised learning, self-supervised learning, and unsupervised learning methodologies, to concurrently facilitate sentence-level semantics, linguistic properties and word level information consideration.

    CLOSSE improves sentence representation quality and has outperformed several state-of-the-art models on supervised downstream tasks and linguistic probing tasks.


  • The technology has the potential to be incorporated by organizations into existing patent search processes/tools to automate prior art search and simplify the herculean task of patent search with more precision, efficiency and accuracy. The automated patent search tool can serve as an efficient and easy way for research professionals to access relevant prior arts, providing them with a technical edge. With this technology, IP professionals can shift their focus to more strategic tasks and make quick use of all the data around them in more structured and intelligent manner.
  • Automated prior art search can be a helpful tool when one needs to quickly validate concepts at the ideation stage and/or to include prior art in invention disclosures to reduce the turnaround time for patent prosecution
  • Inventions that are unique and revolutionary often pose an issue to the subject matter experts (SMEs) during the search for relevant new prior art. In such cases, the subject may not have any direct prior art, but AI-based patent search tools can be used to understand and derive similarity from the latest technologies in order to identify potential existing prior art.
  • Besides applications in automating patent classification and prior art search, this technology can benefit organisations in various areas such as pre-processing unstructured data for analysis, automated document matching and classification, discovery and insight and sentiment analysis. 

Potential Application(s)

This technology can potentially automate patent search and increase the quality and precision of search results. Importantly, it reduces manual work as well as the time and cost of patent searches. Organisations and patent examiners can do away with the tedious process of sifting through the large number of prior art retrieved. Hence, with this invention, it is possible to provide a generic automated patent or prior art search tool for different use cases. For instance, such a tool can form part of the patent filing and patent examination prior art searching process. This tool can also create value for organizations with focus on high value patents, strategic patent prosecution, patent portfolio optimisation and management. 

In addition, this technology can be applied in the following areas:

  • Automation in searching large databases
  • Text classification
  • Information extraction
  • Question answering retrieval
  • Query-to-document match
  • Plagiarism check
  • Summarization
  • Any NLP task related to content-based similarity comparison

Hence it can be useful in a broad range of industries such as healthcare, insurance, finance, legal and retail industries, to name a few.

We welcome interest from the industry for collaboration/ co-development / customisation of the technology into a new product or service. If you have any enquiries or are keen to collaborate, please contact us.