AI Singapore regularly conducts AI Clinics for organisations keen to get started on their own AI journey. These are sessions led by our AI Advisory team and the goal is to acquaint decision makers with knowledge of where AI might serve their business or organisational needs. In this article, I provide a flavour of what is typically shared in an AI Clinic. The target industry for this particular clinic is that of sales and marketing in the retail space.
For as long as there has been commerce, sellers have been keeping tabs on the buyers of their goods and services. That means data on customers, goods/services, and transactions. What separates the good sellers from the poor ones is often the mastery of this data. This is where AI promises to deliver tangible value.
Think about the relationship a retailer has over time with a particular customer, i.e. the customer life cycle. There are many angles one can take with regard to this relationship. This is illustrated in the diagram below.
For each of these angles, there already exist mature AI tools to help. We will look at two of them – customer segmentation and recommender system.
Do you know your customers? Even though every customer is unique, there are also broad categories to which every customer can be said to belong for marketing purposes. Typically, from the mass of transaction records, a customer profile table can be derived. This describes in each line the transaction history of a particular customer. The columns contain data on different features of the customer’s transactions. The more technically inclined reader can imagine this table as representing the customers as points in a multi-dimensional space. The task is to identify groupings of points and then target them with separate marketing campaigns. All this can be easily done with a machine learning model.
Let’s look at a more concrete example to make things clear.
Suppose you are a pet shop owner and, after experimenting with different features, the machine learning model segments your customers into five groups with the following summary values.
For the first segment, you notice that the customers return higher values than the others. So, you call them the “big spenders”. For the second segment, the average frequency of purchase is almost twice that of the next highest group. You label them “frequent buyers”. So on and so forth. With these segments identified, you can proceed to craft appropriate marketing campaigns to engage your customers.
Do you know what your customers want? Of course, you do know what they have bought before. But do you know with confidence what they would likely buy if you were to gently recommend to them? This is what sales promoters do. Machine learning can do this on a massive scale. Take Amazon, the behemoth of retailers. As much as 35% of its sales are generated from its recommendation system.
Just as in customer segmentation, it all starts from transaction records. There is a lot of math involved in processing the records, which lies outside the scope of this article, but it suffices to say that a recommender engine identifies people who have made similar purchases and then proceeds to recommend novel items. As the diagram below illustrates, if John and Jane have both bought dog food before and John has also bought a dog toy but Jane has not, the system will recommend the dog toy to her. The system is able to do this even though it has no semantic understanding of dog food and dog toy.
What I have just shared is a small sampling of the content that goes into an AI Clinic. Keen to know more? You can contact the AI Advisory team at email@example.com for further information.
Just over a year ago this month, AI Singapore played host to a small delegation from the University Technology Center (UTC) Chulalongkorn University (link). It was a casual getting-to-know-each-other chat over drinks in town which was still possible before the appearance of COVID-19. The UTC side was led by Assistant Professor Natawut Nupairoj, its Acting Director, while AI Singapore was represented by our Deputy Director Koo Sengmeng. Within a few weeks of this meeting of six, UTC would have its soft launch in Thailand (link).
UTC was established to address the so-called “valley of death” – the gap between translating university research and IP to working applications with commercial value. This is not a unique challenge as Singapore had already addressed this with its 100 Experiments (100E) programme. However, it was the arguably more difficult related challenge of nurturing AI talent which Prof Natawut had primarily in mind when making the visit to Singapore. Talent is essential to power any such translation. Coincidentally, in the preceding month (Oct 2019), AI Singapore’s AI Apprenticeship Programme had made the news as one of the country winners in the “Talent Accelerator” category in the year’s IDC Digital Transformation Awards (DX Awards) (link). Getting to this stage had not been a straight walk in the park as it involved (and still involves) a lot of experimentations and adaptations in what was at that time essentially uncharted territory. Sengmeng was happy to share our experiences with our Thai counterparts in order to get them quickly up to speed in their own endeavour.
From the good initial meeting, as activities in UTC ramped up, correspondence between UTC and AI Singapore continued unabated, notwithstanding the disruptions that COVID-19 wrecked upon the world. In recognition of AI Singapore’s contributions, Sengmeng was appointed a member of the UTC International Advisory Panel where he continues to provide recommendations on its direction and targets, as well as explore areas of AI collaboration between Singapore and Thailand. Among other things, he also gave a virtual talk at the Summer Seminar in the Bachelor of Arts and Science in Integrated Innovation (BAScii) programme at Chulalongkorn University.
UTC has since launched their own AI Academy Training and Apprenticeship Programme. It consists of an initial two months of intensive assignment-based self-learning, followed by twelve months of paid apprenticeship (up to THB50k or SGD2.2k) in a real-world AI project in either the healthcare or industry domain. If you are familiar with AI Singapore’s AI Apprenticeship Programme, which recently produced the fifth batch of graduates, the similarity in form and content cannot be missed. The first fruits of UTC’s programme will be harvested in September next year. In a way, we at AI Singapore feel encouraged that our model of talent development has been adopted to a large degree by our neighbour. As Sengmeng reflects upon working with UTC in the past twelve months:
I am glad that UTC has found tremendous value in our talent programmes. In fact, we’re looking forward to working with more international partners in months to come so that all of us will benefit from our shared experiences. Together, we nurture existing AI talents and groom future ones, not just for our countries but for all of humanity.
– Koo Sengmeng, Deputy Director of AI Innovation, AI Singapore
In a previous post of this series, we touched upon the basics of Federated Learning and its benefits. We also mentioned that AI Singapore is working on building a system to support Federated Learning.
In this post, we will take a closer look at the system that AI Singapore is building. The system is named Synergos. This is a Greek word, from which the English word “Synergy” was derived. It means “to work together” or “to cooperate”, which is the very gist of the vision that Federated Learning promises. We will first talk about Synergos’ architecture and its various key components. After that, we will zoom in on one of the main components, i.e. Federation.
Key Components of Synergos
Synergos is essentially a distributed system, in which different parties work together to train a machine learning model without exposing the data of each individual party. The diagram below shows a single-party view of Synergos’ key components.
We will start at the bottom of the diagram and work our way up.
The core of Synergos is its Federation component. Here is where the coordination among different parties to train a global model (without exposing data) happens. The Federation component defines the application level protocol over WebSocket to form a Federated Grid. A Federated Grid is a star-architecture network formed by different parties, who exchange messages among themselves to coordinate the model training and inference. We will take a closer look at Federation later.
Compute & Storage acts as an interface to different compute and storage backends. As a start, Synergos currently assumes that the data is managed by a file system and the compute load is handled by a single CPU node. Support for other storage services and compute frameworks is in the roadmap.
In Synergos, as is typical in machine learning, multiple experiments are run to train multiple models, and one of them is eventually chosen as the model to be deployed into production. Different experiments are usually configured with different training datasets, model types, and/or hyperparameters. Model Lifecycle Management is responsible for tracking the running of multiple experiments to record and compare results. It also serves as a model registry to manage the lifecycle of a federated learning model, including model versioning and stage transitions.
As mentioned earlier, a Federated Grid is where the federated training really happens. In Synergos, this is not a persistent setup. It is typically destroyed when an experiment is finished. To run multiple experiments, Orchestration starts multiple Federated Grids and configures them with different sets of data and hyperparameters. The running of experiments are then tracked by Model Lifecycle Management. When all experiments are completed and a model is elected to transit to the production stage, Model Serving makes sure the model is up and running and is able to receive requests from the users, including those who did not contribute data and join the Federated Grid to train the model.
Contribution Calculation and Reward are closely related. One of the main value propositions of Federated Learning is that it enables collaborative model training without the individual parties exposing its training data. But this is a double-edged sword. It also opens the door for the “free-rider“, i.e. participants who try to benefit unilaterally by deliberately injecting dummy data into the training process. A contribution and reward mechanism could help to find out who are the potential free-riders so that the collective benefit of all the participants could be optimised. Contribution Calculation is responsible for evaluating the value of each party’s data; and Reward calculates how much gain a party could receive from the data it has contributed.
In Synergos, although different parties do not expose data to one another, they still need to “register” their data to the data catalog system (external to Synergos). This is accessible by all parties, so that they could identify what data are made available by other parties. Meta-data Management acts as the interface to the data catalog system, which exposes a number of APIs for actions like add/modify/delete data, registration and search. Experiments and model artefacts are also registered to the data catalog system.
Finally, the Dashboard provides a one-stop view of all the information generated by the different components, including experiments and their corresponding configurations (e.g. data used and hyperparameters) and performance. It could also be used to complete some administrative tasks, e.g. start/stop of an experiment, changing of models’ stage (e.g. election of a model to be deployed into production), etc.
Now that we have an overview of the key components inside Synergos and how they interact with one another, let us take a closer look at Federation.
Zoomed-in View of Federation
Federation is developed on top of PySyft, a Python library for secure and private Deep Learning developed by OpenMined, an open-source community actively promoting the adoption of privacy-preserving AI technologies.
The best way to avoid data privacy violations is to not work with the raw data itself. Instead, we need to find a masked representation of the dataset, one that ensures an individual’s anonymity, but not at the cost of reduced algorithmic coverage. This is because in federated learning, we are not interested in the patterns found in individual sub-samples of reality. What we want is to derive aggregate trends that are generalizable to all parties in the system.
The main vehicle used by PySyft to make data “private” is its PointerTensor. As its name implies, it creates an abstracted reference pointing to remote datasets. And this reference can be used by a third party to execute computations on the data without actually “seeing” the data.
In this example, jake is a Worker in PySyft. When we send a tensor to jake, we are returned a pointer to that tensor. All the operations will be executed with this pointer. This pointer holds information about the data present on another machine. Now, x is a PointerTensor, and it can be used to execute commands remotely on jake’s data. An analogy to better understand PointerTensor is that it works like a remote control, i.e. we can use it to turn on/off a TV without physically touching the TV.
The PointerTensor is a powerful tool in making the data “private”. Nevertheless, it is at such a low level of abstraction, it is mandatory for developers to write their own coordination code before PointerTensor becomes operationally usable. And this is where Synergos’Federation component comes in to help.
The Federation component defines the application level protocol over WebSocket to form Federated Grids. In Federation, parties who agree to work together would form a Federated Grid. A Federated Grid is a star-architecture network, in which different parties exchange messages among themselves to complete the model training and inference. The messages among different parties are exchanged via WebSocket protocol. The Federation component also exposes a number of REST APIs, which can be used to send commands to the different entities within the Federated Grid, e.g. start the training, destroy the various Workers (explained in the next paragraph) when federated training completes, etc.
Workers and TTP
There are two main types of roles in a Federated Grid. The first role is the Worker. Each party who contributes data would instantiate a worker. Individual workers do not expose their data to other Workers, but only pass their data to the TTP or Trusted Third Party, which is at the centre of the star architecture, solely responsible for coordinating the federated learning. The TTP contributes no data, but it has the “remote controller” to the data of the Workers‘.
Project, Experiment, Run
Before we proceed further, let’s understand some naming convention used in the Federation component. First, is a concept called Project. A project defines the common goal that multiple parties are working together to achieve. Under a project, there will be multiple experiments, each of them corresponds to one particular type of model to be trained, e.g. logistic regression, neural network, etc. And there are multiple runs under each experiment, each of them uses a different set of hyperparameters.
Let’s use an example to better understand the relationship among different concepts. Assuming that multiple banks decide to work together to build an anti-money laundering model, this would define a project. Under this project, logistic regression is one type of model to be built. So an experiment will be defined to train a logistic regression model. Assuming we are using regularized logistic regression, multiple runs would then be defined with different values of the hyperparameter 𝛌.
A Federated Grid is setup for each run, which has three phases – Registration, Training, and Evaluation. Let’s visit them one by one.
The Registration phase is for all the parties to register the necessary information. The TTP, being the coordinator of a project, will define the project. It will also define the experiment and run, setting the model type and its corresponding hyperparameters. If a party is interested in working with other parties, its worker will register its participation in the project defined by the TTP. The party also needs to supply its connection information. After a worker has been registered into a project, it is able to declare data tags corresponding to the datasets that it would like to contribute within the project’s context. All this information is stored and managed by the Meta-data Management component.
In the Training phase, the Federated Grid defined in the Registration phase needs to be up and running before the federated training takes place. There are a few things happening to bring the Federated Grid up.
First, individual Workers are initialized. Each of them instantiates a PySyft WebsocketServerWorker (WSSW). The connection info supplied by the Workers in the registration phase is used by the TTP to poll their data headers for feature alignment. Feature alignment is a step to make sure different parties have the same number of features after applying one-hot encoding on the categorical features, without revealing the different Workers’ data (we will have another post to talk about the need for feature alignment. Stay tuned!).
The TTP then conducts the feature alignment. The alignments obtained are then forwarded to the Workers, which are used to generate the aligned datasets across all Workers. The aligned dataset is then loaded into each Worker’s WSSW when it is instantiated. It also opens up the Worker’s specified ports to listen for incoming WebSocket connections from the TTP.
Subsequently, for each Worker, the TTP instantiates a PySyft WebsocketClientWorker(WSCW), which is to complete the TTP’s WebSocket handshake with the Worker. When the handshake is established, the TTP’s WSCW can be used to control the behaviour of the Worker’s WSSW without seeing the Worker’s data. With this, a Federated Grid is established.
Now the federated training starts. The global model architecture is fetched from the experiment definition. Likewise, the registered hyperparameters are fetched from the run definition. Pointers to training data are obtained by searching for all datasets tagged for training (i.e. “train” tag). During the training, TTP uses its WSCWs, which are connected to different workers’ WSSW, to coordinate the training, i.e. sending losses and gradients among TTP and Workers to update the global model’s weights with FedAvg or FedProx.
Once training is done, the final global and local models are exported. The Federated Grid will also be dismantled. This is done by first destroying all WSCWs, closing all active WebSocket connections. The TTP then uses the connection information provided by the Workers once more to send termination commands over to the Workers via the REST API, which destroys their respective WSSWs and reclaims resources. Now the Federated Grid is dismantled, and a run completes.
In the evaluation phase, the Federated Grid defined in the registration phase is recreated with the necessary information stored in the Meta-data Management component. Instead of searching for training datasets, datasets with “evaluate” tag are sourced from the Federated Grid. The global model is switched to evaluation mode (i.e. no weight update is happening), and is used to obtain inference values across all retrieved data pointers. Once inference values corresponding to all Workers are obtained, they are stored local at each Worker. Subsequently, performance metrics are computed locally at each Worker and sent back to the TTP for aggregation and logging purposes.
After all this has been completed, the Federated Grid is dismantled again with the same mechanism as described in the training phase.
We hope that by now you have a good understanding of the various key components of Synergos and how the Federation component works. We are currently running an invited preview of Synergos to get early feedback of the development. If you are interested, please send us an email at firstname.lastname@example.org with a description of the use case you have in mind.
In the subsequent articles in this series, we will continue to see how Synergos handles some key technical challenges in Federated Learning, e.g. non-IID data. We will also present some use cases developed with Synergos. Stay tuned.
Why do we need an automated engine for Question Answering?
Back when AI Singapore (AISG) consisted of a team of five and first started our AI Apprenticeship Programme, we had one shared email address that any interested candidate could email questions to. We received questions about coursework, about stipends, about projects, about eligibility and pretty much anything else you could think of. Whoever saw the email first would jump in and write a reply. It was like running around with a hat putting out fires as they flamed up.
Over time, we noticed people asking similar sets of questions, and we managed to collate a good set of FAQs to which to point people when they came to us. Still, handling enquiries was manual and time-consuming.
Our problem is not unique. Any organization, whether in research, banking or government, has to deal with questions from customers. Customers might need clarification on almost anything, from questions that other people have asked before (FAQs) or clauses in contracts. It can be a tedious process for customer service representatives if they need to retrieve such information repeatedly.
Introducing Golden Retriever
This is why we built the open-sourced Golden Retriever, an automated information retrieval engine for human language queries. Golden Retriever is part of the set of pre-built solutions offered by AI Makerspace, solutions that make it easy for teams to integrate AI into their services. Our intention is to provide Golden Retriever as an open source tool for users across multiple industries to fine-tune the model for their own use cases. This will be beneficial for users who want to tap on confidential documents and are hoping to utilize the tool internally.
Golden Retriever primarily uses Google’s Universal Sentence Encoder for Question and Answering (Google USE-QA) to power itself, but it is also compatible with other publicly available models such as BERT and ALBERT. Our initial experiments on academic datasets like the SQUAD QA dataset and Insurance QA dataset showed promising results. The model evaluation metric is accuracy@k (Acc@k), where k is the number of clauses our model returns for a given query. A top score of 1 indicates that the returned k clauses contains a correct answer to the query, and a score of 0 indicates that none of the k clauses returned a correct answer.
If you have a set of customized dataset, you can fine-tune Golden Retriever on that dataset for better results.
We previously wrote about Golden Retriever which you can view here. However, the latest release of Golden Retriever incorporates significant changes to make it even easier to put into production.
How Golden Retriever works
We use the following open source tools to power the core of Golden Retriever,
Elasticsearch: a distributed RESTful search and analytics engine used to store the incoming queries and potential responses for your application
Minio: an object storage system used to store the fine-tuned weights of your model and other artefacts.
Streamlit: an easy-to-use package to setup a frontend page for users to send in their queries
FastAPI: a web framework for building APIs with python 3.6+
DVC: DVC runs on top of any Git repository and allows users to setup reproducible Machine Learning pipelines
Google’s Universal Sentence Encoder: a model pre-trained by Google and used within Golden Retriever to encode text into high dimensional vectors for semantic similarity tasks.
While fine-tuning the Google Universal Encoder model, we have tried to leverage on training data (question and answer pairs) across different domains. We observed that the performance of the model did not improve when we were fine-tuning these largely differing datasets simultaneously. When fine-tuning a set of model weights for your own use case, it might be worth having multiple model weights for different use cases rather than seeking to fine-tune a model that is generalizable across domains.
Benefits of Golden Retriever
The backend services needed to make Golden Retriever production-ready are packaged by Docker Compose. This means the application is platform agnostic. As long as you have Docker on your machine, calling docker-compose up will create the services needed to run Golden Retriever. No piecemeal installations are needed.
Each component is well supported by the Python community. They have good documentation and Getting Started pages for reference.
These design choices mean Golden Retriever is transparent, easy to install and easily customizable. For more narrative walkthroughs of Golden-Retriever or example use cases, check out our github repo and our information page and our demo.
We strive to continuously improve the functionalities of Golden Retriever and welcome contributions from the community. Do drop us an email if you have any suggestions or questions.