Get SGD58.85 off your premium account! Valid till 9 August 2021. Use the Code ‘SGLEARN2021’ upon checkout. Click Here

Deployment in GitLab

GitLab is a platform that we use within AI Singapore, it supports Version Control, Issue boards, Wiki pages and Deployment Pipelines. We will talk about how we use GitLab to deploy applications in this article.

What is Deployment?

The process to allow a software or platform to be accessible to the users. This can be done manually or automatically, depending on the tools used. This is usually the last stage of the process, where we have completed coding and testing on our local machines. We want to be able run our application on a server so that our users can access it. We can also deploy test environments to allow for extensive testing of a new system.

What is CI/CD?

This refers to Continuous Integration and Continuous Delivery. We can automate code checks, tests, builds and deployment when there are changes checked into a GitLab repository. 

(Image source : GitLab homepage)

A GitLab Job is triggered when changes are pushed to a GitLab Repository. Depending on how the CI/CD file .gitlab-ci.yml is set up, the GitLab job will build, check and deploy the application. We can set up the file to only deploy from specific git branches (e.g. production or staging).

In most of our in-house applications, the deployment stage of a CI/CD pipeline is similar and reusable. 

Here are some of the benefits of automatic deployment and CI/CD:

  • Developers can focus on the code rather than on the deployment.
  • It is easier for developers to get quick feedback on any small changes they have made, this will make debugging easier.
  • By using CI/CD as part of the development process, we can test out any code and config changes in a testing environment, before rolling this out to the production server. This is especially useful when testing out new features.
  • It is a repeatable and trackable process, with rollback functionality (when turned on).
  • It allows developers to continuously test any config changes. This removes any human error when deploying it manually, such as missing files or environment variables. The environment variables are also stored in Gitlab CI/CD Settings.

The Tools

We use GitLab, Docker, Helm and Skaffold to deploy our applications and services to our Kubernetes clusters.

  • GitLab CI/CD  is integrated into our GitLab Projects, this contains our code and environment variables required to deploy our application.
  • Docker is a type of containerization tool, which allows a developer to package code, dependencies in order to run an application in multiple environments.
  • Skaffold is developed by Google, it allows you to build a pipeline to build, push and deploy your applications using configurations defined in a YAML file.
  • Helm  is an open source project used to manage packages on Kubernetes.

We place the aforementioned .gitlab-ci.yml file in the root directory of the repository. This file defines the stages of our deployment (e.g. Code Checking, Build, Deploy). Within each stage, we can define a list of steps to execute. Refer to the screenshot below.

  • The first stage (code_check) installs the packages, does code linting and builds a docker image.
  • The second stage (deploy_staging) will login to the docker registry and run the skaffold pipeline to deploy the application to our Kubernetes cluster.

In this case, we are deploying an application to our staging environment, we can define multiple environments within .gitlab-ci.yml (e.g. staging and production).

In the deploy_staging, we call the skaffold command.

skaffold run --filename=skaffold.yaml --namespace=your_namespace

Running the command above will execute the pipeline, to build the Docker Image and deploy to Kubernetes (using Helm) with the variables read from GitLab according to what is defined in skaffold.yaml.

An example skaffold.yaml file is shown below.

apiVersion: skaffold/v1
kind: Config
  name: application-name
  - image:
      dockerfile: Dockerfile

        - --install
      - name: yourAppName
        chartPath: yourApp
        namespace: your_namespace
          YOUR_VAR : "{{.VARIABLE_NAME}}"

As can be seen, the deployment step uses Helm. In this case, we are deploying a custom application yourApp using Helm charts. The Helm charts contain the configuration details, such as environment variables and the parameters used to spin up the Kubernetes pods (eg: RAM Size, Namespace).

A helm chart consists of a collection of files that define your deployment. We would not touch further into this for this article, that would probably take up another blog post.

Running GitLab CI/CD Pipelines

Once the GitLab CI/CD file is set up correctly, we can run the pipeline through the GitLab interface or by pushing code to the repository.

Here is a screenshot of the GitLab CI/CD Jobs. We can see that there are 2 steps here – code_check and deploy.

We can also look at the detailed logs by clicking on Job IDs.

That was a quick review of the deployment tools and steps within our team and how we use the tools to automate some parts of the process. I hope you have learnt a little bit about the motivation for this work and gained some insight into how we deploy our applications here at AISG.


A Collaboration To Build Graph Database Capabilities in AI and Machine Learning Applications

TigerGraph and AI Singapore (AISG) have announced the signing of a Memorandum of Understanding (MOU) to promote Machine Learning (ML) and Artificial Intelligence (AI) industry enablement activities through AISG’s programmes in Singapore. TigerGraph will offer the world’s first distributed native graph database training and certification to AISG’s engineers and apprentices and jointly develop industry best practices for the deployment of AI and ML models in the cloud and at the edge for Singapore use cases.

The partnership will see TigerGraph supporting AISG’s AI Innovation programmes such as 100 Experiments (100E), AI Apprenticeship Programme (AIAP)®, Makerspace AI Bricks and AI Engineering Hub (AIEH) through projects, workshops, seminars, certifications, and joint research and development activities.

As a small country with many digital natives and a national AI programme supported by both the public and private sectors, we already punch above our weight in the technological and economic race. At AISG, we are relentless about nurturing innovation and research breakthroughs that will give birth to bold ideas and applications of AI to solve societal or business challenges. With TigerGraph on board, we are excited by the limitless possibilities of graph database in support of AI and ML applications. For example, new analytics innovation or new graph algorithms may drive superior outcomes to solve AI problem statements presented by the industries in our flagship programme, 100E.

Laurence Liew, Director of AI Innovation

TigerGraph is a transformative native parallel graph database that is uniquely conducive for AI and ML application. TigerGraph database structure of nodes and edges creates connecting and traversing links. AI and ML depend on such links to uncover patterns and data in order to create insights for businesses. TigerGraph’s deep link analytics enables it to process terabytes of data and traverse millions of connections in a fraction of a second, making it an ideal solution for critical applications such as fraud detection, customer360, IoT, AI and ML.

Serene Keng, managing director of channel and alliances for Asia Pacific and Japan, TigerGraph said, “It is now a business imperative to use AI and ML to drive deep insights to transform operations and improve efficiency and the bottom-line. We are proud to join forces with AISG to help companies realise the full potential of graph database for AI and ML application through identification and co-development of industry verticals, sharing of best practices and use cases to accelerate adoption of these critical technologies. TigerGraph is also committed to enabling a steady pipeline of AI professionals with graph database analytics capabilities by supporting the research and development needs of the apprentices in the AIAP®.”

AISG and TigerGraph will explore the setting up of a Graph AI Center of Excellence to conduct proof-of-concept and customer projects under the 100E programme.

Improving Singapore’s AI Literacy With AI for Everyone

For many people. their first point of contact with AI Singapore is through AI for Everyone. I had a chance to talk to Sengmeng, who leads the talent programme, on how getting the nation to be AI literate is so important.

Below is a transcript of the conversation [*].

Hi Sengmeng, good to have you.

Thank you, Basil. Thanks for inviting me.

In previous episodes, I have talked to a wide range of people both within and without AI Singapore, including our lead engineers, AI apprentices and industry advisory members – who are the ones who reach out to industry people to bring them onto the AI journey – as well as sponsors on our 100E programme who come to us with their problem statements.
But, to the general public, in terms of sheer numbers, I think most people are acquainted with AI Singapore through programmes like AI for Everyone and AI for Industry.
Recently, we even forged a partnership with the Civil Service College to adopt AI for Everyone to develop AI literacy for all our public service officers. And we know how big the civil service is, so things are certainly scaling up.
Let’s pull back the curtain a little and bring us behind the scene, the evolution and thinking behind these initiatives.

Thanks, Basil, for setting the stage. Yes, there’s certainly a lot of evolution when we first started AI Singapore and how the talent programme came into place. Originally when AI Singapore was formed in July of 2017, the role for AI Innovation was very simple – to do a hundred industry projects. What we realised that, that it wasn’t so straightforward to go to the industry and then invite them to come to us with problem statements for us to do AI projects. So, a lot of work needs to be done in demystifying AI for those companies and especially introducing AI foundational knowledge to our working professionals, so that they understand that the goal that we’re trying to do with them is to enable them to achieve their business objective through artificial intelligence, not to replace them. So having programmes and having a way to establish a AI foundational baseline is very important.

So you’ve talked about the AI foundational baseline. Why is it important to have that?

In many strategies that we see from organisations, from even governments, that talk about AI strategy, it’s always about grooming AI engineers, machine learning engineers, scientists and even researchers. In AI Singapore, the view that we hold is, while these talents are very important when it comes to promoting the adoption of AI solutions and promoting the adoption of AI development within the country, equally important is the development of the knowledge among the AI users, in this case, it will be the consumers.

So, it’s not just about the tech people. They may be in the spotlight most of the time, but AI literacy is something that is broader across society, right?

That’s a very good point, Basil – AI literacy. Not everybody needs to learn how to code, and not everybody needs to learn how to program and train a machine learning model. Being able to understand basic terminologies of AI or machine learning, and being able to understand in high level principles how machine learning models are being trained – these are very good and, in fact, very essential foundational knowledge for consumers, because AI is pervading all aspects of business life. So, even as you are person who don’t code and don’t train machine learning models, you are going to be a user and you will also be consuming AI solutions. So, having AI literacy actually will help you to be a better judge of whether the AI solution is beneficial to you, how you are going to use it to improve your business, to improve your career and also to improve your life.

So, pretty much in line with our mission here at AI Singapore, where we are tasked with building up the AI ecosystem in Singapore. We have to look at things in a holistic manner and AI literacy is something that we would want to cultivate across the whole ecosystem. Looking into the future, how do you think AI literacy affects future use and development of AI?

As AI becomes more mainstream in businesses and even at public sector and government level where AI is increasingly being used to look at data-driven decision making and even recommendation of policies. Another aspect of AI literacy which is becoming more important is AI ethics. As we have more people being literate in AI, they become better evaluator of AI technology being used in their everyday lives and in their work life and they are in a better position to evaluate and to detect whether these are responsible or irresponsible AI solutions that actually affect them. So, in the future when we look at the development of AI, I do foresee an environment where the AI consumers and the public are also a natural proponent to voice out what kind of AI systems they would like to see being implemented and being deployed, and we may move into a development where future AI systems may be looking at a AI trust mark, so that when businesses get this trust mark, they are assuring the public that the AI system is being designed fairly, transparently and with ethical design in mind.

So we have certainly moved beyond the pure technicals of getting AI models to be accurate and expanded into things like making them responsible and auditable. I think this is a natural development. So, after running the programmes for a while now, what are the challenges encountered in grooming AI literacy?

We saw two challenges involved in grooming AI literacy. First is, the general public tend to fear AI being used to replace their job or to replace their livelihood, and the second is, those who wish to use AI see it as a magic bullet that will solve anything that they need AI to do. So, it’s important for us when we groom AI literacy to emphasize what modern AI can do and cannot do, and what are the limitations. Besides that, a lot of public information on AI does not go in depth into AI, into terms like machine learning, supervised learning, unsupervised learning .. they use the term too freely and that leads to a lot of misconceptions. Also, for many people who are very keen to look at AI or understand them, there are too many learning resources out there. So, most of them don’t know how to start. So when we develop AI for Everyone, it is a way for people who are very new to it, for people who don’t have a lot of time to try to go through a lot of different diverse material to have a very simple way of entering into the world of understanding AI, the first step into their AI literacy.

We have certainly covered a fair amount of ground on this journey already. What more can we expect in the future?

Our talent team continue to look at ways which we can improve our AI literacy programme. Besides looking at refreshing our existing programmes, such as AI for Everyone and AI for Industry, to updated content to keep pace with the development of AI, we also continue our efforts to broaden the scope of bringing literacy to all generations. For example, our current AI for Kids are meant for children aged between ten to twelve years old. In Singapore, there’ll be primary five to primary six. In December, we are going to roll out the AI for Kids Illustrated Edition, where we have been working with partners to produce an e-book that will be suitable for our young kids from primary one to primary three. This e-book will be in a format that is very easy for them to comprehend and a very fun way for them to understand AI, so that they themselves also have an early start in AI literacy.

Globally, there has been a lot of interest from overseas partners and even governments to try to learn from us how we have been grooming AI literacy, and we have entered into a couple of formal partnerships where we will share AI for Everyone to different governments and different grassroot organisations all over the world, so that they can use that as a template to also build the AI foundational knowledge for their citizens.

Many exciting plans in the pipeline. To end for today, I invite you to round things up with some final words.

Thank you, Basil. I think in closing, I would like to encourage the listeners to view AI as a tool just like Powerpoint or Excel is a tool, and software that you have been using are tools that actually help you to do your work better, more efficiently and generally also enrich your life. So, be open to understanding AI and be open to understanding what it can and cannot do, and be open to understanding how it can benefit you. Thank you.

Thanks, Sengmeng.

[*] This conversation was transcribed using Speech Lab. The transcript has been edited for length and clarity.

Robustness Testing of AI Systems

Standard model evaluation processes involve measuring the accuracy (or other relevant metrics) on a hold-out test set. However, the performance on these test sets do not always reflect the ability of the model to perform in the real world. This is because a fundamental assumption when deploying AI models is that all future data is of a similar distribution to what the model was trained on. However, in practice, it is very common to encounter data that is statistically different from the train set, which can potentially cause AI systems to become brittle and fail.

An AI model will always be exposed to a variety of new inputs after deployment due to the fact that the testing data is limited (i.e. a finite subset of all data available). Therefore, the concept of robustness testing is to assess the behaviour of the model to such new inputs, and identify its limitations before deployment. One way to achieve this is by curating additional data from other sources to test the model more comprehensively. However, that can be quite difficult practically. An alternative approach is to introduce mutations into the test data, with the aim of systematically mutating the data towards new and realistic inputs, similar to what the AI system will encounter in the real world. This forms the basis of many robustness testing techniques.

The research community has developed many different approaches[1] for robustness testing, which can be broadly categorised into white-box[2] and black-box testing[3]. White-box testing requires knowledge of the way the system is designed and implemented, whereas black-box testing only requires the system’s outputs in response to a certain input. These different testing techniques provide different insights about the models.

Evaluating robustness of Computer Vision (CV) deep learning model with NTU DeepHunter  

One white-box robustness testing tool that we are exploring comes from AI Singapore’s collaborator, the NTU Cybersecurity Lab (CSL). We will briefly introduce the tool before sharing our insights from using it with a computer vision use case.

In traditional software testing, fuzzing is used to detect anomalies by randomly generating or modifying inputs, and feeding it to the system[4]. A complementary concept is testing coverage, which measures how much of the program has been tested, it is used to quantify the rigour of the test. The goal is to maximise test coverage and uncover as many bugs as possible.

Analogously, fuzz testing can also be applied to machine learning systems. The NTU CSL group under Prof Liu Yang developed DeepHunter[5], a fuzzing framework for identifying defects (cases where the model does not behave as expected) in deep learning models. DeepHunter aims to increase the overall test coverage by applying adaptive heuristics based on run-time feedback from the model. We will attempt to give a brief overview of the tool in the next few paragraphs.

A key component of the fuzzing framework is the mechanism by which new inputs to the system are generated: metamorphic mutations. Metamorphic mutations are transformations in the input that are expected to yield unchanged or certain expected changes in the predictive output[7]. These transformed inputs are known as mutants. For example, some mutations for CV tasks can be varying the brightness of the picture or performing a horizontal flip. For NLP tasks, it can be contracting words or changing words to their synonyms. The mutation strategies should be specified by the user depending on their use case and requirements.

Another component is the coverage criteria. This criteria is computed for each mutant, to determine whether it contributes to a coverage increase. There are various definitions of coverage for deep learning models[8], which are based on behaviours of the neurons in a neural network.  For example, Neuron Coverage (NC) measures the neurons that are activated within a predefined threshold (major functional range), while Neuron Boundary Coverage (NBC) measures the corner-case regions. Regardless of the specific criteria used, the general idea is that tests with higher coverage are expected to capture more diverse behaviours of the model, and allow more defects to be identified, i.e. the test data is perceived to be new to the model. For more details on the assessment of the coverages, please refer to the literature.

Figure 1. The overall workflow of DeepHunter. Image is adapted from [6].

The overall workflow of DeepHunter is illustrated in Figure 1. It starts with an initial set of ‘seeds’ (inputs to the model) which are added to a seed queue for mutation. The core of DeepHunter is a fuzzing loop which combines a seed selection strategy (heuristics to select the next seed for mutation) with the metamorphic mutation, coverage criteria, and runtime model prediction. The seed selection strategy chosen is such that mutants which increase the coverage or the model fails to predict correctly will be added back to the queue for further mutation. The test cases which the model failed to predict correctly are collected for analysis, e.g. checking if the mutant is realistic. This coverage-guided fuzzing technique was demonstrated[5] to be more effective than random testing in identifying a greater number of defects in the model. For more details on the methodology, please refer to the literature.

Figure 2. Illustration of the deep learning model inference pipeline for activity classification.

One of the first users of the tool in AI Singapore is the CV Hub team, for their activity classification use case. A typical CV use case consists of a pre-trained object detection or pose estimation model, combined with use-case specific heuristics or models downstream. The CV Hub team was interested in learning about the robustness of the deep learning model that they developed for activity classification. As illustrated in Figure 2, the model takes in key point coordinates of a human pose, from a pre-trained pose estimation model upstream, and classifies it into an activity.

Figure 3. Example renders of pose key points (input to model) before and after mutation. Data is from the JHMDB dataset.

To identify suitable mutation strategies for testing the model, we conducted a discovery session with the CV Hub team to understand the requirements of the use case. We identified a number of possible mutation strategies, and one of them is to mirror key points by flipping the image horizontally, as shown in Figure 3. This mutation strategy is provided to the tool, which uses it as part of its fuzzing process to generate mutants.

We ran the robustness testing process on the original model and the results are shown in Table 1. The coverage-guided fuzzing identified a large number of defects, which implies that the model was not robust to the mutations.

Model Accuracy on test set Number of fuzzer iterations Number of defects
Original 65.3% 5000 2193
Retrained 64.3% 5000 94
Table 1. Test set accuracy and results of coverage-guided fuzzing for each of the models.

After analysing the results of the coverage-guided fuzzing, a strategy was developed to improve the robustness of the model by retraining it with augmented data. The results of the robustness testing on the retrained model are also shown in Table 1. The smaller number of defects identified implies that it is more robust to the mutations. (Note: In this article, we have demonstrated robustness testing using just one mutation strategy. Additional mutation strategies should be used to obtain a more complete picture of the model’s robustness.)

To compare the robustness testing to a standard model evaluation, we have also included the test set accuracy for each of the models in Table 1. Analysing model performance by this metric alone would have led us to infer that both models’ performance were roughly the same. However, the results from the robustness testing revealed that the models perform very differently when subjected to mutations. Therefore, through this testing process, we are more confident that the retrained model will likely be able to handle new and unseen inputs when deployed.

In summary, we have demonstrated how robustness testing can give us additional insights about a model’s performance beyond the standard evaluation, as well as actionable insights for improvement. This gives us more confidence when using the model. In the next article, we will continue our exploration into other robustness testing tools by exploring a different testing tool, Microsoft Checklist, and its application to an NLP use case.


[1] J. Zhang, M. Harman, L. Ma and Y. Liu, “Machine Learning Testing: Survey, Landscapes and Horizons” in IEEE Transactions on Software Engineering, vol. , no. 01, pp. 1-1, 5555.
doi: 10.1109/TSE.2019.2962027

[2] Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2019. DeepXplore: automated whitebox testing of deep learning systems. Commun. ACM 62, 11 (November 2019), 137–145. DOI:

[3] Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. 2017. Practical Black-Box Attacks against Machine Learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security (ASIA CCS ’17). Association for Computing Machinery, New York, NY, USA, 506–519. DOI:

[4] E. T. Barr, M. Harman, P. McMinn, M. Shahbaz and S. Yoo, “The Oracle Problem in Software Testing: A Survey,” in IEEE Transactions on Software Engineering, vol. 41, no. 5, pp. 507-525, 1 May 2015, doi: 10.1109/TSE.2014.2372785.

[5] Xiaofei Xie, Lei Ma, Felix Juefei-Xu, Minhui Xue, Hongxu Chen, Yang Liu, Jianjun Zhao, Bo Li, Jianxiong Yin, and Simon See. 2019. DeepHunter: a coverage-guided fuzz testing framework for deep neural networks. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis(ISSTA 2019). Association for Computing Machinery, New York, NY, USA, 146–157. DOI:

[6] X. Xie, H. Chen, Y. Li, L. Ma, Y. Liu and J. Zhao, “Coverage-Guided Fuzzing for Feedforward Neural Networks,” 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2019, pp. 1162-1165, doi: 10.1109/ASE.2019.00127.

[7] Chen, T.Y., Cheung, S.C., & Yiu, S. (2020). Metamorphic Testing: A New Approach for Generating Next Test Cases. ArXiv, abs/2002.12543.

[8] Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. DeepGauge: multi-granularity testing criteria for deep learning systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE 2018). Association for Computing Machinery, New York, NY, USA, 120–131. DOI:

mailing list sign up

Mailing List Sign Up C360