Get SGD58.85 off your premium account! Valid till 9 August 2021. Use the Code ‘SGLEARN2021’ upon checkout. Click Here

PeekingDuck V1.2 – A Major Update!

The PeekingDuck team is excited to share our latest v1.2 release! Our model zoo has been expanded with new object tracking, crowd counting, object detection, and pose estimation model nodes. We have also revamped our documentation by making information easier to find, and adding tutorials to guide users. Useful nodes such as for aggregating statistics or image augmentation have also been introduced. Lastly, to accentuate the value of using PeekingDuck, we have created a tagline for PeekingDuck: low-code, flexible, extensible.

PeekingDuck: Low-code, Flexible, Extensible

New Models

PeekingDuck currently offers different object detection and pose estimation model nodes. In this release, we are introducing two new categories of models to PeekingDuck – object tracking and crowd counting. 

What is object tracking? In object detection, the model is unable to tell that a person detected in a video frame is the same person in the next video frame. Object tracking solves this limitation by assigning unique IDs to each initial detection, and tracking these detections as they move around in subsequent video frames. This is demonstrated in the GIF below, where the IDs above each person are consistent over time – hence each object is being “tracked”. Our model.jde and model.fairmot nodes are able to track people, allowing the total number of unique passers-by to be counted over time. This helps reduce dependency on manual counting and can be applied to areas such as retail analytics, queue management, or occupancy monitoring.

People tracking

As mentioned earlier, low-code is one of PeekingDuck’s strengths – this is achieved by PeekingDuck’s modular system of nodes. Different use cases can be tackled simply by specifying the required nodes in a config file, within just a few lines! The scenario shown in the above GIF was the result of using the config file below, where input.visual was used to read from the source video, model.jde performed object tracking, dabble.statistics counted the total number of people, and the other nodes drew and output the results on the screen. More details about how to use PeekingDuck and its nodes can be found in the links at the end of this article.

Config file for counting people over time

We have also created a flexible dabble.tracking node that uses heuristics for tracking. It is not limited to tracking people – when paired with our object detection nodes, it can track one of 80 different types of objects including vehicles or animals. When applied to vehicles in the GIF below, it can count the total number of unique vehicles passing by in a period of time, aiding transportation planning by identifying periods of peak traffic.

Vehicle tracking
Vehicle tracking config file

Crowd counting is the other new category of models in this release, specifically targeted for counting large numbers of people in the hundreds or thousands. Object detection models work less well in such scenarios due to occlusion, where people are blocking others. Our model.csrnet node uses a different density-based approach that circumvents this problem, giving a much better estimate of a crowd size.

Crowd counting
Crowd counting config file

Lastly, we have bolstered the ranks of our object detection and pose estimation models with improved and recent models. The anchor-free model.yolox object detector has almost twice the accuracy of our existing model.yolo when comparing the “tiny” model variants, without much compromise on the FPS. As for pose estimation, our model.movenet offers significant performance improvements over the existing model.posenet, and has more accurate predictions for exercise use cases such as yoga. 

Improved Documentation

Over the last few months, we have been conducting user interviews with a number of PeekingDuck users (thank you for your support!). We have revamped our Read the Docs page using the feedback gathered – making it much easier to find information, providing better guidance to our users, and aesthetically improving the website. Tutorials are now available for perusal, from the basics in the “Hello Computer Vision” tutorial, intermediate recipes in “Duck Confit”, to the advanced “Peaking Duck”, with examples such as interfacing with an SQLite database or aggregating statistics for analytics.

Screenshot of Revamped Documentation

A glossary page has been added for easy reference to PeekingDuck built-in data types, as well as a FAQ and troubleshooting section. Additionally, detailed installation instructions for PeekingDuck on Apple Silicon (M1 Macs) for MacOS Big Sur and Monterey are now provided. 

Additional Notable Features

Here is a high level list of other notable features:

  • Support for Python 3.9 has been added
  • A 6th category of node (augment) has been created for image augmentation. New nodes augment.brightness and augment.contrast are included in this release, and more will be added in future
  • A new input.visual node replaces both and input.recorded, as a single node can achieve the functionality of reading from an image, video, CCTV or webcam live feed
  • A new dabble.statistics node can calculate the cumulative average, maximum, and minimum of a single target variable of interest over time, to be used for analytics
  • The existing draw.legend and draw.tag nodes have been refactored to allow greater flexibility in drawing different data types

Find Out More

To start using PeekingDuck and find out more about our updated features, check out our documentation below:

You are also welcome to join discussions and reach out to our team in our Community page (

Robustness testing pipeline for NLP with Microsoft CheckList

In order to streamline robustness testing into the AI engineering process, the SecureAI team has made it a priority to integrate robustness testing tools into AI Singapore’s MLOps pipelines. In this article, we will share our experience integrating Microsoft CheckList, as a black-box robustness testing tool, in our CI/CD processes along with various technologies as shown in Figure 1. We will use an example from the SG-NLP project developed by AI Singapore’s NLP Hub for demonstration. Unfortunately, the example is currently unavailable on the SG-NLP site (as of writing) and will be added to the suite of models in a future release.


Figure 1. Tool Stack

What is CheckList?

CheckList is an evaluation methodology and tool developed by Microsoft, for comprehensive behavioural testing of NLP models. NLP models are typically very large and complex, with the same backbone being adapted for a diverse range of downstream tasks. A single statistic may not be able to provide useful insights to understand and improve the model. 

CheckList guides users in designing tests targeted toward specific language capabilities, which better reflects the complexities of language tasks. A suite of capability tests gives the user a more comprehensive understanding of model performance compared to a single statistic.

CheckList introduces different test types, which assess relative changes in the model’s predictions in response to changes in input, rather than simply comparing the predictions to the ground truth as done in standard functional tests.

The test types included in CheckList are:

  • Minimum Function Test (MFT): Similar to unit tests in software engineering, composed of simple examples that verify a specific behaviour.
  • Invariance (INV): Applies label-preserving perturbations to inputs, and expects the model prediction to remain the same.
  • Directional Expectation (DIR): Applies perturbations to inputs, and expects the model to behave in a specified way.

The tests are black-box in nature, as they only require the model’s output in response to certain inputs. This allows them to be applied to any model, regardless of implementation, unlike white-box testing techniques, which require knowledge about a model’s implementation and may only be applicable to certain types of models.

Designing tests with the CheckList methodology

The use case we have selected is the ‘Learning to Identify Follow-Up Questions’ (LIF) task. As illustrated in Figure 2, given a passage as context, series of question-answer pairs as conversation history and a candidate question, the model identifies if the candidate is a valid follow-up question.


Figure 2. LIF task

As part of the CheckList process, we sat down with the team from NLP Hub to ideate and design suitable tests for the model. The predefined list of capabilities provided by CheckList served as useful prompts during this process.

The following are some examples of the tests that we designed for the demonstration:




Test types

Robustness (typo)

Introducing typos into the text

brother -> brohter


Robustness (contractions)

Expanding and contracting contractions

They’re →They are They are → They’re


Taxonomy (synonyms)

Replacing words with their synonyms

see -> envision


When a test is run, the data is perturbed to generate test cases, which are passed to the model for predictions. When a model does not behave as expected on a test case, it is considered as a failed case. The failure rates on each test may indicate potential areas of improvement for the model.

Implementing tests in CheckList

Most of the basic perturbations are readily available as functions in CheckList, however, they are designed to be applied directly to strings. Due to the additional complexity of the model input (a single input is represented as a JSON object with predefined keys for the various sub components, as shown in Figure 2), we had to implement adapters for the built-in perturbation functions in order to use them on our data.

To carry out tests with the perturbed data, CheckList requires a function that can return predictions and confidences from the model for a given set of data. As the model being tested was deployed on the cloud, this was easily accomplished by implementing a function that interacts with the model via its REST API.

Data versioning and DVC pipelines

We explored DVC and used it as part of our workflow with two objectives in mind. The first is for data versioning. The second is to make use of DVC pipelines, which reduces unnecessary runtime with intermediate artifact caching. The robustness testing pipeline can be organized into stages, with some stages depending on artifacts from the previous ones, as shown in Figure 3. When the pipeline is executed with DVC, it will only run the stages where the dependencies (data or code) have changed, and reuse intermediate artifacts from the previous run. This allows us to update individual tests or add new ones without the need to re-run the entire pipeline every time.


Figure 3. Stages and artifacts of robustness pipeline in DVC

Integration of robustness testing in Git workflow

We implemented robustness testing as a job in our GitLab CI pipeline, as shown in Figure 4. The CheckList tests are run as a DVC pipeline, configured with a remote storage. CML is used to post the results as a comment to the commit associated with the pipeline.


Figure 4. Integration of robustness testing in CI pipeline

An example of the comment in a GitLab merge request is shown in Figure 5. This allows the user to easily view the results within their development platform, and decide if the model performance is sufficient for deployment or use the insights from the evaluation to improve the model in an informed manner.


Figure 5. Report posted as GitLab comment in a merge request

Interactive result analysis with Voila

CheckList comes with an interactive Jupyter widget to facilitate analysis of the perturbed text and test results. In order to integrate it into the CI/CD pipeline, we turned it into a standalone application with Voila, as shown in Figure 6. We configured the CI/CD pipeline to deploy a containerized Voila application on a Kubernetes cluster, using the results of each run. The user can access the container for each run at a unique URL, and look through the detailed results of each test and examples of the test cases, as shown in Figure 7. The user may analyze these and use the insights to make informed decisions on using or improving the model.

Figure 6. Visual summary Jupyter widget deployed as Voila app
Figure 7. Test details and an example of a failed test case


All in all, CheckList provides us with a systematic process for designing a comprehensive suite of tests for NLP models. The black-box nature of the tests and flexibility of the tooling allows it to be applied across a diverse range of tasks, possibly even beyond the realm of NLP applications.

By combining CheckList with various technologies, we have demonstrated how robustness testing can be integrated into the ML development process in a reproducible and convenient manner.

Moving forward, the SecureAI team aims to continue the progress in this area and contribute to the development of more secure and trustworthy AI systems. Stay tuned!


Great news! We are on medium! You can check out this article on our medium profile:

Do follow us for more articles!


New storybook by AI Singapore to help lower primary school children learn about AI

AI Singapore (AISG) today announced the launch of a storybook, Daisy and her AI Friends, which introduces basic artificial intelligence (AI) concepts like computer vision and machine learning, to children aged between seven and eight years old in a fun and easy-to-understand way.

Launched by Minister of State for Education and Manpower, Ms Gan Siow Huang, the storybook marks another milestone of the AI for Kids (AI4K)® programme, which previously groomed teachers and parent volunteers to conduct AI enrichment classes for upper primary school children.

Koo Sengmeng, Senior Deputy Director of AI Innovation at AI Singapore said, “AI is one of the most transformative technologies for the past decade and Singapore recognises it as an important enabler of our Smart Nation vision. It is never too early to introduce and demystify AI to our younger generation. This storybook expands AISG’s generational AI capabilities development efforts to groom national AI literacy for all, and we hope the book will kickstart the learning journey for the child and parents alike.” 


Creating a book that makes AI relatable

The 40-page storybook follows the story of Daisy as she finds her way around the school campus on her first day of class. Along the way, she enlists the help of new friends who embody basic AI concepts such as computer vision, natural language processing and machine learning. Daisy eventually locates her classroom, thanks to the abilities of her friends.


Work on the book started in April 2021 with a storybook character design competition for primary school children. It received over 230 submissions and 10 finalists’ designs were chosen. These designs were then brought to life in the storybook by a local student from the Nanyang Academy of Fine Arts to accompany the storyline developed by AISG.


Making AI literacy accessible to everyone

Members of the public can borrow Daisy and her AI Friends in hard copy at the public libraries or access the digital copy via National Library Board and AISG’s LearnAI website.

To commemorate the launch, Meta has sponsored the first print run of 10,000 copies which will be distributed to low-income families by AISG to ensure equitable opportunities for everyone in Singapore to learn about AI. These families are beneficiaries of the Ministry of Social and Family Development’s Community Link programme as well as TOUCH Community Services, Singapore Hokkien Huay Kuan amongst others. AISG looks forward to working with more like-minded grassroots and community partners to widen the distribution of the storybook to more beneficiaries.

Plans are also underway for a roadshow featuring the storybook’s concept art and storyboard at selected public libraries from 1 April 2022. More details would be available via AISG’s social media channels at a later date.

More information on the (AI4K)® programme can be found here.

mailing list sign up

Mailing List Sign Up C360