The Platforms Engineering Group supports AI Innovation and other Pillars in AI Singapore and empowers our users to solve challenging problems through infrastructure, platforms and engineering.

The team comprises InfraOps, DataOps, MLOps and SecureAI teams that build internal software platforms to enable and empower our users to create meaningful and robust solutions.

AI model training and inferencing consume a lot of electricity. With the push for a more sustainable way to train and operate AI models, the team is also exploring more efficient methodologies to host such AI models.

“Innovate, build and operate internal software platforms based on modern infra, tooling and best practices for our users and empower them to solve challenging problems.”

The team also provides training and mentors for the AIAP apprentices and works closely with the 100E and Bricks teams to assist and enable the AI engineers to scale their AI models with best practices in ML Ops, CI/CD pipelines and AI robustness.

Our teams include folks with extensive experience in High-Performance Computing, Big Data/Internet of Things, Infrastructure, Ops, Software Engineering, Data Engineering and Machine Learning.

Platforms Engineering Teams

Infraops

1. InfraOps

The InfraOps team manages, operates and secures the on-prem and cloud infrastructure and internal software platforms that enable AI Singapore engineering teams.

2. DataOps

The DataOps team looks after the data infrastructure and data processing pipelines. They help onboard, secure and decommission critical datasets from our collaborators and project sponsors throughout the lifecycle of the projects.

DataOps
MLOps

3. MLOps

The MLOps team works closely with the 100E engineering teams to ensure that good software practices and tooling are adopted and to scale the ML training and deployment pipelines.

4. SecureAI

Our unique SecureAI team is dedicated to developing processes and tooling to support the building of secure and trustworthy AI solutions. They help ensure that the robustness testing and coverage are addressed in the ML model training process for our AI engineering teams.

Secure AI

Our Infrastructure

Platforms Engineering
hybrid

On-Premise and Cloud Hybrid High-Performance Clusters for AI/ML workloads

CPU

7K

Over 7000 x86 and POWER CPUs serving infra, data, ML workloads.

Storage

0.5PB

0.5 PB of storage providing object store, NFS/PFS and file I/O services

Cloud

2

We leverage Google Cloud and Azure for the latest and greatest in cloud technology and infra, including AI Accelerators (A100 GPUs, Cloud TPU, other xPUs, FPGAs)

GPU

32

32 NVIDIA V100 GPUs, and 6 FPGAs for accelerated ML training and inference workloads.

Network

100G

10G Ethernet and 100G Infiniband networks providing infra and cluster networking

NUS-NSCC innovation 4.0 Data Centre

AI Singapore collaborates with the National Supercomputing Centre (NSCC) and hosts Singapore’s largest AI supercomputer (as of 2022). We believe large-scale AI models require the proven techniques from the High-Performance Computing (HPC) world, such as parallel computing, robust scheduling for workloads and checkpointing mechanisms for long-running AI runs.

Singapore’s 1st Tropical Supercomputing Data Centre
  • 300 sqm double-height, CRAC-less (aircon-less) compute area with warm-water cooling with dry coolers for heat dissipation.
  • Power-saving and environmentally sustainable, the estimated PUE is < 1.18.
  • DC has been  awarded a Singapore Building and Construction Authority (BCA) Green Mark Platinum Award (2021)
  • Leverages Singapore-designed cooling racks via Cool Hall Rear-Door Heat Exchange with patented KoolLogix Thermosiphon gas system.
  • Utilizes Digital Twin Design with IoT sensors and intelligent monitoring with AI-assisted operations
Divided into two areas (Hall A and Hall B)
  • Hall A hosts NSCC Supercomputer (HPE Cray EX)
  • Hall B hosts conventional servers, storage and equipment. AISG hosts its on-premise cluster within this Hall.

Enabling AI at Scale

Keen to learn how to build real-world AI models and deploy them at scale? Contact us to find out how we can help you through our AI Advisory services or the 100E.