Get SGD58.85 off your premium account! Valid till 9 August 2021. Use the Code ‘SGLEARN2021’ upon checkout. Click Here

AI Summer School

The first AI Summer School will bring together leading local and international AI experts to educate and foster collaboration among the next generation of researchers. We believe that the time is right for a Summer School because there is a critical mass of new technical materials available and high interest within the Data Science and AI communities, including industry researchers.

In addition, the AI Summer School will facilitate exchange and sharing between local and international students as well as researchers in view of upcoming collaboration and funding opportunities.

The AI Summer School will enable students, academic researchers and industrial practitioners to realize the growing need to improve the quality of our lives and the exciting possibilities surrounding the use of AI in real-world application domains as well as raise our awareness to the data innovation challenges and issues.

Click here for details.

Converting a Simple Deep Learning Model from PyTorch to TensorFlow



TensorFlow and PyTorch are two of the more popular frameworks out there for deep learning. There are people who prefer TensorFlow for support in terms of deployment, and there are those who prefer PyTorch because of the flexibility in model building and training without the difficulties faced in using TensorFlow. The downside of using PyTorch is that the model built and trained using this framework cannot be deployed into production. (Update in Dec 2019: It is claimed that later versions of PyTorch have better support for deployment, but I believe that is something else to be explored). To address the issue of deploying models built using PyTorch, one solution is to use ONNX (Open Neural Network Exchange).

As explained in ONNX’s About page, ONNX is like a bridge that links the various deep learning frameworks together. To this end, the ONNX tool enables conversion of models from one framework to another. Up to the time of this writing, ONNX is limited to simpler model structures, but there may be further additions later on. This article will illustrate how a simple deep learning model can be converted from PyTorch to TensorFlow.

Installing the necessary packages

To start off, we would need to install PyTorch, TensorFlow, ONNX, and ONNX-TF (the package to convert ONNX models to TensorFlow). If using virtualenv in Linux, you could run the command below (replace tensorflow with tensorflow-gpu if you have NVidia CUDA installed). Do note that as of Dec 2019, ONNX does not work with TensorFlow 2.0 yet, so please take note of the version of the TensorFlow that you install.

source <your virtual environment>/bin/activate
pip install tensorflow==1.15.0

# For PyTorch, choose one of the following (refer to for further details)
pip install torch torchvision # if using CUDA 10.1
pip install torch==1.3.1+cu92 torchvision==0.4.2+cu92 -f # if using CUDA 9.2
pip install torch==1.3.1+cpu torchvision==0.4.2+cpu -f # if using CPU only

pip install onnx

# For onnx-tensorflow, you may want to refer to the installation guide here:
git clone
cd onnx-tensorflow
pip install -e ..

If using Conda, you may want to run the following commands instead:

conda activate <your virtual environment>
conda install -c pytorch pytorch

pip install tensorflow==1.15.0

pip install onnx

# For onnx-tensorflow, you may want to refer to the installation guide here:
git clone
cd onnx-tensorflow
pip install -e ..

I find that installing TensorFlow, ONNX, and ONNX-TF using pip will ensure that the packages are compatible with one another. It is OK, however, to use other ways of installing the packages, as long as they work properly in your machine.

To test that the packages have been installed correctly, you can run the following commands:

import tensorflow as tf
import torch
import onnx
from onnx_tf.backend import prepare

If you do not see any error messages, it means that the packages are installed correctly, and we are good to go.

In this example, I used Jupyter Notebook, but the conversion can also be done in a .py file. To install Jupyter Notebook, you can run one of the following commands:

# Installing Jupyter Notebook via pip
pip install notebook

# Installing Jupyter Notebook via Conda
conda install notebook

Building, training, and evaluating the example model

The next thing to do is to obtain a model in PyTorch that can be used for the conversion. In this example, I generated some simulated data, and use this data for training and evaluating a simple Multilayer Perceptron (MLP) model. The following snippet shows how the installed packages are imported, and how I generated and prepared the data.

import numpy as np

import os
import time
import torch
import torch.nn as nn
import torch.optim as optim
from import Dataset, DataLoader
import onnx
from onnx_tf.backend import prepare
import tensorflow as tf

# Generate simulated data
train_size = 8000
test_size = 2000

input_size = 20
hidden_sizes = [50, 50]
output_size = 1
num_classes = 2

X_train = np.random.randn(train_size, input_size).astype(np.float32)
X_test = np.random.randn(test_size, input_size).astype(np.float32)
y_train = np.random.randint(num_classes, size=train_size)
y_test = np.random.randint(num_classes, size=test_size)
print('Shape of X_train:', X_train.shape)
print('Shape of X_train:', X_test.shape)
print('Shape of y_train:', y_train.shape)
print('Shape of y_test:', y_test.shape)

# Define Dataset subclass to facilitate batch training
class SimpleDataset(Dataset):
    def __init__(self, X, y):
        self.X = X
        self.y = y
    def __len__(self):
        return len(self.X)
    def __getitem__(self, idx):
        return self.X[idx], self.y[idx]

# Create DataLoaders for training and test set, for batch training and evaluation
train_loader = DataLoader(dataset=SimpleDataset(X_train, y_train), batch_size=8, shuffle=True)
test_loader = DataLoader(dataset=SimpleDataset(X_test, y_test), batch_size=8, shuffle=False)

I then created a class for the simple MLP model and defined the layers such that we can specify any number and size of hidden layers. I also defined a binary cross entropy loss and Adam optimizer to be used for the computation of loss and weight updates during training. The following snippet shows this process.

# Build model
class SimpleModel(nn.Module):
    def __init__(self, input_size, hidden_sizes, output_size):
        super(SimpleModel, self).__init__()
        self.input_size = input_size
        self.output_size = output_size
        self.fcs = []  # List of fully connected layers
        in_size = input_size
        for i, next_size in enumerate(hidden_sizes):
            fc = nn.Linear(in_features=in_size, out_features=next_size)
            in_size = next_size
            self.__setattr__('fc{}'.format(i), fc)  # set name for each fullly connected layer
        self.last_fc = nn.Linear(in_features=in_size, out_features=output_size)
    def forward(self, x):
        for i, fc in enumerate(self.fcs):
            x = fc(x)
            x = nn.ReLU()(x)
        out = self.last_fc(x)
        return nn.Sigmoid()(out)
# Set device to be used
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Device used:', device)
model_pytorch = SimpleModel(input_size=input_size, hidden_sizes=hidden_sizes, output_size=output_size)
model_pytorch =

# Set loss and optimizer
# Set binary cross entropy loss since 2 classes only
criterion = nn.BCELoss()
optimizer = optim.Adam(model_pytorch.parameters(), lr=1e-3)

After building the model and defining the loss and optimizer, I trained the model for 20 epochs using the generated training set, then used the test set for evaluation. The test loss and accuracy of the model was not good, but that does not really matter here, as the main purpose here is to show how to convert a PyTorch model to TensorFlow. The snippet below shows the training and evaluation process.

num_epochs = 20

# Train model
time_start = time.time()

for epoch in range(num_epochs):
    train_loss_total = 0
    for data, target in train_loader:
        data, target =, target.float().to(device)
        output = model_pytorch(data)
        train_loss = criterion(output, target)
        train_loss_total += train_loss.item() * data.size(0)
    print('Epoch {} completed. Train loss is {:.3f}'.format(epoch + 1, train_loss_total / train_size))
print('Time taken to completed {} epochs: {:.2f} minutes'.format(num_epochs, (time.time() - time_start) / 60))

# Evaluate model

test_loss_total = 0
total_num_corrects = 0
threshold = 0.5
time_start = time.time()

for data, target in test_loader:
    data, target =, target.float().to(device)
    output = model_pytorch(data)
    train_loss = criterion(output, target)
    train_loss_total += train_loss.item() * data.size(0)
    pred = (output >= threshold).view_as(target)  # to make pred have same shape as target
    num_correct = torch.sum(pred == target.byte()).item()
    total_num_corrects += num_correct

print('Evaluation completed. Test loss is {:.3f}'.format(test_loss_total / test_size))
print('Test accuracy is {:.3f}'.format(total_num_corrects / test_size))
print('Time taken to complete evaluation: {:.2f} minutes'.format((time.time() - time_start) / 60))

After training and evaluating the model, we would need to save the model, as below:

if not os.path.exists('./models/'):
    os.mkdir('./models/'), './models/')

Converting the model to TensorFlow

Now, we need to convert the .pt file to a .onnx file using the torch.onnx.exportfunction. There are two things we need to take note here: 1) we need to pass a dummy input through the PyTorch model first before exporting, and 2) the dummy input needs to have the shape (1, dimension(s) of single input). For example, if the single input is an image array with the shape (number of channels, height, width), then the dummy input needs to have the shape (1, number of channels, height, width). The dummy input is needed as an input placeholder for the resulting TensorFlow model). The following snippet shows the process of exporting the PyTorch model in the ONNX format. I included the input and output names as arguments as well to make it easier for inference in TensorFlow.

model_pytorch = SimpleModel(input_size=input_size, hidden_sizes=hidden_sizes, output_size=output_size)

# Single pass of dummy variable required
dummy_input = torch.from_numpy(X_test[0].reshape(1, -1)).float().to(device)
dummy_output = model_pytorch(dummy_input)

# Export to ONNX format
torch.onnx.export(model_pytorch, dummy_input, './models/model_simple.onnx', input_names=['input'], output_names=['output'])

After getting the .onnx file, we would need to use the prepare() function in ONNX-TF’s backend module to convert the model from ONNX to TensorFlow.

# Load ONNX model and convert to TensorFlow format
model_onnx = onnx.load('./models/model_simple.onnx')

tf_rep = prepare(model_onnx)

# Print out tensors and placeholders in model (helpful during inference in TensorFlow)

# Export model as .pb file

If you have specified the input and output names in the torch.onnx.export function, you should see the keys ‘input’ and ‘output’ along with their corresponding values, as shown in the snippet below. The names ‘input:0’ and ‘Sigmoid:0’ will be used during inference in TensorFlow.

{'fc0.bias': <tf.Tensor 'Const:0' shape=(50,) dtype=float32>, 'fc0.weight': <tf.Tensor 'Const_1:0' shape=(50, 20) dtype=float32>, 'fc1.bias': <tf.Tensor 'Const_2:0' shape=(50,) dtype=float32>, 'fc1.weight': <tf.Tensor 'Const_3:0' shape=(50, 50) dtype=float32>, 'last_fc.bias': <tf.Tensor 'Const_4:0' shape=(1,) dtype=float32>, 'last_fc.weight': <tf.Tensor 'Const_5:0' shape=(1, 50) dtype=float32>, 'input': <tf.Tensor 'input:0' shape=(1, 20) dtype=float32>, '7': <tf.Tensor 'add:0' shape=(1, 50) dtype=float32>, '8': <tf.Tensor 'Relu:0' shape=(1, 50) dtype=float32>, '9': <tf.Tensor 'add_1:0' shape=(1, 50) dtype=float32>, '10': <tf.Tensor 'Relu_1:0' shape=(1, 50) dtype=float32>, '11': <tf.Tensor 'add_2:0' shape=(1, 1) dtype=float32>, 'output': <tf.Tensor 'Sigmoid:0' shape=(1, 1) dtype=float32>}

Doing inference in TensorFlow

Here comes the fun part, which is to see if the resultant TensorFlow model can do inference as intended. Loading a TensorFlow model from a .pb file can be done by defining the following function.

def load_pb(path_to_pb):
    with tf.gfile.GFile(path_to_pb, 'rb') as f:
        graph_def = tf.GraphDef()
    with tf.Graph().as_default() as graph:
        tf.import_graph_def(graph_def, name='')
        return graph

With the function to load the model defined, we need to start a TensorFlow graph session, specify the placeholders for the input and output, and feed an input into the session.

tf_graph = load_pb('./models/model_simple.pb')
sess = tf.Session(graph=tf_graph)

output_tensor = tf_graph.get_tensor_by_name('Sigmoid:0')
input_tensor = tf_graph.get_tensor_by_name('input:0')

output =, feed_dict={input_tensor: dummy_input})

If all goes well, the result of print(output) should match that of print(dummy_output) in the earlier step.


ONNX can be pretty straightforward, provided that your model is not too complicated. The steps in this example would work for deep learning models with single input and output. For models with multiple inputs and/or outputs, it would be more challenging to convert them via ONNX. As such, an example to convert multiple input/output models would have to be done in another article, unless there are new versions of ONNX later on that can handle such models.

The Jupyter notebook containing all the codes can be found here.

This article was originally published here in Towards Data Science.

Text-based Graph Convolutional Network — Bible Book Classification

A semi-supervised graph-based approach for text classification and inference

The most beautiful graph you have ever seen, courtesy of

In this article, I will walk you through the details of text-based Graph Convolutional Network (GCN) and its implementation using PyTorch and standard libraries. The text-based GCN model is an interesting and novel state-of-the-art semi-supervised learning concept that was proposed recently(expanding upon the previous GCN idea by Kipf et al. on non-textual data) which is able to very accurately infer the labels of some unknown textual data given related known labeled textual data.

At the highest level, it does so by embedding the entire corpus into a single graph with documents (some labelled and some unlabelled) and words as nodes, with each document-word & word-word edges having some predetermined weights based on their relationships with each other (eg. Tf-idf). A GCN is then trained on this graph with documents nodes that have known labels, and the trained GCN model is then used to infer the labels of unlabelled documents.

We implement text-based GCN here using the Holy Bible as the corpus, which is chosen because it is one of the most read book in the world and contains a rich structure of text. The Holy Bible (Protestant) consists of 66 Books (Genesis, Exodus, etc) and 1189 Chapters. The semi-supervised task here is to train a language model that is able to correctly classify the Book that some unlabelled Chapters belong to, given the known labels of other Chapters. (Since we actually do know the exact labels of all Chapters, we will intentionally mask the labels of some 10–20 % of the Chapters, which will be used as test set during model inference to measure the model accuracy)

Structure of the Holy Bible (Protestant)

To solve this task, the language model needs to be able to distinguish between the contexts associated with the various Books (eg. Book of Genesis talks more about Adam & Eve while the Book of Ecclesiastes talks about the life of King Solomon). The obtained good results of the text-GCN model, as we shall see below, show that the graph structure is able to capture such context relatively well, where the document (Chapter)-word edges encode the context within Chapters, while the word-word edges encode the relative context between Chapters.

The Bible text used here (BBE version) is obtained courtesy of

Implementation follows the paper on Text-based Graph Convolutional Network (

The source codes for the implementation can be found in my GitHub repository (

Representing the Corpus

Corpus represented as a graph. Red lines represent document-word edges weighted by TF-IDF, black lines represent word-word edges weighted by PMI.

Following the paper, in order to allow GCN to capture the Chapter contexts, we build a graph with nodes and edges that represent the relationships between Chapters and words. The nodes will consist of all 1189 Chapters (documents) plus the whole vocabulary (words), with weighted document-word and word-word edges between them. Their weights A_ij are given by:

Edges weights

where PMI is the Point-wise Mutual Information between pairs of co-occurring words over a sliding window #W that we fix to be of 10-words length. #W(i) is the number of sliding windows in a corpus that contain word i#W(i,j) is the number of sliding windows that contain both word i and j, and #W is the total number of sliding windows in the corpus. TF-IDF is the usual term frequency-inverse document frequency of the word in the document. Intuitively, a high positive PMI between pairs of words means that they have a high semantic correlation, conversely we do not build edges between words with negative PMI. Overall, TF-IDF-weighted document-word edges capture within-document context, while PMI-weighted word-word edges (which can span across documents) capture across-document contexts.

In comparison, for non-graph based models, such across-document context information are not easily provided as input features, and the model would have to learn them by itself “from scratch” based on the labels. Since additional information on the relationship between documents is provided in GCN which is definitely relevant in NLP tasks, one would expect that GCN would perform better.

  1. Calculating TF-IDF
### Tfidf
vectorizer = TfidfVectorizer(input="content", max_features=None, tokenizer=dummy_fun, preprocessor=dummy_fun)["c"])
df_tfidf = vectorizer.transform(df_data["c"])
df_tfidf = df_tfidf.toarray()
vocab = vectorizer.get_feature_names()
vocab = np.array(vocab)
df_tfidf = pd.DataFrame(df_tfidf,columns=vocab)

Calculating TF-IDF is relatively straightforward. We know the math and understand how it works, so we simply use sklearn’s TfidfVectorizer module on our 1189 documents texts, and store the result in a dataframe. This will be used for the document-word weights when we create the graph later.

2. Calculating Point-wise Mutual Information between words

### PMI between words
window = 10 # sliding window size to calculate point-wise mutual information between words
names = vocab
occurrences = OrderedDict((name, OrderedDict((name, 0) for name in names)) for name in names)
# Find the co-occurrences:
no_windows = 0; print("calculating co-occurences")
for l in df_data["c"]:
    for i in range(len(l)-window):
        no_windows += 1
        d = l[i:(i+window)]; dum = []
        for x in range(len(d)):
            for item in d[:x] + d[(x+1):]:
                if item not in dum:
                    occurrences[d[x]][item] += 1; dum.append(item)
df_occurences = pd.DataFrame(occurrences, columns=occurrences.keys())
df_occurences = (df_occurences + df_occurences.transpose())/2 ## symmetrize it as window size on both sides may not be same
del occurrences
### convert to PMI
p_i = df_occurences.sum(axis=0)/no_windows
p_ij = df_occurences/no_windows
del df_occurences
for col in p_ij.columns:
    p_ij[col] = p_ij[col]/p_i[col]
for row in p_ij.index:
    p_ij.loc[row,:] = p_ij.loc[row,:]/p_i[row]
p_ij = p_ij + 1E-9
for col in p_ij.columns:
    p_ij[col] = p_ij[col].apply(lambda x: math.log(x))

Calculating PMI between words is more tricky. First, we need to find the co-occurrences between words i, j within a sliding window of 10 words, stored as a square matrix in a dataframe where rows and columns represent the vocabulary. From this, we can then calculate the PMI using the definition earlier. The annotated code for the calculation is shown above.

3. Build the graph

Now that we have all the weights for the edges, we are ready to build the graph G. We use the networkx module to build it. Here, its noteworthy to point out that most of the heavy-lifting computation for this whole project is spent on building the word-word edges, as we need to iterate over all possible pairwise word combinations for a vocabulary of about 6500 words. In fact, a full 2 days is spent computing this. The code snippet for our computation is shown below.

def word_word_edges(p_ij):
    dum = []; word_word = []; counter = 0
    cols = list(p_ij.columns); cols = [str(w) for w in cols]
    for w1 in cols:
        for w2 in cols:
            if (counter % 300000) == 0:
                print("Current Count: %d; %s %s" % (counter, w1, w2))
            if (w1 != w2) and ((w1,w2) not in dum) and (p_ij.loc[w1,w2] > 0):
                word_word.append((w1,w2,{"weight":p_ij.loc[w1,w2]})); dum.append((w2,w1))
            counter += 1
    return word_word
### Build graph
G = nx.Graph()
G.add_nodes_from(df_tfidf.index) ## document nodes
G.add_nodes_from(vocab) ## word nodes
### build edges between document-word pairs
document_word = [(doc,w,{"weight":df_tfidf.loc[doc,w]}) for doc in df_tfidf.index for w in df_tfidf.columns]
### build edges between word-word pairs
word_word = word_word_edges(p_ij)

Graph Convolutional Network

In convolutional neural networks for image-related tasks, we have convolution layers or filters (with learnable weights) that “pass over” a bunch of pixels to generate feature maps that are learned by training. Now imagine that these bunch of pixels are your graph nodes, we will similarly have a bunch of filters with learnable weights W that “pass over” these graph nodes in GCN.

However, there is a big problem: graph nodes do not really have a clear notion of physical space and distance as pixels have (we can’t really say that a node is to the right or left of another). As such, in order to meaningfully convolve nodes with our filter W, we have to first find feature representations for each node that best captures the graph structure. For the advanced readers, the authors solved this problem by projecting both the filter weights W and feature space X for each node into the Fourier space of the graph, so that convolution becomes just a point-wise multiplication of nodes with features. For a deep dive into the derivation, the original paper by Kipf et al. is a good starting point. Otherwise, the readers can just make do with this intuitive explanation and proceed on.

We are going to use a two-layer GCN (features are convolved twice) here as, according to their paper, it gives the best results. The convoluted output feature tensor after the two-layer GCN is given by:


Here, A is the adjacency matrix of graph G (with diagonal elements being 1 to represent self-connection of nodes) and D is the degree matrix ofGW_0 and W_1 are the learnable filter weights for the first and second GCN layer respectively, which is to be trained. is the input feature matrix which we take to be a diagonal square matrix (of ones) of the same dimension as the number of nodes, which simply means that the input is a one-hot encoding of each of the graph nodes. The final output is then fed into a softmax layer with a cross entropy loss function for classification with 66 different labels corresponding to each of the 66 books.

The implementation of the two-layer GCN architecture in PyTorch is given below.

### GCN architecture, with Xavier’s initialization of W_0 (self.weight) and W_1(self.weight2) as well as biases.
class gcn(nn.Module):
    def __init__(self, X_size, A_hat, bias=True): # X_size = num features
        super(gcn, self).__init__()
        self.A_hat = torch.tensor(A_hat, requires_grad=False).float()
        self.weight = nn.parameter.Parameter(torch.FloatTensor(X_size, 330))
        var = 2./(self.weight.size(1)+self.weight.size(0)),var)
        self.weight2 = nn.parameter.Parameter(torch.FloatTensor(330, 130))
        var2 = 2./(self.weight2.size(1)+self.weight2.size(0)),var2)
        if bias:
            self.bias = nn.parameter.Parameter(torch.FloatTensor(330))
            self.bias2 = nn.parameter.Parameter(torch.FloatTensor(130))
            self.register_parameter("bias", None)
        self.fc1 = nn.Linear(130,66)
    def forward(self, X): ### 2-layer GCN architecture
        X =, self.weight)
        if self.bias is not None:
            X = (X + self.bias)
        X = F.relu(, X))
        X =, self.weight2)
        if self.bias2 is not None:
            X = (X + self.bias2)
        X = F.relu(, X))
        return self.fc1(X)

Training Phase

Class label distribution

Out of a total of 1189 Chapters, we will mask the labels of 111 of them (about 10 %) during training. As the class label distribution over 1189 Chapters are quite skewed (above figure), we will not mask any of the class labels of those Chapters in which their total count is less than 4, to ensure that the GCN can learn representations from all 66 unique class labels.

We train the GCN model to minimize the cross entropy losses of the unmasked labels. After training the GCN for 7000 epochs, we will then use the model to infer the Book labels of the 111 masked Chapters and analyze the results.


Loss vs Epoch

From the Loss vs Epoch graph above, we see that training proceeds pretty well and starts to saturate at around 2000 epochs.

Accuracy of training nodes (trained nodes) and inference accuracy of masked nodes (untrained nodes) with epoch.

As training proceeds, the training accuracy as well as the inference accuracy (of the masked nodes) are seen to increase together, until about 2000 epochs where the inference accuracy starts to saturate at around 50%. Considering that we have 66 classes, which would have a baseline accuracy of 1.5 % if we assume that the model predicts by pure chance, thus 50% inference accuracy seems pretty good already.  This tells us that the GCN model is able to correctly infer the Book which the given unlabelled Chapter belongs to about 50 % of the time, after being trained properly on labelled Chapters.

Misclassified Chapters

The GCN model is able to capture the within-document and between-document contexts pretty well, but what about the misclassified Chapters? Does it mean that the GCN model failed on those? Lets look at a few of them to find out.

  • Book: Matthew
    Chapter 27: “Now when it was morning, all the chief priests and those in authority took thought together with the purpose of putting Jesus to death. And they put cords on Him and took Him away, and gave Him up to Pilate, the ruler. Then Judas, who was false to Him, seeing that He was to be put to death, in his regret took back the thirty bits of silver to the chief priests and those in authority, saying, I have done wrong in giving into your hands an upright man. But they said, what is that to us? It is your business. and he put down the silver in the temple and went out, and put himself to death by hanging. And the chief priests took the silver and said, it is not right to put it in the temple store for it is the price of blood. And they made a decision to get with the silver the potter’s field, as a place for the dead of other countries. For this cause that field was named…He has come back from the dead: and the last error will be worse than the first. Pilate said to them, you have watchmen; go and make it as safe as you are able. So they went, and made safe the place where His body was, putting a stamp on the stone, and the watchmen were with them.
    Predicted as: Luke

In this case, Chapter 27 from the book of Matthew has been wrongly classified to be from the book of Luke. From above, we see that this Chapter is about Jesus being put to death by the chief priests and dying for our sins, as well as Judas’s guilt after his betrayal of Jesus. Now, these events are also mentioned in Luke! (as well as in Mark and John) This is most likely why the model classified it as Luke, as they share similar context.

  • Book: Isaiah
    Chapter 12: “And in that day you will say I will give praise to you, O Lord; For though You were angry with me, Your wrath is turned away, and I am comforted. See, God is my salvation; I will have faith in the Lord, without fear: For the Lord is my strength and song; and He has become my salvation. So with joy will you get water out of the springs of salvation. And in that day you will say, give praise to the Lord, let His name be honored, give word of His doings among the peoples, say that His name is lifted up. Make a song to the Lord; for He has done noble things: give news of them through all the earth. Let your voice be sounding in a cry of joy, O daughter of Zion, for great is the Holy One of Israel among you.”
    Predicted as Psalms

Here, Chapter 12 from the Book of Isaiah is wrongly inferred to be from the Book of Psalms. It is clear from this passage that the narrator in Isaiah Chapter 12 talks about giving and singing praises to God, who is his comforter and source of salvation. This context of praising God and looking to Him for comfort is exactly the whole theme of the Book of Psalms, where David pens down his praises and prayers to God throughout his successes, trials and tribulations! Hence, it is no wonder that the model would classify it as Psalms, as they share similar context.


The text-based Graph Convolutional Network is indeed a powerful model especially for semi-supervised learning, as it is able to strongly capture the textual context between and across words and documents, and infer the unknown given the known.

The applications of GCNs are actually quite robust and far-reaching, and this article has only provided a glimpse of what it can do. In general other than for the task presented here, GCN can be used whenever one wants to combine the power of graph representations with deep learning. To provide a few interesting examples for further reading, GCN has been used in combination with Recurrent Neural Networks (RNNs)/Long Short-Term Memory (LSTMs)for dynamic network/node/edge predictions. It has also been successfully applied for dynamic pose estimation of the human skeleton by modelling human joints as graph nodes and the relationships between and within human body structures and time-frames as graph edges.

Thanks for reading and I hope that this article has helped much to explain its inner workings.


  1. Thomas N. KipfMax WellingSemi-Supervised Classification with Graph Convolutional Networks ( (2016)
  2. Liang YaoChengsheng MaoYuan LuoGraph Convolutional Networks for Text Classification ( (2018)

This article is first published in

Simple Neural Network Model Using TensorFlow Eager Execution


Eager Execution is a nifty approach in TensorFlow (TF) to build deep learning models from scratch. It allows you to build prototype models without the hassles that come with the graphical approach that TF uses conventionally.

For example, with Eager Execution, there is no need to start a graph session in order to perform tensor computations. This means faster debugging, as you could check each line of computation on-the-fly without needing to wrap the computation in a graph session.

As a disclaimer, however, using Eager Execution requires some knowledge on the matrix algebra concepts used in deep learning, particularly on how forward passes are done in a neural network. If you are looking for something more high-level and ready for use, I would advise using the Keras API in TF or PyTorch instead.

This article will provide an example of how Eager Execution can be used, by describing the procedure to build, train and evaluate a simple Multilayer Perceptron.

Architecture and Notations

The neural network built in this example consists of an input layer, one hidden layer, and an output layer. The input layer contains 3 nodes, the hidden layer 20 nodes, and the output layer has 1 node. The output value is continuous (i.e. the neural network performs regression).

The values of the input, hidden and output layers, as well as the weights between the layers, can be expressed as matrices. The biases to the hidden and output layers can be expressed as vectors (a special case of matrices with one row or column). The image below shows the dimensions for each of the matrices and vectors.

Notations and dimensions for matrices and vectors

Beginning Eager Execution

After importing the dependencies needed for this example (mainly NumPy and TF), you would need to enable Eager Execution if you are not using TF 2.0. The code snippet below shows how Eager Execution can be enabled.

import numpy as np

import time
import tensorflow as tf
import tensorflow.contrib.eager as tfe
# Enable Eager Execution (must be done before using Eager Execution)
# Method to check if Eager Execution is enabled

Preparing the Data for Training and Evaluation

The next step is to randomly generate some data for use in training and evaluation (for illustration purposes of course), by using NumPy’s random module. With this approach, I created two separate sets of data, one for training and the other for evaluation.

Each set of data contained 1 input array and 1 output array. The input array was in the shape (number of observations, number of features), while the output array was in the shape (number of observations, number of output values per observation). The number of features corresponds to the number of nodes in the input layer, while the number of output values per observation corresponds to the number of nodes in the output layer.

After generating the data, I split the test data into batches, for more efficient evaluation. The train data will also be split into batches, but done during the training process itself.

The snippet below shows how I prepared the data.

# Define size of input, hidden and output layers
size_input = 3
size_hidden = 20
size_output = 1

X_train = np.random.randn(800, size_input)
X_test = np.random.randn(200, size_input)
y_train = np.random.randn(800)
y_test = np.random.randn(200)

# Split test dataset into batches (training dataset will be randomly split into batches during each epoch of model training)
test_ds =, y_test)).batch(4)

Building the Model

What I did here was to create a Python class that stores the codes responsible for weight and bias initialization, forward pass, backpropagation and updates to weights and biases.

The weights and biases were initialized by sampling random values from a standard normal distribution. Random initialization of weights is typically preferred over initializing the weights with the value 0 or 1, in order to reduce the chance of getting issues such as vanishing gradients.

The forward pass can be described by the following equations. The relu() represents the Rectified Linear Unit function, which transforms the linear combination of inputs and biases in a non-linear way. There is no transform function for the equation of the output Y, as a continuous value is expected as the output. As a side note, a non-linear transform function such as sigmoid or softmax would be needed in the second equation if the output is expected to be categorical.

Matrix algebra for the forward pass

The backpropagation of loss and updates of weights and biases are taken care with a few lines of codes (in the loss() and backward() methods of the model class respectively).

The rather long snippet below shows how the model building process can be implemented in a class. The additional compute_output() method is a wrapper over the forward pass algorithm, to facilitate the user in terms of selection of hardware device (CPU or GPU) for model training and evaluation.

# Define class to build model
class Model(object):
  def __init__(self, size_input, size_hidden, size_output, device=None):
    size_input: int, size of input layer
    size_hidden: int, size of hidden layer
    size_output: int, size of output layer
    device: str or None, either 'cpu' or 'gpu' or None. If None, the device to be used will be decided automatically during Eager Execution
    self.size_input, self.size_hidden, self.size_output, self.device =\
    size_input, size_hidden, size_output, device
    # Initialize weights between input layer and hidden layer
    self.W_xh = tfe.Variable(tf.random_normal([self.size_input, self.size_hidden]))
    # Initialize weights between hidden layer and output layer
    self.W_hy = tfe.Variable(tf.random_normal([self.size_hidden, self.size_output]))
    # Initialize biases for hidden layer
    self.b_h = tfe.Variable(tf.random_normal([1, self.size_hidden]))
    # Initialize biases for output layer
    self.b_y = tfe.Variable(tf.random_normal([1, self.size_output]))
    # Define variables to be updated during backpropagation
    self.variables = [self.W_xh, self.W_hy, self.b_h, self.b_y]
  def forward(self, X):
    Method to do forward pass
    X: Tensor, inputs
    if self.device is not None:
      with tf.device('gpu:0' if self.device=='gpu' else 'cpu'):
        self.y = self.compute_output(X)
      # Leave choice of device to default
      self.y = self.compute_output(X)
    return self.y
  def loss(self, y_pred, y_true):
    Method to do backpropagation of loss
    y_pred - Tensor of shape (batch_size, size_output)
    y_true - Tensor of shape (batch_size, size_output)
    y_true_tf = tf.cast(tf.reshape(y_true, (-1, self.size_output)), dtype=tf.float32)
    # Cast y_pred to float32
    y_pred_tf = tf.cast(y_pred, dtype=tf.float32)
    return tf.losses.mean_squared_error(y_true_tf, y_pred_tf)
  def backward(self, X_train, y_train):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-4)
    with tf.GradientTape() as tape:
      predicted = self.forward(X_train)
      current_loss = self.loss(predicted, y_train)
    grads = tape.gradient(current_loss, self.variables)
    optimizer.apply_gradients(zip(grads, self.variables),
#     print('Loss: {:.3f}'.format(self.loss(self.forward(X_train), y_train)))
  def compute_output(self, X):
    Custom method to obtain output tensor during forward pass
    # Cast X to float32
    X_tf = tf.cast(X, dtype=tf.float32)
    # Compute values in hidden layer
    a = tf.matmul(X_tf, self.W_xh) + self.b_h
    l_h = tf.nn.relu(a)
    # Compute output
    output = tf.matmul(l_h, self.W_hy) + self.b_y
    return output

You may have noticed the function tf.cast() used in the class. Reason being, there is this weird error that is triggered because the from_tensor_slices() method from the earlier snippet returns tensors in the tf.float64 data format, but the matrix operations (e.g. tf.matmul()) can only handle tensors in the tf.float32 data format. I have not tried doing Eager Execution on TF 2.0, so I am not sure if this issue has already been addressed in this new version. What I do know is that this issue of data format definitely occurs in the version of TF that I used for this example (i.e. 1.31.1), so this is something to take note of if using Eager Execution on older versions of TF.

Training the Model

After preparing the data and building the model, the next step is to train the model. Model training is pretty simple, with only a few lines of codes needed. The basic idea here is to repeat the following for each batch of data for every epoch: feed the input tensor through the model to get the prediction tensor, compute the loss, backpropagate the loss, and update the weights and biases. During every epoch, the training data will be split randomly into different batches, to increase the computational efficiency of model training and help the model generalize better. The following snippet illustrates how training can be done with Eager Execution.

# Set number of epochs

# Initialize model, letting device be selected by default during Eager Execution
model_default = Model(size_input, size_hidden, size_output)

time_start = time.time()
for epoch in range(NUM_EPOCHS):
  loss_total = tfe.Variable(0, dtype=tf.float32)
  train_ds =, y_train)).shuffle(10, seed=epoch).batch(4)
  for inputs, outputs in train_ds:
    preds = model_default.forward(inputs)
    loss_total = loss_total + model_default.loss(preds, outputs)
    model_default.backward(inputs, outputs)
  print('Epoch {} - Average MSE: {:.4f}'.format(epoch + 1, loss_total.numpy() / X_train.shape[0]))
time_taken = time.time() - time_start

print('\nTotal time taken for training (secoonds): {:.2f}'.format(time_taken))

Evaluating the Model

The final step is to evaluate the model using the test set. The codes to do that are similar to that of training but without the backpropagation and updates of weights and biases.

test_loss_total = tfe.Variable(0, dtype=tf.float32)
for inputs, outputs in test_ds:
  preds = model_default.forward(inputs)
  test_loss_total = test_loss_total + model_default.loss(preds, outputs)
print('Average Test MSE: {:.4f}'.format(test_loss_total.numpy() / X_train.shape[0]))


While Eager Execution is pretty straightforward to use, I would like to emphasize again that it is a low-level approach. I would advise against using Eager Execution, unless: 1) you are doing work that requires you to build a deep learning model from scratch (e.g. research/academic work on deep learning models), 2) you are trying to understand the mathematical stuff that is going on behind deep learning, or 3) you just like to build things from scratch.

Having said that though, I think Eager Execution is a pretty good approach in terms of helping you understand a bit better on what actually happens when we do deep learning, without having to juggle with complicated graphs or the other confusing stuff that come with the conventional TF approach.

The Google Colab notebook that I created for this example can be found here.

This article was originally published here in Towards Data Science.

Let’s Hear It from Our AIAP Batch 1 Graduates

The AI Apprenticeship Programme idea was conceived to solve a major issue AI Singapore faced – lack of local engineering talent trained in AI to work on the 100 Experiments (100E) programme. 100E is AI Singapore’s flagship programme to solve industries’ AI problem statements and help them build their own skilled teams.

“When we first thought of the AI Apprenticeship Programme back in July 2017, we were not sure if we could pull it off and get the support we needed as the AI Apprenticeship Programme was not part of the original AI Singapore programme approved. As much as we tried to hire, it was not easy to find Singaporean engineers and developers trained in AI. It also did not help that we were not able to match the salaries of Google, Facebook, Alibaba and the big boys.

So I asked my team if we could train Singaporeans who are keen in AI and are already learning AI on their own but perhaps did not have an opportunity to work on real world AI problems.

Lo and behold…. we found these rough stones and polished them into gems. This was the genesis of the AIAP.” – Laurence Liew, Director of AI Industry Innovation, AISG.

The Pioneer Batch's Toolkit

First-Hand Experience

Thanks to the 100E project offered by the programme, our apprentices were assigned different projects that allowed them to gain valuable hands-on experience at tackling real-world problems that they might face in their future professional lives. According to one of our graduates Lee Cheng Kai, the “fastest way to learn something is to do it.” And that is what the programme delivered. It gave the apprentices the opportunity to “work on real life machine-learning problems and challenges,” which allowed them to learn all about an end-to-end life cycle of a machine learning project. “…practical experience from the apprenticeship accelerated my learning of various AI concepts and frameworks.” concurred Tai Kai Yu, Systems Specialist, Data Analytics/AI, ST Engineering.

Exposure to Substantial Educational Resources

“The programme exposed me to numerous resources and helped me learn all things from AI to DevOps,” said Christopher Leong, a member of our pioneer batch who recently joined the R&D team at Virtuos Games.

Echoing this statement is Leong’s batch mate, Raimi Karim who’s now an AI Engineer at AI Singapore. “I learnt how to look for resources that are suitable for me. This allowed me to pick up Python and JavaScript as well as deep learning and reinforcement learning.”

Peer-to-Peer Learning

Calling on AI talents from various walks of life and different stages of their careers allowed for a diverse and dynamic learning environment. Apprentices shared their expertise and experiences that allowed them to expand each other’s knowledge. According to Hoo Chan Kai, one of our Batch 1 graduates who recently joined DBS as a Data Scientist, the programme allowed him to connect “with like-minded individuals and mentors with a specific domain expertise.” 

“…the other apprentices have made my learning easier through discussions and sharing their domain expertise. They are also a fun bunch of people with amazing ideas on AI.” Raimi added.

Knowledgeable Mentors

So you want to become an AI Apprentice

Giving the apprentices free reign over their AI projects to choose but providing sound guidance and advice, our mentors created a hands-on learning experience for our pioneer batch.

According to Leo Tay, “I was able to share freely with mentors on thoughts and issues about the projects and how to tackle them.”

Spanning over the 9 months programme, the apprentices, who came into AI despite their diverse backgrounds, learnt a lot more about AI especially through the 100E projects they were assigned to. Almost all of them went through at least one end-to-end cycle of an AI project – from problem formulation, modelling, solution engineering to MVP deployment. We also saw the bond they had built amongst themselves – this, we believe is the most important takeaway from the programme. It was also a fulfilling experience for the AISG mentors who, in return, had learnt much from the apprentices through the different interactions and discussions.

12 out of our 13 graduates went to AI-related jobs immediately after the programme; 1 will further her studies in Research at a local university

Advice from the Pioneer Batch

“Be humble and keep an inquisitive and open mind since there are always new skills and knowledge to acquire. Do not shy away from seeking additional inputs or asking your peers for opinions.”

 Hoo Chan Kai

“I had no computer science background but I have been trying to get my feet wet with programming and statistics in the last two to three years. In the first three months, we were given time to learn about some basic skills required in AI. Now, I have a better appreciation of AI and the technologies involved, as well as a foundation and platform to go more in-depth.” 

Eunice Soh

What's next for AIAP?

We will continue to grow our own timber!!!

1 batch done

2 batches ongoing…and

6 more batches to go!

Find out how you can be a part of the AIAP....

Sign up from 20 May – 9 June, 2019 (Batch #4 Application)


HPC to Deep Learning from an Asia Perspective

Big data, data science, machine learning, and now deep learning are all the rage and have tons of hype, for better—and in some ways, for worse. Advancements in AI such as language understanding, self-driving cars, automated claims, legal text processing, and even automated medical diagnostics are already here or will be here soon.

In Asia, several countries have made significant advancements and investments into AI, leveraging their historical work in HPC.

China now owns the top three positions in the Top500 with Sunway TaihuLight, Tianhe-2, and Tianhe, and while Tianhe-2 and Tianhe were designed for HPC style workloads, TaihuLight is expected to run deep learning frameworks very efficiently. In addition, Baidu of China probably has one of the largest AI teams in this part of the world, and it would not be surprising to learn that these large Internet companies are working closely with the likes of TaihuLight and the Tianhe team to develop their own AI supercomputers.

Japan is no stranger to AI and robotics, and has been leading the way in consumer-style AI systems for a long time. Remember that Fuzzy Logic washing machine? Japan’s car industry is probably one of the largest investors into AI technology in Japan today, with multiple self-driving projects within Japan and globally.

RIKEN is deploying the country’s largest “Deep learning system” based on 24 NVIDIA DGX-1 and 32 Fujitsu servers this year. Tokyo Tech and the National Institute of Advanced Industrial Science and Technology (AIST) have also announced their joint “Open Innovation Laboratory” (OIL), which will have the innovative TSUBAME3.0 AI supercomputer this year and an upcoming massive AI supercomputer named “ABCI” in 2018.

South Korea announced a whopping US $863M investment into AI in 2016 after AlphaGo’s defeat of grandmaster Lee Sedol, and this is an additional investment on top of existing investments made since early 2013 (Exobrain and Deep view projects). It will establish a new high profile, public/private research center with participation from several Korean conglomerates, including Samsung, LG, telecom giant KT, SK Telecom, Hyundai Motor, and Internet portal Naver.

Closer to home, Singapore has recently announced a modest US $110M (SGD $150M) national effort over five years to build its capabilities in Artificial Intelligence called AI@SG. Funded by the National Research Foundation of Singapore and hosted by the National University of Singapore, this is a multi-agency effort comprising government ministries, institutes of higher learning, and industry to tackle specific industry problems in Singapore. Besides a grand challenge problem (to be identified by end of the year), a major focus is on partnering with local industry to drive the adoption of AI technology to significantly improve productivity and competitiveness.

In particular, an effort called 100E — for 100 industry “experiments” over five years — will work closely with industry partners to help solve their problems using AI and HPC with the combined efforts of all the government agencies and institutes of higher learning and research centers. As is typical of Singapore style, three big bets for AI have been identified in Finance, Healthcare, and Smart City projects. The compute backbone of AISG is expected to ride on new AI HPC systems and also leverage various HPC systems existing in Singapore, including the newly established National Supercomputing Centre.

AI being powered on HPC-style clusters is not an accident. It has been and always was a workload that HPC folks have been running — it’s just that it was not sexy to be associated with AI back then. Now, we can all come out of the closet.

About the Author

Laurence Liew is currently the Director for AI Industry Innovation in AI Singapore.

Prior to AI Singapore, Laurence led Revolution Analytics’ Asia business and R&D until its acquisition by Microsoft in 2015. Laurence and his team were core contributors to the open source San Diego Supercomputer Centre’s Rocks Cluster distribution for HPC systems from 2002-2006 and contributed the integration of SGE, PVSF, Lustre, Intel, and PGI compilers into Rocks.

mailing list sign up

Mailing List Sign Up C360