Get SGD58.85 off your premium account! Valid till 9 August 2021. Use the Code ‘SGLEARN2021’ upon checkout. Click Here

Illustrated: 10 CNN Architectures

A compiled visualisation of the common convolutional neural networks

(This article was first posted on Towards Data Science.)

12 min read. TL;DR — jump to the illustrations here.

How have you been keeping up with the different convolutional neural networks (CNNs)? In recent years, we have witnessed the birth of numerous CNNs. These networks have gotten so deep that it has become extremely difficult to visualise the entire model. We stop keeping track of them and treat them as blackbox models.

Fine, maybe you don’t. But if you’re guilty too then hey, you’ve come to the right place! This article is a visualisation of 10 common CNN architectures, hand-picked by yours truly. These illustrations provide a more compact view of the entire model, without having to scroll down a couple of times just to see the softmax layer. Apart from these images, I’ve also sprinkled some notes on how they ‘evolved’ over time — from 5 to 50 convolutional layers, from plain convolutional layers to modules, from 2–3 towers to 32 towers, from 7⨉7 to 5⨉5— but more on these later.

By ‘common’, I am referring to those models whose pre-trained weights are usually shared by deep learning libraries (such as TensorFlow, Keras and PyTorch) for users to use, and models that are usually taught in classes. Some of these models have shown success in competitions like the ImageNet Large Scale Visual Recognition Challenge (ILSVRC).

The 10 architectures that will be discussed and the year their papers were published.
Pre-trained weights which are available in Keras for 6 of the architectures that we will talk about. Adapted from a table in the Keras documentation.

The motivation for writing this is that there aren’t many blogs and articles out there with these compact visualisations (if you do know of any, please share them with me). So I decided to write one for our reference. For this purpose, I have read the papers and the code (mostly from TensorFlow and Keras) to come up with these vizzes.

Here I’d like to add that a plethora of CNN architectures we see in the wild are a result of many things — improved computer hardware, ImageNet competition, solving specific tasks, new ideas and so on. Christian Szegedy, a researcher at Google once mentioned that

“[m]ost of this progress is not just the result of more powerful hardware, larger datasets and bigger models, but mainly a consequence of new ideas, algorithms and improved network architectures.” (Szegedy et al, 2014)

Now let’s get on with these beasts and observe how network architectures improve over time!

A note on the visualisations
Note that I have excluded information like the number of convolutional filters, padding, stride, dropouts, and the flatten operation in the illustrations.

Contents (ordered by year of publication)

  1. LeNet-5
  2. AlexNet
  3. VGG-16
  4. Inception-v1
  5. Inception-v3
  6. ResNet-50
  7. Xception
  8. Inception-v4
  9. Inception-ResNets
  10. ResNeXt-50


1. LeNet-5 (1998)

Fig. 1: LeNet-5 architecture, based on their paper

LeNet-5 is one of the simplest architectures. It has 2 convolutional and 3 fully-connected layers (hence “5” — it is very common for the names of neural networks to be derived from the number of convolutional and fully connected layers that they have). The average-pooling layer as we know it now was called a sub-sampling layer and it had trainable weights (which isn’t the current practice of designing CNNs nowadays). This architecture has about 60,000 parameters.

⭐️What’s novel?

This architecture has become the standard ‘template’: stacking convolutions and pooling layers, and ending the network with one or more fully-connected layers.


2. AlexNet (2012)

Fig. 2: AlexNet architecture, based on their paper.

With 60M parameters, AlexNet has 8 layers — 5 convolutional and 3 fully-connected. AlexNet just stacked a few more layers onto LeNet-5. At the point of publication, the authors pointed out that their architecture was “one of the largest convolutional neural networks to date on the subsets of ImageNet.”

⭐️What’s novel?

1. They were the first to implement Rectified Linear Units (ReLUs) as activation functions.

2. Overlapping pooling in CNNs.


3. VGG-16 (2014)

Fig. 3: VGG-16 architecture, based on their paper.

By now you would’ve already noticed that CNNs were starting to get deeper and deeper. This is because the most straightforward way of improving performance of deep neural networks is by increasing their size (Szegedy et. al). The folks at Visual Geometry Group (VGG) invented the VGG-16 which has 13 convolutional and 3 fully-connected layers, carrying with them the ReLU tradition from AlexNet. Again, this network just stacks more layers onto AlexNet. It consists of 138M parameters and takes up about 500MB of storage space 😱. They also designed a deeper variant, VGG-19.

⭐️What’s novel?

  1. As mentioned in their abstract, the contribution from this paper is the designing of deeper networks (roughly twice as deep as AlexNet).


4. Inception-v1 (2014)

Fig. 4: Inception-v1 architecture. This CNN has two auxiliary networks (which are discarded at inference time). Architecture is based on Figure 3 in the paper.

This 22-layer architecture with 5M parameters is called the Inception-v1. Here, the Network In Network (see Appendix) approach is heavily used, as mentioned in the paper. This is done by means of ‘Inception modules’. The design of the architecture of an Inception module is a product of research on approximating sparse structures (read paper for more!). Each module presents 3 ideas:

  1. Having parallel towers of convolutions with different filters, followed by concatenation, captures different features at 1×1, 3×3 and 5×5, thereby ‘clustering’ them. This idea is motivated by Arora et al. in the paper Provable bounds for learning some deep representations, suggesting a layer-by layer construction in which one should analyse the correlation statistics of the last layer and cluster them into groups of units with high correlation.
  2. 1×1 convolutions are used for dimensionality reduction to remove computational bottlenecks.
  3. 1×1 convolutions add nonlinearity within a convolution (based on the Network In Network paper).
  4. The authors also introduced two auxiliary classifiers to encourage discrimination in the lower stages of the classifier, to increase the gradient signal that gets propagated back, and to provide additional regularisation. The auxiliary networks (the branches that are connected to the auxiliary classifier) are discarded at inference time.

It is worth noting that “[t]he main hallmark of this architecture is the improved utilisation of the computing resources inside the network.”

The names of the modules (Stem and Inception) were not used for this version of Inception until its later versions i.e. Inception-v4 and Inception-ResNets. I have added them here for easy comparison.

⭐️What’s novel?

  1. Building networks using dense modules/blocks. Instead of stacking convolutional layers, we stack modules or blocks, within which are convolutional layers. Hence the name Inception (with reference to the 2010 sci-fi movie Inception starring Leonardo DiCaprio).


  • Paper: Going Deeper with Convolutions
  • Authors: Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich. Google, University of Michigan, University of North Carolina
  • Published in: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

5. Inception-v3 (2015)

Fig. 5: Inception-v3 architecture. This CNN has an auxiliary network (which is discarded at inference time). *Note: All convolutional layers are followed by batch norm and ReLU activation. Architecture is based on their GitHub code.

Inception-v3 is a successor to Inception-v1, with 24M parameters. Wait where’s Inception-v2? Don’t worry about it — it’s an earlier prototype of v3 hence it’s very similar to v3 but not commonly used. When the authors came out with Inception-v2, they ran many experiments on it, and recorded some successful tweaks. Inception-v3 is the network that incorporates these tweaks (tweaks to the optimiser, loss function and adding batch normalisation to the auxiliary layers in the auxiliary network).

The motivation for Inception-v2 and Inception-v3 is to avoid representational bottlenecks (this means drastically reducing the input dimensions of the next layer) and have more efficient computations by using factorisation methods.

The names of the modules (Stem, Inception-A, Inception-B etc.) were not used for this version of Inception until its later versions i.e. Inception-v4 and Inception-ResNets. I have added them here for easy comparison.

⭐️What’s novel?

  1. Among the first designers to use batch normalisation (not reflected in the above diagram for simplicity).

✨What’s improved from previous version, Inception-v1?

  1. Factorising n×convolutions into asymmetric convolutions: 1×n and n×1 convolutions
  2. Factorise 5×5 convolution to two 3×3 convolution operations
  3. Replace 7×7 to a series of 3×3 convolutions


6. ResNet-50 (2015)

Fig. 6: ResNet-50 architecture, based on the GitHub code from keras-team.

Yes, it’s the answer to the question you see on the top of the article.

From the past few CNNs, we have seen nothing but an increasing number of layers in the design, and achieving better performance. But “with the network depth increasing, accuracy gets saturated (which might be unsurprising) and then degrades rapidly.” The folks from Microsoft Research addressed this problem with ResNet — using skip connections (a.k.a. shortcut connections, residuals), while building deeper models.

ResNet is one of the early adopters of batch normalisation (the batch norm paper authored by Ioffe and Szegedy was submitted to ICML in 2015). Shown above is ResNet-50, with 26M parameters.

The basic building block for ResNets are the conv and identity blocks. Because they look alike, you might simplify ResNet-50 like this (don’t quote me for this!):

⭐️ What’s novel?

  1. Popularised skip connections (they weren’t the first to use skip connections).
  2. Designing even deeper CNNs (up to 152 layers) without compromising model’s generalisation power
  3. Among the first to use batch normalisation.


7. Xception (2016)

Fig. 7: Xception architecture, based on the GitHub code from keras-team. Depthwise separable convolutions are denoted by ‘conv sep.’

Xception is an adaptation from Inception, where the Inception modules have been replaced with depthwise separable convolutions. It has also roughly the same number of parameters as Inception-v1 (23M).

Xception takes the Inception hypothesis to an eXtreme (hence the name). What’s the Inception hypothesis again? Thank goodness this was explicitly and concisely mentioned in this paper (thanks François!).

  • Firstly, cross-channel (or cross-feature map) correlations are captured by 1×1 convolutions.
  • Consequently, spatial correlations within each channel are captured via the regular 3×3 or 5×5 convolutions.

Taking this idea to an extreme means performing 1×1 to every channel, then performing a 3×3 to each output. This is identical to replacing the Inception module with depthwise separable convolutions.

⭐️What’s novel?

  1. Introduced CNN based entirely on depthwise separable convolution layers.


8. Inception-v4 (2016)

Fig. 8: Inception-v4 architecture. This CNN has an auxiliary network (which is discarded at inference time). *Note: All convolutional layers are followed by batch norm and ReLU activation. Architecture is based on their GitHub code.

The folks from Google strike again with Inception-v4, 43M parameters. Again, this is an improvement from Inception-v3. The main difference is the Stem group and some minor changes in the Inception-C module. The authors also “made uniform choices for the Inception blocks for each grid size.” They also mentioned that having “residual connections leads to dramatically improved training speed.”

All in all, note that it was mentioned that Inception-v4 works better because of increased model size.

✨What’s improved from the previous version, Inception-v3?

  1. Change in Stem module.
  2. Adding more Inception modules.
  3. Uniform choices of Inception-v3 modules, meaning using the same number of filters for every module.


9. Inception-ResNet-V2 (2016)

Fig. 9: Inception-ResNet-V2 architecture. *Note: All convolutional layers are followed by batch norm and ReLU activation. Architecture is based on their GitHub code.

In the same paper as Inception-v4, the same authors also introduced Inception-ResNets — a family of Inception-ResNet-v1 and Inception-ResNet-v2. The latter member of the family has 56M parameters.

✨What’s improved from the previous version, Inception-v3?

  1. Converting Inception modules to Residual Inception blocks.
  2. Adding more Inception modules.
  3. Adding a new type of Inception module (Inception-A) after the Stem module.


10. ResNeXt-50 (2017)

Fig. 10: ResNeXt architecture, based on their paper.

If you’re thinking about ResNets, yes, they are related. ResNeXt-50 has 25Mparameters (ResNet-50 has 25.5M). What’s different about ResNeXts is the adding of parallel towers/branches/paths within each module, as seen above indicated by ‘total 32 towers.’

⭐️ What’s novel?

  1. Scaling up the number of parallel towers (“cardinality”) within a module (well I mean this has already been explored by the Inception network…)


Appendix: Network In Network (2014)

Recall that in a convolution, the value of a pixel is a linear combination of the weights in a filter and the current sliding window. The authors proposed that instead of this linear combination, let’s have a mini neural network with 1 hidden layer. This is what they coined as Mlpconv. So what we’re dealing with here is a (simple 1 hidden layer) network in a (convolutional neural) network.

This idea of Mlpconv is likened to 1×1 convolutions, and became the main feature for Inception architectures.

⭐️What’s novel?

  1. MLP convolutional layers, 1×1 convolutions
  2. Global average pooling (taking average of each feature map, and feeding the resulting vector into the softmax layer)


  • Paper: Network In Network
  • Authors: Min Lin, Qiang Chen, Shuicheng Yan. National University of Singapore
  • arXiv preprint, 2013

Let’s show them here again for easy reference:











Resources for neural network visualisation

Here are some resources for you to visualise your neural network:

Similar articles

CNN Architectures: LeNet, AlexNet, VGG, GoogLeNet, ResNet and more ….

A Simple Guide to the Versions of the Inception Network


I have used the above-mentioned papers that produced the architectures for reference. On top of these, here are some others I used for this article: (

Implementation of deep learning models from the Keras team (

Lecture Notes on Convolutional Neural Network Architectures: from LeNet to ResNet (

Review: NIN — Network In Network (Image Classification) (

Is there any error that you noticed in the visualisation? Is there anything else that you think I should’ve included? Drop me a comment below!

Special thanks to Wei Qi, Ren Jie, Fu Nan, Shirlene and Derek for reviewing this article.

Follow me on Twitter @remykarem or LinkedIn. You may also reach out to me via Feel free to visit my website at

AIAP Batch 2 Graduates!

The second batch of apprentices from the AI Apprenticeship Programme (AIAP) has recently graduated! After 9 months of extensive research, exploration, discussions and hacking, they are now ready to call themselves “AI engineers”. How has the journey been like for them? Let us hear some of their stories.

Delon Leonard

Delon first got to know about AIAP from the AI Singapore website. He had always been interested in the use of AI to solve practical problems since his university days and prior to AIAP, he had worked in a research department in a statutory board. To make the selection into the programme, Delon practised Kaggle challenges and coding interview questions and this had helped him develop his skills in computational thinking needed for the role.

Delon Leonard

The apprenticeship programme not only exposed Delon to a variety of practical problems which can be solved using AI, it also taught him a wide range of deep learning and statistical modeling techniques. Throughout the programme, he was greatly appreciative of his project mate, mentor and project manager’s support and guidance.

After having successfully worked with a real-world problem during the apprenticeship programme, Delon is set to further hone his skills in AI as he joins another statutory board to work on AI-related projects.

Tong Jieqi

Jieqi is a NUS computer science graduate and had worked as a software engineer for three-and-a-half years at a tech start-up prior to AIAP. The use of AI to solve different problems in recent years – from the automation of processes too mundane and repetitive for humans to perform to the rise of personalised services targeting individual consumers – piqued her interest in this domain. She saw the potential for AI to be applied in just about any industry and that motivated her to apply for the programme. Although Python was not her primary language as a software engineer, she had no problem picking it up and became proficient enough to make it for the selection.

Tong Jieqi

During the first 6 weeks of the programme, the apprentices went through a curated series of self-study lessons which Jieqi found useful as a guide to further explore the AI landscape on her own. Having the opportunity to work on an actual project thereafter allowed her to put into practice what she had learnt. It was also interesting for her to work with people from different backgrounds, many of whom were willing to share their experiences and knowledge. She also found the opportunities to share about her own learning journey with various VIP groups who came to visit as well as at a special AI for Everyone (AI4E) International Women’s Day session very fulfilling.

While the work had been challenging, Jieqi believes that having a positive attitude helped her successfully complete the apprenticeship programme. She looks forward to re-joining the tech start-up she left but with a different job scope where she will have more opportunities to contribute within the company in the data science domain, in addition to software engineering.

Shirlene Liew

A mechanical engineer by training, Shirlene did science policy and industry development in a government agency after graduation. Subsequent further studies in healthcare systems and operations brought her into contact with data analytics. Along the way, she became proficient in R and also picked up a bit of Python. With the birth of her daughter at the end of her master’s programme, she took 2 years off to be a full-time mother. When she was ready to re-enter the workforce, she chanced upon the apprenticeship programme from the AI Singapore website and decided to apply for it.

Shirlene Liew

Looking to deepen her technical skills to complement her policy-making and project management experience, Shirlene had the opportunity to work for a health-tech client, helping to develop models for fraud detection in healthcare images. Although there were times when dense equations and concepts appeared too daunting, her personal interest in healthcare and the supportive learning environment in the apprenticeship programme were instrumental in making the journey a positive experience.

Shirlene believes that to grow in a field, it is important to be part of a community. Even as the apprenticeship ends, she will be keeping in touch with her peers. She will be joining a data security AI start-up as a product manager where she will get to combine her technical and project management skills.

Soh Wee Tee

With a PhD in physics and working as a researcher in magnetism and spintronics, Wee Tee first became interested in AI in early 2018. Before long, he was reading up extensively on machine learning techniques as well as working on Kaggle-type projects a few months prior to the AIAP selection.

Soh Wee Tee

A plus for Wee Tee about the apprenticeship programme were the very open, friendly and collaborative discussions that took place every day. While his main project involved anomaly detection, that did not prevent him from venturing into other areas like natural language processing and reinforcement learning. A particularly proud moment was being able to implement the AlphaZero algorithm, the chess-playing AI, from scratch.

Wee Tee is of the opinion that it is important to stay in touch with rapid developments in AI by reading important papers while also at the same time, have clarity of one’s interests in order not to lose focus. Working hands-on on projects that one finds interesting and learning along the way is the best way to acquire concepts and skills.

Grateful for having been given the chance to transit smoothly into the AI industry, Wee Tee will be working in the field of natural language processing on projects to better the lives of Singaporeans.

Every journey has a beginning and an end. Although they came from different backgrounds and were at different stages in life, the common interest in AI and the belief in the positive impact they can make with this technology brought them together. Whether it was huddling together to debug a piece of code, presenting their findings or just chilling out over a beer after work, everyone has something to remember about the 9-month apprenticeship programme. 

Let us wish them success in their new endeavours!

Expedia Group and AISG Ink Collaboration Under 100E Programme to Develop AI Solution to Transform the Online Search Experience for Asian Travellers

Expedia Group today announced a collaboration with AI Singapore (AISG) under its flagship 100 Experiments (100E) programme to develop an artificial intelligence (AI) solution to transform the online search experience for Asian travellers. As the first online travel platform to collaborate with AISG for 100E, Expedia Group will provide a team of experienced engineers, data scientists and marketers to work with the AISG’s project lead, project managers and AI apprentices to enhance travel search query understanding and improve the accuracy of search query resolution in Asian languages.  

With English as the dominant language being used online by 25% of all Internet users[1], today’s search engines are extremely efficient in understanding travel search queries and providing query resolutions in the English language. However, when dealing with travel search queries conducted in Asian languages such as Japanese, Korean, simplified Chinese and traditional Chinese, the performance of the search engines decline significantly and the accuracy of query resolution dips. For a start, the Expedia Group and AI Singapore project team will leverage natural language processing and machine learning to develop an AI-based model to enhance search query understanding and resolution in the Japanese language, before extending the model to other Asian languages to enhance online search efficiency.

When completed, the AI solution will enable Expedia Group to deepen its understanding of travel search query patterns and nuances in Asian languages, and equip the travel platform with the ability to serve the needs of Asian travellers better by improving the accuracy and efficiency of search query resolution.

“All across Asia Pacific, online travel spending is growing at a very rapid pace. By 2020, the Asia Pacific region will account for more than 40% of global online travel sales, surpassing North America  and Western Europe[2]. With language processing research focused predominantly in English, Expedia Group sees a critical need to look at leveraging technologies such as machine learning and natural language processing to enhance our AI solution to improve the efficiency of search query resolution for Asian languages,” said Kevin Ng, Senior Director, Product and Technology, Expedia Group. “Through the collaboration with AI Singapore, this will not only enable us to gather the right data, technology and talents to build a viable AI solution that can enhance search query understanding, it also enables us to transform the online search experience for Asian travellers to better serve our consumers in the long haul.”

“Under AISG’s 100E collaboration with Expedia Group, machine learning is applied to better understand search queries for Asian languages in the growing online travel market in Asia.  More importantly, while we collaborate and co-create the machine learning solution with Expedia Group, we are also using the opportunity to train Singaporean AI engineers via the AI Apprenticeship Programme,” said Laurence Liew, Director, AI Industry Innovation, AI Singapore.

“Expedia Group is the world’s travel platform. With more than 7,000 data artchitects, artificial intelligence experts and engineering specialists at Expedia Group, this collaboration with AISG has the potential to unlock new and industry-leading AI solutions to improve online travel, bringing the world within reach for those shopping for travel online in Asian languages. We are excited to collaborate with AISG and look forward to working with some of Singapore’s world-class tech talents,” said Mark Okerstrom, President and Chief Executive Officer, Expedia Group.

AISG’s flagship 100E programme matches companies which are keen to use AI to address their problem statements, with the local researchers who are interested to tackle those problems. The 100E programme has received strong industry interest to date. Since its launch in 2017, more than 300 companies have contacted AISG with interest to participate in the 100E programme; 40 projects have been approved and started with some projects completed and deployed after nine months of research and development.

[1] Source: Statista most common languages used on the internet as of April 2019:

[2] Source: Statista Distribution of Digital Travel Sales Worldwide from 2014 to 2020, by region:

Proteona and AI Singapore Partner to Improve Cell Therapies and IO Treatments through Single Cell Analysis of Tumors and CAR T Cells

Proteona Pte. Ltd. has announced its participation in AI Singapore’s 100 Experiments (100E) programme to develop AI tools for single cell multi-omics data analysis. The project is being conducted in collaboration with Prof Wong Limsoon, Kwan Im Thong Hood Cho Temple Chair Professor from the National University of Singapore (NUS) School of Computing, a leading expert in bioinformatics and computational biology. Together with Proteona bioinformaticians and data scientists, the team aims to solve key challenges in single cell data analysis using artificial intelligence tools.

A key obstacle of single cell data analysis is combining datasets from different sources such as different patient samples and obtaining robust cell clustering and cell-type annotation. Single cell analysis often leads to the discovery of novel cell populations with features that had not been previously observed. Clinical samples, such as tumor biopsies, are known to be very heterogeneous, making cell type identification very challenging. Moreover, single-cell analysis is prone to noise and batch-effects that make comparisons across experiments difficult.

As a result of these challenges, cell clustering and cell annotation usually requires extensive manual intervention. This is time consuming, requires specialized knowledge and expertise, and is prone to human error and bias.

“Batch effects are prevalent in -omics data. This is particularly pronounced in single-cell measurements. Profiles from one batch are not directly compatible with that from another batch.” says Prof Limsoon Wong, NUS School of Computing. “The AI-driven components here will facilitate a more convenient and explicit identification of the specific protein complexes and biological circuits relevant to cell-types and states.”

With this collaboration, the team will further develop their robust computational workflows for knowledge-driven analysis, with an AI-based system trained using Proteona’s in-house annotated datasets. Proteona’s ESCAPE™ RNA-Seq technology and services simultaneously measures both proteomic and transcriptional expression at single-cell resolution. The developed AI-analysis will leverage this unique modality to enable deeper insights into single-cell biology.

“An immediate outcome of this collaboration will be a tool to improve the quality of results presented to our customers. It will save them time in annotating known cell types and correcting for batch effects. This platform is also used internally as a way for building our database of cell types and cell states which is then used for better annotating our customer’s data. We will also use these tools for our internal programs in biomarker discovery and diagnostic development,” says Dr Andreas Schmidt, CEO of Proteona.

“We see the merging of biotechnology and data-driven IT as one of the biggest value drivers in the health industry. With Proteona`s single cell proteogenomic data platform the company is in a unique position to impact health decisions for therapy development and the clinic,” explains Chou Fang Soong, General Partner Pix Vine Capital, one of Proteona´s investors.

With founders Prof Gene Yeo of UCSD, Prof Jonathan Scolnick of NUS and Deputy Director of the Molecular Engineering Laboratory, A*STAR, Dr Shawn Hoon, Proteona has strong roots in cutting edge academic discoveries around the world. The Proteona – AI Singapore consortium actively seeks additional partners from the cell therapy and hematology-oncology communities to contribute to their international single cell analysis initiative.

mailing list sign up

Mailing List Sign Up C360