Get SGD58.85 off your premium account! Valid till 9 August 2021. Use the Code ‘SGLEARN2021’ upon checkout. Click Here

Retrieving the Right Information About COVID-19 With Golden Retriever

COVID-19, which spread out of Wuhan, China, to the rest of the world in recent weeks has caused much public concern. During this outbreak, there has been much information and misformation being shared online and offline. Sometimes it is difficult to tell them apart, at least without exercising some due diligence to look up reputable websites.

To help combat misinformation, AI Singapore (AISG) developed an information retrieval application for questions about COVID-19 using its Golden Retriever tool based on information from the Ministry of Education and Ministry of Health FAQ websites (updated as of 10 Feb).

Golden Retriever uses an AI model called a Transformer that is able to understand words in context, as well as to account for synonyms. A traditional approach, like a keyword search, on the other hand, will only be able to locate exact phrases in the document. Instead of hunting for your answer by searching using different keywords, Golden Retriever is able to return a ranked list of answers that best fit a query phrase. This requires a knowledge base that has been divided into individual clauses. Each clause is a discrete piece of text like a paragraph, or an answer to a FAQ. The good thing is that no model training is necessary as it is pre-trained, so it is already able to match questions to the appropriate clause out-of-the-box.

To use Golden Retriever, head to here. Make sure the Knowledge Base selected is “COVID-19”. Type your question into the query box and hit “Fetch”. It is as simple as that!

As with all machine learning tools, the answers are not guaranteed to be always accurate, and should not be used for diagnostic purposes or to make important decisions. For any clarifications on the content delivered, please look to the relevant authority for advice.

Golden Retriever is a Brick (pre-built solutions) from AI Makerspace, a platform offered by AISG to help SMEs and start-ups accelerate the adoption of AI in Singapore.

Related Stories

(Image source)

Getting Daily COVID-19 Situation Reports from WHO

Since 21 January, 2020 (two days before the Wuhan lockdown), the World Health Organization (WHO) has been publishing a daily situation report on the global status of the COVID-19 outbreak. It collects information provided by national authorities each day as of 10 am Central European Time (CET) and publishes the aggregated report by the end of the work day in Geneva. This corresponds to the early hours of the next calendar day in Singapore.

Among the information available is a list of figures of confirmed COVID-19 cases by country or territory. This is arguably the most important section of the report for decision makers and anyone wishing to keep abreast of the constantly evolving situation.

Part of the report from 15 February, 2020

Getting these figures involves opening the main WHO COVID-19 situation report webpage, clicking on the report link with the latest date and then scrolling through the downloaded report to get to them. Did you know that these are well defined actions which can be executed by the RPA (robotic process automation) tool TagUI? The TagUI team in AI Singapore (AISG) did just that and also added an extra step to email the captured information to a predefined distribution list. Now, mundane actions like click-opening the browser, copying-and-pasting specific sections of a report and writing the email can be off-loaded to the computer. The user only needs a single click to start the process.

The video below shows the tool in action.

If you are interested in the workflow code, you can view it below.

// This automation workflow grabs WHO daily situation report (PDF file),
// extracts a summary of the latest statistics, and emails the summary.
// It uses AI Singapore's TagUI cutting edge version, available here -
// https://github.com/kelaberetiv/TagUI#set-up

// This workflow is designed for macOS as keyboard shortcuts are
// specific to macOS. Also, for WHO daily report landing page,
// Chrome browser is set to 150% zoom to ensure correct matching
// using OCR and computer vision to interact with the website.

// For the Gmail portion, the browser zoom is set to 125% for better
// visibility during recording of the video. Computer vision and OCR
// is used in this workflow as Gmail recently blocks browser automation,
// thus Gmail interactions cannot be done using web element identifiers.

// The automation can be run using - tagui who_report.tag -nobrowser
// Or scheduled using crontab scheduler to repeat daily automatically

js begin
// define list of countries for accurate extraction from PDF table
// primary list from countries in the report
country_list = [
    'Singapore',
    'Japan',
    'Republic of Korea',
    'Malaysia',
    'Australia',
    'Viet Nam',
    'Philippines',
    'Cambodia',
    'Thailand',
    'India',
    'Nepal',
    'Sri Lanka',
    'United States of America',
    'Canada',
    'Germany',
    'France',
    'The United Kingdom',
    'Italy',
    'Russian Federation',
    'Spain',
    'Belgium',
    'Finland',
    'Sweden',
    'United Arab Emirates',
    'Iran (Islamic Republic of)',
    'Egypt',
    'International conveyance'
]

country_list_backup = ["Afghanistan", "Albania", "Algeria", "American Samoa", "Andorra", "Angola", "Anguilla", "Antarctica", "Antigua And Barbuda", "Argentina", "Armenia", "Aruba", "Austria", "Azerbaijan", "Bahamas", "Bahrain", "Bangladesh", "Barbados", "Belarus", "Belize", "Benin", "Bermuda", "Bhutan", "Bolivia", "Bonaire, Sint Eustatius And Saba", "Bosnia And Herzegovina", "Botswana", "Bouvet Island", "Brazil", "British Indian Ocean Territory", "Brunei Darussalam", "Bulgaria", "Burkina Faso", "Burundi", "Cameroon", "Cape Verde", "Cayman Islands", "Central African Republic", "Chad", "Chile", "Christmas Island", "Cocos Islands", "Colombia", "Comoros", "Congo", "Democratic Republic Of The Congo", "Cook Islands", "Costa Rica", "Croatia", "Cuba", "Cyprus", "Czech Republic", "Denmark", "Djibouti", "Dominica", "Dominican Republic", "Ecuador", "El Salvador", "Equatorial Guinea", "Eritrea", "Estonia", "Ethiopia", "Falkland Islands", "Faroe Islands", "Fiji", "French Guiana", "French Polynesia", "French Southern Territories", "Gabon", "Gambia", "Georgia", "Ghana", "Gibraltar", "Greece", "Greenland", "Grenada", "Guadeloupe", "Guam", "Guatemala", "Guernsey", "Guinea", "Guinea-bissau", "Guyana", "Haiti", "Heard Island And Mcdonald Islands", "Holy See", "Honduras", "Hong Kong", "Hungary", "Iceland", "Indonesia", "Iraq", "Ireland", "Isle Of Man", "Israel", "Jamaica", "Jersey", "Jordan", "Kazakhstan", "Kenya", "Kiribati", "North Korea", "Kuwait", "Kyrgyzstan", "Lao People's Democratic Republic", "Latvia", "Lebanon", "Lesotho", "Liberia", "Libya", "Liechtenstein", "Lithuania", "Luxembourg", "Macao", "Macedonia", "Madagascar", "Malawi", "Maldives", "Mali", "Malta", "Marshall Islands", "Martinique", "Mauritania", "Mauritius", "Mayotte", "Mexico", "Micronesia, Federated States Of", "Moldova", "Monaco", "Mongolia", "Montenegro", "Montserrat", "Morocco", "Mozambique", "Myanmar", "Namibia", "Nauru", "Netherlands", "New Caledonia", "New Zealand", "Nicaragua", "Niger", "Nigeria", "Niue", "Norfolk Island", "Northern Mariana Islands", "Norway", "Oman", "Pakistan", "Palau", "Palestine, State Of", "Panama", "Papua New Guinea", "Paraguay", "Peru", "Pitcairn", "Poland", "Portugal", "Puerto Rico", "Qatar", "Reunion", "Romania", "Rwanda", "Saint Barthelemy", "Saint Helena, Ascension And Tristan Da Cunha", "Saint Kitts And Nevis", "Saint Lucia", "Saint Martin", "Saint Pierre And Miquelon", "Saint Vincent And The Grenadines", "Samoa", "San Marino", "Sao Tome And Principe", "Saudi Arabia", "Senegal", "Serbia", "Seychelles", "Sierra Leone", "Sint Maarten", "Slovakia", "Slovenia", "Solomon Islands", "Somalia", "South Africa", "South Georgia And The South Sandwich Islands", "South Sudan", "Sudan", "Suriname", "Svalbard And Jan Mayen", "Swaziland", "Switzerland", "Syrian Arab Republic", "Taiwan", "Tajikistan", "Tanzania", "Timor-leste", "Togo", "Tokelau", "Tonga", "Trinidad And Tobago", "Tunisia", "Turkey", "Turkmenistan", "Turks And Caicos Islands", "Tuvalu", "Uganda", "Ukraine", "United States Minor Outlying Islands", "Uruguay", "Uzbekistan", "Vanuatu", "Venezuela, Bolivarian Republic Of", "Virgin Islands, British", "Virgin Islands", "Wallis And Futuna", "Western Sahara", "Yemen", "Zambia", "Zimbabwe"];

// append secondary list from ISO definitions as a backup
country_list = country_list.concat(country_list_backup)
js finish

// launch user normal Chrome browser (to allow Gmail login)
// to capture statistics from the latest situation report
js clipboard('https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports/')
keyboard [cmd][space]
keyboard chrome[enter]

// wait for a few seconds to ensure Chrome is ready to launch new tab
// user might have an open session, can't wait for Google image snapshot
wait

// click on Google taskbar icon to ensure browser is in focus
click google.png

// before pasting report URL from clipboard to URL textbox
keyboard [cmd]t
keyboard [cmd]l
keyboard [cmd]v
keyboard [enter]

// click on situation report link with running report number
// for first run, set report_count.txt to latest count minus 1
// clicking of the link is done using OCR and computer vision
load report_count.txt to report_count
report_count = (parseInt(report_count.trim(), 10) + 1).toString()
dump `report_count` to report_count.txt
click Situation report - `report_count` using ocr

// click to ensure report PDF has been loaded
// before carrying out the next steps below
click who_logo.png

// copy report URL to clipboard for later use
keyboard [cmd]l
keyboard [cmd]c
report_url = clipboard()

// click to set PDF web control to be in focus
click who_logo.png

// use shortcut keys to copy and store PDF text
keyboard [cmd]a
keyboard [cmd]c
pdf_result = clipboard()

// define header text to be send in the email distribution
statistics_summary = 'Below are the latest statistics from WHO daily report -' + '\n' + report_url + '\n'
statistics_details = ''

// special handling for China as it is no longer in the table
country_result = get_china_stats()
echo `country_result.name` - `country_result.stats`
statistics_details += country_result.name + ' - ' + country_result.stats + '\n'

// segment the section containing country statistics
country_start_marker = 'Western Pacific Region'; country_finish_marker = '*Case classifications'
pdf_result = pdf_result.substring(pdf_result.indexOf(country_start_marker), pdf_result.indexOf(country_finish_marker))

// remove region description for cleaner processing later
pdf_result = pdf_result.replace('Western Pacific Region', '')
pdf_result = pdf_result.replace('South-East Asia Region', '')
pdf_result = pdf_result.replace('Region of the Americas', '')
pdf_result = pdf_result.replace('European Region', '')
pdf_result = pdf_result.replace('Eastern Mediterranean Region', '')

// clean up countries with names displayed on multiple lines
pdf_result = pdf_result.replace('United States of\nAmerica', 'United States of America')
pdf_result = pdf_result.replace('International\nconveyance', 'International conveyance')

// clean up cruise ship Diamond Princess exception for stats
pdf_result = pdf_result.replace('(Diamond Princess)', '').replace('(Diamond\nPrincess)', '')

// remove line breaks to remove inconsistencies in the table
pdf_result = pdf_result.replace(/\n/g, '')

// scans through PDF text to extract country and statistics
for country_count from 1 to 999
{
    country_result = next_country_stats()
    echo `country_result.name` - `country_result.stats`
    statistics_details += country_result.name + ' - ' + country_result.stats + '\n'
    if country_result.name equals to 'International conveyance'
        break
}

// sort list of countries by alphabetical order
statistics_details = statistics_details.split('\n')
statistics_details = statistics_details.sort().join('\n')
statistics_summary += statistics_details

// save a copy of the email content for reference
dump `statistics_summary` to statistics.txt

// close Chrome tab for report pdf
keyboard [cmd]w

// close Chrome tab for main report page if still there
// some days report pdf launches in new tab, some don't
if present('report_tab.png')
keyboard [cmd]w

// open Gmail to send statistics summary to mailing list
js clipboard('https://mail.google.com/mail/u/1/#inbox')
keyboard [cmd]t
keyboard [cmd]l
keyboard [cmd]v
keyboard [enter]

// click new email icon and wait for a few seconds,
// before using keyboard combinations to paste text.
// text can also be typed character by character,
// which is much slower as there's a lot of text.
click new_email.png
wait 2.5 seconds

// paste email distribution list to To: field
js clipboard('distribution_list@aisingapore.org')
keyboard [cmd]v
keyboard [tab]

// paste email subject to Subject: field
js clipboard('WHO COVID-19 Daily Report')
keyboard [cmd]v
keyboard [tab]

// paste email content to email body
js clipboard(statistics_summary)
keyboard [cmd]v

// click on send button to send email
click send_email.png
wait 10 seconds

// close Gmail browser tab
keyboard [cmd]w

The complete set of files is also available here as a single download for your reference. Feel free to adapt them to your needs.

TagUI is a Brick (pre-built solutions) from AI Makerspace, a platform offered by AISG to help SMEs and start-ups accelerate the adoption of AI in Singapore.

Related Stories

Tips for Presenting Data Insights

In my many years of experience in the Data Science industry, I’ve noticed that there are two main groups of data scientists:

  1. Those who focus on building data products.
  2. Those who focus on using machine learning to provide actionable insights to stakeholders.

If you belong to the 2nd group, this is exactly written for you.

The main objective of any insights presentation is to get stakeholders to adopt the insights presented so as to solve business challenges. Yes, many people comment, “We are helping these folks, does the responsibility to convince lie with the data scientist?” Well, Data Science is a new function in most organizations. Most organizations have no idea how Data Science really works, and for stakeholders, they have been making decisions based on gut feel and, let’s face it, making decisions from gut feel is much easier. Moreover, a data scientist needs to constantly gain credibility, to convince stakeholders that data can provide value to their work. Without stakeholder support, the data scientist will have difficulty providing value to the organization.

Here are a few tips for your upcoming presentation.

Tip 1: Talk in their “Language”

A simple analogy is to imagine that I drop you off in a foreign country where they do not speak English, or their understanding of the English language is minimal. There will be a communication problem, right? When you prepare your data insights presentation, speak in the audience’s (mostly business users) language. If you are doing marketing analytics, translate the insights into suitable marketing terms or how such insights can inform marketing strategies. For instance, let us compare these two statements :

“Based on our model, for one standard deviation increase in TV advertising dollar, it translates to a revenue increase of 2.78 standard deviation.”

VS

“Based on our model, for every dollar we put into TV advertising, it translates to a revenue increase of 2 dollars.”

It is obvious that the second statement can be more readily understood by many. No doubt, it means more work for the data scientist to do the conversion but it helps a lot in building stronger communication with stakeholders. Remember, look through your presentation and see how you can “speak their language”. Every time you realize you have a technical jargon, it is definitely an opportunity to translate it into something better for the stakeholders.

Tip 2: Show ME the MONEY!

For a data scientist to add value to the organization, remember this:

“SHOW ME THE MONEY!”

It is pure capitalism and economics. If the data scientist could not add value to the business, help the business to earn more money (more than the salary, please!), why would the business owner pay the data scientist? It would be a loss-making move. So to justify the salary drawn (be it high or low), the data scientist has to show the business where revenue opportunities can be gained and cost can be reduced.

Having said that, the next challenge after showcasing the data insights is to convince the stakeholders to use the insights in the strategy planning and execution. How to convince? Well, show them the money! After the data insights presentation, you want to show them the potential revenue to be gained or the potential cost savings. Once you are able to provide some tangibles in the conclusion of your presentation, stakeholders have a better idea of what they are missing out and can be convinced to undertake initiatives that tap into the insights generated.

A word of warning though, you have to manage the stakeholders’ expectations. How the potential revenue gained or cost savings achieved should never be plucked from the sky but rather through simulations/calculations, which brings me to the next tip.

Tip 3: Convince with Data

During the presentation, there are times where you have to make certain assumptions for your calculations (mentioned above) or analyses. In that case, always ask yourself if the assumptions can be supported by data. For instance,

“Taking our insights into consideration, we believed we can earn $400K by spending $100K buying ad space in this list of websites. This $400K was derived by taking the ad response rate from the previous year, which was about 4%, and multiplied by the expenditure of our customers in Group A last year.”

You will notice that throughout the statement made, the numbers were not made up but were supported using past data. In that sense, it will not be easy to dispute those numbers since it is a reflection of the past. But, of course, with the huge assumption that the future will be like the past. Having said that, is there a better way to get the numbers? If you have, I would love to hear more. 🙂

Conclusion

Those were a few tips for presenting your data insights to stakeholders. Depending on what you present, not all tips may be applicable but keep them in mind so that you, the data scientist, will continue to add value to your employer.

Have fun in your Data Science learning journey! 🙂

COVID-19 : Visualising the Outbreak

Since cases of COVID-19 infection started surfacing worldwide in the second half of January, data on the outbreak has turned into a torrent as it relentlessly widens its reach. Effective visualisation is often a way for the human mind to grasp the essential features of the underlying data. Let us see a few ways the outbreak has been represented on the Internet.

We start with a map showing the geospatial distribution of confirmed cases in Singapore. You can access it here. Every confirmed case is represented as a coloured circle, red for the most recent ones, orange for the rest. Click on a circle and you will receive more information on the stricken individual. It will also show places visited by the person as white circles, linked by blue lines from the original circle. The creator of this visualisation is reportedly a 32-year-old Singaporean with the Twitter handle Ottokyu. According to him, he keeps the map updated upon new information announcement from the Ministry of Health via Twitter.

Another visualisation of the COVID-19 situation in Singapore can be found on the Facebook pages of a local user. Here is an example of a chart he has been updating for the past days. It is not known what his information sources are, but an interesting and salient feature is that he chooses to represent the people affected as clusters. At the time of writing, the contagion is still relatively contained and it makes sense to accentuate clusters of people closely related to one another.

Turning our attention worldwide, we look at an animation of the COVID-19 outbreak at a global level. It is maintained by HealthMap, which according to its website is a team of researchers, epidemiologists and software developers who utilise online informal sources for disease outbreak monitoring and real-time surveillance. Click on the “animation spread” button and you will be presented with a day-by-day account of the officially confirmed cases in different parts of the world.

In cases, as in the present time, where spatial, relational and temporal features of an evolving situation have to be effectively communicated, visualisations are often the best means to go about it.

Note : These visualisations have, presumably, been done with the best of intentions. Check with the relevant health authorities (e.g. Ministry of Health in Singapore, WHO) for the most accurate information.

Related Stories

The Salamander ML Book Gets an Update!

Hands-on Machine Learning with Scikit-Learn, Keras & TensorFlow : Concepts, Tools, and Techniques to Build Intelligent Systems

This book is here to help you get your job done.

So wrote AurĂ©lien GĂ©ron, the author of this well-received book, now into its second edition. That is probably an understatement. Not only do you get things done, you also gain a good grasp of the ins and outs of machine learning (ML). When the first edition came out in early 2017, it instantly became a favourite of mine. Since then, there have been several updates to it – a testimony to both the pace at which this field moves as well as to the diligence of the author.

In the final quarter of 2019, the time was finally ripe for a new edition, coinciding with the official release of TensorFlow 2.0. If you have been following developments in TensorFlow, you will know that tf.keras was introduced as an implementation of the popular high-level Keras API. “Keras” has now been added to the title and several chapters were re-written or added. The print version of the book is now a whopping 856 pages! Despite that, the book never feels bloated or unwieldy. You can find the complete details of the changes here.

A good book provides a coherent structure, with chapters which can be read independently, yet contains cross-references to relate the focus topic of the moment with other topics explained elsewhere in the book. On this point the author has done an admirable job.

Table of Content

This new edition will continue to appeal to a broad section of ML practitioners. For a better understanding of the concepts behind models and techniques, it is a good book to start with. For more advanced folks, it also contains enough information for repeated referencing. Whether it is for a deeper understanding of support vector machines or an evaluation of the performance of different deep learning optimisers (Adam? Nadam?). I especially appreciate an entire new chapter (Chapter 9) on unsupervised learning which fills a glaring gap in the previous edition.

Just as in the first edition, code examples are available on GitHub. They now come ready to run on Google Colab (hurray!). Accompanying the code snippets are often descriptions in simple English so that nobody has to get lost.

Code followed by simple English!

With the introduction of Keras as the high-level API, knowledge of it alone is sufficient for most use cases in deep learning (95% according to the author). However, in specific cases, it becomes necessary to write lower level TensorFlow code. The author identifies such cases (e.g. custom loss function, custom layer) and goes through them in detail.

It is my personal belief that ML practitioners should get into the habit of reading quality academic papers, especially those describing the algorithms behind the libraries they use for modeling. The book makes useful references to such papers where applicable, whether for more traditional ML techniques (e.g. mini-batch K-means) or neural-based ones (e.g. SentencePiece), to aid the discussions. What makes a good practitioner is often the intimate knowledge of the inner workings of models which mediocre ones simply treat as back boxes. With such pointers to relevant external content, the usefulness of the book expands far beyond its pages.

In mid-2019, Google released TensorFlow Extended (TFX) as a framework to provide production-grade support for ML pipelines. This is not covered in the book and totally understandable as the industry progresses at breakneck speed. However, the author is certainly in touch with the developments and interested folks can peruse the slide deck he used to deliver a training at TensorFlow World 2019 in Santa Clara here. Perhaps we can look forward to chapter 19 being augmented or a totally new chapter 20 in future. In the nearer term, look out for this upcoming title dedicated to ML deployment which the author is currently reviewing.

To close off, I highly recommend this book to folks new on the ML journey (who will gain much practical knowledge) as well as to experts (who will benefit from the thoughtful organisation of the subject matter with additional material for further reading).

Hands-on Machine Learning with Scikit-Learn, Keras & TensorFlow : Concepts, Tools, and Techniques to Build Intelligent Systems is also a recommended book for machine learning in the AI Apprenticeship Field Guide.

What It Takes To Be an AI Apprentice

Today, we welcome the fifth batch of apprentices into the AI Apprenticeship Programme (AIAP)™. Even as they embark on their journeys, getting used to the office they will call home for the next 9 months like many before them, AI Singapore has already started the application process for the sixth iteration of the award-winning programme. What does it take to be accepted into the programme? After running it for two years, we have a pretty good idea of the kind of candidates who do well.

To start off, the programme is one of deep-skilling rather than re-skilling. This means you come into the programme already equipped with some of the relevant skills, ready to work with us to build further capabilities.

We look for hidden gems and help polish them a bit. And then let them shine.

– Laurence Liew, director of Industry Innovation, AI Singapore

Being a relatively new specialisation, many have come into AI engineering from different backgrounds and disciplines. Despite that, there is a common core of knowledge which we consider to be essential and can be acquired by the motivated individual through self-study before they start AIAP™. We have enunciated this core knowledge in the AIAP Field Guide, now available in book form.

Click on the image to download

The guide lays out clearly the fields of knowledge the AIAP™ aspirant should be conversant with as well as suggested resources to acquire them within a 12-month period. Programme hopefuls are free to replace the learning platforms with their own preferences, especially for individuals who already have partial knowledge in some areas (e.g. Python coding, statistics). Make no mistake that to become an AI engineer is not a walk in the park. You have to be comfortable with both theory and practice.

Of course, having technical knowledge is not the only important aspect. There are many “softer” skills which help you do well. These include being able to articulate problems, work with different viewpoints as a team and the curiosity to keep up with developments in the field. Take a page from those who have gone before.

We hope that this field guide goes a long way towards accelerating the learning journey should you decide to apply for the apprenticeship programme!

The AIAP™ is the first TechSkills Accelerator Company-Led Training (TeSA-CLT) initiative in AI. This is a collaboration between AI Singapore and IMDA to develop a pipeline of AI professionals for the industry.

mailing list sign up

Mailing List Sign Up C360