Get SGD58.85 off your premium account! Valid till 9 August 2021. Use the Code ‘SGLEARN2021’ upon checkout. Click Here

Data mining with Rattle and R

For those keen to use R but yet need a quick start to solve some immediate data-mining problems, you can use Rattle.

From the Rattle website:

Rattle is a popular GUI for data mining using R. It presents statistical and visual summaries of data, transforms data so that it can be readily modelled, builds both unsupervised and supervised machine learning models from the data, presents the performance of models graphically, and scores new datasets for deployment into production. A key features is that all of your interactions through the graphical user interface are captured as an R script that can be readily executed in R independently of the Rattle interface. Use it as a tool to learn and develop your skills in R and then to build your initial models in Rattle to then be tuned in R which provides considerably more powerful options.

Deep Learning – A Practitioner’s Approach

From this book, you will learn:

  • Dive into machine learning concepts in general, as well as deep learning in particular
  • Understand how deep networks evolved from neural network fundamentals
  • Explore the major deep network architectures, including Convolutional and Recurrent neural networks
  • Learn how to map specific deep network to the right problem
  • Walk through the fundamentals of turning general neural networks and specific deep network architectures
  • Use vectoriztion techniques for different data types with DataVac, DL4J’s workflow tool
  • Learn how to use DL4J natively on Spark and Hadoop

Fake News: National Security in the Post-Truth Era

An timely, important and relevant piece of work by colleagues over at S. Rajaratnam School of International Studies (RSIS) at NTU.
Summary of paper
Fake news is not a new issue but it poses a greater challenge now. The velocity of information has increased drastically with messages now spreading internationally within seconds online. Readers are overwhelmed by the flood of information, but older markers of veracity have not kept up, nor has there been a commensurate growth in the ability to counter false or fake news. These developments have given an opportunity to those seeking to destabilize a state or to push their perspectives to the fore.
This report discusses fake news with regard to the ways that it may manifest, how its dissemination is enabled through social media and search engines, how people are cognitively predisposed to imbibing it, and what are the various responses internationally that have been implemented or are being considered to counter it.
This report finds that efforts to counter fake news must comprise both legislative and non-legislative approaches as each has its own challenges. First, the approaches must factor in an understanding of how technology enables fake news to spread and how people are predisposed to believing it. Second, it would be helpful to make a distinction between the different categories of falsehoods that are being propagated using fake news as the medium. Third, efforts should go hand in hand with ongoing programmes at shoring up social resilience and national consensus. Fourth, efforts need to move beyond bland rebuttal and statements, as these may be counter-productive. Fifth, counter-narratives that challenge fake news must be released expeditiously as fake news is able to spread en masse at great speed due to technology. In sum, collaboration across the whole of society, including good public-private partnership, is necessary in order to unravel fake news and ensure better synergy of efforts in countering it.
Read the full report here.

Employing in-home sensor technology to explore elderly’s social needs

Singapore is ageing rapidly. In recent years, many initiatives have been launched to provide care for the ageing society, one of which is community-based care services to facilitate ageing-in- place.
A successful community eldercare model may require the synergy of various stakeholders; ranging from caregivers, healthcare providers, technology providers to policy makers for care management of older people living in the community.
In this extended abstract, we discuss how a sensor-based elderly monitoring system could enable
community caregivers to identify elderly in need. By applying AI methods on data gathered through:

(i) non-obtrusive in-home sensors,

(ii) subjective surveys and

(iii) attendance in social activities

to identified elderly who are in need of interventions to improve their social and emotional well being.
The findings of this study will provide useful recommendations for value-added elder care planning.
Authors:
Mingrui Huang, Cheryl Koh, Nadee Goonawardene, Hwee-Pink Tan from SMU-TCS iCity Lab, Singapore Management University
Justina Teo, Lions Befrienders Service Association (Singapore)

Read the paper  Employing in-home sensor technology to explore elderly’s social needs: implications on personalising community elder care

Dispatches from Jupyter Con

Dispatches from Jupyter Con

Jupyter Notebook is the tool of choice for many data workers. Data visualisation experts use it to build dashboards , data scientists use it to test algorithms, and computational scientists use it to study the stars in the sky and the genes in the human body. We use Jupyter a lot in our day to day as well. Usually, we use it on a laptop as a quick and easy way to build prototypes and explore data. Once we have a good idea of what we’re working with, we can then write scripts to to automate our analysis or train our model on a more powerful virtual machine. But we are only one type of engineer using Jupyter in only one of many ways. Jupyter Con this August gave us the chance to widen our knowledge by meeting other users and gathering more ideas from them.

As a former animal biologist, I observed roughly three breeds of people at Jupyter Con. There were the data workers use Notebooks to visualize and analyze data, there were the engineers who set up Jupyter for data workers, sometimes building on powerful big data frameworks and serving hundreds of people, and there were the educators who use Jupyter helps cultivate data literacy not only in scientists, but literature majors and high schoolers as well.

The data workers came from many fields, which really showed how different industries are being enhanced with a data-centric approach. There was Mark Hansen from the Columbia Journalism School, who talked about using Notebooks to investigate fake profiles on Twitter and the behaviour of the bots behind these accounts. He and his students eventually published the analyses as a longform article The Follower Factory in The New York Times, and the piece is a pioneering work on how statistics and data can be mixed with traditional investigative journalism to inform and to tell a great story. While Mark talked about journalism and data, Michelle Ufford from Netflix looked at how data fuels well, almost everything at her company, from business decisions to their legendary movie recommendation system. My favourite part of her talk was when she showed the company’s organisational chart – there were engineers for algorithms, visualisations, business analytics and compute infrastructure, just to name a few. All of them use data as their raw material, and Jupyter is their data tool.

The infrastructure engineers talked a lot about how Jupyter was not only a stand-alone web-browser application, but also a powerful extension to their existing compute infrastructure. There was CERN, who used Jupyter as a user-friendly interface that extended the functionality of their existing big-data processing system SWAN. Netflix improved their job scheduling system with parameterized Jupyter Notebooks. These teams took what was good about Notebooks – the interactivity and ease of use – to improve, not replace, what they had already built.

The educators were inspiring. One highlight was a talk from the UC Berkeley Data Sciences division, where a small team worked with student volunteers to craft Notebooks that applied machine learning to all sorts of undergraduate fields of study. There were Notebooks to analyze text data for English Literature classes; there were Notebooks that used data to teach fundamental theories in Economics. Despite only being in action for a short time, the team has managed to reach many students and many faculty, giving members of the university a taste of what data means for their field.

One last thing

Throughout the event, what was remarkable was how these three breeds didn’t move in silos. An educator was just as comfortable talking about hosting Jupyter Notebooks on the cloud as she was talking about working with Economics professors to use Jupyter to teach a class. An infrastructure engineer was clear about the multiple ways data analysts use Jupyter to build dashboards and visualize data. The atmosphere was interdisciplinary and open and the fast exchange of ideas was invigorating. If software and data are equal parts technical tooling and community, that feeling of community is one more thing to learn and emulate from the conference.

How people keep learning – the role of intrinsic and extrinsic motivators and when behavioural economists need to come in

AI Singapore has a strong focus on learning and growth. We want our internal engineers and apprentices to develop and improve their craft. This means we don’t only think about how to build machine learning systems. We also spend time thinking about how to engineer productive learning environments for our people. The following are a few thoughts about a key feature of any learning environment – sustained motivation

Daphne Koller is a Stanford Professor who founded Coursera, a platform that offers Massive Open Online Courses. In a lecture she gave at Carnegie Mellon university, she joked that many people in the audience must have started a course on Coursera. Whether or not they had finished the course, however, was another matter. In response, the audience laughed sheepishly.

Daphne’s contemporary, Peter Norvig, who founded the a similar MOOC platform Udacity, has also shared how he experimented with different course delivery styles meant to slow down how fast students were dropping out of courses. They sent email reminders to students. They built features to facilitate more peer-to-peer interactions so students felt a sense of community. These measures did appear to work, and indeed Daphne and Peter’s experiences show that any discussion around learning is incomplete without a discussion around how to engineer and sustain motivation.

In my own experience, motivation falls along a spectrum ranging from intrinsic to extrinsic. At the intrinsic end, internal emotions like curiosity or an appetite for learning something can be enough to drive someone to start learning something new. Learning might look like playing with Raspberry Pis over a weekend of signing up for an online course with friends. At the opposite end, external motivators, like a problem at work or a company mandate, are what pushes a student to hone a new skill. These people may then petition their boss to send them for a training course or attend a conference. In these two instances, the motivators are strong, and so learning is largely effective. In fact, there is sometimes double effectiveness if there is a tangible outcome at the end, for example switching careers, making new friends, or reaching a new work milestone.

The question is what to do about the masses of people who fall in the middle of these two extremes. These are people who might hear rumblings like “the jobs landscape is changing” and “upskilling is important in a modern world”. Yet, because there isn’t a concrete push or pull factor, they fall through the cracks. They may sign up for a Coursera account, attend a few courses, then drop out. They may even complete the courses and videos, but miss out on a crucial next step – applying what they have learnt to a real-world problem.

It is in this middle space that behavioural economists come in with their tools: “nudges” and default options that push people towards desired behaviour, or competitions and points systems to keep people motivated. These measures work, and although it’s tempting to think that people who rely on these are more “weak-willed”, the fact is that even the strongly motivated sometimes also need these interventions to keep them on track.

I would suggest though, that having a strong intrinsic/extrinsic motivator needs to exist first. Get that right, and the need for a lot of the behavioural checkpoints like assignment deadlines and automated email reminders falls away.  

Polyaxon – Reproducible data science experiments

I have been looking at the emerging tooling around reproducible data science recently and am glad to have discovered Polyaxon. While still in the early stages of development, being able to run and scale the experiments for a data science project without having to worry too much about the infrastructure and other logistics behind it would certainly be a produtivity boost.

Keeping a close watch on their development at: https://github.com/polyaxon/polyaxon/

Automated Feature Engineering

Automating reptitive tasks is one of the hallmarks of software engineering. One of the main task in data science is to be able to extract good and relevant features from your data. The marriage of both = Automated feature engineering. Featuretools is exactly that. An open source project developed by a commercial company. Certainly worth giving it a try and see how well it works for your projects! https://github.com/Featuretools/featuretools

Full stack AI

A good read from IBM on how a full stack end to end AI hardware infrastructure can be put together. Built with scalability in mind with comprehensive coverage of the various needs within an enterprise from flexibility, security to availability. While the recommendations certainly gears towards IBM offerings, it should not be difficult to swap out some of the components.

https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=87016787USEN

mailing list sign up

Mailing List Sign Up C360