The road to artificial intelligence: A case of data over theory (my notes)

Below is a summary of insights from the story published in New Scientist entitled The road to artificial intelligence: A case of data over theory

Dartmouth College in Hanover, New Hampshire

  • Team from Dartmouth gathered together to create a new field called AI in 1956 Create fields in: “machine translation, computer vision, text understanding, speech recognition, control of robots and machine learning
  • They took at top-down approach – reason logical approach where you first creating a “mathematical model” of how we might process speech, text or images, and then by implementing that model in the form of a computer program AND that their work further understanding about our own human intelligence.
  • The Dartmouth had two assumptions of AI
    • Mathematical model theories to stimulate human intelligence
      AND
    • Help us understand our own intelligence
  • Both assumptions were WRONG.

Data beats theory!

  • By mid-200s, success came in the form small set of statistical learning algorithms and large amounts of data and that the intelligence is more in the data than in the algorithm – and ditched the assumption that AI would help us understand our own intelligence
  • A machine learns when it changes its behavior based on experience using data which is contrary to the assumptions of 60 years ago, we don’t need to precisely describe a feature of intelligence for a machine to simulate it.
  • For example email spam, every time you drag it into “spam” folder in your Gmail account for example, you are teaching the machine to “classify” spam or everytime you teach for a bunny rabbit and go to images click “bunny rabbit” you are teaching the machine what a bunny rabbit looks like. Data beats theory!
  • For the field of AI, it has been a humbling and important lesson, that simple statistical tricks, combined with vast amounts of data, have delivered the kind of behaviour that had eluded its best theoreticians for decades.
  • Thanks to machine learning and the availability of vast data sets, AI has finally been able to produce usable vision, speech, translation and question-answering systems. Integrated into larger systems, those can power products and services ranging from Siri and Amazon to the Google car.
  • A key thing about data is that its found “in the wild” – generated as a byproduct of various activities – some as mundane as sharing a tweet or adding a smiley under a blog post.
  • Humans (Engineers and entrepreneurs) have also invented a variety of ways to elicit and collect additional data, such as asking users to accept a cookie, tag friends in images or rate a product. Data became “the new oil”.
  • Every time you access the internet to read the news, do a search, buy something, play a game, or check your email, bank balance or social media feed, you interact with this infrastructure.
  • It creates a “Data-driven” network effort a data-driven AI both feeds on this infrastructure and powers it.
  • Risk: Contrary to popular belief these are not existential risks to our species, but rather a possible erosion of our privacy and autonomy as data (public and private) is being leveraged.
  • Winters of AI discontent – the two major winters occurred in the early 1970s and late 1980s
  • AI today has a strong – and increasing diversified – commercial revenue stream

 

 

  • Artificial intelligence (AI) includes:
    1. Natural language processing,
    2. Image recognition and classification
    3. Machine learning  (ML) –  so it’s a subset of AI and Deep Learning (artificial neural network –  more below) is a subset of ML
  • In 1950 Alan Turing published a groundbreaking paper called “Computing Machinery and Intelligence”.  Turning poses the question of whether machines can think?
  • He proposed the famous Turing test, which says, essentially, that a computer can be said to be intelligent if a human judge can’t tell whether he is interacting with a human or a machine.
  • Artificial intelligence was coined in 1956 by John McCarthy, who organized an academic conference at Dartmouth dedicated to the topic to explore aspect of learning “cognitive thinking” or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.
  • The phrase “machine learning” also dates back to the middle of the last century.  In 1959, Arthur Samuel (one of the attendees of Dartmouth conference) defined machine learning as “the ability to learn without being explicitly programmed.”
  • Samuel to create a computer checkers application that was one of the first programs that could learn from its own mistakes and improve its performance over time.
  • Like AI research, machine learning fell out of vogue for a long time, but it became popular again when the concept of data mining began to take off around the 1990s.
  • Data mining uses algorithms to look for patterns in a given set of information.
  • Machine learning went one step further  – it changes its program’s behavior based on what it learns
  • Years go by with “AI Winters” due to lack of big data sets and computing power
  • Until to IBM’s Watson AI winning the game show Jeopardy and Google’s AI beating human champions at the game of Go have returned artificial intelligence to the forefront of public consciousness
  • Now Machine Learning is used for predicting and classification:
    • Natural language processing – IBM Watson is a technology platform that uses natural language processing and machine learning to reveal insights from large amounts of unstructured data.
    • Image recognition – i.e. people face Facebook with Deep face – https://research.facebook.com/publications/deepface-closing-the-gap-to-human-level-performance-in-face-verification/
    • Recommender system –  Amazon highlights products you might want to purchase and when Netflix suggests movies you might want to watch and Facebook with newsfeeds.  HIVERY also uses Recommender system to help our customers with having the right products in the right distribution channels at the right time and place.
    • Predictive analytics –  HIVERY around in fraud detection, price strategy and new product distribution placement strategy
  • Deep learning  – often called artificial neural network or neural net is a system that has been designed to process information in ways that are similar to the ways biological brains work.
  • Deep learning uses a certain set of machine learning algorithms that run in multiple layers. It is made possible, in part, by systems that use GPUs to process a whole lot of data at once.

Source: http://www.datamation.com/data-center/artificial-intelligence-vs.-machine-learning-whats-the-difference.html

We are currently working with a client in the FMCG who is trying to unpack better customer segments or “clusters” in order to engage and market to them better.

 

The problem is we are trying to solve is “How might we find customers that allow for better ROI on marketing initiatives?”

 

Its interesting project. Currently most companies that face this challenge. Imagine however the ability to segment customers in a totally new way? Discover common needs or characteristics not humanly possible or conceivable? What if we allowed artificial intelligence to apply its own way to “segment” or “cluster” ?

 

At HIVERY, we apply artificial intelligence to business problems. With this FMCG, we set the challenge to unpack new customer segments for our client using our proprietary machine learning framework, and run A/B market experiments to ‘test’ the effectiveness of these new “machine-conceived” segments.

 

Below is our highlight approach, at the end we are able to create a custom application application to allow the discovery of new segments and communicating and measuring that new segment possible using our proprietary machine learning framework.

 

 

Screen Shot 2016-04-03 at 10.17.44 AM
Gather the data, apply our AI framework (unsupervised learning algorithm), once new clusters are identified work with our client (i.e. domain expertise) to unpack and refine what they mean, communicate on new segments to stakeholders for buy in (i.e. marketing teams), test and measure marketing campaigns using A/B testing method, fine tune marketing actions and deploy wider.

 
So do this? Well, lets talk supervised and unsupervised learning algorithms. Supervised learning is when the dataset you feed your algorithm is done with “pre-define tags”. So this classification algorithm requires training data. Once the “machine learning training” is completed (i.e. a classification model is created), than the classification model is used to classify new datasets and help identify common needs or characteristics (based on pre-define tags). This how most customer segments are done. This is a “this is a female, age 40-50”, every time the machine recognises a “female, age 40-50”, it group them together.

 

 

In unsupervised learning algorithms, there are no “pre-define tags”, hence “”machine learning training” is not done at all. Here we allow the “machine learning” to identify pattens on its own. The way MACHINES SEE DATA is COMPLETELY different to the way HUMANS SEE DATA. This is why at HIVERY we say Data Has A Better Idea™.

 

Here is data set compared to the same cluster by an unsupervised learning algorithms.

Original vs clustered dataset using unsupervised learning algorithm

Original vs clustered dataset using unsupervised learning algorithm

 

 

In our example of “males vs females”, unsupervised learning algorithms might cluster based on a specific characteristic seen in the data itself (beyond what humans segment), instead of “gender” and “age”, it might be some other variable like “likely to commit fraud” or “strong likely to purchase an up-sell” or “will re-purchase within next 4 days”

 

New market segments based on unsupervised learning algorithm

New market segments based on unsupervised learning algorithm approach

 

 

I like Saimadhu Polamuri explanation of the difference between supervised and unsupervised learning algorithms, he talks about basket and it is filled with fresh fruits.

 

The task is to arrange the same type fruits at one place. Assuming the fruits are apple, pomegranate, banana,cherry and grape only.

 

So you already know from your previous learnt knowledge (which was trained in the past) to recognise the shape of each and every fruit so it is easy to arrange the same type of fruits at one place. As these fruits come in, you recognise them arrange or “cluster” the same type of fruits together, forming a different segments. This type of learning is called as supervised learning.

 

 

In unsupervised learning, suppose you are an alien from another world, and had the same basket and it is full with same fruits. Like before your task is to arrange the these one place. But this time you don’t know any thing about “fruits” – you are an alien after all! You are seeing these fruits for the first time, so how will you arrange them? You might decide to select any physical characteristic of that particular fruit. Suppose you take colour. Then you will arrange them base on the colour and go something like this:

  • Red colour group: apple, pomegranate & cherry
  • Green colour group: banana & grapes

Now you will take another physical characteristic like size, so now you groups them things like:

  • Red colour AND big one: pomegranate, apple
  • Red colour AND small one: cherry
  • Green colour AND small one: grapes
  • Green colour AND big one: banana

Here you haven’t learnt any thing before about “fruits” means no train data and non-response variable. This type of learning is know unsupervised learning.

 

Summary:

 

Supervised learning: discover patterns in the data that relate data attributes with a target (class) attribute. These patterns are then utilised to predict the values of the target attribute in future data instances.

 

Unsupervised learning: The data have no target attribute.

 

Both are type of classification techniques.

According to IBM, 90% of worlds data was created just in the last 6 years – 90%! Fuelled by the internet generation, each and every one of us is constantly producing and releasing data. Be it ourselves to companies to capturing customer information and sales transaction. The volumes of data make up what has been designated ‘Big Data’. This massive data sets are piling up year on year.   The problem is how to leverage this data to make better decision?

There are three (3) simple stages one needs to unpack to:  Define the Problem, Solve the Problem, Communicate Actionable Results Clearly.

 

 

 

1. DEFINE THE PROBLEM

 

 

The first, DEFINE THE PROBLEM. Speak with any startup or design thinker and they would say fall in love with the problem not the solution. In fact, Albert Einstein was said If I had only one hour to save the world, I would spend fifty-five minutes defining the problem, and only five minutes finding the solution. Coming up with solutions is easier. Solving the right problem and defining can be challenging. This is no different with big data projects and trying to leverage it to make better insightful decisions.

Framing the problem is about defining the business question you want analytics to answer and identifying the decision you make as a result. Its an pretty important step. If not don’t frame the right problem, no amount data or analysis in the world is going to get you the answers you are looking for.

Defining the problem is split into two parts, framing the problem (what your solving to frame) and reviewing previous findings (what worked or didn’t work) to help you refine the problem.

Framing the problem involves asking yourself “Why is that a problem?” . Toyota famously created the “five why’s” technique. Its about understanding the root cause of the problem. Designs companies like IDEO use phases with the words “How Might We…” to help frame the problem.

reviewing previous findings, involves finding out what worked in the past and why things didn’t work before. This also helps refine the problem.

 

 

 

2. SOLVE THE PROBLEM

 

 

The second stage is SOLVE THE PROBLEM. This often thought to be the primary one. This stage where you starting collecting the right variables (i.e. data fields), collecting sample data to test/play with, and doing some basic analysis to test assumptions quickly. This is also similar process as Cross Industry Standard Process for Data Mining, commonly known by its acronym CRISP-DM.

CRIP-DM is a data mining process model that describes commonly used approaches that data mining experts use to tackle problems.

 

 

800px-CRISP-DM_Process_Diagram

 

 

We at HIVERY we use similar simplified version called DEP – Discovery, Experiment/Pilot and Deployment.

 

 

Screen Shot 2016-03-27 at 3.12.50 PM

 

 

3.COMMUNICATE ACTIONABLE RESULTS CLEARLY

 

 

 

The third and final stage is COMMUNICATE ACTIONABLE RESULTS CLEARLY.  If you want anything to happen as results of stage 1 & 2, you got to communicate your results effectively. If a decision maker do not understand the analysis done or what the results means, he or she won’t be comfortable making a decision based on them. With our “communication-challenged” world, communicating sophisticated analytical results effectively and simply makes the world of different.

 

 

A good data visualisation books on this topic is called Storytelling with Data: The Effective Visual Communication of Information by Cole Nussbaumer Knaflic.

 

 

Data needs to be to engaging, informative, compelling. Human often use stories to communicate effectively and help create memorable knowledge transfer.