AI and Data Science for Dummies — Chat with Classmates

Posted on

These are my replies to issues about AI and its business practise, debated among ~200 of my fellow classmates from IIT Bombay. They are edited slightly to preserve privacy, to remove particular references and for better narrative. This is the first part of a series of these posts. The second segment explores insights about ‘Why Doesn’t AI Work?’ , the third about ‘AI Hacks That Do Work.’ and the last about ‘Why and How to Get Started with AI.’

Data Science

At the simplest level data science is exactly that — a scientific analysis of data. In the fourth grade, when we all learned how to construct rudimentary graphs, we had become data scientists already.

You would think that I am exaggerating to make a point. Well, lookup Microsoft corporate strategy, and its concentration on a new product called PowerBI – they are making a major push on it as a way to cement Windows based systems in organisations. Then check up demos they have for PowerBI. There is enough available on YouTube. These demos discuss about dash-boarding and how the incredibly strong software can visualise your darkest, deepest data to generate outstanding plots. And then tell me if a fourth grader can’t construct those dashboards.

Of course, there is a lot more to PowerBI than producing bar graphs, but the point is that even at that simplest level data science can be quite effective. Add mean and standard deviation to it, and you have covered practically everything in the field of business analytics. The proliferation of sensors and other embedded devices has contributed significantly to the explosion in data size in recent years (IoT). But as a data analyst, the most difficult intellectual challenge will be learning how to clean the different data types instead of learning how to process them.

The Use of Synthetic Intelligence

There is a small segment of data science field that focuses on leveraging data to develop better programmes. Here is the intuition behind it. ‘Do X’ programmes are the simplest type. They are the backbone of the programming industry and have tremendous power.

Intelligent computer systems use the “if A then do X else do Y” logic. Almost all programming in the last century, and even most programming now, is as straightforward as that, therefore I won’t bother explaining it. Expert systems and rule engines are really just complex iterations of if-then expressions.

The original challenge that sparked the development of AI was whether or not a computer could learn to infer the presence of the aforementioned condition A and generate such rules on its own. The expression “If A, then do X; otherwise, do Y” can be rewritten as “c = Cx if A; else, do Cy,” where x and y are two possible actions. Suddenly this is as simple as a classification problem. If we are given a set of pre-labelled data points, can we find a model, A, which can classify a new data point to Cx or Cy (or one of a number of classes in the generalised case)?

If we’re successful, we won’t have to worry about the if-else statements. The only thing left to do is get our hands on the labelled data set (also known as training data), feed it into the machine, and then head for the hills. Algebra and statistics have taught us a wide variety of classification methods, including Nave Bayes, logistic regression, decision trees, and many others.

Congratulations! You have already created an AI if you have ever tried to fit a line to a set of data.

What’s the big deal here?

Then why all the fuss? Three things – one, this is a significant issue by itself. You have no clue how many artificially intelligent systems seldom employ anything more than probability. One common machine learning approach used to achieve greater complexity is called Random Forest. It includes generating decision trees based on several samples of the data, hence the forest, and then calculating the mode or the median of the decisions by each of the trees. Statistically speaking, there is nothing special here. Yet this is now enabling virtually every facet of human existence. Any direction you go in, you’ll probably find an intelligent machine like this guiding the way.

Neuronal Network

Second, they developed a neural network that they found out. Nodes in this network can be thought of as weighted sums. Each factor in the inputs is given some weight, and the total is then calculated. Simple

Make it happen, all right? One of my friends, John (not his real name), in his senior year of college was trying to win over Jane (not her real name), a fellow volunteer at an organisation called Magic Bus. Magic Bus works with disadvantaged kids by hosting camps and other activities. John’s decision tree for attending an event was straightforward: if she was going to be there, he would go despite the risks. If it was a party instead of a hike or camp, and the weather was fine, John would definitely attend.

The data scientist may have taken three binary variables, a = if Jane was going to attend, b = whether the event was a party, and c = whether it was going to rain, in order to plot John’s behaviour over the course of the year. One could easily write an equation, say p = w1.a + w2.b + w3.c, and then use a threshold to determine if John would attend the event. Everybody in the field of data science appears to be obsessed with this one single neuron. It could properly predict John’s actions with the right parameters.

Let’s say the weather report and the event’s nature played a role in Jane’s decision. Afterwards, there are two separate inputs, a hidden layer including a node for her decision (+ two to pass original inputs), and a final decision node. What about the two output layer nodes representing whether or not John wore his new jeans? It immediately takes on the appearance of a neural network.