Machine Learning – Baby Steps

Machine learning (ML) is the “FUTURE”. I have been reading about it for quite some time now and I am pretty convinced by the statement. We are all talking about BigData, predictive analytics, etc. but really, when a system dude like me tries to foray into the field of ML, everything seems so overwhelming. The discussion starts with having millions of records (if you are lucky). Otherwise, it is 4TB of unstructured data as a start. Your systems brain tries to grasp the big picture and gets lost in trying to figure out the details. But well, after reading around, grappling, experimenting and reading a bit more, I think that learning ML is doable for us system dudes. You do not have to be a math genius (well, it helps if you are). But I will start with this blog of mine documenting the baby steps needed. I will use it as my reference and you can use it as yours if you find it useful.

In general, before we dive into the ‘code’, we need to understand how ML works. It is actually very simple from a 10000 feet view.

We have a blob of input data, we run it through some gears and out comes the prediction :D. But seriously, we have many different data streams with either structured or unstructured data, we create usable data i.e. refine the blob to only use data which we think is usable for our predictions and remove the unnecessary noise. Beware that these steps are iterative. So you keep going back to the same step again and again until you find something the most optimal model. The most optimal model definition is based on the measurement criteria setup before you try to solve whatever problem you are trying to solve using ML. The refined data is then fed into a model and the model output is measured against sample data. Of course, you now have supervised and unsupervised learning and associated algorithms and models. But 90% of ML is supervised learning i.e. the refined data has the answers your model is trying to predict. You make the model learn from part of the sample refined data and then execute the model on the rest of the data to understand if the model predictions were as expected. You simply rinse and repeat the process as many times as it takes. There are many frameworks (FW) and tools that we can use for ML as listed on KNuggets. But we will use something very simple to start with that helps us understand ML.

We will use Google’s TensorFlow (TF). I bet you have heard about it and it is probably the one that made you more interested in ML (at least that was for me). There are quite some sites which show how TF is superior to other FWs & tools so I won’t go into that. Okay now, after those basics, let’s start with our first baby step. We need to install/get tensor flow (TF) on our system for using it. But being the nice Docker enthusiast that we are, we will use it. I work primarily on Linux but AFAIK, Docker also works on Windows :). Download & run your local copy of TF using Docker by typing in the below:

docker run -it -p 8888:8888 tensorflow/tensorflow

Once everything is running, the terminal will show a link (with a token) which you can open in your browser. The first page just shows some list of sample files and stuff. And you don’t really know how to navigate around. No worries, we still do a hello world in tensor flow (and we are using python to do it). Click on new Python 2 (notebook). And a cell would be shown. I will try to add some screenshots later but it is pretty simple. Just follow the instructions ;). Now for the fun part, paste in the following (or type it in if you prefer) into the text box beside the In[]. The code is self-explanatory so I won’t bother.

# Import tensorflow
import tensorflow as tf

# Open up a tf session
sess = tf.Session();

# Print our hello World!
hello = tf.constant("Hello World from TensorFlow!");
print(sess.run(hello));

Now click on Cell & Run Cells in the page menu. You should be able to see the Hello World text. Yippee, this is your first baby step in TF. You can also do additions and all the normal python stuff that you can do. Just use the tf.constant() & sess.run(). Do some more experiements and be comfortable.

Next we will start doing something really cool. Make sure you save your notebook. ENJOY!