To Docker or to Vagrant

To Docker or to Vagrant, an eternal struggle for a dev-ops / smart people (like me!) who wants to automate stuff is a big life and death question! No seriously, I mean there are just 2 camps of people, one is hardcore docker supporters who want docker everywhere (without realizing that docker is not a full virtualization solution but uses Linux kernel’s namespaces and cgroups functionality. So what does it mean? Well for starters, anything that requires mounting operations inside the docker container, for instance, requires special privileged access and these have the possibility to change your actual host filesystem (eg: mounting filesystems inside the container will mount it indirectly on your host as well and rest is history. It is not just mount operations, simple things like socket access, changes to locale or local-gen will not work inside docker containers without passing special access privilege flags while running the docker container.

Vagrant is a straightforward configuration script for Virtual Box (VB) which is a full virtualization solution. So one basically installs an OS and does everything inside that container itself. Also, vagrant vbox files are easier to share (since it is a single file itself, unlike the layered approach in Docker). In any case, doing privileged operations in VB/Vagrant is more reassuring then doing the same on Docker. So even though I am a docker supporter, I also like vagrant because of the full virtualization support and next to impossibility to mess up with your host system.

Anyways, that’s that. So to Docker or to Vagrant is a personal choice depending on the use case one is trying to implement as well as the need for distribution (private or public). Cheers & send in your comments.

Update Anaconda Navigator

Everybody using Python would nowadays be using Anaconda instead. And if you aren’t, my recommendation is to use it. Why may you ask? Because with python comes a lot of packages and each such package comes with a host load of dependencies and it is difficult & time-consuming to resolve those dependencies manually. Also, anaconda allows one to create multiple environments (basically environment containers for package isolation). So let’s say you need package1 for some type of work and package2 for some other type of work. But package2 inherently depends on a different version of package1. Now if you have the same environment, you will have conflicts but creating different isolated environment containers helps the use case.

Anyways, just use it. Its simple and allows you to spend more time focussing on the business logic. Now then, once you have installed Anaconda and when you launch the Anaconda Navigator application, it will give you an upgrade dialog box showing that a new version is available. If you have installed it somewhere in “Program Files”, the auto-upgrade mechanism will not work. So instead of being frustrated or trying to install it somewhere else, go through the following steps.

1. Open the command prompt using “Run as Administrator” option from Windows Start Menu
2. Go to the location where conda.exe is installed. In my case, it was ” C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\Scripts ” (Yes, I got the anaconda installed with Visual Studio Community Edition :).
3. Execute the following code
conda.exe update --prefix "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64" anaconda-navigator

Now your anaconda-navigator will be upgraded and something similar to the following text will be seen.

Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64:

The following packages will be UPDATED:

anaconda: 5.0.0-py36hea9b2fc_0 --> custom-py36h363777c_0
anaconda-navigator: 1.6.8-py36h4b7dd57_0 --> 1.6.12-py36hdad2993_0
conda: 4.3.27-py36hcbae3bd_0 --> 4.4.7-py36_0
pycosat: 0.6.2-py36hf17546d_1 --> 0.6.3-py36h413d8a4_0

Proceed ([y]/n)? y

anaconda-custo 100% |###############################| Time: 0:00:00 131.17 kB/s
pycosat-0.6.3- 100% |###############################| Time: 0:00:00 662.27 kB/s
conda-4.4.7-py 100% |###############################| Time: 0:00:01 904.57 kB/s
anaconda-navig 100% |###############################| Time: 0:00:02 1.96 MB/s
DEBUG menuinst_win32:__init__(185): Menu: name: 'Anaconda${PY_VER} ${PLATFORM}', prefix: 'C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64', env_name: 'None', mode: 'None', used_mode: 'system'
DEBUG menuinst_win32:__init__(185): Menu: name: 'Anaconda${PY_VER} ${PLATFORM}', prefix: 'C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64', env_name: 'None', mode: 'None', used_mode: 'system'

Don’t worry about the DEBUG prints, it just says that I have run the command outside of a container environment. Now voila, your anaconda navigator is updated. As a bonus, you might also see a list of packages that are upgradeable in the UI. The UI, unfortunately, will now allow you to carry on the upgrades. Instead, just write the following in an environment window (a python window opened via the environment from anaconda-navigator)

conda update --all

If you run the command outside of an environment, the base/root environment will be updated. You can also use the –name argument to the command to give a specific environment name. Running it from inside a contained environment will only update the packages for that environment. To update the environment itself, you can use  conda env update . Check the conda documentation for the same.

So long people, ENJOY learning!

Remote control your Linux Workstation (Efficiently)

There are many different solutions to remote control your Linux workstation but most of them are not effective. For example, one can always ssh into the workstation and forward X. Well, this kind of works but has several drawbacks; the major one being that all the processes you started with your ssh session will be killed when you loose your SSH session. I call nohup, screen, etc. as bypass methods because they essentially try to give you a workaround solution instead of actually solving it. And if you are like me who keeps logging in and out and starting/stopping scripts, I am pretty sure you will forget to use those just like me and login after a good night sleep to find out that your compile script got terminated because you forgot to screen it! Also, SSH with X forwarding typically needs an X-Server on your host machine and can be a PIA to setup if you are on Windows. Yes, yes, we can all use MobaXTerm but hey, believe me, X-server is heavy in terms of resource usage on Windows. Also automatic file changes are not detected with that solution.

Next in the queue of nonviable solution is x2go. I haven’t been able to use it properly because the X-server on my windows machine keeps crashing whenever I have a different resolution than the one on the workstation. So that is a big failure. Then minors on the list are VNC, RDP, XRDP, etc. All have their drawbacks. For VNC, it is excruciatingly slow when I use it. Also the ability to dynamically change screen resolutions is tossed out of the window. Yes, I know about the geometry setups in .vncstartup but hey, I am in no mood to write down all the resolutions that I want to use in that file! It is NOT EFFICIENT. RDP and XRDP rarely if ever works and again is affected by the slowness.

So the last remaining option is to use NX from NoMachine. Both the NXServer and the NXClient are actually free for personal use. One needs to install the server on Linux (pretty simple with the deb file provided). And then use the NXClient on windows and it works like a charm. Of course again, the whole thing becomes mighty slow under the following conditions:
– Changing the resolution of the remote display to a different one than the monitor connected to
– Using a virtual display instead of the actual physical one

Now, there are multiple reasons to use virtual displays with the simplest being that I can turn off my monitor and still use NX. And the person sitting on the desk besides my Linux workstation does not see a ghost operating it. And yes, there is an option to turn off physical display when someone connects, but as soon as you logoff, the physical screen becomes active in its full form. And if you forget to lock the computer before disconnecting your NX client, the person besides you gets access to your screen and can start doing weird stuff on it. Believe me, I have a monkey friend who used to do that and one time, he simply shutdown the machine to annoy me and get me to commute the whole 25KM (he wanted to have lunch with me!). Besides that, as I wrote earlier, I want to keep changing the resolutions as I work. When I am editing stuff, I need lower resolution bigger fonts, when it is compiling/running scripts, I can go with the default 2K resolution.

Virtual displays are a great way to do this. So now you want to evaluate the terminal server from NX but don’t. Use the LTSP (Linux Terminal Server Project) which works amazingly well with the NX protocol. Just

sudo apt-get install ltsp-server-standalone

and you are off to go. Now you will have a virtual desktop that you can connect to (in fact as many as you want) without having to cough up the precious greens or settling for something low grade. You can always ssh and have NX running simultaneously. Best part with NX is that you can pickup from where you left off. Yes, now that is what I call efficient.

ENJOY your remoting Linux workstation from Windows and still being happy!

Tensorflow mnist_deep.py OOM error when running on GPU!

I know you are interested in #MachineLearning and your first instinct is to use #TensorFlow (of course since it is backed by #Google) and you will probably find a lot of support with queries. The best part is that the available #Docker container will help you experiment and it simply runs out of the box. Look at my article on getting started with Tensorflow. Soon you will get bored by the amazingly slow executing speeds on CPU and will be thirsty to run it on your GPU because:

1. GPU is faster at running multiple parallel threads
2. GPU is faster at doing matrix/multi-dimensional operations
3. Your GPU is simply idling when you are running model training/inference on your CPU
4. Your coffee can only last for 2 minutes in which GPU can finish processing your model whereas, with CPU you will need at least 10 coffees (not good for health)
5. Last and not the least, go and get yourself an NVIDIA GPU if you haven’t already to save yourself time and achieve efficiency

In any case, all experts & hobbyists alike train their models on GPUs. And boy they have powerful GPUs (Quadro P6000 (:drooling)). But hobbyists/poor engineers like me do not have access to those professional grade GPUs. Instead, we need to get away with low-end GeForce graphic cards. Nothing wrong with those though. They plan amazing games. But there are some problems when running ML algorithms like mnist_deep (a DNN based sample application) that allocates huge amount of memory in one go. The problem executing this on GPU with less than 4GB of GDDR5 is that the application simply causes an Out Of Memory (OOM) exception.

Though there is a way to use batch-size (i.e. allocate memory in small chunks). The Tensorflow team talks about it in the documentation but does not have a proper example that we can follow. So after some search and find operations, I was able to get hold of the code. Below is the modified mnist_deep.py that runs on all GPUs irrespective of the memory they have. One can further reduce the batch-size if the problem continues.

Line #166 and #167 were the original lines which were trying to create the dictionary with the whole set of data in memory causing OOMs. Line #168 to Line #179 is the replacement code that divides the data into batch-sizes of 50. Nothing strange there, we are simply taking smaller number of images and calculating the accuracy. Hope it is easier for you to get the solution and don’t have to run around like me. ENJOY!

Automate Your Twitter with a BOT!

We all have Twitter accounts and we all tweet stuff. But sometimes, we all suffer from information overload. And we all need to get more followers right? So one of the things we can do is automate Twitter posts using a tweet bot. I am using nodejs as my framework (simply because it works for webapps pretty good). You can search on the web to understand how to install nodejs and npm. Use the latest version and it should work fine. I am also using a Twitter API client for nodejs known as Twit. Twit is a very simple library that wraps all the twitter APIs in easy to use function calls. The code is pretty straight forward as can be seen below.

But first, go to https://dev.twitter.com/ and register your app. You will now get a consumer key and a consumer secret. Authenticate the app to use your twitter account which will give you additional access token and access token secret. Create an environment file and store these in there. If you know javascript, the code is very easy to follow. We first create a Twit client, give it the secret keys that will allow it to authenticate our account and use the secrets to post messages on our behalf. Go through the API docs for Twit, they are very extensive.

But if you are just plain lasy, copy the code I have written, modify the environment variables and get going. You can add more code as needed. Be sure to share it though. Now, either you can deploy it locally on your machine or use a cloud hosting service supporting nodejs hosting. My provider does not support nodejs hosting and I am in no mood to change them (YET) and I don’t want it hosted locally since I am not the guy who keeps my app running all the time (though seems a good idea now) and I am not the one to spend bucks on another hosting provider. So I have 3 options.

1. https://glitch.com/
2. https://www.heroku.com/
3. https://www.openshift.com/

Of course, there are more but you can use either of these to host your app for free. The only limitation is that your app will be killed if not accessed within 30 minutes. So you simply need to login to your computer and keep pinging the hosted URL :D. Naah.. I am kidding. We can start a cron job and use many of the free cron jobbing sites like:

1. https://cron-job.org/en/
2. https://uptimerobot.com/

For my twitter bot, I am using heroku and cron-job.org. Between, have a look at my bot at https://twitter.com/perfmetrics_bot. And besides retweeting, you can also use the bot to greet your new followers, say good by to people who unfollow you, gather some statistics, etc. I will write a different article on that one (once I update my bot). Between, be sure to follow my proper Twitter Account https://twitter.com/wolverine2k and show your support.

ENJOY!