Tensorflow mnist_deep.py OOM error when running on GPU!

I know you are interested in #MachineLearning and your first instinct is to use #TensorFlow (of course since it is backed by #Google) and you will probably find a lot of support with queries. The best part is that the available #Docker container will help you experiment and it simply runs out of the box. Look at my article on getting started with Tensorflow. Soon you will get bored by the amazingly slow executing speeds on CPU and will be thirsty to run it on your GPU because:

1. GPU is faster at running multiple parallel threads
2. GPU is faster at doing matrix/multi-dimensional operations
3. Your GPU is simply idling when you are running model training/inference on your CPU
4. Your coffee can only last for 2 minutes in which GPU can finish processing your model whereas, with CPU you will need at least 10 coffees (not good for health)
5. Last and not the least, go and get yourself an NVIDIA GPU if you haven’t already to save yourself time and achieve efficiency

In any case, all experts & hobbyists alike train their models on GPUs. And boy they have powerful GPUs (Quadro P6000 (:drooling)). But hobbyists/poor engineers like me do not have access to those professional grade GPUs. Instead, we need to get away with low-end GeForce graphic cards. Nothing wrong with those though. They plan amazing games. But there are some problems when running ML algorithms like mnist_deep (a DNN based sample application) that allocates huge amount of memory in one go. The problem executing this on GPU with less than 4GB of GDDR5 is that the application simply causes an Out Of Memory (OOM) exception.

Though there is a way to use batch-size (i.e. allocate memory in small chunks). The Tensorflow team talks about it in the documentation but does not have a proper example that we can follow. So after some search and find operations, I was able to get hold of the code. Below is the modified mnist_deep.py that runs on all GPUs irrespective of the memory they have. One can further reduce the batch-size if the problem continues.

Line #166 and #167 were the original lines which were trying to create the dictionary with the whole set of data in memory causing OOMs. Line #168 to Line #179 is the replacement code that divides the data into batch-sizes of 50. Nothing strange there, we are simply taking smaller number of images and calculating the accuracy. Hope it is easier for you to get the solution and don’t have to run around like me. ENJOY!