Configuring Theano For High Performance Deep Learning23rd May 2015
The topic of this post is perhaps a bit mundane, but after spending a considerable amount of time getting this right, I decided to put together a step-by-step guide so I would remember how to do it next time. And since I already went to the trouble, I might as well share it and hopefully save others a few headaches.
Theano, if you're not aware, is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It has a number of really interesting features, particularly its transparent use of a GPU for massively parallel computation and the ability to dynamically compile expressions into optimized C code. Theano provides developers with a general-purpose computing framework for very fast tensor operations and is widely used in the Python machine learning community, particularly for deep learning research. If you'd like to learn more, there's detailed documentation available here.
While Theano is very powerful, it's unfortunately not that easy to set up and configure properly. The website does provide instructions, but they're often inconsistent or out of date. There are also a number of blog posts and Stack Overflow questions floating around that discuss various elements of the process, but nothing that completely covered what I needed. So that's what this post will address.
To be clear, my goal was relatively simple - install and configure Theano with an efficient BLAS implementation for CPU operations, along with the ability to leverage a GPU when desired, on a clean install of Ubuntu 15.04 so I could run deep learning experiments. Although I'm running Ubuntu 15, this guide should also work on 14.xx. As a bonus, I'll also cover setting up a fresh install of Anaconda and downloading/installing a new but promising deep learning library called Keras.
First, open a terminal and run the following commands to make sure your OS is up-to-date.
sudo apt-get update sudo apt-get upgrade sudo apt-get install build-essential sudo apt-get autoremove
Next, install and set up Git. We need this to download and build OpenBLAS ourselves.
sudo apt-get install git git config --global user.name <name> git config --global user.email <email>
Next we need a fortran compiler (again required to build OpenBLAS).
sudo apt-get install gfortran
Now we're going to retrieve and build OpenBLAS. It's possible to skip this part and just download a pre-built BLAS package, but you'll get much better performance by compiling it yourself, and it requires relatively little effort. First create a Git folder (I added it to my home directory), then run these commands:
cd git git clone https://github.com/xianyi/OpenBLAS cd OpenBLAS make FC=gfortran sudo make PREFIX=/usr/local install
Next let's set up the required GPU tooling. If you're using a relatively modern NVIDIA card, your best bet is to use CUDA. It's possible to configure Theano using OpenCL as well, but I haven't tried this so I won't cover it here. There are a number of ways to install CUDA, and many guides advise you to download the binaries from NVIDIA's website and run some commands to install it, but actually there's a package available that installs everything you need. Run the following commands:
sudo apt-get install nvidia-current sudo apt-get install nvidia-cuda-toolkit
The first line installs NVIDIA's graphics card drivers, and the second line installs the CUDA tools. Note that the "nvidia-current" package may (despite what the name indicates) install an older version of the drivers than is actally available. Don't fall into the trap of thinking to have to use the legacy drivers to use CUDA. You don't! I ended up installing newer drivers from the system menu and it still worked fine.
You should restart your system after installing new drivers to make sure everything gets loaded properly. To verify that CUDA is installed, run this at the terminal (it should output some text that includes your graphics card model in it somewhere):
Next we'll get Anaconda set up, which will install Theano along with all of its dependencies. Download the binaries here and save them somewhere locally, then navigate to the path where you saved the file and run:
(note that the file name will change when a new version is published, so use the actual name of the file you downloaded)
To make sure your Anaconda install is up-to-date and all of Theano's dependencies are there, run a few statements at the terminal using the "conda" package manager:
conda update conda conda update anaconda conda install pydot conda update theano
We're now in the home stretch. All that's left are some configuration items to tell Theano where your BLAS/CUDA libraries are and how to use them. First, create a file called ".theanorc" in your home directory and add the following contents to it:
[global] device = gpu floatX = float32 [blas] ldflags = -L/usr/local/lib -lopenblas [nvcc] fastmath = True [cuda] root = /usr/lib/nvidia-cuda-toolkit
(switch the "device" flag to cpu when you want to use that instead)
Finally, we need to add the CUDA path to an environment variable that Theano also looks for. Run the following statement:
export LD_LIBRARY_PATH = "/usr/lib/nvidia-cuda-toolkit"
That's it! You're now set up to use Theano. In order to verify that everything is working, add the following code to "theano_test.py" in your home directory:
from theano import function, config, shared, sandbox import theano.tensor as T import numpy import time vlen = 10 * 30 * 768 # 10 x #cores x # threads per core iters = 1000 rng = numpy.random.RandomState(22) x = shared(numpy.asarray(rng.rand(vlen), config.floatX)) f = function(, T.exp(x)) print f.maker.fgraph.toposort() t0 = time.time() for i in xrange(iters): r = f() t1 = time.time() print 'Looping %d times took' % iters, t1 - t0, 'seconds' print 'Result is', r if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]): print 'Used the cpu' else: print 'Used the gpu'
Then run it from the terminal and verify that it completes using the correct device (based on your configuration earlier):
NOTE: At this point I ran into an issue that had me banging my head against the wall for a few hours. When I tried to run the above test, Theano could not access my GPU even though CUDA was configured correctly. Although the exact underlying cause is still unclear to me, it turns out that running any CUDA process with root access first will "initialize" things such that any future processes will run successfully without root access. This needs to be done once after each restart and then you're good to go. One possible solution to this is to run the above script with root access:
sudo python theano_test.py
However, this presented two new problems for me:
1) My Anaconda python distro wasn't properly linked while running with root, so it was using the base Linux install and couldn't find any of the libraries
2) The process creates a temporary folder for Theano's compiled code that was then inaccessible without root access, causing future attempts to use Theano to fail
I resolved both of these issues by creating a simple bash script that I run once each time I reboot the machine. Create a file called "init.sh" in your home directory and add the following lines of code to it:
/home/<user>/anaconda/bin/python theano_test.py cd .theano rm -rf compiledir_Linux-3.19--generic-x86_64-with-debian-jessie-sid-x86_64-2.7.9-64/
(make sure you replace "user" with your actual path name)
The folder with the compiled code may have a different name, but it appears to be the same each time Theano runs, so just check to see what yours looks like and put it in the script. Then run this each time you reboot:
sudo bash init.sh
Now you should be good to go. The last step is adding a library to build deep learning nets with. I recommend trying Keras. It's still a bit immature but already has a great feature set, and the API is the best I've seen for this kind of stuff. To get Keras installed, run the following commands from your git folder:
git clone https://github.com/fchollet/keras cd keras python setup.py install
That's all there is to it. You're now ready to start building ultra-fast deep learning nets. Enjoy!