Installing nvidia CUDA toolkit – Linux (Ubuntu 13.10)

NVidia

CUDA is one of Nvidia’s greatest gifts for the everyday programmer who dreams of a parallelised world. With most computer/laptops now taking aboard at least one dedicated graphics processor, the prospect of parallel tools for all is becoming a reality. Soon enough everyone no matter what your sex, will be able to multi-task.

I write this as I recently had to upgrade my laptop. I wanted to reinstate my GeForce 640M Nvidia graphics card installing with the latest drivers (331.49) and software for CUDA (v5.5). This turns out is not so trivial as CUDA-5.5 will not run with the latest gcc compilers (gcc4.6+) at present, so you can’t use C++11 :(… However, I came across the two following posts which provided some novel solutions to this problem and now everything works great! I hope they may help someone else out there. Read these both first, just so you are aware of what need be done. The first is merely the process to install the drivers, the second provides more detail into what you should do to make it work on the latest gcc versions although it is a bit of a hack but does the job. Read them both!

How-to-manually-install-latest-nvidia-drivers-on-ubuntu

Incompatibility with GCC 4.7+

You can check the install was ok by running a simple example and/or typing the following command.

nvidia-smi

I will get around to posting more about CUDA in particular, the use of Thrust which is a STL like interface enabling you to devise neat and familiar C++ looking parallel code for a large chunk of parallel applications . It is seriously cool!

For a neat intro to parallel programming see these talks, they know their stuff:-

https://www.udacity.com/course/viewer#!/c-cs344/l-55120467/m-65830481

C++ neural networks applied to stock markets

stock market moves

I have quite an interest in random processes and exactly how we can go about looking for patterns in seemingly random phenomena. One interesting topic I wanted to bring up today was the use of neural networks to predict future stock market trends. What I will present is some free C++ code you can install and then try for yourselves whereby you can employ your own neural networks (NN) on real market data using some wrappers to ROOT (http://root.cern.ch/drupal/)! The strategy is based on buying a certain stock, in this case it is actually trading with a benchmark index known as the SNP500 (http://us.spindices.com/indices/equity/sp-500). The code is an extension to a wonderful package called Hudson written by Alberto Giannetti, he offers a GPL licensed end of day back testing framework using C++ stl, boost, ta-libs and gnu-gsl libraries ( original code base is here http://code.google.com/p/hudson/ ) where you can load in various formats of csv file (Yahoo, Google etc). This extension allows one to perform a multivariate analysis (MVA) which can decipher linear and non-linear trends between input variables of a given dataset and then test its performance. Please bear in mind that this was written during my PhD studies and was solely for fun one weekend. There are clear areas that can be improved in terms of the C++ I just don’t have the time right now, feel free to email me or leave a comment if you like for ways to aid the project. My extension can be obtained from the following location

# Download the repo
git co https://mattreid9956@bitbucket.org/mattreid9956/hudson-mva.git
cd hudson-mva
./build.sh
cd example/MVA_SNP500
# open the run.sh script and try executing the lines one by one to see if they all work, let me know if you have trouble.

This week there is an accompanying pdf file where you can find the detail of a study employed on SNP500 data obtained from Yahoo Finance.

TMVA-StockMarket.pdf

Anyway check out the pdf file if you find it remotely interesting give me a shout. It’s not “complete”, but close enough and I wanted to get it out there and see if anyone is interested in collaborating (I am trying to finish my PhD as well as several other things and don’t like leaving things on the back burner for too long, so here it is). Anyone up to date with particle physics techniques will immediately recognise the format here, anyone who is not will hopefully gain an insight into how some of these techniques can be applied in a toy model trading situation. A network such as this may not be valid indefinitely, its effectiveness would need continuous evaluation, since like I say in the paper, a market is not static and there will be periods of non-normal activity (government intervention for instance). What would be interesting is developing methods of when to turn on a trend following algorithm and when not to! Anyway, if you have any thoughts let me know!

Boost.Random and Boost.Accumulators – Part 2

pie-chart-of-procrastination

As the title picture suggests, you can use statistics in a variety of ways to get your point across. Ignoring this amusing abuse of statistics, I welcome you to part two of the scripts aiming to provide some insight into using Boost.Accumulators. As promised in the previous blog, Boost.Random and Boost.Accumulators – Part 1, I wanted to now show some examples of how to manipulate the Boost.Accumulator classes, such that we can make our own accumulator.

Boost.Accumulators

This blog aims at providing some examples using the Boost.Accumulators libraries which is a statistics based tool-kit. I will also provide an example of how to make your own templated accumulator, with a case in point being an Exponential Moving Average or EMA for short. Again this blog only touches upon some of the methods that I have found useful to now, with some more light reading of the Boost.Accumulators page (Boost.Accumulators) you will find many other statistical solutions that cater for a vast array of problems. Do not expect to be told exactly what each function does, there will be some slightly advanced implementation, comments and the rest is assumed knowledge unless you post on here with questions, which are welcome I might add. You can download the example code via the following

# Download the repo
wget https://dl.dropboxusercontent.com/u/88131281/BoostAccumulator.tar.gz
tar xzvf BoostAccumulator.tar.gz
cd BoostAccumulator/
make

Why Boost.Accumulators?
This templated library provides a convenient interface to various statistical methods such as the mean, nth order moments, skews and various other statistical tools which you can apply to your dataset. The simplest example proceeds by defining an accumulator that will calculate the mean given some set of numbers {1,2,3,4,5}, clearly this should return the value 3 since (1+2+3+4+5)/5=3. So lets start:

// stl includes
#include <iostream>

// Accumulator includes
#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics.hpp>

int main(){
  boost::accumulators::accumulator_set< double, boost::accumulators::stats< boost::accumulators::tag::mean > > acc;
  // We use an operator method to add the variables to the accumulator.
  acc(1.0);
  acc(2.0);
  acc(3.0);
  acc(4.0);
  acc(5.0);

  std::cout << "Mean value of dataset is: " << boost::accumulators::mean( acc ) << std::endl;
  return 0;
}

This very basic example should print out the following

Mean value of dataset is: 3

The accumulator is a heavily templated tool that lets us choose what types of statistics we want to compute on some type. In order to define the statistics type you will note we defined another templated argument inside the accumulator called boost::accumulators::stats. It is here we specified that it was a mean we wish to calculate which informs the accumulator of functions which the accumulator can call to apply to our dataset. The most simplistic form follow

boost:: boost::accumulators::accumulator< TYPE, boost::accumulators::stats< STAT1, STAT2, ... > > acc;

where TYPE is the dataset type; int, double, etc and the stats would be the list of calculable statistics we wish to return from the sample, in our case STAT1 = boost::accumulators::tag::mean and that was all (clearly one could use a “using namespace ” command here to shorten the lengthy text but I quite like knowing where things come from so I leave them in). From this you can do a whole bunch of things from means to moments and variances, but check on-line to get a full list. You can even add weights to each entry in the dataset (such as the calculation for a harmonic mean/ weighted average) so I put in the additional argument for show but without example.

boost:: boost::accumulators::accumulator< TYPE, boost::accumulators::stats< STAT1, STAT2, ... >, WEIGHT_TYPE > acc;
acc( p1, w_p1 );

here the WEIGHT_TYPE would be again an int, double etc and the line below shows how you would weight each point in your dataset with w_p1 being the weight applied to p1. Hopefully all pretty self explanatory. So the more useful aspect is the definition of your own types of accumulator and that is what I wanted to present now. An example of how to create an exponential moving average or EMA for short.

Building an EMA Boost.Accumulator?

EMA have various uses mainly in smoothing out a volatile dataset such that you can find some underlying trend. It is an extremely basic statistical tool and in some instances has advantages over simple moving average since it provides a larger weight to the last value, thus following the data more closely. The mathematical formalism is simple and follows

 EMA_{t} = \alpha V_{t} \left( 1 - \alpha \right)EMA_{t-1} where we find  \alpha = \frac{2}{N + 1}

N refers to the time period, V_{t} is the value at time t to evaluate the EMA, EMA_{t-1} is the previous value for the EMA. Essentially each time we update the EMA a re-weighting is applied that scales the value accordingly. The best way to see this in action is to run the example called all.cpp in the examples that can be downloaded at the top of the page and it calculates the corresponding EMA for the given set of data. The main class for the EMA is called EMA.cpp and contained in the include directory that comes with the package.

Python Google Trends API

google

EDIT: FYI PEOPLE —> I RETIRED THE REPO AT BIT BUCKET, not to panic though, dreyco676 has a version for python 3 and it is working well as of 21/10/2014. Please see https://github.com/dreyco676/pytrends.

Hi once again, slight detour but thought I should share this. I found the original code on-line and with a few tweaks, managed to get it to do what I wanted and so here it is for anyone interested in this sort of thing. In a world where large datasets are becoming ever more available to the average Joe, Google are doing their bit by allowing you to see what historic rates of search terms have occurred over a given time period, essentially allowing one to look back at what people have been thinking about. You can try this for yourself by following this link to http://www.google.co.uk/trends homepage. As an example lets say I was curious to see how often people search for the term fruit, I would get the following display.

googletrend

(ARGH, since this blog is free I cannot embed the link!!) There are plenty of ways one can then analyses these forms of data whether it be sentiment indicators such as the stock market, movie hits based on search results, when are most babies born and other such seasonal traffic patterns just as some examples.

Anyway the code is simple python script and follows a simple example so check it out. It does require you to login to your Google account so that it can cache the cookies so if you don’t have one you can always get each “.csv” file you need directly from the Google Trends website posted above. EDIT 20/01/2014: You will not be able to login if you have Google additional security running, this is because you get a redirect java session that will wait for a password that is sent to you by mobile, thus the script will never know what that is and is not written in a way to accept it as input. Don’t turn your security off to use this script, thats just stupid, instead just try/make a different gmail account, sorted!!

# Download the git repo.
git clone https://github.com/dreyco676/pytrends.git
cd pytrends
./example.py

EDIT: –> As mentioned above please use the pytrends version on github. I will not be supporting mine. There is an example called example.py, run this and you sould download a search for “pizza”!

This simple example will go and grab a load of trend data provided in the python list and store each in a .csv file. You can also check out the repository directly at https://bitbucket.org/mattreid9956/google-trend-api. The formatting is such that you are returned the end of week date for the whole week and the trend value over that period, this is supposed to make life easier should one run an analysis later. One could easily modify this script to get the desired formatting, note that period in which you search will change the granularity of the time window. For instance searches for 3 months will return daily results, where as searches over a year will return the accumulated results over a given week. This is a little annoying an I don’t see why Google won’t allow daily results by default, maybe time to ask them! Watch this space…

Boost.Random and Boost.Accumulators – Part 1

Wallpapers-random-30957435-1920-1080Boost

 

Bonjour! This is the first post of two which will discuss Boost’s random number generators and statistical accumulator functionality. They are both useful for Monte Carlo and statistical studies and so I will outline their use in turn in separate postings.

Boost.Random To begin with we will take look at an example of how to set up a random number generator for a given distribution. To begin I will point out what random distributions Boost comes pre-packed with, then provide an example using a uniform distribution. I will then post a link to a wrapper class that allows you to use 9 different random distributions. The official documentation for Boost.Random is always the best place to start! I hope to provide some insight into what I would deem the most common distributions but you will come across far more with some light reading of the Boost.Random pages (Boost.Random), which in my opinion, are some of the better documented in the Boost collection.

Why Boost.Random? The basis for random number generation is built in a series of software implemented algorithms that when given a “seed” or initial state, go on to generate a sequence of random numbers. These types of applied algorithms are usually known as pseudo-random number generators (PRNG). There are many use cases for pseudo-random number generators including but not limited to – encryption, simulations, games, modelling random processes as well as many other real world applications. Of course there is no such thing as a completely random number generating algorithm, if we know the initial conditions then we can work out the random number that would be produced (if we could be bothered to follow the calculation). If we take a large enough ensemble of these random events, they will display similar behaviour to that of the requested distribution. I want to introduce the following examples provided by the Boost.Random library:-

  1. Uniform distribution – A function with constant probability that is defined on the bounded region, f(x;a,b) with x \in (a,b). An example could be some form of combinatorial background, such that random combinations populate the probability space but have no peaking structures; exhibiting a flat (or uniform) distribution.
  2. Gaussian distribution – A function containing a mean and the spread about that mean, taking two arguments f(x;\mu,\sigma). This is one of the most common statistical distributions there are!
  3. Exponential distribution – Generally used to model waiting times between events, such as a Poisson distribution. All these concern the time we need to wait before a given event occurs. If this waiting time is unknown, it is often appropriate to think of it as a random variable having an exponential distribution
  4. Gamma distribution – Based on the \Gamma(\alpha,\beta) function.
  5. Chi-Squared distribution –  It is one of the most widely used in hypothesis testing or in the construction of confidence intervals. Can provide measures of goodness for a fit to data points for example.
  6. Cauchy distribution (to those particle physicists out there, known as the non-relativistic Breit-Wigner distribution) – distribution which has no mean, variance or higher moments defined and spans the range x \in (-\infty,\infty).
  7. Poisson distribution – A discrete process, depicts the probability for the number of events occurring within some time frame or spacial window, classic example is radioactivity (nuclear decay) but there are many other applications.
  8. Binomial distribution – This again is a discrete random process, governs the probability of getting k successes given n trials and probability p for each test. Classic example is a coin toss.
  9. Triangular – Random distribution returning the distance between 3 random numbers.

For those interested, there are plenty of resources on-line, for instance see probability distributions.


Boost.Random the Code!

It all starts with the inclusion of the “boost/random.hpp” header file. This header calls all the necessary classes and constructs that we need for the random number distributions. I thought for now I can outline a very simple example of how to set up a random number distribution, then from there will introduce a simple class I made that has those listed above in a configurable way. The following example like I say is a simple explicit display of how to set up an integer uniform random number generator in the range (0,10).

#include "boost/random.hpp"
// Initialise Boost random numbers, uniform integers from min to max
 const int rangeMin = 0;
 const int rangeMax = 10;
 typedef boost::uniform_int<> NumberDistribution; // choose a distribution
 typedef boost::mt19937 RandomNumberGenerator;    // pick the random number generator method,
 typedef boost::variate_generator< RandomNumberGenerator&, NumberDistribution > Generator;  // link the generator to the distribution

 NumberDistribution distribution( rangeMin, rangeMax );
 RandomNumberGenerator generator;
 Generator numberGenerator(generator, distribution);
 generator.seed( seed ); // seed with some initial value

 int N(100);
 for (int i(0); i < N; ++i) {
  std::cout << numberGenerator() << std::endl; // each time the the operator()
 }

There are several typedef‘s in the example which just make for convenient coding, and these could easily be replaced by some other attribute to those provided. For instance, the random number generator is that of the Mersenne Twister algorithm, boost::mt19937, and the distribution is that of a uniform variate between 0 and 10, boost::uniform_int();. Feel free to browse generators or distributions respectively for an exhaustive list of other generators and distributions. The generator and distribution are then bound by the use of boost::variate_generator(RandomNumberGenerator&, NumberDistribution). This means that each time the numberGenerator() method is called a new random value will be generated, as the variate_generator increments the generator each time such that a new value is created. So hopefully that gives a brief overview. What I next outline is a wrapper class I made to make these a little more accessible and easy to configure. You can download a header file called BoostRandom.hpp which configures all of the 9 distributions I outline above. To get the code and some examples please get the following

# Download and execute variable binning macro.
wget https://dl.dropboxusercontent.com/u/88131281/BoostRandom.tar.gz
tar xzvf BoostRandom.tar.gz
cd BoostRandom
make
./bin/gauss.exe
./bin/all.exe

The first example shows a Gaussian distribution using the configurable, the second shows how we can set many distributions at the same time generating various random numbers at once. Have fun with it, I hope it is useful. The only thing that must always be done is to set the parameters for the distribution you want to use, there are no defaults! Also do not set the template to integers. Quite simply there is a type trait set in Boost.Random such that a uniform distribution between (0,1) always requires floating point precision. This kind of makes sense since an integer in this range would simply be 0 or 1 which would not require much effort. The Gaussian distribution (as well as others) uses such a uniform distribution as a building block since you can use a Box-Muller transformation to generate other distributions; like a Gaussian distribution. Hence only use this code for types that are: float, double and long double’s. Enjoy! A simple example is posted below so you can see how easy it is to configure the class.

// simple example of Gausian
#include <iostream>

// BoostRandom header
#include "BoostRandom.hpp"

using namespace boost::distribution;
int main() {
    
    // BoostRandom is templated, need to pass < NumberGenerator, precision type > e.g. BoostRandom< boost::mt19937, double >.
    // This uses the Mersenne Twister algorithm and is the one I use most of the time so there is a typdef to this BoostRandomD
    BoostRandomD random( Poisson, 0 );  // 0 could be std::time(0);
    random.setParams( Poisson_Mean, 5 );  
    std::cout << random() << std::endl;
    return 0;
}

The above code would simply print out 1 random number sampled from a Poisson distribution, but I think this highlights the ease of generation. There are a few useful functions in there such as “simulate” where you fill stl containers immediately with your desired distribution and however many variables you so choose.


Example Distributions!

See the below links to view all the distributions available in the BoostRandom class.

uniform triangular poisson gauss gamma exponential chi2 cauchy binomial

Boost.Python. Executing C++ inside a python environment

CppPython
Python is a fantastic scripting language having a simple and easy to learn syntax and style. At LHCb we use Python as a configurable to control the desired functionality that is usually embedded in C++. So why would you want to have two languages? Does that now make things more complicated? Well C++ is inherently much faster to operate so when we have to make decisions which need to be calculated within a few milliseconds we utilise the speed of C++. However, if you have a bunch of classes that are written and just need executing it can be much easier to run a python script to control and setup a job that we want to run as it is a much simpler scripting language. What usually takes many lines of code to do in C++ can be achieved in just a few lines of python. So, I wanted to show an example of a C++ class that can be used as an object in a python instance. The example follows on from the simple vector class (x,y,z) used in the tutorials, https://particlephysicsandcode.wordpress.com/learn-c-and-special-relativity. With this example we will see how Boost provides tools that converts your class into a fully fledged python object. This means an identical class can be used between platforms, python or C++. To get the source, wget the following:-

wget https://dl.dropbox.com/u/88131281/boost_python_example.tar.gz
tar xzvf boost_python_example.tar.gz
cd boost_python_example
chmod +x build.sh
./build.sh

I assume you have cmake, if not a simple “sudo apt-get install cmake” should do the trick. In the build directory created there will be a folder called Vector. The clever part is all done by the Boost.Python template, the one we use in this example is BOOST_PYTHON_MODULE(vectors). Using this wrapper class you can add all the functionality that your class exhibits then behind the scenes the C++ compiler converts this to C-binary understood by python. Running the python script you should see the following:-

matt@matt-W250ENQ-W270ENQ:~/C++/boost_python_example$ ./build/Vectors/vectors.py 
vec  = (1, -3.56019, 0.570154)
vec2 = (4.44649, 3, 0.478267)
vec3 = (5.44649, -0.560186, 1.04842)
(1, 1, 1)
(2, 5, 7)
8.83176086633
(R, Phi, Theta) = (8.83176086633, 1.19028994968, 0.655744935261)

These are a series of example just to show that the implementation is correct and gives the desired results, we can rotate about the axes, get the length of the vector, change to polar coordinates… Everything we can do in the C++ class. For this vector example we define the following forward declarations of our class:-

//...
// Add the required python headers
#include <boost/python.hpp>
#include <boost/python/operators.hpp>

//...
// The main class definitions
// ...

// Add the python module
using namespace boost::python;
BOOST_PYTHON_MODULE(vectors)
{
    boost::python::class_<ThreeVector>("ThreeVector")
        .def( init<ThreeVector>() )
        .def( init<double, double, double>() )
        .def(self_ns::str(self_ns::self))
        .def( self + ThreeVector())
        .def( self - ThreeVector())
        .def( self * ThreeVector())
        .def( self / ThreeVector())
        .def("setXYZ", &ThreeVector::setXYZ)
        .add_property("X", &ThreeVector::getX, &ThreeVector::setX)
        .add_property("Y", &ThreeVector::getY, &ThreeVector::setY)
        .add_property("Z", &ThreeVector::getZ, &ThreeVector::setZ)
        .add_property("R", &ThreeVector::getR, &ThreeVector::setR)
        .add_property("Theta", &ThreeVector::getTheta, &ThreeVector::setTheta)
        .add_property("Phi", &ThreeVector::getPhi, &ThreeVector::setPhi)
        .def("length", &ThreeVector::length)
        .def("rotateX", &ThreeVector::rotateX)
        .def("rotateY", &ThreeVector::rotateY)
        .def("rotateZ", &ThreeVector::rotateZ);

    boost::python::def("arctan", arctan);
    boost::python::def("scalarProduct", scalarProduct);
};

Firstly we declare the class “ThreeVector”. Then we declare the constructors with the init wrapper, “.def( init() )” followed by the copy assignment operator “.def(self_ns::str(self_ns::self))”. To define operators we use self along with the operator as written above “.def( self + ThreeVector())”. To access getter and setter functions we use the “.add_property(“Name”, &MyClass::getName, &MyClass::setName)”. Any remaining public member functions of the class can be added with the “.def(“func”, &MyClass::func)”. Anyway, that about sums it up for a class. If you have free functions that you want to forward declare for use in python use the def function as shown, “def(“arctan”, arctan);”. It is important to remember the semi-colon at then end of the module declaration and to use full stops after each class attribute that you specify.

Interpolation of a C++ class to a python one is simple with Boost and can be useful for anyone wishing to expand the usability of their code so that python programmers can use it, or simply allow one to write very quick programmes for testing. You can simply open a python terminal import the Vectors object and you’re in business.