C++ neural networks applied to stock markets

stock market moves

I have quite an interest in random processes and exactly how we can go about looking for patterns in seemingly random phenomena. One interesting topic I wanted to bring up today was the use of neural networks to predict future stock market trends. What I will present is some free C++ code you can install and then try for yourselves whereby you can employ your own neural networks (NN) on real market data using some wrappers to ROOT (http://root.cern.ch/drupal/)! The strategy is based on buying a certain stock, in this case it is actually trading with a benchmark index known as the SNP500 (http://us.spindices.com/indices/equity/sp-500). The code is an extension to a wonderful package called Hudson written by Alberto Giannetti, he offers a GPL licensed end of day back testing framework using C++ stl, boost, ta-libs and gnu-gsl libraries ( original code base is here http://code.google.com/p/hudson/ ) where you can load in various formats of csv file (Yahoo, Google etc). This extension allows one to perform a multivariate analysis (MVA) which can decipher linear and non-linear trends between input variables of a given dataset and then test its performance. Please bear in mind that this was written during my PhD studies and was solely for fun one weekend. There are clear areas that can be improved in terms of the C++ I just don’t have the time right now, feel free to email me or leave a comment if you like for ways to aid the project. My extension can be obtained from the following location

# Download the repo
git co https://mattreid9956@bitbucket.org/mattreid9956/hudson-mva.git
cd hudson-mva
./build.sh
cd example/MVA_SNP500
# open the run.sh script and try executing the lines one by one to see if they all work, let me know if you have trouble.

This week there is an accompanying pdf file where you can find the detail of a study employed on SNP500 data obtained from Yahoo Finance.

TMVA-StockMarket.pdf

Anyway check out the pdf file if you find it remotely interesting give me a shout. It’s not “complete”, but close enough and I wanted to get it out there and see if anyone is interested in collaborating (I am trying to finish my PhD as well as several other things and don’t like leaving things on the back burner for too long, so here it is). Anyone up to date with particle physics techniques will immediately recognise the format here, anyone who is not will hopefully gain an insight into how some of these techniques can be applied in a toy model trading situation. A network such as this may not be valid indefinitely, its effectiveness would need continuous evaluation, since like I say in the paper, a market is not static and there will be periods of non-normal activity (government intervention for instance). What would be interesting is developing methods of when to turn on a trend following algorithm and when not to! Anyway, if you have any thoughts let me know!

Boost.Random and Boost.Accumulators – Part 2

pie-chart-of-procrastination

As the title picture suggests, you can use statistics in a variety of ways to get your point across. Ignoring this amusing abuse of statistics, I welcome you to part two of the scripts aiming to provide some insight into using Boost.Accumulators. As promised in the previous blog, Boost.Random and Boost.Accumulators – Part 1, I wanted to now show some examples of how to manipulate the Boost.Accumulator classes, such that we can make our own accumulator.

Boost.Accumulators

This blog aims at providing some examples using the Boost.Accumulators libraries which is a statistics based tool-kit. I will also provide an example of how to make your own templated accumulator, with a case in point being an Exponential Moving Average or EMA for short. Again this blog only touches upon some of the methods that I have found useful to now, with some more light reading of the Boost.Accumulators page (Boost.Accumulators) you will find many other statistical solutions that cater for a vast array of problems. Do not expect to be told exactly what each function does, there will be some slightly advanced implementation, comments and the rest is assumed knowledge unless you post on here with questions, which are welcome I might add. You can download the example code via the following

# Download the repo
wget https://dl.dropboxusercontent.com/u/88131281/BoostAccumulator.tar.gz
tar xzvf BoostAccumulator.tar.gz
cd BoostAccumulator/
make

Why Boost.Accumulators?
This templated library provides a convenient interface to various statistical methods such as the mean, nth order moments, skews and various other statistical tools which you can apply to your dataset. The simplest example proceeds by defining an accumulator that will calculate the mean given some set of numbers {1,2,3,4,5}, clearly this should return the value 3 since (1+2+3+4+5)/5=3. So lets start:

// stl includes
#include <iostream>

// Accumulator includes
#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics.hpp>

int main(){
  boost::accumulators::accumulator_set< double, boost::accumulators::stats< boost::accumulators::tag::mean > > acc;
  // We use an operator method to add the variables to the accumulator.
  acc(1.0);
  acc(2.0);
  acc(3.0);
  acc(4.0);
  acc(5.0);

  std::cout << "Mean value of dataset is: " << boost::accumulators::mean( acc ) << std::endl;
  return 0;
}

This very basic example should print out the following

Mean value of dataset is: 3

The accumulator is a heavily templated tool that lets us choose what types of statistics we want to compute on some type. In order to define the statistics type you will note we defined another templated argument inside the accumulator called boost::accumulators::stats. It is here we specified that it was a mean we wish to calculate which informs the accumulator of functions which the accumulator can call to apply to our dataset. The most simplistic form follow

boost:: boost::accumulators::accumulator< TYPE, boost::accumulators::stats< STAT1, STAT2, ... > > acc;

where TYPE is the dataset type; int, double, etc and the stats would be the list of calculable statistics we wish to return from the sample, in our case STAT1 = boost::accumulators::tag::mean and that was all (clearly one could use a “using namespace ” command here to shorten the lengthy text but I quite like knowing where things come from so I leave them in). From this you can do a whole bunch of things from means to moments and variances, but check on-line to get a full list. You can even add weights to each entry in the dataset (such as the calculation for a harmonic mean/ weighted average) so I put in the additional argument for show but without example.

boost:: boost::accumulators::accumulator< TYPE, boost::accumulators::stats< STAT1, STAT2, ... >, WEIGHT_TYPE > acc;
acc( p1, w_p1 );

here the WEIGHT_TYPE would be again an int, double etc and the line below shows how you would weight each point in your dataset with w_p1 being the weight applied to p1. Hopefully all pretty self explanatory. So the more useful aspect is the definition of your own types of accumulator and that is what I wanted to present now. An example of how to create an exponential moving average or EMA for short.

Building an EMA Boost.Accumulator?

EMA have various uses mainly in smoothing out a volatile dataset such that you can find some underlying trend. It is an extremely basic statistical tool and in some instances has advantages over simple moving average since it provides a larger weight to the last value, thus following the data more closely. The mathematical formalism is simple and follows

 EMA_{t} = \alpha V_{t} \left( 1 - \alpha \right)EMA_{t-1} where we find  \alpha = \frac{2}{N + 1}

N refers to the time period, V_{t} is the value at time t to evaluate the EMA, EMA_{t-1} is the previous value for the EMA. Essentially each time we update the EMA a re-weighting is applied that scales the value accordingly. The best way to see this in action is to run the example called all.cpp in the examples that can be downloaded at the top of the page and it calculates the corresponding EMA for the given set of data. The main class for the EMA is called EMA.cpp and contained in the include directory that comes with the package.