Boost.Python. Executing C++ inside a python environment

CppPython
Python is a fantastic scripting language having a simple and easy to learn syntax and style. At LHCb we use Python as a configurable to control the desired functionality that is usually embedded in C++. So why would you want to have two languages? Does that now make things more complicated? Well C++ is inherently much faster to operate so when we have to make decisions which need to be calculated within a few milliseconds we utilise the speed of C++. However, if you have a bunch of classes that are written and just need executing it can be much easier to run a python script to control and setup a job that we want to run as it is a much simpler scripting language. What usually takes many lines of code to do in C++ can be achieved in just a few lines of python. So, I wanted to show an example of a C++ class that can be used as an object in a python instance. The example follows on from the simple vector class (x,y,z) used in the tutorials, https://particlephysicsandcode.wordpress.com/learn-c-and-special-relativity. With this example we will see how Boost provides tools that converts your class into a fully fledged python object. This means an identical class can be used between platforms, python or C++. To get the source, wget the following:-

wget https://dl.dropbox.com/u/88131281/boost_python_example.tar.gz
tar xzvf boost_python_example.tar.gz
cd boost_python_example
chmod +x build.sh
./build.sh

I assume you have cmake, if not a simple “sudo apt-get install cmake” should do the trick. In the build directory created there will be a folder called Vector. The clever part is all done by the Boost.Python template, the one we use in this example is BOOST_PYTHON_MODULE(vectors). Using this wrapper class you can add all the functionality that your class exhibits then behind the scenes the C++ compiler converts this to C-binary understood by python. Running the python script you should see the following:-

matt@matt-W250ENQ-W270ENQ:~/C++/boost_python_example$ ./build/Vectors/vectors.py 
vec  = (1, -3.56019, 0.570154)
vec2 = (4.44649, 3, 0.478267)
vec3 = (5.44649, -0.560186, 1.04842)
(1, 1, 1)
(2, 5, 7)
8.83176086633
(R, Phi, Theta) = (8.83176086633, 1.19028994968, 0.655744935261)

These are a series of example just to show that the implementation is correct and gives the desired results, we can rotate about the axes, get the length of the vector, change to polar coordinates… Everything we can do in the C++ class. For this vector example we define the following forward declarations of our class:-

//...
// Add the required python headers
#include <boost/python.hpp>
#include <boost/python/operators.hpp>

//...
// The main class definitions
// ...

// Add the python module
using namespace boost::python;
BOOST_PYTHON_MODULE(vectors)
{
    boost::python::class_<ThreeVector>("ThreeVector")
        .def( init<ThreeVector>() )
        .def( init<double, double, double>() )
        .def(self_ns::str(self_ns::self))
        .def( self + ThreeVector())
        .def( self - ThreeVector())
        .def( self * ThreeVector())
        .def( self / ThreeVector())
        .def("setXYZ", &ThreeVector::setXYZ)
        .add_property("X", &ThreeVector::getX, &ThreeVector::setX)
        .add_property("Y", &ThreeVector::getY, &ThreeVector::setY)
        .add_property("Z", &ThreeVector::getZ, &ThreeVector::setZ)
        .add_property("R", &ThreeVector::getR, &ThreeVector::setR)
        .add_property("Theta", &ThreeVector::getTheta, &ThreeVector::setTheta)
        .add_property("Phi", &ThreeVector::getPhi, &ThreeVector::setPhi)
        .def("length", &ThreeVector::length)
        .def("rotateX", &ThreeVector::rotateX)
        .def("rotateY", &ThreeVector::rotateY)
        .def("rotateZ", &ThreeVector::rotateZ);

    boost::python::def("arctan", arctan);
    boost::python::def("scalarProduct", scalarProduct);
};

Firstly we declare the class “ThreeVector”. Then we declare the constructors with the init wrapper, “.def( init() )” followed by the copy assignment operator “.def(self_ns::str(self_ns::self))”. To define operators we use self along with the operator as written above “.def( self + ThreeVector())”. To access getter and setter functions we use the “.add_property(“Name”, &MyClass::getName, &MyClass::setName)”. Any remaining public member functions of the class can be added with the “.def(“func”, &MyClass::func)”. Anyway, that about sums it up for a class. If you have free functions that you want to forward declare for use in python use the def function as shown, “def(“arctan”, arctan);”. It is important to remember the semi-colon at then end of the module declaration and to use full stops after each class attribute that you specify.

Interpolation of a C++ class to a python one is simple with Boost and can be useful for anyone wishing to expand the usability of their code so that python programmers can use it, or simply allow one to write very quick programmes for testing. You can simply open a python terminal import the Vectors object and you’re in business.

Installing Open MPI 1.6.5 (Ubuntu 12.04, 13.04, fedora)

The OpenMPI logo. A box filled with useful tricks!
Open MPI (http://www.open-mpi.org )  is one of the most liberating tools out there. In a world where time is of the essence and most computers these days have more than one core, why let them sit around idle doing nothing, put them to good use!!

It stands for Message Parsing Interface (MPI) and allows you to do a whole manner of parallelised computing applications. For an in-depth idea of what it can do visit the website but a few key things that are useful to know are:

  1. Collective functions: Main example is the MPI Reduce function, which allows you to perform simple operations such as a summation using the family of processors to do so.
  2. Point-to-Point communication: Most common example I can think of is the use of a head node splitting a dataset into smaller memory chunks based on the number of sub-processors that are available, where each will then compute the same task in parallel. This operations protocol is usually called a master-to-slave process.
  3. Communicators: These connect all the MPI processes into groups, called process groups. An example is the world communicator which contains attributes such as the size (number of processors) and rank (ordered topology) of the group. They also allow for the manipulation of process groups.

How to Install
Open MPI is relatively simple to install, I should point out I have not tried this on a Fedora machine but as long as you have the same libraries/dependencies then it should be the same procedure. There are two methods; the first (and not my preferred method!) is to simply find the latest version from the on-line repositories as of today the latest version was 1.6.5. To do this you can

sudo apt-get update
sudo apt-get install -y autotools-dev g++ build-essential openmpi1.5-bin openmpi1.5-doc libopenmpi1.5-dev
# remove any obsolete libraries
sudo apt-get autoremove

That should get you what you need, if you are on Fedora simply “sudo yum search openmpi” that should bring you up something similar to openmpi, openmpi-devel. The next method and my preferred way as you get the most recent update version(currently 1.6.5) is to take it directly from the website. The below script worked for me on Ubuntu 13.04 tested on 25/07/2013.

The script below will install Open MPI in your /usr/local area, this can be modified by changing the parameter installDIR in the script to the desired location. After install the libraries are placed in $installDIR/lib/openmpi and you can now begin playing with Open MPI. One thing to note is that I apply ldconfig to the /usr/local/lib, this is a much better method than setting paths explicitly. To do this you need to modify your ld.so.conf.d directory to make it look in the /usr/local/xxx area if it doesn’t already. Now with Ubuntu you may already have this linked up so check if your machine has a file called “/etc/ld.so.conf.d/libc.conf” and a path explicitly showing “/usr/local/lib” and that should be all. If you do then you can ignore this step, else you can add the path using the following:

sudo echo "/usr/local/lib" >> /etc/ld.so.conf.d/local.conf
sudo ldconfig -v

I prefer this method as you do not have to keep adding things to your LD_LIBRARY_PATH all the time which is not really recommended, see http://www.xahlee.info/UnixResource_dir/_/ldpath.html for a couple of examples for the case against setting this path. I should have mentioned this in previous blogs too!!

# Matthew M Reid. install open mpi shell script

# install destination
installDIR="/usr/local"
# First get necessary dependencies
sudo apt-get update
sudo apt-get install -y gfortran autotools-dev g++ build-essential autoconf automake 
# remove any obsolete libraries
sudo apt-get autoremove

# Build using maximum number of physical cores
n=`cat /proc/cpuinfo | grep "cpu cores" | uniq | awk '{print $NF}'`

# grab the necessary files
wget http://www.open-mpi.org/software/ompi/v1.6/downloads/openmpi-1.6.5.tar.gz
tar xzvf openmpi-1.6.5.tar.gz
cd openmpi-1.6.5

echo "Beginning the installation..."
./configure --prefix=$installDIR
make -j $n
make install
# with environment set do ldconfig
sudo ldconfig
echo
echo "...done."

So finally to test the installation here is a simple example that just prints out some information from each processor.

#include <iostream>
#include <mpi.h>

int main(int argc, char *argv[])
{
    int numprocessors, rank, namelen;
    char processor_name[MPI_MAX_PROCESSOR_NAME];

    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &numprocessors);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Get_processor_name(processor_name, &namelen);

    if ( rank == 0 )
    {
        std::cout << "Processor name: " << processor_name << "\n";
	std::cout << "master (" << rank << "/" << numprocessors << ")\n";
    } else {
        std::cout << "slave  (" << rank << "/" << numprocessors << ")\n";
   }
   MPI_Finalize();
   return 0;
}

When compiling C++ with Open MPI you can use the executable wrapper, mpic++, which makes compiling much easier. On execution of your script you need to call mpirun where you can specify the number of processors via the -np flag. The output is as follows running from my local machine. Dont worry about the ordering in which these are spit back at you.

matt@matt-W250ENQ-W270ENQ:$ mpic++ -W -Wall test.cpp -o test.o
matt@matt-W250ENQ-W270ENQ:$ mpirun -np 4 test.o
Slave   (3/4)
Processor name; matt-W!112332-NZ10
Master  (0/4)
Slave   (1/4)
Slave   (2/4)

The thing I would like to come onto will be Boost.MPI. This is a very nice interface to the MPI framework and also allows the use of prerequisite parallel graph libraries, which have been well developed and implemented. So the next blog will be about installing Boost which alone has a vast amount to offer, followed by some examples of the Boost.MPI framework in action. The thing that is really clever about this is the Boost.Serialization architecture which allows you to send more complex data structures such as user defined classes via the MPI framework, so you can make “almost” anything parallel.