Friday, 18 September 2015

Basic statistics using higher-order functions in C++

I do a lot of individual-based simulations, and often I want to calculate some population statistics 'on the fly'. I found that it can be helpful to use C/C++'s basic functional capabilities.

A higher-order function is a function that takes other functions as arguments (or returns a function as a result). In C/C++, functions can be passed to other functions, but the notation can be a bit awkward, as one needs to pass a reference to a function. If the function is a method of some class, then the notation gets even more involved. You can make your life easier by using a typedef.

The following code snippet shows the header file of a simple example. The goal is to calculate some statistics on a population of "Critters". These Critters have multiple "traits", and the traits are accessed by methods of the class Critter of signature "double Critter::trait() const". Suppose that we want to calculate the mean of the trait "happiness". It's trivial to write a function that does this, but then we might also get interested in the average "wealth". The function that calculates the average wealth is identical to the function that calculates average happiness, except that happiness is replaced by wealth. We can get rid of this redundancy by defining the typedef Trait as a method of Critter, and writing a function that takes the average of an arbitrary Trait.

// main.hpp
#include <list>
#include <iostream>
#include <cmath> // for sqrt and pow
// the population consists of "Critters"
class Critter {
public:
Critter(int ); // constructor takes "age"
// typical critter traits...
double happiness() const;
double wealth() const;
private:
int age;
};
/* Trait will be a const method of Critter
* that takes no argument, and returns a double.
*/
typedef double (Critter::*Trait)() const;
// higher-order functions take a Trait as an argument
double calcMean(const std::list<Critter> & , Trait );
double calcCorrelation(const std::list<Critter> & , Trait , Trait );
int main();
Let us now look at the source file. The most important things to notice are...
(1) whenever you pass a member "trait" (e.g. wealth) of Critter to a function, you should pass it as "&Critter::trait" (i.e. pass the reference).
(2) when you want to evaluate Trait x of Critter alice, you'll need to de-reference x, and call the resulting function: "(alice.*x)()"

// main.cpp
#include "main.hpp"
// Critters constructor
Critter::Critter(int age) : age(age) { /* empty */ }
// methods for Critter (Traits)
double Critter::happiness() const {
return 10.0/0.9*pow(0.9, age/10.0); // some arbitrary computation
}
double Critter::wealth() const {
return (age-50)*(age-50)*0.01 + age*0.1; // some arbitrary computation
}
// higher-order functions
double calcMean(const std::list<Critter> & population, Trait x) {
double Ex = 0.0;
int n = population.size();
std::list<Critter>::const_iterator it = population.begin();
for ( ; it != population.end(); ++it ) {
Ex += ((*it).*x)(); // x needs to be derefecenced before it can be called
}
Ex /= n; // assumes a non-empty population
return Ex;
}
double calcCorrelation(const std::list<Critter> & population, Trait x, Trait y) {
double Ex = 0.0; double Ey = 0.0;
double Exx = 0.0; double Eyy = 0.0;
double Exy = 0.0;
int n = population.size();
std::list<Critter>::const_iterator it = population.begin();
for ( ; it != population.end(); ++it ) {
Ex += ((*it).*x)(); Exx += ((*it).*x)() * ((*it).*x)();
Ey += ((*it).*y)(); Eyy += ((*it).*y)() * ((*it).*y)();
Exy += ((*it).*x)() * ((*it).*y)();
}
Ex /= n; Ey /= n; Exx /= n; Eyy /= n; Exy /= n;
double sdx = sqrt(Exx - Ex*Ex); double sdy = sqrt(Eyy - Ey*Ey);
double covxy = Exy - Ex*Ey;
double corrxy = covxy/(sdx*sdy);
return corrxy;
}
// main
int main() {
// make a population of Critters
int N = 100;
std::list<Critter> critters;
for ( int n = 0; n < N; ++n ) {
critters.push_back(Critter(n));
}
// calculate statistics
double Eh = calcMean(critters, &Critter::happiness);
double Ew = calcMean(critters, &Critter::wealth);
double corrhw = calcCorrelation(critters, &Critter::happiness, &Critter::wealth);
// and show them to the world
std::cout << "mean wealth = " << Ew << std::endl;
std::cout << "mean happiness = " << Eh << std::endl;
std::cout << "correlation happiness and wealth = " << corrhw << std::endl;
return 0;
}
If you want to play with this example, put the header in a file called "main.hpp", and the source in a file called "main.cpp", and compile main.cpp by typing "g++ main.cpp -o critters" in your terminal (I assume that you are using Linux and have the gcc compiler installed).