Sunday, 3 January 2021

Spin your own autodiff in C++

I recent years, I have made a lot of good use of automatic differentiation (AD, autodiff), for instance using Stan or pytorch. However, it always appeared a bit magical to me until I decided to just implement a simple forward-mode "autograd" method in C++ myself. This is a stripped-down version of my "exercise" with plenty of room for enhancement.

Dual numbers

To make forward-mode autodiff work, we replace all floating point numbers with so-called dual numbers \(z = x + y N\) where \(N\) is a nilpotent element, i.e., \(N^2 = 0\). The space of dual number \(\mathbb{D}\) is a real algebra with addition and multiplication defined as follows. Let \(z_1 = x_1 + y_1 N\) and \(z_2 = x_2 + y_2 N\), then \[ z_1 + z_2 = (x_1 + x_2) + (y_1 + y_2) N \] and \[ z_1 \cdot z_2 = x_1 x_2 + (x_1 y_2 + x_2 y_1) N \] Notice now this product encodes the product rule for derivatives. All elements \(z \in \mathbb{D} \) with \(z = x + yN\) and \(x \neq 0 \) have a multiplicative inverse \(z^{-1} = x^{-1} - yx^{-2} N \). In C++ we can define the dual numbers using a class and overload the arithmatic operators as follows. In the header file dual.hpp we declare a class Dual with members x and y. In the source file dual.cpp we define all methods using the rules for dual arithmetic described above.

Function evaluation

The magic of dual numbers comes from the way we can extend differentiable functions on \(\mathbb{R}\) to the dual numbers. For any differentiable function \(f : \mathbb{R} \rightarrow \mathbb{R} \), we can extend \(f\) to a function \(f : \mathbb{D} \rightarrow \mathbb{D} \) by defining \[ f(x + yN) = f(x) + y f'(x)N \] where \(f'\) is the derivative of \(f\). To see why this is a natural choise, we take the Taylor exansion of \(f\) around \(x\) and see that \[ f(x + yN) = f(x) + f'(x) yN + \tfrac12 f''(x) (yN)^2 + \dots \] However, as \(N^2 = 0\) all the terms of order \(y^2\) and higher vanish. This allows us to overload a number of standard mathematical functions such that they work on both double and Dual variables. In the files dualmath.hpp and dualmath.cpp we define some functions like \(\sin(x)\) and \(\cos(x)\) Notice that the functions declared in dualmath.hpp are declared for double values in the <cmath> header, which is included in the file dualmath.cpp.

Templated functions

Of course, in the functions defined in dualmath.cpp we had to "manually" calculate the derivative to evaluate the functions on \(\mathbb{D}\) (with the exception of the pow function). However, we went to the trouble to define the dual numbers in C++ in order to "automatically" calculate derivatives of functions. To see how this works, suppose that define a templated C++ function fun<T>(T x), then we can either use fun<double> or fun<Dual>. The derivative of fun can then be evaluated at a real number \(x\) using a dual number \(z = x + yN\) with non-zero \(y\). In the following example, we "automatically" compute the derivative of \(\tan(x)\). This file gives the following output
$ g++ ad_example1.cpp dual.cpp dualmath.cpp -o ad_example1
$ ./ad_example1
0 1 1
0.01 1.0001 1.0001
0.02 1.0004 1.0004
0.03 1.0009 1.0009
0.04 1.0016 1.0016
0.05 1.0025 1.0025
0.06 1.00361 1.00361
0.07 1.00492 1.00492
0.08 1.00643 1.00643
...
which we can use to make the following graph

To see why this works, we write \(z = x + 1N\) and mimick the calculation done by the computer: \[ \tan(z) = \frac{\sin(z)}{\cos(z)} = \frac{\sin(x) + \cos(x)N}{\cos(x) - \sin(x)N} \] which is defined in dualmath.cpp. Then using the rules for division and multiplication of dual numbers defined in dual.cpp, we get \[ \frac{\sin(x) + \cos(x)N}{\cos(x) - \sin(x)N} = (\sin(x) + \cos(x)N)(1/\cos(x) + \sin(x)/\cos^2(x)N) \] \[ = \frac{\sin(x)}{\cos(x)} + (1 + \sin^2(x)/\cos^2(x))N \] Which simplifies to \[ \tan(x) = \tan(x) + 1/\cos^2(x) N \] Hence, the Dual::y component of fun(z) is equal to \(\tan'(x)\).

Create derivatives and gradients with a functor

Modern C++ makes it easy to write functional programs, and we can use this to define a functor that returns the derivative of \(f\) when given a function \(f\). For this, we make use of the <functional> header. The signature of our functor has the following form
std::function<double(double)> derivative(std::function<Dual(Dual)> f);
meaning that the functor derivative takes a function \(f : \mathbb{D} \rightarrow \mathbb{D}\) as argument, and returns a function \(f' : \mathbb{R} \rightarrow \mathbb{R} \).
Things become much more useful in higher dimensions. In the example below, we also show how to automatically compute the gradient of a function \( F : \mathbb{R}^n \rightarrow \mathbb{R} \) and (more efficiently) directional derivatives \(\nabla F(x) \cdot y \). The 1D derivative, gradient and directional derivatives are declared in the following header file The definitions of the functors makes use of C++ lambda expressions that capture the function given as argument A simple example of the functors defined in autograd.cpp is given here: The result of the second example is the following:
$ g++ ad_example2.cpp dual.cpp dualmath.cpp autograd.cpp -o ad_example2
$ ./ad_example2
norm of x = 1.58114
gradient of norm(x) = [0.316228, 0.948683]
gradient of norm(x) times u = 0.948683

More advanced methods

In addition to gradients, one can compute Jacobains of vector fields. Also, instead of dual numbers one can define so-called "hyper-dual" numbers to compute second-order derivatives and Hessians.