Dual numbers
To make forward-mode autodiff work, we replace all floating point numbers with so-called dual numbers \(z = x + y N\) where \(N\) is a nilpotent element, i.e., \(N^2 = 0\). The space of dual number \(\mathbb{D}\) is a real algebra with addition and multiplication defined as follows. Let \(z_1 = x_1 + y_1 N\) and \(z_2 = x_2 + y_2 N\), then \[ z_1 + z_2 = (x_1 + x_2) + (y_1 + y_2) N \] and \[ z_1 \cdot z_2 = x_1 x_2 + (x_1 y_2 + x_2 y_1) N \] Notice now this product encodes the product rule for derivatives. All elements \(z \in \mathbb{D} \) with \(z = x + yN\) and \(x \neq 0 \) have a multiplicative inverse \(z^{-1} = x^{-1} - yx^{-2} N \). In C++ we can define the dual numbers using aclass
and overload the arithmatic operators as follows.
In the header file dual.hpp
we declare a class Dual
with members x
and y
.
In the source file dual.cpp
we define all methods using the rules for dual arithmetic described above.
Function evaluation
The magic of dual numbers comes from the way we can extend differentiable functions on \(\mathbb{R}\) to the dual numbers. For any differentiable function \(f : \mathbb{R} \rightarrow \mathbb{R} \), we can extend \(f\) to a function \(f : \mathbb{D} \rightarrow \mathbb{D} \) by defining \[ f(x + yN) = f(x) + y f'(x)N \] where \(f'\) is the derivative of \(f\). To see why this is a natural choise, we take the Taylor exansion of \(f\) around \(x\) and see that \[ f(x + yN) = f(x) + f'(x) yN + \tfrac12 f''(x) (yN)^2 + \dots \] However, as \(N^2 = 0\) all the terms of order \(y^2\) and higher vanish. This allows us to overload a number of standard mathematical functions such that they work on bothdouble
and Dual
variables.
In the files dualmath.hpp
and dualmath.cpp
we define some functions like \(\sin(x)\) and \(\cos(x)\)
Notice that the functions declared in dualmath.hpp
are declared for double
values in the <cmath>
header,
which is included in the file dualmath.cpp
.
Templated functions
Of course, in the functions defined indualmath.cpp
we had to "manually" calculate the derivative to evaluate the functions on \(\mathbb{D}\) (with the exception of the pow
function).
However, we went to the trouble to define the dual numbers in C++ in order to "automatically" calculate derivatives of functions.
To see how this works, suppose that define a templated C++ function fun<T>(T x)
, then we can either use fun<double>
or
fun<Dual>
. The derivative of fun
can then be evaluated at a real number \(x\) using a dual number \(z = x + yN\) with non-zero \(y\).
In the following example, we "automatically" compute the derivative of \(\tan(x)\).
This file gives the following output
$ g++ ad_example1.cpp dual.cpp dualmath.cpp -o ad_example1 $ ./ad_example1 0 1 1 0.01 1.0001 1.0001 0.02 1.0004 1.0004 0.03 1.0009 1.0009 0.04 1.0016 1.0016 0.05 1.0025 1.0025 0.06 1.00361 1.00361 0.07 1.00492 1.00492 0.08 1.00643 1.00643 ...which we can use to make the following graph
To see why this works, we write \(z = x + 1N\) and mimick the calculation done by the computer: \[ \tan(z) = \frac{\sin(z)}{\cos(z)} = \frac{\sin(x) + \cos(x)N}{\cos(x) - \sin(x)N} \] which is defined in
dualmath.cpp
. Then using the rules for division and multiplication of dual numbers defined in dual.cpp
, we get
\[ \frac{\sin(x) + \cos(x)N}{\cos(x) - \sin(x)N} = (\sin(x) + \cos(x)N)(1/\cos(x) + \sin(x)/\cos^2(x)N) \]
\[ = \frac{\sin(x)}{\cos(x)} + (1 + \sin^2(x)/\cos^2(x))N \]
Which simplifies to
\[ \tan(x) = \tan(x) + 1/\cos^2(x) N \]
Hence, the Dual::y
component of fun(z)
is equal to \(\tan'(x)\).
Create derivatives and gradients with a functor
Modern C++ makes it easy to write functional programs, and we can use this to define a functor that returns the derivative of \(f\) when given a function \(f\). For this, we make use of the<functional>
header. The signature of our functor has the following form
std::function<double(double)> derivative(std::function<Dual(Dual)> f);meaning that the functor
derivative
takes a function \(f : \mathbb{D} \rightarrow \mathbb{D}\) as argument, and
returns a function \(f' : \mathbb{R} \rightarrow \mathbb{R} \).
Things become much more useful in higher dimensions. In the example below, we also show how to automatically compute the gradient of a function \( F : \mathbb{R}^n \rightarrow \mathbb{R} \) and (more efficiently) directional derivatives \(\nabla F(x) \cdot y \). The 1D derivative, gradient and directional derivatives are declared in the following header file The definitions of the functors makes use of C++ lambda expressions that capture the function given as argument A simple example of the functors defined in
autograd.cpp
is given here:
The result of the second example is the following:
$ g++ ad_example2.cpp dual.cpp dualmath.cpp autograd.cpp -o ad_example2 $ ./ad_example2 norm of x = 1.58114 gradient of norm(x) = [0.316228, 0.948683] gradient of norm(x) times u = 0.948683