Dual numbers
To make forward-mode autodiff work, we replace all floating point numbers with so-called dual numbers \(z = x + y N\) where \(N\) is a nilpotent element, i.e., \(N^2 = 0\). The space of dual number \(\mathbb{D}\) is a real algebra with addition and multiplication defined as follows. Let \(z_1 = x_1 + y_1 N\) and \(z_2 = x_2 + y_2 N\), then \[ z_1 + z_2 = (x_1 + x_2) + (y_1 + y_2) N \] and \[ z_1 \cdot z_2 = x_1 x_2 + (x_1 y_2 + x_2 y_1) N \] Notice now this product encodes the product rule for derivatives. All elements \(z \in \mathbb{D} \) with \(z = x + yN\) and \(x \neq 0 \) have a multiplicative inverse \(z^{-1} = x^{-1} - yx^{-2} N \). In C++ we can define the dual numbers using aclass and overload the arithmatic operators as follows.
In the header file dual.hpp we declare a class Dual with members x and y.
In the source file dual.cpp we define all methods using the rules for dual arithmetic described above.
Function evaluation
The magic of dual numbers comes from the way we can extend differentiable functions on \(\mathbb{R}\) to the dual numbers. For any differentiable function \(f : \mathbb{R} \rightarrow \mathbb{R} \), we can extend \(f\) to a function \(f : \mathbb{D} \rightarrow \mathbb{D} \) by defining \[ f(x + yN) = f(x) + y f'(x)N \] where \(f'\) is the derivative of \(f\). To see why this is a natural choise, we take the Taylor exansion of \(f\) around \(x\) and see that \[ f(x + yN) = f(x) + f'(x) yN + \tfrac12 f''(x) (yN)^2 + \dots \] However, as \(N^2 = 0\) all the terms of order \(y^2\) and higher vanish. This allows us to overload a number of standard mathematical functions such that they work on bothdouble and Dual variables.
In the files dualmath.hpp and dualmath.cpp we define some functions like \(\sin(x)\) and \(\cos(x)\)
Notice that the functions declared in dualmath.hpp are declared for double values in the <cmath> header,
which is included in the file dualmath.cpp.
Templated functions
Of course, in the functions defined indualmath.cpp we had to "manually" calculate the derivative to evaluate the functions on \(\mathbb{D}\) (with the exception of the pow function).
However, we went to the trouble to define the dual numbers in C++ in order to "automatically" calculate derivatives of functions.
To see how this works, suppose that define a templated C++ function fun<T>(T x), then we can either use fun<double> or
fun<Dual>. The derivative of fun can then be evaluated at a real number \(x\) using a dual number \(z = x + yN\) with non-zero \(y\).
In the following example, we "automatically" compute the derivative of \(\tan(x)\).
This file gives the following output
$ g++ ad_example1.cpp dual.cpp dualmath.cpp -o ad_example1 $ ./ad_example1 0 1 1 0.01 1.0001 1.0001 0.02 1.0004 1.0004 0.03 1.0009 1.0009 0.04 1.0016 1.0016 0.05 1.0025 1.0025 0.06 1.00361 1.00361 0.07 1.00492 1.00492 0.08 1.00643 1.00643 ...which we can use to make the following graph
To see why this works, we write \(z = x + 1N\) and mimick the calculation done by the computer: \[ \tan(z) = \frac{\sin(z)}{\cos(z)} = \frac{\sin(x) + \cos(x)N}{\cos(x) - \sin(x)N} \] which is defined in
dualmath.cpp. Then using the rules for division and multiplication of dual numbers defined in dual.cpp, we get
\[ \frac{\sin(x) + \cos(x)N}{\cos(x) - \sin(x)N} = (\sin(x) + \cos(x)N)(1/\cos(x) + \sin(x)/\cos^2(x)N) \]
\[ = \frac{\sin(x)}{\cos(x)} + (1 + \sin^2(x)/\cos^2(x))N \]
Which simplifies to
\[ \tan(x) = \tan(x) + 1/\cos^2(x) N \]
Hence, the Dual::y component of fun(z) is equal to \(\tan'(x)\).
Create derivatives and gradients with a functor
Modern C++ makes it easy to write functional programs, and we can use this to define a functor that returns the derivative of \(f\) when given a function \(f\). For this, we make use of the<functional> header. The signature of our functor has the following form
std::function<double(double)> derivative(std::function<Dual(Dual)> f);meaning that the functor
derivative takes a function \(f : \mathbb{D} \rightarrow \mathbb{D}\) as argument, and
returns a function \(f' : \mathbb{R} \rightarrow \mathbb{R} \).
Things become much more useful in higher dimensions. In the example below, we also show how to automatically compute the gradient of a function \( F : \mathbb{R}^n \rightarrow \mathbb{R} \) and (more efficiently) directional derivatives \(\nabla F(x) \cdot y \). The 1D derivative, gradient and directional derivatives are declared in the following header file The definitions of the functors makes use of C++ lambda expressions that capture the function given as argument A simple example of the functors defined in
autograd.cpp is given here:
The result of the second example is the following:
$ g++ ad_example2.cpp dual.cpp dualmath.cpp autograd.cpp -o ad_example2 $ ./ad_example2 norm of x = 1.58114 gradient of norm(x) = [0.316228, 0.948683] gradient of norm(x) times u = 0.948683

No comments:
Post a Comment