tag:blogger.com,1999:blog-83854277794306465682024-03-19T07:28:48.324-07:00tbz533Christiaan van Dorphttp://www.blogger.com/profile/09480120443706387617noreply@blogger.comBlogger23125tag:blogger.com,1999:blog-8385427779430646568.post-23980527788800340082023-12-09T08:50:00.000-08:002024-01-15T08:50:00.542-08:00Contributing code to StanThis post is about me trying to contribute some code to the <a href="https://mc-stan.org/">Stan</a> codebase.
My colleagues and I often use
the <a href="https://en.wikipedia.org/wiki/Dirichlet-multinomial_distribution">Dirichlet-Multinomial distribution</a>
(e.g. in this <a href="https://doi.org/10.1101/2023.10.16.562650">preprint</a>),
which is not a native distribution in the Stan language (v2.33, December 9, 2023).
It is not hard to define this as a user-defined distribution, but it still would be convenient to have this as a native function;
it helps with avoiding bugs, repetitive code or having to <code>#include</code> files.
This is going to be my first pull request, so mistakes <i>will</i> be made.
<br />
I based all this on tutorials from the <a href="http://mc-stan.org/math/index.html">Stan Math website</a> (specifically the sections on "adding new functions" and "adding new distributions"), and the <a href="https://mc-stan.org/stanc3/stanc/exposing_new_functions.html">stanc webpage</a>. The first link explains how to add functions to the Stan Math library, and the second explains how to make such functions available in the Stan language.
<br /><br />
<i>
Some updates about the submission process: <br /><br />
<b>2023-12-09:</b> After trying three times, I managed to pass all automated tests for the Stan Math library (see Steps 2 and 7 below). Now I'm waiting for code reviews from a Stan developer.
<br /><br />
<b>2023-12-23:</b> Last week a Stan developer reviewed my code, made improvements and requested some changes. I discuss the improvements below (Step 7), as they are really instructive. My PR was then approved and merged with the "develop" branch. I then submitted PRs for the documentation and the stan-to-c++ compiler. This went through without any issues. This means that the Dirichlet-multinomial distribution should be available in Stan version 2.34.
<br /><br />
<b>2024-01-15:</b> I posted this blog post on the <a href="https://discourse.mc-stan.org/t/contributing-code-to-stan/33746">Stan furum</a> and received some useful feed-back. I've added some notes below.
</i>
<br /><br />
All code and documentation published in this blog post is licensed under the following licenses:
<ul>
<li>Code: BSD 3-clause (<a href="https://opensource.org/licenses/BSD-3-Clause">https://opensource.org/licenses/BSD-3-Clause</a>)</li>
<li>Documentation: CC-BY 4.0 (<a href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</a>)</li>
</ul>
<br /><br />
<h2>Step 1: Coding the distribution as a user-defined function in Stan</h2>
A good place to start is to write a working Stan model that we can use for testing and benchmarking. The Dirichlet-Multinomial distribution is a Multinomial distribution in which the probability vector (simplex) is sampled from a Dirichlet distribution. We can actually use this fact to generate random samples in Stan. Lets start there, because it is easy.
<br /><br />
<script src="https://gist.github.com/chvandorp/8c03f3abb4bcedc4597e4d97c7fe0622.js"></script>
To define a custom RNG in Stan, you write a function in the <code>functions</code> block with name <code>distribution_name_rng</code> where <code>distribution_name</code> is the name of your distribution. In our case, that's <code>dirichlet_multinomial</code>. The RNG function typically uses native Stan RNGs to generate random samples, which are then transformed, or rejected until they satisfy some condition, or whatever you need. In this case, we first sample a probability vector from the Dirichlet distribution using <code>dirichlet_rng(alpha)</code>. Then, we use this probability vector as input for the Multinomial distribution to get a integer array with <code>multinomial_rng(dirichlet_rng(alpha), N)</code>. Here <code>N</code> is the "population size" parameter of the multinomial distribution (also referred to as the "number of trials").
Using some <a href="https://gist.github.com/chvandorp/d02082731fc24cec78ea3107ecc14c03">python code</a>,
we can use the Stan model to sample random Dirichelet Multinomial deviates \(X\) and plot \(X / N\) on a \(K\)-simplex. Here we generate \(40\,000\) samples using \(N = 1000\) and \(K = 3\).
<br /><br />
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi4XFHiklC70Mhyphenhyphen_zzyWQVsmUfHcweD3F1rigV9KM-tGEg_ckhk_zTTE0IvSobBdMYDltZ9H1t6IbyGwMaiNbTb6WadcTSh7gFEnLOWe9YuOsHXkDe8YYKSwSM53rr1mEuK2I7dVq7nP02pjIznTLMISfnmvVSgVc_wLsisGSjrvwNzRrFQ-zDOPIrEEw_L/s2070/dirmult_simplices.png" style="display: block; padding: 1em 0px; text-align: center;"><img alt="" border="0" data-original-height="2060" data-original-width="2070" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi4XFHiklC70Mhyphenhyphen_zzyWQVsmUfHcweD3F1rigV9KM-tGEg_ckhk_zTTE0IvSobBdMYDltZ9H1t6IbyGwMaiNbTb6WadcTSh7gFEnLOWe9YuOsHXkDe8YYKSwSM53rr1mEuK2I7dVq7nP02pjIznTLMISfnmvVSgVc_wLsisGSjrvwNzRrFQ-zDOPIrEEw_L/s400/dirmult_simplices.png" width="400" /></a></div>
<br /><br />
Next, we define a probability mass function (PMF), which, like everything has to be defined on the log scale in Stan. The PMF of the Dirichlet Multinomial distribution is
\[ p(x | \alpha) = \frac{N B(\alpha_0, N)}{\prod_{k : x_k > 0} x_k B(\alpha_k, x_k)}\]
where \(\alpha_0 = \sum_{k=1}^K \alpha_k \), \( N = \sum_{k=1}^K x_k \) and \(K\) is the number of categories.
The function \(B\) is the Beta function.
We can code this in Stan almost verbatim using the <code>lbeta</code> function, which computes \(\log \circ B\), and replacing products with sums.
<br /><br />
<script src="https://gist.github.com/chvandorp/56c03a426e99e5a840bcc3cdb498d4dd.js"></script>
This implementation contains a small bug, as it does not work for \(x_1 = x_2 = \cdots = x_K = 0\),
but we'll fix that later.
<br />
To do a simple test, I colored randomly sampled points (using \(\alpha = (2.4, 4.0, 6.0)'\) and \(N = 1000\)) scaled to fit in a 3-simplex in three ways. First I used the
<code>scipy.stats</code> implementation of the Dirichlet-Multinomial PMF, then I used the <code>dirichlet_multinomial_lpmf</code>
function defined in the <code>functions</code> block (and exponentiated the values).
Finally, I used a Gaussian KDE (from <code>scipy.stats</code>) to compute an
emperical density of points in the sample (this is how I colored points in the simplex plot above). The tree methods of coloring
give very similar results.
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiGLlcsmQH7eHDMZ2QhYf1d6_HJ_L8Mfl3qNOqmKOCE22FwgEAg_2mHj6__VG7pz6_K62P2V0cXljFuogFG3TRdg2niRr8EDhZ3UZYfcCFLboZM8sXawzlMwt6FhkxJdehsEVQN3ubGul9zrZQMMT5sHWz762-cbLBgOyO6eVKXahrSOxjeZ1UdPoFyQqE0/s2152/compare_pmfs.png" style="display: block; padding: 1em 0px; text-align: center;"><img alt="" border="0" data-original-height="871" data-original-width="2152" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiGLlcsmQH7eHDMZ2QhYf1d6_HJ_L8Mfl3qNOqmKOCE22FwgEAg_2mHj6__VG7pz6_K62P2V0cXljFuogFG3TRdg2niRr8EDhZ3UZYfcCFLboZM8sXawzlMwt6FhkxJdehsEVQN3ubGul9zrZQMMT5sHWz762-cbLBgOyO6eVKXahrSOxjeZ1UdPoFyQqE0/s600/compare_pmfs.png" width="400" /></a></div>
<h2>Step 2: Adding the distribution to the Stan math library</h2>
Stan is built on top of the Stan math library. All native functions are defined here. We first have to clone the Stan math library from github with
<code>git clone git@github.com:stan-dev/math.git</code>. Documentation for the Stan math library can be found <a href="https://mc-stan.org/math/">here</a>.
It shows you how to write programs that use the math library, giving you access to automatic differentiation, Stan functions, etc.
The distributions can be found in the folder <code>math/stan/math/prim/prob</code>. Here we want to add the files <code>dirichlet_multinomial_rng.hpp</code>
for the random number generator, <code>dirichlet_multinomial_lpmf.hpp</code> for the probability mass function and <code>dirichlet_multinomial_log.hpp</code>
which is only there for legacy reasons. In the same folder, we also find the Dirichlet distribution, the Multinomial distribution and the Beta-Binomial distribution.
If we somehow combine these three, we should get our Dirichlet-Multinomial distribution, so these examples are very useful.
Let's start with the RNG, which is easiest.
<br /><br />
<script src="https://gist.github.com/chvandorp/4fa927663638fde86059869d756fc22f.js"></script>
We'll talk a bit about the docstring later, so let's ignore that for now.
The actual logic of the code is really simple: we make use of the Dirichlet and Multinomial RNGs.
Therefore, we have to <code>#include</code> their header files.
The difficulty is understanding some C++ tricks, error checking, and working with the Eigen library.
The first observation is that Stan math functions are defined in the namespace <code>stan::math</code>.
Next, we see that the parameter <code>alpha</code> has a somewhat complicated type
<pre><code>
const Eigen::Matrix<double, Eigen::Dynamic, 1>& alpha
</code>
</pre>
This means that <code>alpha</code> is a \(K \times 1\) matrix (a vector) with <code>double</code> entries.
\(K\) is not determined at compile time. The variable <code>function</code> is used for generating error messages and
contains the name of the C++ function. The <code>alpha_ref</code> variable is the result of the <code>to_ref</code>
function call. What this does is "evaluating" Eigen expressions. Eigen expressions can be "lazy" in the sense
that they are not immediately evaluated. In this case, <code>alpha</code> could be the result of some
complicated vector computation. Here, we need the actual values of alpha. If <code>alpha</code> is already
evaluated, then <code>to_ref</code> does nothing.
<br />
Next, we validate the input. The variable <code>alpha</code> has to be positive, and not infinite.
There is also a <code>check_positive</code> test, that does allow positive infinity. I guess \(+\infty\) could make
sense if only one of the \(\alpha_k\) is infinite. But if multiple \(\alpha_k\) are infinite, the distribution is no
longer well defined. So it's better to avoid infinities.
The integer <code>N</code> has to be non-negative, and all integers are finite.
<br />
As mentioned, the logic is identical to the user-defined Stan function. There is one exception: the case where
\(N = 0\). If we look at the RNG for the Binomial distribution (v4.7.0 (September 5, 2023)), we find that \(N\) has to be positive:
<pre><code>
check_positive(function, "number of trials variables", N);
</code>
</pre>
and using this RNG with \(N=0\) indeed results in an exception:
<pre><code>
Exception: multinomial_rng: number of trials variables is 0,
but must be positive!
</code>
</pre>
On the other hand, the <code>multinomial_lpmf</code> function works fine when \(N = \sum_{k=1}^K n_k = 0\), and return \(0 = \log(1)\).
So I don't think the <code>check_positive</code> check is correct. I might file an issue on GitHub in the future.
For now, we have to make sure that the case \(N = 0\) is handled separately by our RNG: we just return the zero vector.
<br /><br />
Next, we are going to write the code for the probability mass function. The distribution is discrete, so we have to define a function than ends with <code>_lpmf</code>
instead of <code>_lpdf</code>.
If you read the example for "adding new distributions" on the <a href="http://mc-stan.org/math/index.html">Stan Math page</a>,
you'll find that a lot of effort is put in computing the gradients of the function, and also making sure the function is
properly vectorized. The binomial example that I based my implementation on did not do any of these things. It relies on
automatic differentiation and is currently not vectorized. In fact none of the multivariate discrete distributions are currently
vectorized. Therefore I did not try to implement vectorization for the Dirichlet-Multinomial either.
In the non vectorized case, it also does not make much sense to manually compute derivatives. The log-PMF is just a sum of log-Beta functions.
I'll leave vectorization and gradients for a future PR.
<br /><br />
<i>
Alas, as it turns out, the Stan team did not let me get away with this so easily. See section "Revisions" in "Step 7" below.
Thankfully, I got a lot of help. Also, the version of the code with analytic derivatives is somewhat faster than my lazy version shown here.
</i>
<br /><br />
<script src="https://gist.github.com/chvandorp/478d8c9c52eb3593ad8f6dd49bb8ad95.js"></script>
Again, the logic of the actual computations are pretty simple and very similar to our user-defined Stan function.
We also perform similar domain checks as in the RNG function. In addition, we check that <code>ns</code> and <code>alpha</code>
have compatible sizes. Then we define <code>lp</code> the log-probability. The type of <code>lp</code> has to be the same as the
base type of <code>alpha</code>, and this can be achieved with <code>return_type_t</code>. If you had multiple
parameters where some have base type <code>double</code> and others Stan's autodiff type <code>var</code>, then passing all parameter types to
<code>return_type_t</code> would result in the appropriate return type <code>var</code>.
The template parameter <code>T_prior_size</code> is the type of the "prior size" (or intensity) parameter <code>alpha</code>.
There are some restrictions on what <code>alpha</code> is allowed to be, and the template parameter
<code>require_eigen_col_vector_t<T_prior_size>* = nullptr</code> ensures that <code>alpha</code> has the correct Eigen type.
I would like to understand this better, but that will have to wait.
<br />
Next let's look at the <code>propto</code> template parameter. This is used for dropping constant terms in the computation
of the log-PMF. In a Stan sampling statement, we don't have to compute any constant terms of the log-probability or
log-density, as constant terms do not matter in MCMC. As \(n\) is an integer vector and therefore always constant "data",
we don't have to compute <code>log(sum_ns)</code> or <code>log(ns[i])</code>. Not computing these logs can same time and energy.
In some cases, we do want to know the absolute log-PMG. For instance in the <code>generated quantities</code> block when we
want to use log-probabilities for model comparison (WAIC, LOO_IC, etc.).
It is also possible that <code>alpha</code> is data (meaning that the base type is <code>double</code>. In that case,
there is nothing to compute if we're only interested in the log PMF up-to constants. Hence, we return <code>0.0</code>.
To test if "summands" depending on the "prior size" have to be included in the log PMF, if <code>propto == true</code>,
we can use <code>include_summand<propto, T_prior_size>::value</code>.
Next, we compute the terms of the log PMF that include <code>alpha</code>, and in the same loop, we compute the sum of the <code>ns</code>.
If <code>sum_ns</code> is zero (\(N = 0\)), then <code>lp</code> is still zero and we can simply return this value.
<br />
Then we compute \(\alpha_0 = \sum_{k=1}^K \alpha_k\) using Eigen's <code>sum</code> method. Note that the type
is <code>value_type_t<T_prior_size></code>, which is the base type of <code>alpha</code>.
The difference between <code>return_type_t</code> and <code>value_type_t</code> is that the former
takes multiple arguments and returns the most "general" base type that fits all of them. The arguments
of <code>return_type_t</code> don't have to be containers.
<br />
If <code>propto</code> is false, we still have to add the terms that only depend on <code>ns</code>.
<br /><br />
To give the user access to the functions we just defined, we have to add <code>#include</code> directives in the file <code>prob.hpp</code> in folder <code>stan/math/prim</code>.
This file contains a list of include directions for all distributions in alphabetical order. So we just add our new files in the right place
<br /><br />
<script src="https://gist.github.com/chvandorp/f6300626f0cd134f433af85334f1d3dc.js"></script>
Let's now write a C++ program that imports the Stan library to test our new function.
Instructions on how to use the Stan math library in your project can be found <a href="https://mc-stan.org/math/index.html">here</a>.
In this example, I will simply sample a random vector from the Dirichlet-Multinomial distribution
with fixed intensity. I then compute the log-probability of that sample.
I put some effort in generating "real" random numbers using <code>std::random_device</code>,
and I wrote a function to print <code>std::vector</code>. It helps to read a bit about the <a href="https://eigen.tuxfamily.org/index.php?title=Main_Page">Eigen library</a>,
which has some methods for formatting matrices and vectors.
<br /><br />
<script src="https://gist.github.com/chvandorp/679e61711699a5e55dca8e95bbf543f1.js"></script>
This program prints something like:
<pre><code>
random sample from dirichlet_multinomial(100, (10.1, 3.5, 7.2)) = (30, 8, 62)
log dirichlet_multinomial((30, 8, 62) | (10.1, 3.5, 7.2)) = -8.60311
</code>
</pre>
To compile a program that makes use of the Stan math library, Stan provides a useful tool called "standalone".
Compilation of the above code is done as follows
<pre><code>
make -f path_to_stan_math/make/standalone hello_dirmult
</code>
</pre>
Building the Stan math library can be done in a similar way
<pre><code>
make -j4 -f path_to_stan_math/make/standalone math-libs
</code>
</pre>
The <code>-j4</code> flag determines the number of CPU cores used for the task (or threads). I have an old core i5. So I'll use 4.
<br /><br />
If we look at the makefile in the Stan math directory, we see that there is a target called <code>doxygen</code>.
This is for building the documentation with <a href="https://www.doxygen.nl/index.html">doxygen</a> using docstrings in the code.
In our case, we have used some LaTeX equations (using delimiters <code>\f$ ... \f$</code> and <code>\f[ ... \f]</code>),
that are rendered by doxygen. Let's check that this is rendered correctly, and that we didn't make any TeX errors.
We can build the documentation with <code>make doxygen</code>. We then find the HTML pages in <code>doc/api/html</code>
and view them in a browser. Here's a screenshot
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgu9r6WMJ7uCN9Kn0wBS1jwHrMdHaYU2aYHY14yiskcy50kfJWJTa43gHiS6fzxkvgPmoLMxPmvvTwtQSuZL5T-eb3CSOTRxsJ0Kmek7FiRyBGWbMAilv2ao-Raztx2xlB95WQFq09MiXoFEgTEhSD9PjSKpGy2WxkRJTHuk9E3xMznt-5YaTZWEa0NvJCz/s846/screenshot_dirichlet_multinomial_lpmf.png" style="display: block; padding: 1em 0px; text-align: center;"><img alt="" border="0" data-original-height="846" data-original-width="589" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgu9r6WMJ7uCN9Kn0wBS1jwHrMdHaYU2aYHY14yiskcy50kfJWJTa43gHiS6fzxkvgPmoLMxPmvvTwtQSuZL5T-eb3CSOTRxsJ0Kmek7FiRyBGWbMAilv2ao-Raztx2xlB95WQFq09MiXoFEgTEhSD9PjSKpGy2WxkRJTHuk9E3xMznt-5YaTZWEa0NvJCz/w446-h640/screenshot_dirichlet_multinomial_lpmf.png" width="500" /></a></div>
And here's one for the RNG documentation
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh_QP0DWpoAavQMxXIhFPxfLCIFNRpscmpmq84iz0TyPuuGcfUgaiZkRWSCSREhJ0bnFEvm4N2v7GVAKfzYMSikcX0BkYEEdbVsZlSNUiprdd2Vk8cTYodGNHrjf7jyrHUZv1L0Q0PpMtwle7NBGR-C8HGwpk0uKS9qNY2JKSaZ3YCBQFStRGCe6xrc67No/s795/screenshot_dirichlet_multinomial_rng.png" style="display: block; padding: 1em 0px; text-align: center;"><img alt="" border="0" data-original-height="795" data-original-width="650" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh_QP0DWpoAavQMxXIhFPxfLCIFNRpscmpmq84iz0TyPuuGcfUgaiZkRWSCSREhJ0bnFEvm4N2v7GVAKfzYMSikcX0BkYEEdbVsZlSNUiprdd2Vk8cTYodGNHrjf7jyrHUZv1L0Q0PpMtwle7NBGR-C8HGwpk0uKS9qNY2JKSaZ3YCBQFStRGCe6xrc67No/w523-h640/screenshot_dirichlet_multinomial_rng.png" width="500" /></a></div>
Building the docs revealed several mistakes (and I left at least one in the screenshots above), so doing this was definitely helpful.
<br /><br />
<h2>Step 3: Writing unit tests</h2>
This step may sound a bit boring, but I found out that it is quite important. First of all the Stan team will not accept your code without any unit tests. Second, testing your code helps. I managed to catch two errors in my (very small) piece of code, all because of writing tests. Stan makes use of the <a href="https://google.github.io/googletest/">GoogleTest</a> framework.
There are lots of examples of tests for distributions in the folder <code>math/test/unit/math/prim/prob/</code>.
I took <code>multinomial_test.cpp</code> as a template, which I modified.
One of the tests checks that a exception is thrown if one of the intensity parameters are infinite.
For the multinomial distribution, the function checks that the parameters form a simplex.
Hence, it also checks of the values are finite, as they are in the interval \((0,1)\).
In our case, the intensities only have to be positive, and so initially I changed <code>check_simplex</code> into <code>check_positive</code>.
However, infinity is also positive and therefore this check does not raise an error if one of the parameters is infinite.
Therefore I changed the check to <code>check_positive_finite</code> (provided in <code>math/stan/math/prim/err.hpp</code>) and this time my test passed.
The other error was produced by a case in which all counts are zero. In our definition of the PMF, we would lead to \(0 / 0\) which gives a NaN.
However, if the number of trials (or population size) is zero, then it makes sense to define the probability of the event \((0,0,\dots,0)'\) as \(1\),
since this is the only possible event. I fixed this by checking for this condition in <code>dirichlet_multinomial_lpmf</code>.
I will not show you all the unit tests I wrote/modified but just to give an idea, this is the first one where I compare the value of the Dirichlet-Multinomial PMF with a known value (calculated with <code>scipy.stats</code> in Python)
<br /><br />
<script src="https://gist.github.com/chvandorp/c4dd1ffcfb5276ba546e7579f7022e82.js"></script>
What's happening here is that <code>TEST</code> is a macro defined in <code>gtest.h</code>. The identifiers <code>ProbDistributionsDirichletMultinomial</code>
and <code>DirichletMultinomial</code> are used to name the test function. The test function's body is defined in curly brackets.
We then create a real vector \(\theta = (2.0, 3.0, 5.0)'\) and an integer vector \(n = (1,2,3)'\) and test if
\[
\log(p(n | \theta)) = -2.477938
\]
where \(p\) is the PMF of the Dirichlet-Multinomial distribution.
<br />
You can run all or a subset of tests with the <code>runTests.py</code> python script. For instance
<pre><code>
python3 runTests.py test/unit/math/prim/prob/dirichlet_multinomial_test
</code>
</pre>
runs only the new tests. Warning: running all tests takes a lot of time.
<br /><br />
<h2>Step 4: Adding the distribution to the Stan documentation</h2>
Before we expose our new distribution to the user (i.e. make it a native function in the Stan programming language), we have to write some documentation for the
Stan <a href="https://mc-stan.org/docs/functions-reference/index.html">functions reference</a>. We can clone the documentation with
<code>git clone git@github.com:stan-dev/docs.git</code>. The documentation is written in R markdown. This is the part that I added to
<code>multivariate_discrete_distributions.Rmd</code> in the folder <code>docs/src/functions-reference</code>
<br /><br />
<script src="https://gist.github.com/chvandorp/01fd59c287f6084668bdbbeac0193518.js"></script>
This is largely based on the multinomial example. Note that all functions have an HTML-commented function signature. The README.md says this about these comments:
<blockquote>
The Stan Functions Reference contains HTML comments which describe the function signature for all functions.
The script <code>extract_function_sigs.py</code> is used to scrape these signatures into a plain text file.
</blockquote>
<i>
This feature exists for Stan-specific syntax highlighting.
Syntax highlighters (see this <a href="https://mc-stan.org/users/interfaces/#stan-language-syntax-aware-editors">link</a> for a list)
have to know which variable names correspond to native functions so that they can be colored distictly.
</i>
To build the documentation, use the python script <code>build.py</code> as follows
<pre><code>
python3 build.py 2 34
</code>
</pre>
The numbers 2 and 34 correspond to the major and minor version numbers. The R markdown file contains strings such as `r since("2.34")`
which correspond to the version number when a function became available. It was not clear to me what version I should use in my new documentation.
At the time of writing, the version was 2.33, and I reasoned that this new distribution would become available in the next version: 2.34.
<br />
The build script should work if you have R studio installed, which I don't (I have rstudio-server, in case you're wondering how I can survive). In such a case, you'll have to install both pandoc and pandoc-citeproc with <code>sudo apt install</code> (on Ubuntu 22.04, that is). Make sure the R packages bookdown, dplyr, ggplot2, kableExtra are installed with <code>install.packages()</code>. However, I also found that I needed to install the Ubuntu packages libfontconfig1-dev, libcurl4-openssl-dev, libxml2-dev with <code>sudo apt install</code>. Here's a screenshot of the new page in the functions reference pdf.
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgCiwG4KN5tUK5kgJyBAPJl-AM4SHr800RzgJeElOgVIUApL-hmh8CQYCTwpVsP2PknJ0hVkfbXO4Xn4BJ0NIcDhMbrqsn9O2XAmBOefuwaPGib6d41ZPZcRh5-tZ90wQeFjdS_j0arc1-4uMUvpKz7306YIv0WEoGXgLfs4IpmfupO8oDXYJurOTbgxex9/s1444/screenshot_funcref.png" style="display: block; padding: 1em 0px; text-align: center;"><img alt="" border="0" data-original-height="955" data-original-width="1444" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgCiwG4KN5tUK5kgJyBAPJl-AM4SHr800RzgJeElOgVIUApL-hmh8CQYCTwpVsP2PknJ0hVkfbXO4Xn4BJ0NIcDhMbrqsn9O2XAmBOefuwaPGib6d41ZPZcRh5-tZ90wQeFjdS_j0arc1-4uMUvpKz7306YIv0WEoGXgLfs4IpmfupO8oDXYJurOTbgxex9/w400-h265/screenshot_funcref.png" width="500" /></a></div>
<br /><br />
<h2>Step 5: adding the distribution to the <code>stanc3</code> compiler</h2>
In terms of lines of code, this step is really simple. The stanc3 compiler is written in the <a href="https://ocaml.org/">OCaml language</a> which might be
a bit unfamiliar (it certainly is for me). Also the OCaml ecosystem has a bit of a steep learning curve.
We start by cloning stanc3 using <code>git clone git@github.com:stan-dev/stanc3.git</code>. The file that contains all native functions and distributions
can then be found in <code>stanc3/src/middle/Stan_math_signatures.ml</code>. We just have to add lines for <code>dirichlet_multinomial_rng</code>
and <code>dirichlet</code>, and we can basically copy the code from the signatures of <code>multinomial_rng</code> and <code>multinomial</code>.
<br /><br />
<script src="https://gist.github.com/chvandorp/7e93be0e3ce0675c481adb81237ad627.js"></script>
Instructions on how to install all dependencies and stanc can be found <a href="https://mc-stan.org/stanc3/stanc/getting_started.html">here</a>.
the stanc repository contains a script that does basically everything for you. However, if you want to install OCaml yourself, check <a href="https://opam.ocaml.org/doc/Install.html">this page</a>. You will need the dune package. I started working on this a while ago, and when I picked up the work recently,
I found that I had to update my OCaml version. This can be done with <code>opam switch create [version number]</code> (with the right version number).
<br /><br />
Naturally, the stanc compiler has its own tests which includes a couple of Stan programs to test function signatures.
The Stan program <code>stanc3/test/integration/good/function_signatures/distributions/rngs.stan</code> contains function calls for all RNGs.
We can add our new RNG to this file by adding the line
<pre><code>
ns = dirichlet_multinomial_rng(alpha, 20);
</code>
</pre>
The variables <code>ns</code> and <code>alpha</code> are already defined or declared.
I then added a folder <code>stanc3/test/integration/good/function-signatures/distributions/multivariate/discrete/dirichlet_multinomial</code> and created a file <code>dirichlet_multinomial_lpmf.stan</code> in this folder.
I basically copied the content of
the test file for the Multinomial distribution (located at <code>stanc3/test/integration/good/function-signatures/distributions/multivariate/discrete/multinomial</code>)
and changed <code>multinomial</code> to <code>dirichlet_multinomial</code>. The fact that the multinomial distribution takes simplices and the Dirichlet Multinomial positive vectors does not seem to matter for these tests: it's just about function signatures. This is the test file that I added:
<br /><br />
<script src="https://gist.github.com/chvandorp/ea0c292c2165131c05246b1e6b853471.js"></script>
Finally, Use <code>dune runtest</code> to check changes in the code, and then run <code>dune promote</code> to accept these changes. Otherwise the integration tests (see Step 7 below) will fail. More instructions on exposing new functions can be found <a href="https://mc-stan.org/stanc3/stanc/exposing_new_functions.html">here</a>
<br /><br />
<h2>Step 6: Compiling a model with <code>cmdstanpy</code></h2>
To test if everything works as expected, I will now write a Stan model that can be used to estimate intensity parameters of the
Dirichlet-Multinomial distribution, given some data that was sampled from this distribution with known parameters.
To make life easy (i.e. no messing around with cmdstan, json files etc.),
I want to do this in Python using the <code>cmdstanpy</code> interface. Let's first write our Stan model.
<br /><br />
<script src="https://gist.github.com/chvandorp/d0367d4a2c7d8a912c0f7a5f56d803e3.js"></script>
This is pretty straightforward. We have \(K\) categories, \(N\) observations \(X\) and for sampling pseudo-observations, we also pass a population size parameter \(M\)
to the model. In the parameters block, we define \(\alpha \in \mathbb{R}_+^K\), which is the intensity, and in the model block we give \(\alpha\) a weakly-informative
log-normal prior. We have to write the "sampling statements" for \(X\) in a for loop, as the Dirichlet-Multinomial distribution does not allow vectorization at this point.
In the generated quantities block, we sample "fake data" \(Y\) and compute the likelihood of the data \(X\), again in a for loop. This is not the most exciting model,
but it's good enough for this test.
<br /><br />
<i>Important note: The "linking" solution presented in the following section works, but there is a better "official" solution, which I explain a little bit later.</i>
<br />
So far, we have only used the Stan Math library in C++ directly. In order to get cmdstan to work with (1) our extended Stan Math library, and (2) the updated stanc compiler, we have to link all these components together. First, we have to point <code>stan</code> to the right math library. If you clone stan from github
(with <code>git clone git@github.com:stan-dev/stan.git</code>) you will find a folder <code>stan/lib/stan_math</code>. This folder is empty and should contain the contents for the Stan Math library. It is possible replace the folder <code>stan_math</code> with a link to the <code>math</code> folder. First <code>cd</code> to the <code>stan/lib</code>
folder and remove the empty <code>stan_math</code> folder. Then create a link with
<pre><code>
ln -s path/to/math stan_math
</code>
</pre>
Then clone <code>cmdstan</code> with <code>git clone git@github.com:stan-dev/cmdstan.git</code>. The cmdstan folder contains an empty <code>stan</code>
folder. Replace this with a link to the modified stan folder.
<br /><br />
This approach of creating links worked for me, but there is an easier way. It is possible to clone cmdstan from github "recursively", i.e. also
clone the content of the <code>stan</code> folder, and the content of the <code>lib/stan_math</code> folder at once.
Download and installation instructions for cmdstan can be found <a href="https://github.com/stan-dev/cmdstan/wiki/Getting-Started-with-CmdStan">here</a>.
As indicated in the "Installing from GitHub repo" section, we can clone the <code>stan</code> and <code>math</code> repositories with <code>make stan-update</code>
from the <code>cmdstan</code> folder. Or, instead of just cloning the cmdstan repository, you can clone recursively with <code>git clone https://github.com/stan-dev/cmdstan.git --recursive</code>. This cloning is quite slow, because you don't only get the latest version, but also the entire history of Stan development. To speed things up, you could do a "shallow" clone with the additional flags <code>--depth 1 --shallow-submodules</code>. We can then build cmdstan with <code>make build</code>. This will also download a pre-built stanc compiler.
This is not quite what we want, though, because we have to link to our own version of the <code>math</code> library, and
our modified <code>stanc</code> compiler.
<br />
To tell cmdstan that it should use our modified <code>stanc</code> version, we have to create a file called <code>local</code> in the folder
<code>cmdstan/make</code>. In fact, there is a file <code>local.example</code> already present in this folder which contains come other (commented-out) options.
We can copy this file with <code>cp local.example local</code> and then open it in a text editor and add the line <code>STANC3=path/to/stanc3</code>.
I have <code>stanc3</code> and <code>cmdstan</code> in the same folder, so for me this line is <code>STANC3=../stanc3</code>
<br /><br />
<i>
It turns out that a similar trick can be used to use a custom <code>math</code> repository, and so we don't have to create symbolic links. In the <code>cmdstan</code> makefile, you'll find these lines:
<pre>
-include make/local # user-defined variables
STAN ?= stan/
MATH ?= $(STAN)lib/stan_math/
</pre>
The <code>?=</code> assignment in makefiles only works when the variable on the left-hand-side hasn't already been defined. So if we define <code>MATH=path/to/math</code> in our <code>make/local</code> file (which is included in the main makefile), then we override the <code>MATH ?= stan/lib/stan_math/</code> assignment. We could do the same for the <code>STAN</code> variable.
</i>
<br /><br />
To compile and run the above Stan model with <code>cmdstanpy</code>, we have to make sure <code>cmdstanpy</code> can find the right cmdstan installation.
This can simpy be done using <code>cmdstanpy.set_cmdstan_path("path/to/cmdstan")</code> where "path/to/cmdstan" is the location of the modified cmdstan installation.
<br /><br />
<script src="https://gist.github.com/chvandorp/9ec98229ffc3ea2c0f8e92d804b56ca4.js"></script>
The python script makes a plot of the ground truth parameter vector <code>alpha</code> (black dots), together with gray violins representing the posterior distribution.
Judging from this plot, everything seems to be working.
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjE3UofTkn20e-DBjmck3egRsTDdEBvYBtKx2Mg73yfk3fQ1uGnyfCzUtqygQE517ENVvX3e6eEaoecDrsl5TnrIPLlAUwBk3x-a3Q5pYHW540laGAcU6WpQUvzZc_I63y-tFa__DGquI5dbxnzGd3CxE-OhcFeyxnzwgvWlVeNms0e6z171Gj42NhNLWUW/s1662/intensity_estimates.png" style="display: block; padding: 1em 0px; text-align: center;"><img alt="" border="0" data-original-height="1298" data-original-width="1662" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjE3UofTkn20e-DBjmck3egRsTDdEBvYBtKx2Mg73yfk3fQ1uGnyfCzUtqygQE517ENVvX3e6eEaoecDrsl5TnrIPLlAUwBk3x-a3Q5pYHW540laGAcU6WpQUvzZc_I63y-tFa__DGquI5dbxnzGd3CxE-OhcFeyxnzwgvWlVeNms0e6z171Gj42NhNLWUW/s320/intensity_estimates.png" width="320" /></a></div>
It also helps to look at the diagnostics (with <code>print(sam.diagnose())</code> in python). They look reassuring:
<pre><code>
Processing csv files: /tmp/tmp_vwn59j0/dirmultqq672ujg/dirmult-20231126104727_1.csv, ...
Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.
Checking sampler transitions for divergences.
No divergent transitions found.
Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.
Effective sample size satisfactory.
Split R-hat values satisfactory all parameters.
Processing complete, no problems detected.
</code>
</pre>
<h2>Step 7: creating a pull request (PR) on GitHub</h2>
As Stan has multiple components, we have to submit multiple PRs. First, we have to submit our code to the Stan Math library.
Then we have to provide new documentation for the functions reference, and then we have to submit our modifications to stanc3.
<br /><br >
<h4> Stan Math PR </h4>
I will start with creating a PR for the math library. All PRs must correspond to an issue. So typically you first "request" a feature and then submit a PR to add it. In this case, there already exists an <a href="https://github.com/stan-dev/math/issues/54">issue</a> with a feature request for the Dirichlet-Multinomial distribution (from 2014), so we'll use that.
<br />
There's a lot of information to digest before you make a PR. A place to start is the "contributing" <a href="https://github.com/stan-dev/math/blob/e43fc08ca3345113ab437954600bc4e2bfd7ee54/.github/CONTRIBUTING.md">page</a>. A much more detailed (and essential) "Developer process overview" can be found <a href="https://github.com/stan-dev/stan/wiki/Developer-process-overview">here</a>, although these instructions are sometimes specific for the <code>stan</code> repository (not <code>math</code>). Then there is also this <a href="https://github.com/stan-dev/stan/wiki/Introduction-to-Stan-for-New-Developers">Introduction to Stan for New Developers</a>.
It is also a good idea to have a look at the <a href="https://github.com/stan-dev/math/blob/develop/.github/PULL_REQUEST_TEMPLATE.md">pull request template</a> (which <i>is</i> <code>math</code> specific).
<br /><br />
On GitHub, fork the Stan math repository. Then clone the fork to your local machine. Make sure you're on the develop branch with <code>git branch</code>, and create a new branch with <code>git checkout -b feature/issue-NN-short-description</code> (this creates and switches to the new branch) where NN is the issue number and "short-description" is... a short description of the feature.
In Steps 1-6, I had been working on just a clone of the math repository. So at this point I copied all new files and applied changes to the header file. The Stan team asks that you only commit well formatted code (more about this later), so be careful with what you commit to the feature branch.
When you're ready, add files with <code>git add path/to/file</code> and commit changes with <code>git commit -a -m "explain what changed"</code>. The <code>-a</code>
option makes sure that all tracked files are added before the commit. To push the new brach to the remote fork, use <code>git push -u origin feature/issue-NN-short-description</code>
<br /><br />
There are some requirements not covered in Steps 1-6 above. To make sure your code passes Stan's automated tests, these tests should pass:
<pre><code>
make test-headers
make test-math-dependencies
runTests.py test/unit
make doxygen
make cpplint
</code>
</pre>
For me, cpplint failed as lines were longer than 80 characters, and some lines ended in whitespace.
In order to run cpplint, I had to install python2 (as Ubuntu 22.04 only has python3) with <code>sudo apt install python2</code>.
On the web, there are some pages explaining how to make python2 the default python version. I think that's
a really bad idea. However, you can tell make to use python2 with
<pre><code>
export PYTHON=python2
make cpplint
</code>
</pre>
The rule make test-math-dependencies did not work for me, but this make target just points to <code>./runChecks.py</code>
which has a hash-bang that did not work for me. So I just ran the script directly with
<pre><code>
python3 runChecks.py
</code>
</pre>
This ran without producing any output. Which means no errors were found.
<br /><br />
My first PR failed very quickly. It turned out that passing cpplint was not sufficient
to ensure proper formatting. To make sure that your code is formatted correctly, you have to use clang-formatter.
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhBNShnj7xC1vuH7nTgdYon0p8j-aoKE72mNtRkcjfIrlLVIabe0aASfuvHzlbcPvg72Eqehqam7wBMR52gP97xU3zEWUwrOf6kYcYIkJ1kkVGrmhA2qnRo508shRBoROR7HIZ6mdKe_C85EGv1X-C8JKwnBnp5q6WtU5eQI7B9VbLuXuaMJJ6yYBb_DXir/s1368/clang-format-fail.png" style="display: block; padding: 1em 0px; text-align: center;"><img alt="" border="0" data-original-height="200" data-original-width="1368" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhBNShnj7xC1vuH7nTgdYon0p8j-aoKE72mNtRkcjfIrlLVIabe0aASfuvHzlbcPvg72Eqehqam7wBMR52gP97xU3zEWUwrOf6kYcYIkJ1kkVGrmhA2qnRo508shRBoROR7HIZ6mdKe_C85EGv1X-C8JKwnBnp5q6WtU5eQI7B9VbLuXuaMJJ6yYBb_DXir/s600/clang-format-fail.png" width="400" /></a></div>
Installing clang-format in Ubuntu requires that you to first add the PPA for LLVM. Instructions are found <a href="https://apt.llvm.org/">here</a>. On ubuntu 22.04, you would need the line "deb http://apt.llvm.org/jammy/ llvm-toolchain-jammy main".
First get the signature with "wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -" (see section "Install (stable branch)". Then add the PPA with <code>sudo add-apt-repository</code> followed by the PPA string. It should also be possible to use the
GUI (software and updates). On the <a href="https://github.com/stan-dev/stan/wiki/Coding-Style-and-Idioms">coding style page</a> instructions are given to install <code>clang-format-5.0</code>. For me,
this did not work, but installing <code>clang-format</code> (without the 5.0) did.
In the <code>hooks</code> folder, a script is provided to install a pre-commit git "hook". This makes sure that clang-format is run on all cpp and hpp files that were changed.
<br /><br />
My second PR failed as well. This time at the "verify changes" stage
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgewReJ5wd3Xgjw1HQTQojVvejjkbG-Vf9gDlIjPSrR9RUwGmt5XDwjT4uDyKTYulpu8c68Vrua7Hz8Hpe5ZLI41sJ6O_iIbeAZ-ytmmuVoSA4QR3vgxDJGBLflg0hTzsfdceQ6QVsxRRXGjH3AzhSR-gf-hbdSSj72xJm5K2VyOsA1G4qkMjk_6e66axWx/s1368/verify-changes-fail.png" style="display: block; padding: 1em 0px; text-align: center;"><img alt="" border="0" data-original-height="250" data-original-width="1368" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgewReJ5wd3Xgjw1HQTQojVvejjkbG-Vf9gDlIjPSrR9RUwGmt5XDwjT4uDyKTYulpu8c68Vrua7Hz8Hpe5ZLI41sJ6O_iIbeAZ-ytmmuVoSA4QR3vgxDJGBLflg0hTzsfdceQ6QVsxRRXGjH3AzhSR-gf-hbdSSj72xJm5K2VyOsA1G4qkMjk_6e66axWx/s600/verify-changes-fail.png" width="400" /></a></div>
I got the following message
<pre><code>
remote: Not Found
fatal: repository 'https://github.com/chvandorp/stan_math/math/' not found
error: pathspec 'forkedOrigin/feature/issue-54-dirichlet-multinomial'
did not match any file(s) known to git
script returned exit code 1
</code>
</pre>
Which makes some sense because my fork is called <code>stan_math</code> and not <code>stan_math/math</code>. I renamed the repository <code>math</code> to <code>stan_math</code> to have a somewhat more descriptive name.
The solution to this was simple: just change the name of the fork on GitHub.
A name change is automatically updated the PR. All you have to do is change the remote URL in your local <code>.git/config</code> file.
In the meantime, the Stan team has fixed this "bug" in their testing pipeline: it should now be possible to choose a different name for your fork.
<br /> <br />
After attempt 2, I also received a request to add an additional unit test. This test had to be added to <code>test/unit/math/mix/prob/</code> using the <code>expect_ad</code> function. I was able to use the Dirichlet distribution as an example. <code>expect_ad</code> compares the AD with a value based on finite difference method. These tests also pass if an exception is thrown. It then just checks that all AD types also lead to the same exception. With these changes I tried submitting the PR again.
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEilNV2TtgL_qzQFMoQMxcAIWRfZD1t4AxOZuPzpTMHuGEfNGRNb1JkJDlNFFYiSXozXJ35LdY291I4ilwZ2mUbPGARc74I397SkhoUKm6HgGljQ1fQp_iBdpdzrOTEiwUsuF_pKIwSmE_87ZsW-eHjXpB90QSCoHWB1iS5NCrXFWLbVfJK5PAmTkupRJ3pJ/s1368/Screenshot%202023-12-06%20at%2010-16-53%20jenkins%20_%20Stan_Math%20_%20PR-2979%20_%20%233.png" style="display: block; padding: 1em 0px; text-align: center;"><img alt="" border="0" data-original-height="320" data-original-width="1368" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEilNV2TtgL_qzQFMoQMxcAIWRfZD1t4AxOZuPzpTMHuGEfNGRNb1JkJDlNFFYiSXozXJ35LdY291I4ilwZ2mUbPGARc74I397SkhoUKm6HgGljQ1fQp_iBdpdzrOTEiwUsuF_pKIwSmE_87ZsW-eHjXpB90QSCoHWB1iS5NCrXFWLbVfJK5PAmTkupRJ3pJ/s400/Screenshot%202023-12-06%20at%2010-16-53%20jenkins%20_%20Stan_Math%20_%20PR-2979%20_%20%233.png" width="400" /></a></div>
This time all checks passed.
<br /><br />
<h4>Revisions</h4>
My <code>dirichlet_multinomial_lpmf</code> function is written in an ugly imperative style, while the Eigen and Stan math libraries provide methods to vectorize a lot of these operations. A Stan developer re-wrote my function in a much nicer, more functional style, and also added analytic derivatives.
<br /><br />
<script src="https://gist.github.com/chvandorp/4570b1f517071d5315f9fd7f583baaa2.js"></script>
I definitely do not understand all details of this revised function, but let's go through it anyway.
The first line that's different defines a variable <code>alpha_val</code> as follows
<pre><code>
const auto& alpha_val = as_value_array_or_scalar(alpha_ref);
</code></pre>
As I understand it, the point of this is that if we want to compute gradients ourselves, then we want to temporarily "turn off" automatic differentiation. Hence, we have to use the "values" of Stan autodiff types instead of the full types in our computations.
This is what <code>as_value_array_or_scalar</code> does.
Next, we see that Stan math provides an overloaded function <code>sum</code> that takes sums of a lot of things. So there's no need for for loops or <code>std::accumulate</code>.
The integer valued <code>ns</code> are then cast to <code>double</code> values using Eigen's <code>cast<double></code> method (I'm not sure if the template disambiguater is required here).
<br /> <br />
Next, we are going to compute the log-probability <code>lp</code>.
First, we add \(\log(N) - \sum_{k : x_k > 0} \log(x_k)\) to <code>lp</code>, but only if we are interested in constants.
Notice that we can do elementwise comparison <code>ns_array > 0</code> on Eigen arrays, and use the <code>select</code> method to compute an elementwise ternary operation.
One important thing to realize here is that the computations are "lazy". This means although the expression <code>log(ns_array)</code> occurs in the C++ code, the log of the
elements of <code>ns_array</code> that are zero are never evaluated.
Similarly, we add \(\log(B(\alpha_0, N)) - \sum_{k : x_k > 0} \log(B(\alpha_k, x_k))\) to <code>lp</code>.
<br /> <br />
Now we will compute the gradient. For this, we use a object of the class <code>partials_propagator</code>, using the function <code>make_partials_propagator</code>.
This function takes a variable number of arguments, one for each of the parameters or random variable that might require a gradient. In this case, this only holds for
<code>alpha</code>, because <code>ns</code> is always constant.
To put the "manually calculated" gradient in the right "slot", we use the <code>partials<0></code> function, where the template argument <code>0</code> is used because we only had a single non-constant parameter <code>alpha</code>. The gradient of the distribution is given by
\[
\frac{\partial\log(p(x | \alpha))}{\partial\alpha_i} =
\frac{\partial\log(B(\alpha_0, N))}{\partial \alpha_i} - \sum_{k : x_k > 0} \frac{\partial \log(B(\alpha_k, x_k))}{\partial \alpha_k}
\]
Then we write \( \log(B(a, b)) = \log(\Gamma(a)) + \log(\Gamma(b)) - \log(\Gamma(a+b)) \) and consider two cases, \(x_i = 0\) (case I) and \(x_i > 0\) (case II).
In case I, we get
\[
\frac{\partial\log(p(x | \alpha))}{\partial\alpha_i} = \psi(\alpha_0) - \psi(\alpha_0 + N)
\]
where \(\psi(a) = \Gamma'(a) / \Gamma(a) \) denotes the digamma function.
In case II, we get
\[
\frac{\partial\log(p(x | \alpha))}{\partial\alpha_i} = \psi(\alpha_0) - \psi(\alpha_0 + N) - \psi(\alpha_i) + \psi(\alpha_i + x_i)
\]
Notice that we can use the latter expression for both cases, because \(\psi(\alpha_i) - \psi(\alpha_i + x_i) = 0\) when \(x_i = 0\). However,
evaluating the digamma function is not free. Therefore we use the <code>select</code> method.
The <code>build</code> method of the <code>partials_propagator</code> class then combines gradients and values in the right way and returns the appropriate value.
<br /><br />
<h4> The other PRs </h4>
The remaining PRs (documentation and stanc) are quite uneventful.
I did not create separate issues for the doc and stanc PRs, but referred to the accepted PR on stan-dev/math. For user-facing changes to stanc3, you must provide a PR which documents these changes. So first submit the PR for the doc, and then create a PR for stanc3.
Christiaan van Dorphttp://www.blogger.com/profile/09480120443706387617noreply@blogger.com0tag:blogger.com,1999:blog-8385427779430646568.post-54165362918206415902021-01-03T15:28:00.007-08:002021-01-03T15:28:57.312-08:00Spin your own autodiff in C++I recent years, I have made a lot of good use of automatic differentiation (AD, autodiff), for instance using <a href="https://mc-stan.org/">Stan</a>
or <a href="https://pytorch.org/">pytorch</a>.
However, it always appeared a bit magical to me until I decided to just implement a simple forward-mode "autograd" method in C++ myself.
This is a stripped-down version of my "exercise" with plenty of room for enhancement.
<h2>Dual numbers</h2>
To make forward-mode autodiff work, we replace all floating point numbers with so-called dual numbers \(z = x + y N\) where \(N\) is a nilpotent element, i.e., \(N^2 = 0\).
The space of dual number \(\mathbb{D}\) is a real algebra with addition and multiplication defined as follows. Let \(z_1 = x_1 + y_1 N\) and \(z_2 = x_2 + y_2 N\), then
\[ z_1 + z_2 = (x_1 + x_2) + (y_1 + y_2) N \]
and
\[ z_1 \cdot z_2 = x_1 x_2 + (x_1 y_2 + x_2 y_1) N \]
Notice now this product encodes the product rule for derivatives.
All elements \(z \in \mathbb{D} \) with \(z = x + yN\) and \(x \neq 0 \) have a multiplicative inverse \(z^{-1} = x^{-1} - yx^{-2} N \).
In C++ we can define the dual numbers using a <code>class</code> and overload the arithmatic operators as follows.
In the header file <code>dual.hpp</code> we declare a class <code>Dual</code> with members <code>x</code> and <code>y</code>.
<script src="https://gist.github.com/chvandorp/2dfbc68a66e3870ee544726d19e26587.js"></script>
In the source file <code>dual.cpp</code> we define all methods using the rules for dual arithmetic described above.
<script src="https://gist.github.com/chvandorp/3bdf22d225fa088d62bf13eb90be27d3.js"></script>
<h2>Function evaluation</h2>
The magic of dual numbers comes from the way we can extend differentiable functions on \(\mathbb{R}\) to the dual numbers. For any differentiable function \(f : \mathbb{R} \rightarrow \mathbb{R} \), we can extend \(f\) to a function \(f : \mathbb{D} \rightarrow \mathbb{D} \) by defining
\[ f(x + yN) = f(x) + y f'(x)N \]
where \(f'\) is the derivative of \(f\). To see why this is a natural choise, we take the Taylor exansion of \(f\) around \(x\) and see that
\[ f(x + yN) = f(x) + f'(x) yN + \tfrac12 f''(x) (yN)^2 + \dots \]
However, as \(N^2 = 0\) all the terms of order \(y^2\) and higher vanish.
This allows us to overload a number of standard mathematical functions such that they work on both <code>double</code> and <code>Dual</code> variables.
In the files <code>dualmath.hpp</code> and <code>dualmath.cpp</code> we define some functions like \(\sin(x)\) and \(\cos(x)\)
<script src="https://gist.github.com/chvandorp/9296bf5e61467cd994fb7e8b43a53ddf.js"></script>
Notice that the functions declared in <code>dualmath.hpp</code> are declared for <code>double</code> values in the <code><cmath></code> header,
which is included in the file <code>dualmath.cpp</code>.
<script src="https://gist.github.com/chvandorp/5764041bf6a0dcf55f1664891d4768d4.js"></script>
<h2>Templated functions</h2>
Of course, in the functions defined in <code>dualmath.cpp</code> we had to "manually" calculate the derivative to evaluate the functions on \(\mathbb{D}\) (with the exception of the <code>pow</code> function).
However, we went to the trouble to define the dual numbers in C++ in order to "automatically" calculate derivatives of functions.
To see how this works, suppose that define a templated C++ function <code>fun<T>(T x)</code>, then we can either use <code>fun<double></code> or
<code>fun<Dual></code>. The derivative of <code>fun</code> can then be evaluated at a real number \(x\) using a dual number \(z = x + yN\) with non-zero \(y\).
In the following example, we "automatically" compute the derivative of \(\tan(x)\).
<script src="https://gist.github.com/chvandorp/c80432b13ad72b21a27007f62662c647.js"></script>
This file gives the following output
<pre>
$ g++ ad_example1.cpp dual.cpp dualmath.cpp -o ad_example1
$ ./ad_example1
0 1 1
0.01 1.0001 1.0001
0.02 1.0004 1.0004
0.03 1.0009 1.0009
0.04 1.0016 1.0016
0.05 1.0025 1.0025
0.06 1.00361 1.00361
0.07 1.00492 1.00492
0.08 1.00643 1.00643
...
</pre>
which we can use to make the following graph <br/>
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqP-c2hpproNRClz3iN1nXQ9gvxtmCLvEVTU2weAqJGHVAqCOm_mikDGRcrTAXF947b-gGdiWOWsRA5rNRFyZTbRaOVpbf0_M40uQrbcY19lxkSKPOwIEz_Hpi403yH6M7_ETSrXwReDoc/s1336/ad_example1.png" style="display: block; padding: 1em 0; text-align: center; "><img alt="" border="0" width="400" data-original-height="1112" data-original-width="1336" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqP-c2hpproNRClz3iN1nXQ9gvxtmCLvEVTU2weAqJGHVAqCOm_mikDGRcrTAXF947b-gGdiWOWsRA5rNRFyZTbRaOVpbf0_M40uQrbcY19lxkSKPOwIEz_Hpi403yH6M7_ETSrXwReDoc/s400/ad_example1.png"/></a></div>
<br/>
To see why this works, we write \(z = x + 1N\) and mimick the calculation done by the computer:
\[ \tan(z) = \frac{\sin(z)}{\cos(z)} = \frac{\sin(x) + \cos(x)N}{\cos(x) - \sin(x)N} \]
which is defined in <code>dualmath.cpp</code>. Then using the rules for division and multiplication of dual numbers defined in <code>dual.cpp</code>, we get
\[ \frac{\sin(x) + \cos(x)N}{\cos(x) - \sin(x)N} = (\sin(x) + \cos(x)N)(1/\cos(x) + \sin(x)/\cos^2(x)N) \]
\[ = \frac{\sin(x)}{\cos(x)} + (1 + \sin^2(x)/\cos^2(x))N \]
Which simplifies to
\[ \tan(x) = \tan(x) + 1/\cos^2(x) N \]
Hence, the <code>Dual::y</code> component of <code>fun(z)</code> is equal to \(\tan'(x)\).
<h2>Create derivatives and gradients with a functor</h2>
Modern C++ makes it easy to write functional programs, and we can use this to define a functor that returns the
derivative of \(f\) when given a function \(f\).
For this, we make use of the <code><functional></code> header. The signature of our functor has the following form
<pre>
std::function<double(double)> derivative(std::function<Dual(Dual)> f);
</pre>
meaning that the functor <code>derivative</code> takes a function \(f : \mathbb{D} \rightarrow \mathbb{D}\) as argument, and
returns a function \(f' : \mathbb{R} \rightarrow \mathbb{R} \).
<br/>
Things become much more useful in higher dimensions. In the example below, we also show how to automatically compute the gradient of a function
\( F : \mathbb{R}^n \rightarrow \mathbb{R} \) and (more efficiently) directional derivatives \(\nabla F(x) \cdot y \).
The 1D derivative, gradient and directional derivatives are declared in the following header file
<script src="https://gist.github.com/chvandorp/5e634b75980d9476bad1638d22da028f.js"></script>
The definitions of the functors makes use of C++ lambda expressions that capture the function given as argument
<script src="https://gist.github.com/chvandorp/d567ab207afc528ae92eb591d27946d5.js"></script>
A simple example of the functors defined in <code>autograd.cpp</code> is given here:
<script src="https://gist.github.com/chvandorp/c41f249f496934761e64492d22531ddd.js"></script>
The result of the second example is the following:
<pre>
$ g++ ad_example2.cpp dual.cpp dualmath.cpp autograd.cpp -o ad_example2
$ ./ad_example2
norm of x = 1.58114
gradient of norm(x) = [0.316228, 0.948683]
gradient of norm(x) times u = 0.948683
</pre>
<h2>More advanced methods</h2>
In addition to gradients, one can compute Jacobains of vector fields. Also, instead of dual numbers one can define so-called "hyper-dual" numbers to compute second-order derivatives and Hessians.
Christiaan van Dorphttp://www.blogger.com/profile/09480120443706387617noreply@blogger.com0tag:blogger.com,1999:blog-8385427779430646568.post-55573995647279045432020-06-21T14:40:00.004-07:002020-06-21T14:45:23.233-07:00A C++ worker pool using thread and functionalTo celebrate the fifth anniversary of the TBZ533 blog, I decided the revisit the
<a
href="https://tbz533.blogspot.com/2015/06/a-simple-class-wrapper-for-parallelism.html"
>first post</a
>. This was about a class that could be inherited to provide worker pool
functionality, implemented with the pthread library and function pointers. Using
a bit of modern C++, the implementation of a simple worker pool becomes much
more elegant. <br /><br />
In the header file <code>workerpool.hpp</code>, we declare the WorkerPool class.
The argument of the constructor is the number of workers. If the number of
workers is zero, all tasks are executed serially (which can be convenient for
debugging). A <code>WorkerPool</code> can not be copied, so we disable the copy
constructor and copy assignment constructor. A <code>Task</code> is defined as a
function that takes no arguments and returns nothing. A <code>Task</code> can be
passed to the worker pool using the <code>push_back()</code> method (which name
is inspired by <code>std::list::push_back()</code>). The final public method is
the <code>sync()</code> method that waits for all tasks to be completed.
<br /><br />
The private methods and members of <code>WorkerPool</code> are a vector of
worker threads, a list of (unfinished) tasks, the worker routine, and some
synchronization objects. The boolean <code>abort_flag</code> is used to get the
workers to exit their worker routines. The <code>count</code> integer represents
the number of unfinished tasks. Tasks are guarded by the
<code>task_mut</code> mutex, and <code>count</code> by the
<code>count_mut</code> mutex. The <code>task_cv</code> condition variable is
used to signal that a new <code>Task</code> is added to the list of tasks, while
the <code>count_cv</code> is used to signal that another task has been
completed.
<br /><br />
<script src="https://gist.github.com/chvandorp/1e5c0a5f1300cf8902206eacb297c0db.js"></script>
<br /><br />
The source file contains the definitions of the methods of
<code>WorkerPool</code>. The constructor initiates a number of worker threads
and starts their <code>worker_routine()</code>. The descructor makes sure that
all threads exit their worker routine and joins the threads. This means that the
<code>WorkerPool</code> uses the RAII technique. The
<code>push_back()</code> method evaluates the <code>Task</code> object if there
are no workers, and otherwise adds the task to the <code>tasks</code> list,
after increasing the <code>count</code>. We signal that a task has been added in
case a worker is sleeping. The <code>sync()</code> method only returns when
<code>count == 0</code>. Finally, the most important part of the code is the
<code>worker_routine()</code>. When the list <code>tasks</code> is empty, a
worker waits until a signal is given that a new tasks is added, or the
<code>abort_flag</code> is set. In the first case, the task is executed, in the
second case, the worker exits the <code>worker_routine()</code>.
<br /><br />
<script src="https://gist.github.com/chvandorp/d53282c26c85de025a45d3aebd8125ee.js"></script>
<br /><br />
<h2>Example: drawing the Mandelbrot set in parallel</h2>
<div class="separator" style="clear: both; text-align: center;">
<a
href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj8bcUkN-S5p5M6HOB1XCBOC68rXrimlHeUqK2KufEit5IDsK0FvqkZly-VRvSKuonAXUuZUdyZW4uyCAMzKwhiTsU91tJ9r3tAJs85xmsHgPWDdW_GEo_vnxvW5bdNX0voikwrDHyRto3-/s1200/mandelbrot.png"
imageanchor="1"
style="margin-left: 1em; margin-right: 1em;"
><img
border="0"
data-original-height="1200"
data-original-width="1200"
height="320"
src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj8bcUkN-S5p5M6HOB1XCBOC68rXrimlHeUqK2KufEit5IDsK0FvqkZly-VRvSKuonAXUuZUdyZW4uyCAMzKwhiTsU91tJ9r3tAJs85xmsHgPWDdW_GEo_vnxvW5bdNX0voikwrDHyRto3-/s320/mandelbrot.png"
/></a>
</div>
<br /><br />
To give a nice example of how the <code>WorkerPool</code> should be used, I
modified some code from
<a
href="https://medium.com/farouk-ounanes-home-on-the-internet/mandelbrot-set-in-c-from-scratch-c7ad6a1bf2d9"
>Farouk Ounane
</a>
for drawing the Mandelbrot set in C++. Each <code>Task</code> determines if a
point in the complex plane is part of the Mandelbrot set or not. We use a
closure (lambda expression) to define these tasks, and pass them to the
WorkerPool. The result is shown above. To compile the C++ program with
<code>gcc</code>, put the files <code>workerpool.cpp</code>,
<code>workerpool.hpp</code> and <code>mandelbrot.cpp</code> in the same
directory, and execute from the terminal
<pre>
g++ workerpool.cpp mandelbrot.cpp -o mandelbrot -pthread -std=c++11
</pre>
<br />
<script src="https://gist.github.com/chvandorp/a6569ebbae75c0861018ebf4a7f1a5b4.js"></script>
Christiaan van Dorphttp://www.blogger.com/profile/09480120443706387617noreply@blogger.com0tag:blogger.com,1999:blog-8385427779430646568.post-64422056893053785092020-03-01T14:20:00.001-08:002020-03-01T14:20:42.713-08:00My Matplotlib Cheatsheet (#2)<a href="https://tbz533.blogspot.com/2019/11/my-matplotlib-cheatsheet-1.html">Previously</a>,
I wrote about some <code>matplotlib</code> methods I often use.
In this post I will show two more.
<br/>
<br/>
<h2>Scatter plots with color for density</h2>
Scatter plots with many points can become very uninformative.
A very easy solution is to color the points according to their local density.
This can be done with the <code>gaussian_kde</code> from <code>scipy.stats</code>.
Suppose that we want to make a scatter plot from <code>xs</code> and <code>ys</code>
in subplot <code>ax</code>, then we just have to add two lines to color the points:
<pre class="brush: py">
xy = numpy.vstack([xs, ys])
z = scipy.stats.gaussian_kde(xy)(xy)
ax.scatter(xs, ys, s=5, c=z)
</pre>
Here's an example:
<br/>
<br/>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh5hRpy-835heT_uIEV9FpCx2TDU5OopMo5lMXzA6Po935GWmBC5hN_ShDuM8vfoasYYkU2-4uJY1Csmd3SaK-h8oFtVdI6-WEeg5BP7vhzCppXaWmUrL9i6KckjJphFDEKQhqJm0awk8RS/s1600/scatter-with-density.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh5hRpy-835heT_uIEV9FpCx2TDU5OopMo5lMXzA6Po935GWmBC5hN_ShDuM8vfoasYYkU2-4uJY1Csmd3SaK-h8oFtVdI6-WEeg5BP7vhzCppXaWmUrL9i6KckjJphFDEKQhqJm0awk8RS/s400/scatter-with-density.png" width="400" height="197" data-original-width="1600" data-original-height="788" /></a></div>
<br/>
<br/>
This is the code I used to make the figure, making use of <code>inset_axes</code> for a color bar:
<br/>
<br/>
<script src="https://gist.github.com/chvandorp/49db587d4fbd79056a4e0b01fb260eb7.js"></script>
<br/>
<br/>
<h2>Recycling bins for multiple histograms</h2>
When you want to plot multiple histograms in one plot, the width of the bins can be very different,
which does not look so good. I recently found a simple method to fix this.
The second value that the <code>hist</code> function returns specifies the boundaries of the bins used for the histogram.
This value can be used directly in the next call to the <code>hist</code> function using the <code>bins</code> keyword.
The following figure shows the result:
<br/>
<br/>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiR76BV-J_qBIYB8khRogmRoYQcdNz1TmYGqTyf0NgDM6SH59rjXwqdqI5GFpqOvS4YVhvRJSIp0ZUgDhxmOQim_8jXCYzoDYCOR9-kwgxsAR0CNsT9flGhnG-4GNb9dIaZY3GjB5xB0yV_/s1600/recycling-bins.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiR76BV-J_qBIYB8khRogmRoYQcdNz1TmYGqTyf0NgDM6SH59rjXwqdqI5GFpqOvS4YVhvRJSIp0ZUgDhxmOQim_8jXCYzoDYCOR9-kwgxsAR0CNsT9flGhnG-4GNb9dIaZY3GjB5xB0yV_/s400/recycling-bins.png" width="400" height="160" data-original-width="1600" data-original-height="640" /></a></div>
<br/>
<br/>
I used the following code to plot this figure:
<br/>
<br/>
<script src="https://gist.github.com/chvandorp/58c89a8d73aa5cde4c4965fcb92382ff.js"></script>
Christiaan van Dorphttp://www.blogger.com/profile/09480120443706387617noreply@blogger.com0tag:blogger.com,1999:blog-8385427779430646568.post-47471617164321136032020-01-11T17:47:00.001-08:002020-01-20T08:46:22.979-08:00Generalized Profiling with StanThe probabilistic programming language <a href="https://mc-stan.org">Stan</a> has built-in support for ODE models
through the higher-order functions <code>integrate_ode_rk45</code> and <code>integrate_ode_bdf</code>.
Here, I will demonstrate an alternative way to implement ODE models in Stan,
that has some interesting benefits. This method is called <i>generalized profiling</i>
and has been developed by <a href="https://doi.org/10.1111/j.1467-9868.2007.00610.x">
Ramsay et al. in 2007</a>.
<br/><br/>
In a <a href="http://tbz533.blogspot.com/2019/11/using-constant-stan-functions-for.html">previous blog post</a>,
I used the Lotka-Volterra predator-prey model as an example for an ODE model in Stan.
The model is given by the 2-dimensional system of ODEs
$$
\frac{dx}{dt} = a x - b xy\,,\quad \frac{dy}{dt} = c b xy - dy
$$
with initial conditions \(x(0) = x_0\) and \(y(0) = y_0\).
Here, \(x\) and \(y\) represent the prey and predator "concentrations", respectively.
If we choose a system size \(K\), we can generate some data
\(X_i \sim \mathrm{Poisson}(K x(t_i))\) and \(Y_i \sim \mathrm{Poisson}(K y(t_i))\) at times \(0\leq t_i \leq T\).
However, when we implement this model in Stan with <code>integrate_ode_rk45</code>,
and try to estimate the parameters \(a, b, c, d, x_0, y_0\) without giving any initial guesses for these parameters,
Stan will have a really hard time finding any good values.
The reason for this is that for non-linear systems of ODEs the posterior density can be quite "rugged",
i.e. can contain many local optima that even Stan's NUTS sampler can have a hard time navigating through.
The generalized profiling method solves this problem by modifying the
model in such a way that the posterior density becomes more smooth.
This works roughly as follows.
<br/><br/>
Suppose that the model is defined by the system of ODEs
$$
\dot{u}(t) = f(t, u(t), \theta)\,,\quad u(0) = u_0
$$
Fitting the solution of a system of ODEs through data can be hard,
so why not replace the solution of the ODEs with something that can be fit through the data easily:
for instance a <a href="http://mathworld.wolfram.com/B-Spline.html">B-spline</a>. Obviously, this leads to the problem of over-fitting:
when the number of knots equals the number of observations, the spline will fit the data perfectly.
Also, we are not learning much about the parameters of the original model. We can solve both problems by penalizing.
At each time point \(t\), the spline \(u\) has a total derivative \(\dot{u}(t)\),
while according to the model, the total derivative should be equal to the vector field
\(f(t, u(t), \theta)\). We can enforce that the spline resembles trajectories of the model by defining a penalty
$$
\lambda \mathcal{S}[u] = \frac{\lambda}{2} \int_0^T \|\dot{u}(t) - f(t, u(t), \theta) \|^2 dt
$$
which equals \(0\) exactly when \(u\) solves the system of ODEs.
The log-likelihood of the data, given parameters \(\theta\) and spline \(u\) is then given by
$$
\sum_{i=1}^n \log p(U_i | u(t_i), \theta) - \lambda \mathcal{S}[u]\,,
$$
where \(p\) defines the measurement model
(in our predator-prey model, \(p\) is the probability mass function of the Poisson distribution).
When \(\lambda\) is close to \(0\), the spline is allowed to deviate from a true model trajectory,
and fitting is easy. On the other hand, when \(\lambda\) is very large,
the spline starts to resemble a true solution to the ODEs, and we will estimate \(\theta\) more precisely.
<br/><br/>
<h2> Stan implementation </h2>
In order to implement generalized profiling is Stan, we have to solve two problems.
First, we have to construct a B-spline \(u\) in Stan, and compute the derivative \(\dot{u}\) of such a B-spline.
Second, we have to numerically integrate the penalty \(\mathcal{S}[u]\).
For the B-spline, I use an
<a href="https://mc-stan.org/users/documentation/case-studies/splines_in_stan.html">example from Milad Kharratzadeh</a>,
that uses a recursive Stan function to build a spline basis matrix in the <code>generated quantities</code> block.
Milad gives a concise introduction to B-splines, which I will not repeat, but we do need the derivative of the spline.
<br/><br/>
Fortunately, we can define the derivative of a spline in terms of other B-splines.
Define the spline basis of order \(k\) recursively
$$
B_{i,k}(t) = \omega_{i,k} B_{i,k−1}(t) + (1−\omega_{i+1,k})B_{i+1,k−1}(t)
$$
with coefficients \(\omega_{i, k} = (t - \tau_i) / (\tau_{i+k-1} - \tau_i)\)
and knots \(\tau_1 \leq \tau_2 \leq \dots \leq \tau_m\).
The base-case of the recursion is given by the order \(1\) splines
$$
B_{i, 1}(t) =
\left\{\begin{array}{ll}
1 & \mbox{if } \tau_i \leq t < \tau_{i+1} \\
0 & \mbox{otherwise}
\end{array}\right.
$$
The derivative of a basis spline \(B\) is then given by
$$
B_{i, k}'(t) = (k-1)\left(\alpha_{i, k} B_{i, k-1}(t) - \alpha_{i+1, k} B_{i+1, k-1}(t)\right)
$$
where \(\alpha_{i, k} = 1/(\tau_{i+k-1} - \tau_i)\). We can prove this easily with induction.
<br/><br/>
The following Stan functions are implementations of B-splines and their derivative.
The file <code>splines.stan</code> can be included in other Stan models with an
<code>#include "splines.stan"</code> statement in the <code>functions</code> block.
<br/><br/>
<script src="https://gist.github.com/chvandorp/f5bd674f7cea39f0b10ffa0f20dac9dc.js"></script>
<br/>
Next, we need a way to compute the integral defined by \(\mathcal{S}[u]\).
In general, this will not have a closed form expression, and therefore we use
numerical integration. According to Simpson's rule, we can approximate the integral
$$
\int_a^b g(x) dx = \tfrac{h}{3}\left(g(a) + 4 g(\tfrac12(b+a)) + g(b)\right) + \mathcal{O}(h^5)
$$
where \(h = \tfrac12 (b-a)\).
Hence, we will partition the interval \([0,T]\) into \(k-1\) sub-intervals of equal length \(2h = \frac{T}{k-1}\),
and we have to know the value of the spline \(u(t)\) and it's derivative \(\dot{u}(t)\) at the
\(2k-1\) end and midpoints of these intervals.
The penalty functional \(\mathcal{S}[u]\) is then approximated with
$$
\mathcal{S}[u] \approx \tfrac{h}{3} \sum_{i=1}^{k-1} \left(L(2(i-1)h) + 4 L((2i-1)h) + L(2ih)\right)
$$
where \(L(t) = \tfrac12 \| \dot{u}(t) - f(t, u(t), \theta) \|^2 \). The number of "grid points" \(k\) is defined in the
Stan model's <code>data</code> block as <code>NumGridPts</code>
<br/><br/>
The Stan model is organized as follows. In the <code>functions</code> block, we define
the system of ODEs (copied directly from a <a href="http://tbz533.blogspot.com/2019/11/using-constant-stan-functions-for.html">previous post</a>).
Also, we define a convenient function <code>seq</code> to make sequences.
In the <code>data</code> block, we import the observations and observation times,
define the number of knots, grid points, the degree of the splines, the constant \(K\) and the weight
of the penalty \(\lambda\).
In the <code>transformed data</code> block, we define a knot vector and the grid
for numerical integration. Then, we build the spline and spline derivative matrices
<code>BObs</code>, <code>BGrid</code>, and <code>DBGrid</code>
for the observation times and grid points.
The <code>parameters</code> block declares the predator-prey model parameters
\(a, b, c, d\), and the spline weights \(\upsilon\).
In the <code>model</code> block, we compute the values of the spine \(u\) and spline
derivative \(\dot{u}\) at the observation times (<code>u = BObs * upsilon</code>)
and grid points (<code>u_grid = BGrid * upsilon</code> and <code>du_grid = DBGrid * upsilon</code>).
These values are used for numerical integration of the penalty, and in the "sampling statement"
of the Poisson-distributed observations.
<br/><br/>
<script src="https://gist.github.com/chvandorp/c17d411c649c7300c329094ce72e9897.js"></script>
<br/>
<h2> Fitting the model to generated data </h2>
I will now demonstrate the Stan model with simulated data from the predator-prey model.
We will use Python to generate data. First, we have to import some modules and
compile the Stan model.
A GitHub "gist" with all the python code in a single script is available
<a href="https://gist.github.com/chvandorp/82eeb7ae8f5ad7d437d1d100eb44959a">here</a>.
<pre class="brush: py">
import pystan
import numpy as np
import scipy.stats as sts
import matplotlib.pyplot as plt
plt.rcParams.update({'font.size': 18})
from matplotlib.gridspec import GridSpec
from scipy.integrate import solve_ivp
import sdeint ## will be used later
## compile the Stan model
sm = pystan.StanModel(file="gen-prof-LV.stan")
</pre>
Now, we define the predator-prey model and choose some parameters \(\theta = (a, b, c, d)\),
and initial condition \(u_0 = (x_0, y_0)^T\). We also have to define the times we observe the system
and the number of observations. Then we simulate the model using the <code>solve_ivp</code>
method from <code>scipy.integrate</code> and generate noisy observations using
<code>poisson.rvs</code> from <code>scipy.stats</code>.
<pre class="brush: py">
## choose nice parameter values
a = 1.0
b = 0.4
c = 0.4
d = 0.5
theta = [a, b, c, d]
## observation times and initial conditions
NumObs = 25
tmin, tmax = 0, 50
TimesObs = np.linspace(tmin, tmax, NumObs)
## initial values
u0 = [1.0, 1.0] ## x0, y0
## the "system size" parameter K
K = 10
## define the Lotka-Volterra predator-prey model
def LV_sys(t, u):
return [a*u[0] - b*u[0]*u[1], c*b*u[0]*u[1] - d*u[1]]
## use an ODE integrator to produce a trajectory
sol = solve_ivp(LV_sys, (tmin, tmax), u0, t_eval=TimesObs)
## generate random data (observations)
Obs = sts.poisson.rvs(sol.y.T*K)
## K determines the measurement noise
</pre>
Next, we define a function <code>run_gen_prof</code>
that makes the correct data dictionary for Stan and starts the Stan model.
One of the arguments of the function is \(\lambda\), the weight of the penalty.
We will run the Stan model twice. Once with a small \(\lambda = 1\), and once with a
large \(\lambda = 100 \). The function <code>run_gen_prof</code> also returns
the grid points used for numerical integration of the penalty term, that we will
use for plotting the spline and it's derivative below.
<pre class="brush: py">
def run_gen_prof(sm, obs, times, lam, system_size, deg=3,
chains=4, chain_len=1000, thin=5):
"""
convenient function to make a data dictionary for Stan
and run the Stan model
"""
n = len(times)
## put a knot at every observation and between two observations
num_knots = 2*n-1
## number of points for numerical integration
num_grid_pts = 3*num_knots-1
grid_pts = np.linspace(times[0], times[n-1], num_grid_pts)
data = {
"NumKnots" : num_knots,
"SplineDeg" : deg,
"NumGridPts" : num_grid_pts,
"NumObs" : n,
"TimesObs" : times,
"Obs" : obs,
"K" : system_size,
"Lambda" : lam,
}
settings = {
"max_treedepth" : 15,
"adapt_delta" : 0.9
}
sam = sm.sampling(data=data, chains=chains, iter=chain_len,
control=settings, thin=thin)
## return the grid points for plotting
return sam, grid_pts
## fit the model twice with different lambdas
lam_small = 1
sam_small, GridPts = run_gen_prof(sm, Obs, TimesObs, lam_small, K)
lam_large = 100
sam_large, _ = run_gen_prof(sm, Obs, TimesObs, lam_large, K)
</pre>
In order to visualize the result,
we define a function <code>plot_gen_prof_fit</code>
that makes a plot of the data and fit.
The function also plots the derivative of the spline \(\dot u\) and the
vector field \(f(t, u(t), \theta)\), such that we can see how
well the spline resembles a trajectory of the predator-prey model.
<pre class="brush: py">
def plot_gen_prof_fit(sam, times, obs, grid_pts, system_size, n=None):
"""
Make a figure with the data and the fitted spline.
Also add the derivative of the spline and the vector field
to give an indication of the deviation from the LV model
"""
if n is None:
n = len(times)
chain_dict = sam.extract(permuted=True)
fig = plt.figure(figsize=(14, 7))
gs = GridSpec(4,1)
ax = fig.add_subplot(gs[:2,0])
bxs = [fig.add_subplot(gs[2,0], sharex=ax),
fig.add_subplot(gs[3,0], sharex=ax)]
labels = ["Prey ($X$)", "Predator ($Y$)"]
colors = ["tab:blue", "tab:orange"]
pcts = [2.5, 97.5]
## make plots for predators and prey
for i, color in enumerate(colors):
ax.scatter(times[:n], obs[:n,i], color=color, edgecolors='k',
zorder=3, label=labels[i])
## plot trajectories
uss = chain_dict["uhat"][:,:,i].T
mean_uhat = [system_size*np.mean(us) for us in uss]
ax.plot(grid_pts, mean_uhat, color='k', zorder=2,
label='fit' if i == 0 else None)
range_uhat = [system_size*np.percentile(us, pcts) for us in uss]
ax.fill_between(grid_pts, *np.array(range_uhat).T, color=color,
alpha=0.5, linewidth=0, zorder=1)
## plot simulations
uss = chain_dict["usim"][:,:,i].T
range_usim = [np.percentile(us, pcts) for us in uss]
ax.fill_between(grid_pts, *np.array(range_usim).T, color=color,
alpha=0.3, linewidth=0)
## plot derivative of the spline and the target derivative
uss = chain_dict["duhat_real"][:,:,i].T
mean_duhat_real = [system_size*np.mean(us) for us in uss]
bxs[i].plot(grid_pts, mean_duhat_real, color=color,
linewidth=3, label="spline")
uss = chain_dict["duhat_target"][:,:,i].T
mean_duhat_target = [system_size*np.mean(xs) for xs in uss]
bxs[i].plot(grid_pts, mean_duhat_target, color='k',
linestyle='--', label="LV model")
bxs[i].legend(loc=1, ncol=2, prop={'size': 10})
## some labels etc...
ax.legend(loc=1, ncol=3, prop={'size': 10})
ax.set_ylabel("data and fit")
for i, c in enumerate('xy'):
bxs[i].set_ylabel(f"$\\frac{{d{c}}}{{dt}}$",
rotation=0, va='center')
ax.get_xaxis().set_visible(False)
bxs[0].get_xaxis().set_visible(False)
bxs[1].set_xlabel("Time ($t$)")
fig.align_ylabels()
return fig, (ax, *bxs)
</pre>
Let us first have a look at the fit with \(\lambda = 1\).
<pre class="brush: py">
fig, axs = plot_gen_prof_fit(sam_small, TimesObs, Obs, GridPts, K)
fig.savefig("gen-prof-fit-small-lambda.png", dpi=300, bbox_inches='tight')
</pre>
This is the result (click on the image to make it larger):
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiZArXO7fYeVAGDykEfU6TtpxG4dSkMK8d_Gw2qSaQuuln4lYOEySxDpiVQ_EkXKgZBsdGxsZb2H6cT5q_6gw0EmIyBJ1GJnb714zusig8r8adOosIGyZZ6vriD3dpALgDFumkvuXlawFcl/s1600/gen-prof-fit-small-lambda.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiZArXO7fYeVAGDykEfU6TtpxG4dSkMK8d_Gw2qSaQuuln4lYOEySxDpiVQ_EkXKgZBsdGxsZb2H6cT5q_6gw0EmIyBJ1GJnb714zusig8r8adOosIGyZZ6vriD3dpALgDFumkvuXlawFcl/s400/gen-prof-fit-small-lambda.png" width="400" height="205" data-original-width="1600" data-original-height="818" /></a></div>
The scaled spline \(K u\) goes through almost all the data points \(X_i\) and \(Y_i\),
and the derivative \(\dot{u}\) roughly follows the vector field \(f(t, u(t), \theta)\).
However, the prediction intervals (dark colored bands) are quite wide, meaning that the
uncertainty in \(u\) is very large between observations.
</br></br>
In order to make our spline-based model more closely related to the ODE model,
we have to increase \(\lambda\). Now let us look at the fit with \(\lambda = 100\).
<pre class="brush: py">
fig, axs = plot_gen_prof_fit(sam_large, TimesObs, Obs, GridPts, K)
fig.savefig("gen-prof-fit-large-lambda.png", dpi=300, bbox_inches='tight')
</pre>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjukD0Dygdy94YRitNqqAyVfi2ej_eLy4MP5ofFxJzeJyzN-m0Cr3qz02nP5hi6d003Kjhwydl478OwCzdGEs59Oqg1edC2Vuk48lCKCiaXK5JjhtzBLAqQ2iJIedg3lY6Fm0V3qZrZGg3-/s1600/gen-prof-fit-large-lambda.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjukD0Dygdy94YRitNqqAyVfi2ej_eLy4MP5ofFxJzeJyzN-m0Cr3qz02nP5hi6d003Kjhwydl478OwCzdGEs59Oqg1edC2Vuk48lCKCiaXK5JjhtzBLAqQ2iJIedg3lY6Fm0V3qZrZGg3-/s400/gen-prof-fit-large-lambda.png" width="400" height="205" data-original-width="1600" data-original-height="818" /></a></div>
The result of taking \(\lambda\) large is that the spline \(K u\) no longer goes through all the
data points \(X_i, Y_i\), as it should, since the the data is sampled from a Poisson distribution
with mean \(K u(t_i) \). In addition, the derivative of the spline \(\dot{u}\) is now almost
identical to the vector field \(f(t, u(t), \theta)\).
<br/><br/>
We will now look at the parameter estimates, and the effect of choosing small and large \(\lambda\).
Again, we define a simple function <code>plot_par_est</code>
to plot the estimates, and then use this function to create the two plots.
<pre class="brush: py">
def plot_par_est(ax, sam, real_par_vals):
"""
plot parameter estimates and compare them with the real values
"""
chain_dict = sam.extract(permuted=True)
parnames = ["a", "b", "c", "d"]
latex_parnames = [f"${x}$" for x in parnames]
pcts = [2.5, 97.5]
## plot estimates and 95 percentiles
pos = range(len(parnames))
means = [np.mean(chain_dict[x]) for x in parnames]
ranges = [np.percentile(chain_dict[x], pcts) for x in parnames]
ax.scatter(pos, [np.mean(chain_dict[x]) for x in parnames],
color='tab:red', label="estimate", marker='D', zorder=1)
for p, r in zip(pos, ranges):
ax.plot([p, p], r, color='tab:red', linewidth=2, zorder=1)
ax.set_xticks(pos)
ax.set_xticklabels(latex_parnames)
## plot real parameter values
ax.scatter(pos, real_par_vals, color='k', label="real value", zorder=2)
ax.legend(loc=1, ncol=1, prop={'size': 10})
ax.set_ylabel("parameter value")
ax.set_xlabel("parameter name")
## compare the parameter estimates with small and large lambda
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(7,3), sharey=True)
plot_par_est(ax1, sam_small, theta)
plot_par_est(ax2, sam_large, theta)
ax1.set_title(f"$\\lambda = {lam_small}$")
ax2.set_title(f"$\\lambda = {lam_large}$")
fig.savefig("gen-prof-estimates.png", dpi=300, bbox_inches='tight')
</pre>
This results in the following figure.
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhtZrvtprPZxGKwjbuv2aOZzs_4EfScxEUBxbJiLa2Jx10UfKnhaYFJv3Q9driUJkr3W_5ZCAuZ1XkUaL458pBbnN9hj3sHJLiv9NCQhU6KOkV0hpIBaVZP06Vi-z-4Gzva53aKIFuzNsrb/s1600/gen-prof-estimates.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhtZrvtprPZxGKwjbuv2aOZzs_4EfScxEUBxbJiLa2Jx10UfKnhaYFJv3Q9driUJkr3W_5ZCAuZ1XkUaL458pBbnN9hj3sHJLiv9NCQhU6KOkV0hpIBaVZP06Vi-z-4Gzva53aKIFuzNsrb/s400/gen-prof-estimates.png" width="400" height="213" data-original-width="1600" data-original-height="852" /></a></div>
The estimates (red diamonds) are pretty close to the real values (black dots) for both small and large \(\lambda\),
but the 95% credibility intervals (CrI; the red vertical lines) are much smaller when \(\lambda = 100\).
<blockquote>
<b>disclaimer</b>: I am not sure that these red lines <i>are</i> in fact 95% CrIs, as we are not fitting the true model to the data.
</blockquote>
The question now is: what \(\lambda\) should we choose. In this case, the larger \(\lambda\) is,
the better, because the spline gets closer to a trajectory of the model.
However, as \(u\) is a B-spline, and not <i>any</i> differentiable function,
we will never have \(\dot{u}(t) = f(t, u(u), \theta)\) for all but finitely many \(t\),
and hence the penalty functional \(\mathcal{S}[u]\) will never be zero,
unless the number of knots goes to infinity.
In practice, one can try a number of increasing values for \(\lambda\)
and choose different numbers of knots, and see when the estimates (and uncertainties)
stabilize.
<br/><br/>
<h2>Model misspecification: the predator-prey model with intrinsic noise</h2>
An interesting benefit of the generalized profiling method over ODE integrators is
that it can deal with some level of misspecification of the model.
For biological systems, practically all models are misspecified.
One source of misspecification is intrinsic noise, i.e., the true dynamics
are not deterministic but stochastic.
For the Lotka-Volterra predator-prey model, intrinsic noise has
interesting consequences. The oscillatory dynamics predicted by the deterministic model
have a fixed period and amplitude. When we add some noise to the system,
we find that the model quickly gets out of phase with the deterministic trajectory,
and that also the amplitude will vary over time. This makes it very
difficult to use an ODE integrator to fit the model to data.
We will see that the generalized profiling approach still does the job.
<br/><br/>
First, we will add diagonal multiplicative noise to the predator-prey model.
We will use the <code><a href="https://pypi.org/project/sdeint/">sdeint</a></code> module
to simulate stochastic differential equations (SDEs).
This module is still a pre-release, but it suffices for our purpose.
Our stochastic predator-prey model is given by the SDEs
$$
dx_t = (a x_t - bx_t y_t)dt + \sigma dW^1_t
$$
$$
dy_t = (cb x_t y_t - d y_t)dt + \sigma dW^2_t
$$
where \(W_t\) is a 2-dimensional Wiener process, and \(\sigma\) is the volatility.
We use this model to generate data as follows
<pre class="brush: py">
## add process noise to the model
sigma = 0.1
## other parameters stay the same
NumObs = 50
tmin, tmax = 0, 100
Thin = 10
TimesObsSDE = np.linspace(tmin, tmax, Thin*(NumObs-1)+1)
TimesObs = TimesObsSDE[::Thin]
## define the system
def LV_sys_drift(u, t):
return np.array([a*u[0] - b*u[0]*u[1], c*b*u[0]*u[1] - d*u[1]])
def LV_sys_diffusion(u, t):
return sigma * np.diag(u)
sol = sdeint.itoint(LV_sys_drift, LV_sys_diffusion, u0, TimesObsSDE)
sol_ode = solve_ivp(LV_sys, (tmin, tmax), u0, t_eval=TimesObsSDE)
## generate random data (observations)
Obs = sts.poisson.rvs(sol[::Thin,:]*K)
</pre>
We then plot the trajectory and observations, and compare them to the trajectory
predicted by the determinitic model.
<pre class="brush: py">
## make a figure of the stochastic process,
## the data and the deterministic skellaton
pcts = [2.5, 97.5] ## percentiles used later for CrIs
colors = ["tab:blue", "tab:orange"] ## color for prey and predator
fig, axs = plt.subplots(2, 1, figsize=(14, 7), sharex=True)
for i, color in enumerate(colors):
axs[i].scatter(TimesObs, Obs[:,i], color=color,
edgecolors='k', zorder=2, label='observations')
axs[i].plot(TimesObsSDE, sol[:,i]*K, color=color,
linewidth=3, zorder=1, label='stochastic process')
axs[i].plot(TimesObsSDE, sol_ode.y[i]*K, color='k', alpha=0.75,
label='deterministic skellaton')
axs[i].legend(loc=0, ncol=3, prop={'size': 10})
axs[-1].set_xlabel("Time ($t$)")
axs[0].set_ylabel("$x$ SDE (blue)\n$x$ ODE (black)")
axs[1].set_ylabel("$y$ SDE (orange)\n$y$ ODE (black)")
fig.savefig("stochastic-LV-sim.png", dpi=300, bbox_inches='tight')
</pre>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi6nKFvZijThWfbgWx8tqp_l7UYENhdyH2CsKxiT2HOmtqTWKbqEEkR6XkAhyphenhyphenjfYwo09AKLXi3Nbc5kNcgO2xsBSJ7-OdJCuiV_whmgelsy6yylVwqME3Qx0OlKAiWAr6bmGhjxyYBHKhnB/s1600/stochastic-LV-sim.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi6nKFvZijThWfbgWx8tqp_l7UYENhdyH2CsKxiT2HOmtqTWKbqEEkR6XkAhyphenhyphenjfYwo09AKLXi3Nbc5kNcgO2xsBSJ7-OdJCuiV_whmgelsy6yylVwqME3Qx0OlKAiWAr6bmGhjxyYBHKhnB/s400/stochastic-LV-sim.png" width="400" height="201" data-original-width="1600" data-original-height="804" /></a></div>
As mentioned, the stochastic trajectory gets out of phase with the deterministic model (black curves)
and also the amplitude varies over time.
<br/><br/>
We now fit the same Stan model as before to the stochastic predator-prey data.
We again use the <code>run_gen_prof</code> function to start the Stan program,
and plot figures with the <code>plot_gen_prof_fit</code> and
<code>plot_par_est</code> functions.
<pre class="brush: py">
## choose a lambda, and fit the model...
lam = 10
sam, GridPts = run_gen_prof(sm, Obs, TimesObs, lam, K)
## plot the fit
fig, axs = plot_gen_prof_fit(sam, TimesObs, Obs, GridPts, K)
fig.savefig("gen-prof-fit-sde.png", dpi=300, bbox_inches='tight')
## plot estimates
fig, ax = plt.subplots(1, 1, figsize=(4,3))
plot_par_est(ax, sam, theta)
fig.savefig("gen-prof-estimates-sde.png", dpi=300, bbox_inches='tight')
</pre>
The data and model fit are shown in the following figure. The generalized profiling fit stays close to the data,
even though the stochastic path gets out of phase with the deterministic trajectory.
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhhI2t6oD5jVSkkkamwK3ojAXY_iYByFgKD7LYB4dwzmf-39bWFIMa94xRN49YXyEhoiNy-zrpMApkKqQrmYDTPcDdGNkN-CY0b9jpZbQOX9dhBGfnxo3x74zpWy_o_IDBve8Wpwf4uIr8T/s1600/gen-prof-fit-sde.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhhI2t6oD5jVSkkkamwK3ojAXY_iYByFgKD7LYB4dwzmf-39bWFIMa94xRN49YXyEhoiNy-zrpMApkKqQrmYDTPcDdGNkN-CY0b9jpZbQOX9dhBGfnxo3x74zpWy_o_IDBve8Wpwf4uIr8T/s400/gen-prof-fit-sde.png" width="400" height="205" data-original-width="1600" data-original-height="818" /></a></div>
Also, the parameter estimates are still very close to the real values, as shown by the following figure
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiYVmdhVJlzpB00hRxZE8lH5k3pvVEYYHjMYTOSCh5tg67uDYWuv5vMP9w_LTt7zzBK4T6EGHqrcj_ian03Jo4CL52xpB4R_8FLipaFtwI1DfdmrS9ViqMJF4aR1GptQDUXhrJtyhTZQEyD/s1600/gen-prof-estimates-sde.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiYVmdhVJlzpB00hRxZE8lH5k3pvVEYYHjMYTOSCh5tg67uDYWuv5vMP9w_LTt7zzBK4T6EGHqrcj_ian03Jo4CL52xpB4R_8FLipaFtwI1DfdmrS9ViqMJF4aR1GptQDUXhrJtyhTZQEyD/s320/gen-prof-estimates-sde.png" width="320" height="234" data-original-width="1275" data-original-height="931" /></a></div>
<br/>
For this fit, I used a \(\lambda = 10\). Again, it is not clear what value for \(\lambda\) we should choose.
In this case, a bigger \(\lambda\) is not necessarily better, as a finite \(\lambda\) allows the
spline to deviate from the deterministic model, which is exactly what we need.
In a follow-up paper <a href="https://doi.org/10.1098/rsif.2010.0412">Hooker et al.</a>
fit a model to a subset of the data and then compare model
predictions with the left-out data points (i.e. forward cross-validation). This is repeated for different values
of \(\lambda\) to determine the penalty weight yielding the best predictive performance.
I experimented with this, but will leave it for a future blog post.
<br/><br/>
<h2>Theoretical considerations and further developments</h2>
The motivation for the penalty functional \(\mathcal{S}[u]\) may appear to be <i>ad hoc</i>,
but in fact the justification for such a functional is given by some very interesting mathematics and physics.
The theory of small random pertubations of dynamical systems developed by
<a href="https://en.wikipedia.org/wiki/Freidlin%E2%80%93Wentzell_theorem">Friedlin and Wentzell</a>
tells us that the probability that a stochastic trajectory is "close" to the path \(u\) is roughly \(\exp(-\sigma^{-2} \mathcal{S}[u])\).
Here the volatility \(\sigma\) should be very small and the noise is <i>additive</i>.
Hence, in a way, the penalty (or <i>action</i>) functional \(\mathcal{S}[u]\) is proportional to the log-likelihood of the path \(u\).
<br/><br/>
There is no reason to choose B-splines for \(u\) other than convenience.
In the so called <i>tracking</i> approach of
<a href="https://projecteuclid.org/euclid.ejs/1452004955">Clairon and Brunel (2017)</a>,
the path \(u(t)\) is instead defined in terms of a \(2n\) dimensional system of ODEs
(where \(n\) is the dimension of the original system \(\dot{u} = f(t, u, \theta)\)).
This system is derived as follows.
<br/><br/>
First, The discrete ubservations \(U_i\) at times \(0 \leq t_i \leq T\) are replaced by a continuous
approximation \(U(t)\) with \(0\leq t\leq T\) (e.g. using a B-spline), and it is assumed that the measurment error
is normally distributed with precision \(\kappa\).
Hence, the log-likelihood of the smoothened data \(U(t)\) given trajectory \(u(t)\) is (up to a constant) equal to
$$
-\frac{\kappa}{2}\int_0^T \| u(t) - U(t)\|^2 dt\,.
$$
The penalty of a path \( u(t) \) is again given by
$$
\frac{\lambda}{2}\int_0^T \| \dot{u}(t) - f(t, u(t), \theta)\|^2 dt\,.
$$
Given a parameter \(\theta\), we now have to find a path \(u(t)\) with some initial condition \(u(0) = u_0\)
that minimizes the action functional
$$
\mathcal{S}[u] = \int_0^T \left(\frac{\kappa}{2} \| u(t) - U(t) \|^2 + \frac{\lambda}{2} \| \dot{u}(t) - f(t, u(t), \theta) \|^2 \right) dt\,.
$$
This is a <a href="https://en.wikipedia.org/wiki/Calculus_of_variations">variational problem</a>,
and can be solved in terms of Hamilton's equations.
First, we have to transform the Lagrangian (the integrand of the action functional)
$$
\mathcal{L}(t, u, \dot{u}) = \frac{\kappa}{2} \| u(t) - U(t) \|^2 + \frac{\lambda}{2} \| \dot{u}(t) - f(t, u(t), \theta) \|^2
$$
<a href="https://en.wikipedia.org/wiki/Hamiltonian_mechanics">into a Hamiltonian</a> \(\mathcal{H}\).
For this, we define the conjugate momentum
$$
p = \frac{\partial \mathcal{L}}{\partial \dot{u}} = \lambda (\dot{u} - f(t, u, \theta))\,.
$$
The Hamiltonian is then equal to
$$
\mathcal{H}(t, u, p) = \langle \dot{u}, p \rangle - \mathcal{L}(t, u, \dot{u}) = \langle \lambda^{-1} p + f(t, u, \theta) , p \rangle - \frac{\kappa}{2} \|u-U\|^2 - \frac{\lambda^{-1}}{2} \|p\|^2
$$
which simplifies to
$$
\mathcal{H}(t, u, p) = \frac{\lambda^{-1}}{2} \|p\|^2 + \langle f(t, u, \theta), p\rangle - \frac{\kappa}{2} \|u - U\|^2
$$
The path \(u\) that minimizes the action functional then has to satisfy Hamilton's equations:
$$
\dot{u} = \frac{\partial \mathcal{H}}{\partial p} = f(t, u, \theta) + \lambda^{-1} p
$$
and
$$
\dot{p} = -\frac{\partial \mathcal{H}}{\partial u} = \kappa (u-U) - \left(\frac{\partial f}{\partial u}\right)^T p
$$
together with the boundary conditions \(u(0) = u_0\) and \(p(T) = 0\).
More details and a much more general derivation can be found in
<a href="https://projecteuclid.org/euclid.ejs/1452004955">the paper</a> by
Clairon and Brunel. In a future blog post I want apply the tracking method to the
predator-prey example.
Christiaan van Dorphttp://www.blogger.com/profile/09480120443706387617noreply@blogger.com1tag:blogger.com,1999:blog-8385427779430646568.post-2676804063008131322019-11-15T18:02:00.000-08:002019-11-15T18:03:35.325-08:00My Matplotlib Cheatsheet (#1)I find myself Googling the same matplotlib/pyplot queries over and over again.
To make life easy, I started collecting some of them in a series of blog posts.
<br/>
In this first post, I'll look at subplots and annotations.
<br/><br/>
<h2>Turn off axes </h2>
<br/>
We can remove axes, ticks, borders, etc. with <code>ax.axis('off')</code>.
This can be used to remove unnecessary subplots created with the <code>subplots</code> function
from <code>pyplot</code>,
for instance, when we only want to use the upper triangle of the grid. Here's an example:
<br/><br/>
<script src="https://gist.github.com/chvandorp/099bbc6a6056fd7a477885a1b1d911e0.js"></script>
<br/>
The result is:
<br/>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEinv5yALaZACu-qQ7s2SPC6FEdTtyd2hyXl27VR1yC3X8YikhN3YjW7qPcslEiH0Y1VmGIDnnyrDJUyoHRAKsnfrkSNSrtRiWn7BTYrDQwZF7MZx7Ux5udev4uhMnzpNnA3s6AGAdeMXz2l/s1600/axis-off.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEinv5yALaZACu-qQ7s2SPC6FEdTtyd2hyXl27VR1yC3X8YikhN3YjW7qPcslEiH0Y1VmGIDnnyrDJUyoHRAKsnfrkSNSrtRiWn7BTYrDQwZF7MZx7Ux5udev4uhMnzpNnA3s6AGAdeMXz2l/s320/axis-off.png" width="320" height="320" data-original-width="1374" data-original-height="1373" /></a></div>
<br/><br/>
<h2>Share axes between subplots after plotting </h2>
<br/>
An alternative way of making a ragged array of subplots makes use of <code>gridspec</code>.
The problem here is that it is a bit more difficult to share x and y axes.
Of course <code>add_subplot</code> has keyword arguments <code>sharex</code> and <code>sharey</code>,
but then we have to distinguish between the first and subsequent subplots. A better solution is the
<code>get_shared_x_axes()</code> method. Here's an example:
<br/><br/>
<script src="https://gist.github.com/chvandorp/5b1c8c5febbf5fc086adcd9db663dae6.js"></script>
<br/>
The result is:
<br/>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrC20T4i6qnzq1oNsMnRTx4wMBz9FEW6cPkTyTCTASXPGx3kzJtgUPhm3SP970tGdmxxsUG86N1qvVnuPxlRg7Z82FI7gRplnFoT-KLPLxL6Ph-tEwlqfYnqCh87XurAecMfo2fUQRhheO/s1600/distorted-lissajous.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrC20T4i6qnzq1oNsMnRTx4wMBz9FEW6cPkTyTCTASXPGx3kzJtgUPhm3SP970tGdmxxsUG86N1qvVnuPxlRg7Z82FI7gRplnFoT-KLPLxL6Ph-tEwlqfYnqCh87XurAecMfo2fUQRhheO/s400/distorted-lissajous.png" width="400" height="386" data-original-width="1188" data-original-height="1147" /></a></div>
<br/>
<h2>Hybrid (or blended) transforms for annotations</h2>
<br/>
When you annotate a point in a plot, the location of the text is often relative to the data
in one coordinate, but relative to the axis (e.g. in the middle) in the other.
I used to do this with inverse transforms, but it turns out that there is a better way:
the <code>blended_transform_factory</code> function from <code>matplotlib.transforms</code>.
<br/>
Suppose that I want to annotate three points in three subplots. The arrows should point to these three
points, and I want the text to be located above the point, but the text in the 3 subplots
has to be vertically aligned.
<br/></br>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjY0SdhBWP0h4sPjFA-FWhx3_0TPmbnZrmWqWlDSUgrOXJ8pJeL7r7rTTnZBNkFqIymosMnERILpe46XoeYNDNJG4_kKdJr50bHMjHaBSkcfpcHIZlWL4vrHF6tXYkDodcnxyj5JAOPjxtA/s1600/annotations.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjY0SdhBWP0h4sPjFA-FWhx3_0TPmbnZrmWqWlDSUgrOXJ8pJeL7r7rTTnZBNkFqIymosMnERILpe46XoeYNDNJG4_kKdJr50bHMjHaBSkcfpcHIZlWL4vrHF6tXYkDodcnxyj5JAOPjxtA/s400/annotations.png" width="400" height="147" data-original-width="1600" data-original-height="588" /></a></div>
<br/><br/>
Notice that the y-axes are not shared between the subplots!
To accomplish the alignment, we have to use the <code>annotate</code> method with a custom <code>transform</code>.
<br/><br/>
<script src="https://gist.github.com/chvandorp/9fdde7b9b5a21ddd2708148c8e912939.js"></script>
Christiaan van Dorphttp://www.blogger.com/profile/09480120443706387617noreply@blogger.com1tag:blogger.com,1999:blog-8385427779430646568.post-72969010582943310042019-11-11T11:48:00.000-08:002019-11-11T11:48:03.986-08:00Using constant Stan functions for parameter array indices in ODE models<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgfqKLYblw5Y5hRLDpoyVIFG5xxFzZ4kxdqXKsJEC0zVOtBeh56E_yN2P6VgNV2UDQo71Ggf0AJyQPpmOZs1MvO3bnW1709u2bTvK_1KzulqWtBjIF6FG-YTDbAnkYgn1Jd6pV7BeqQC8pq/s1600/LV-model-fit.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgfqKLYblw5Y5hRLDpoyVIFG5xxFzZ4kxdqXKsJEC0zVOtBeh56E_yN2P6VgNV2UDQo71Ggf0AJyQPpmOZs1MvO3bnW1709u2bTvK_1KzulqWtBjIF6FG-YTDbAnkYgn1Jd6pV7BeqQC8pq/s400/LV-model-fit.png" width="400" height="278" data-original-width="1272" data-original-height="885" /></a></div>
<br/>
Stan's ODE solver has a fixed signature. The state, parameters and data has to be given as arrays.
This can get confusing during development of a model, since parameters can change to constants (i.e. data),
the number of parameters can change and even the state variables could change. Errors can easily creep in
since the user has to correctly change all indices in the ODE function, and in (e.g.) the <code>model</code> block.
A nice way to solve this is using constant functions. Constants in Stan are functions without an argument that
return the constant value. For instance <code>pi()</code> returns 3.14159265... The user can define such constants
in the <code>functions</code> block. Suppose that we want to fit the <a href="https://en.wikipedia.org/wiki/Lotka%E2%80%93Volterra_equations">Lotka-Volterra</a> model to data. The system of ODEs is given by
\[ \frac{dx}{dt} = ax - bxy\,,\quad \frac{dy}{dt} = cbxy - dy \]
and so we have a 2-dimenional state, and we need a parameter vector of length 4.
In the <code>function</code> block, we will define a function <code>int idx_a() { return 1; }</code>
that returns the index of the parameter \(a\) in the parameter vector, and we define similar functions
for the other parameters. The full model can be implemented in Stan as shown below. The data <code>Preys</code>
and <code>Predators</code> is assumed to be Poisson-distributed with mean \(Kx\) and \(Ky\), respectively,
for some large constant \(K\).
I fitted the model to some randomly generated data, which resulted in figure above.
<br />
<br />
<script src="https://gist.github.com/chvandorp/d76c2b8f1e4e6bb961dc6cc29bce2d83.js"></script>
<br />
Of course, this is still a low-dimensional model with a small number of parameters,
but I found that even in a simple model, defining parameter indices in this way keeps everything concise.
<br />
I used the following Python script to generate the data and interface with Stan.
<br />
<br />
<script src="https://gist.github.com/chvandorp/42e7b8a830b6233786a60d9d6dfdc64e.js"></script>
<br />
The following Figure show the parameter estimates together with the "real" parameter values
<br/>
<br/>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEis1xSv-DxGarumFoL74FxrtSHUI4rmFCKseJHAiJKVaIhR-AoaUac-pJo19fn42COc0RJC6zuoS5Ka1Z9R5UHVuOEgI_FYxbixOn5HzywktOdCE2xVZX_7lrsHddvbWYIVAPEYKJf0M_Uu/s1600/LV-model-estimates.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEis1xSv-DxGarumFoL74FxrtSHUI4rmFCKseJHAiJKVaIhR-AoaUac-pJo19fn42COc0RJC6zuoS5Ka1Z9R5UHVuOEgI_FYxbixOn5HzywktOdCE2xVZX_7lrsHddvbWYIVAPEYKJf0M_Uu/s400/LV-model-estimates.png" width="400" height="185" data-original-width="1263" data-original-height="583" /></a></div>
Christiaan van Dorphttp://www.blogger.com/profile/09480120443706387617noreply@blogger.com0tag:blogger.com,1999:blog-8385427779430646568.post-54549038267748193032019-08-03T16:01:00.000-07:002019-08-03T16:28:39.221-07:00Tracking Jupyter notebooks with GitRecently, I came across a
<a href="https://doi.org/10.1371/journal.pcbi.1007007">PLOS Ten Simple Rules paper</a>
about Jupyter notebooks. Rule number 6 is about version control and suggests the use of
<a href="https://github.com/jupyter/nbdime">nbdime</a> to merge notebooks, or saving the notebooks
as python scripts and tracking those with your version control system.
I experienced problems with using Git to track changes in Jupyter notebooks a while ago,
and have been using the following solution.
<br/>
<br/>
For me, the largest nuisance is that figures are stored as very long strings in the output of cells.
This makes it almost impossible to manually resolve conflicts while merging two branches.
The solution is simple: just clear the output of the cells before committing the notebook.
I'm doing this with a simple script (which I found somewhere on the Internet).
<br/>
<br/>
<script src="https://gist.github.com/chvandorp/a3ab32d281faad1e524ed766a16ca2be.js"></script>
<br/>
The script <code>clear_output_ipynb.py</code> lives in the same folder (called <code>notebooks</code>) as my Jupyter notebooks. I don't track changes in the <code>.ipynb</code> files, but have "clean" copies of the notebooks
(with extension <code>.ipynb.cln</code>) that are part of the Git repository.
To make life easy, I have two makefiles in my project folder called <code>cln.makefile</code> and
<code>nbs.makefile</code>. Before I stage the changes in my notebooks, I first run
<pre>
$ make -f cln.makefile
</pre>
which runs the script <code>clear_output_ipynb.py</code> for each notebook in my <code>notebooks</code> folder.
<br/>
<br/>
<script src="https://gist.github.com/chvandorp/971dd80ccae684fc1067680eaa8eafa2.js"></script>
<br/>
After I pull changes from a remote repository, or switch to another branch, I have to copy
all <code>.ipynb.cln</code> files to <code>.ipynb</code> files. For this I have another makefile,
and so I run
<pre>
$ make -f nbs.makefile
</pre>
before using and modifying the notebooks.
<br/>
<br/>
<script src="https://gist.github.com/chvandorp/95a07fd860cfcd508d5bd1fb4c3e6439.js"></script>
<br/>
Of course, sometimes I forget to clean the notebooks before committing, or I forget to
make the <code>.ipynb</code> files. I've tried to automate the process of cleaning and
copying with Git "hooks", but I have not been able to make that work. If somebody knows how, let me know!
Christiaan van Dorphttp://www.blogger.com/profile/09480120443706387617noreply@blogger.com0tag:blogger.com,1999:blog-8385427779430646568.post-83296659926369235442019-07-21T19:22:00.000-07:002019-07-21T19:50:33.692-07:00Neat encoding of censored data in StanSuppose that some of the observations in your data set are left- or right-censored.
A way to handle such data in likelihood-based models is to integrate out the censored observations.
In that case you have to replace the likelihood of an observation given the parameters with
a cumulative likelihood. In <a href="https://mc-stan.org/">Stan</a>, you can replace a sampling statement as <code>X ~ normal(mu, sigma)</code> with <code>target += normal_lcdf(X | mu, sigma)</code> when \(X\) is in fact the upper bound of a left-censored observation, that is \(\mathcal{N}(\mu, \sigma^2)\) distributed. For a right-censored
observation you have to use the complementary cumulative distribution function <code>normal_lccdf(X | mu, sigma)</code>.
<br/>
<br/>
A way to encode censoring in a Stan model such that all types of data can be treated equally is by using a custom probability distribution function <code>censored_normal_lpdf</code> that takes an additional argument <code>cc</code> that determines the type of censoring. Below we use the following codes:
<br/>
<table>
<tr>
<td>Uncensored</td>
<td>0</td>
</tr>
<tr>
<td>Left-censored</td>
<td>1</td>
</tr>
<tr>
<td>Right-censored</td>
<td>2</td>
</tr>
<tr>
<td>Missing data</td>
<td>3</td>
</tr>
</table>
<br/>
The function <code>censored_normal_lpdf</code> is defined in the <code>functions</code> of the Stan model below.
In addition to <code>N</code> <code>Observations</code>, we add <code>N</code> <code>CensorCodes</code> that
must be either 0, 1, 2, or 3 in the <code>data</code> block.
In the <code>model</code> block, we now can use the
sampling statement <code>Observations[n] ~ censored_normal(mu, sigma, CensorCodes[n])</code>,
because this is just syntactic sugar for
<code>target += censored_normal_lpdf(Observations[n] | mu, sigma, CensorCodes[n])</code>.
Vectorization unfortunately does not work with this method.
<br/>
<br/>
<script src="https://gist.github.com/chvandorp/d2d9887f017ef7015015d5c2f3959b80.js"></script>
<br/>
To demonstrate the <code>censored_normal</code> distribution,
I added the following python script that generates some random data
and compiles and runs the Stan model.
<br/>
<br/>
<script src="https://gist.github.com/chvandorp/db9d0248e69588a5aad4bf9ffc9c61f0.js"></script>
<br/>
The output of the script will be something like
<pre>
mu: -0.018, 95% CrI: [-0.176, 0.133]
sigma: 1.071, 95% CrI: [0.953, 1.203]
</pre>Christiaan van Dorphttp://www.blogger.com/profile/09480120443706387617noreply@blogger.com0tag:blogger.com,1999:blog-8385427779430646568.post-14808412572428529982019-05-18T09:28:00.000-07:002019-05-18T09:28:32.864-07:00A thread-safe global random number generator using C++ <random> and <mutex>In a <a href="https://tbz533.blogspot.com/2018/11/the-simplest-example-on-c11.html">previous post</a>,
Sandro showed how to use the C++ <random> header to define a global random number generator (RNG).
If at some point the user decides to parallelize their program, they would have to make the global RNG
thread-safe. This requires only a small modification.
<br/>
<br/>
I have copied Sandro's code and added thread-safety using the C++ (with standard at least c++11) headers <thread> and <mutex>.
In the header file <code>thread_safe_random.hpp</code>, the functions <code>RANDOM</code> and <code>Seed</code> are declared.
<br/>
<br/>
<script src="https://gist.github.com/chvandorp/96e803158754caa6dc497da0b30b8366.js"></script>
The C++ file <code>thread_safe_random.cpp</code> defines the functions <code>RANDOM</code> and <code>Seed</code>,
and the RNG and uniform distribution as before. Additionally, a mutex <code>my_rng_mutex</code> is defined to guard <code>my_rng</code>.
I am using a <code>lock_guard</code> to lock and release the mutex. When one thread of execution calls the function <code>RANDOM</code>,
it acquires the mutex. Any other thread that calls <code>RANDOM</code>, has to wait until the mutex is released.
<br/>
<br/>
<script src="https://gist.github.com/chvandorp/f11a88eda85b1ead6173d768886cb78c.js"></script>
In order to demonstrate <code>thread_safe_random</code>, I created <code>max</code> threads in the <code>main</code> function
that use the auxiliary function <code>fetch_random_number</code> to call <code>RANDOM</code>.
<br/>
<br/>
<script src="https://gist.github.com/chvandorp/589286990342a11a3535fb44f969cf83.js"></script>
The result should look like
<pre>
$ g++ --std=c++11 -pthread main.cpp thread_safe_random.cpp -o test_safe_random
$ ./test_safe_random
0.284779
0.243487
0.161906
0.338338
0.235765
0.502853
0.389262
0.165401
0.244871
0.194046
</pre>
However, the order of these numbers can change each time you execute the program.
This means that the program is no longer deterministic (although we can argue about what it means to be deterministic or not),
because the OS determines the order in which the threads call <code>RANDOM</code>.
Another problem with this implementation is that it will be slow when each thread needs many random numbers.Christiaan van Dorphttp://www.blogger.com/profile/09480120443706387617noreply@blogger.com0tag:blogger.com,1999:blog-8385427779430646568.post-35294309508810991122019-05-17T15:53:00.000-07:002019-05-17T15:53:30.127-07:00Copying polymorphic C++ objects using an inherited dup() methodIn order to copy polymorphic objects in C++, it can be convenient to equip the base and derived classes with a <code>.dup()</code> method (or <code>.clone()</code> method) that returns a pointer to a copy of the (derived) object. When you have a large amount of different derived classes, overriding the base class's <code>.dup()</code> method for each of them can be a bit of a nuisance. In order to solve this, I sometimes use an "intermediate" class template that can be inherited instead of the base class to provide the <code>.dup()</code> method. This solution is not perfect, because it does not provide the possibility of <a href="https://en.wikipedia.org/wiki/Covariant_return_type">covariant return types</a>.
<br/>
<br/>
The class template <code> Cloneable </code> is defined as follows:
<br/>
<script src="https://gist.github.com/chvandorp/1324022eebd092d611299b93cb145fb1.js"></script>
<br/>
In the following code snippet, the use of the <code> Cloneable </code> is demonstrated:
<br/>
<script src="https://gist.github.com/chvandorp/a3836d1a62e6d269739f43456579b5f6.js"></script>
If someone has a better way to do this, let me know.Christiaan van Dorphttp://www.blogger.com/profile/09480120443706387617noreply@blogger.com0tag:blogger.com,1999:blog-8385427779430646568.post-65528780107456887842018-11-01T10:18:00.000-07:002018-11-05T09:05:22.558-08:00The simplest example on c++11 <random>For some reasons, the internet lacks a beginner-level tutorial on how to use the c++11 <random> library when your code is made up of different files.<br />
This is not a tutorial, this is just what I managed to put together... <br />
<br />
I want <br />
<ul>
<li>one global random engine (e.g. the mersenne twister)</li>
<li>one real-value uniform distribution in the interval [0,1), let's call this function RANDOM()</li>
<li>to be able to call RANDOM() from anywhere in my code. </li>
</ul>
<br />
There are three files: random.h, random.cpp and main.cpp (any additional .cpp file that includes random.h can use the function RANDOM() ). <br />
The content of the files is as follow:<br />
<br />
random.h <br />
<pre class="brush: cpp">// random.h
#ifndef _RND_HH_
#define _RND_HH_
#include <random> //--- FOR THIS YOU NEED c++11, enable with -std=c++11 flag
// Declare engine - single instance for the whole code
//extern std::mt19937 my_rng;
extern std::mt19937_64 my_rng;
//Declare distributions:
extern std::uniform_real_distribution<double> my_unif_real_dist;
//extern std::uniform_int_distribution<double> my_unif_int_dist;
int Seed(int seed);
double RANDOM();
#endif
// end of random.h
</pre>
random.cpp <br />
<pre class="brush: cpp">//random.cpp
#include <stdio.h>
#include <iostream>
#include <chrono>
#include "random.h"
//std::mt19937 my_rng {};
std::mt19937_64 my_rng {}; // Defines an engine
std::uniform_real_distribution<double> my_unif_real_dist(0., 1.); //Define distribution
// Function to seed the random number generator from main file
// useful if you want the seed from a parameter file
// a negative value for seed gets you a random seed
// outputs the seed itself
int Seed(int seed)
{
if (seed < 0) {
long rseed=static_cast<long unsigned int>(std::chrono::high_resolution_clock::now().time_since_epoch().count());
std::cerr << "Randomizing random generator, seed is "<<rseed<<std::endl;
my_rng.seed(rseed);
return rseed;
} else {
std::cerr << "User-provided seed is "<<seed<<std::endl;
my_rng.seed(seed);
return seed;
}
}
// This is the function to call if you want a random number in the interval [0,1)
double RANDOM(void)
{
return my_unif_real_dist(my_rng);
}
// end of random.cpp
</pre>
And finally for the main.cpp file <br />
<pre class="brush: cpp">//main.cpp
#include <stdio.h>
#include <iostream>
#include "random.h"
int main()
{
int max = 10;
int my_seed = 235;
int my_new_Seed = Seed(my_seed);
for(int i=0; i<max;++i){
double one_random_number = RANDOM();
std::cerr << RANDOM() << std::endl;
}
}// end of main.cpp
</pre>
That's it! This is really all you need.<br />
Compile it with: <br />
g++ -std=c++11 random.cpp main.cpp -o my_pretty_random_numbers <br />
and happy random number generation.<br />
<br />
By the way, this seems to work fine on my machine (running Ubuntu 18). <br />
Can this be improved in simplicity and/or performance? Are there bugs?<br />
Please let me know!Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-8385427779430646568.post-3964369538460214222018-06-03T07:03:00.000-07:002018-06-04T09:08:55.493-07:00Easy (Bayesian) multidimensional scaling with Stan<div dir="ltr" style="text-align: left;" trbidi="on">
Multidimensional scaling (MDS) is a data visualization technique in which the dimension of the data is reduced in a non-linear way. The data is represented as a \(N\times N\) distance matrix \((d_{ij})_{ij}\), and \(N\) points \(x_i\) in
a \(D\) dimensional space (typically \(D=2\)) are chosen such that the Euclidean distances \(\|x_i - x_j\|\)
resemble the input distances \(d_{ij}\) "as good as possible".
<br />
<br />
In metric MDS, an objective function \(E(x) = \sum_{1\leq i < j \leq N} (d_{ij} - \|x_i - x_j\|)^2\) is defined
that needs to be minimized. For different flavors of MDS, this objective function is defined differently.
In order to minimize the objective function, e.g. the conjugate gradient descent method is used.
This method requires that one calculates the gradient \(\nabla E\) of the objective function. Of course, this is not
so difficult is the case of metric MDS, but more difficult objective functions might require more effort.
Enter <a href="http://mc-stan.org/">Stan</a>.
<br />
<br />
Stan uses automatic differentiation for Hamiltonian Monte Carlo, but Stan can also be used for maximum likelihood. Hence, if we can formulate the MDS problem in terms of a likelihood function, we can let Stan do all the work. The parameters of the model are the \(x_i\), the data is given by the distances \(d_{ij}\). If we assume that given the parameters, the data is distributed as
\[
d_{ij} \sim \mathcal{N}(\|x_i - x_j\|, \sigma^2)\,,
\]
then maximizing the (log) likelihood is equivalent to minimizing the function \(E\).
The parameter \(\sigma^2\) is merely a nuisance parameter that needs to be estimated as well.
<br />
<br />
<h3> An implementation of MDS in the Stan programming language </h3>
Implementing MDS in Stan is fairly straightforward,
but there are a few snags that we should be aware of.
First, if \(x\) solves the MDS problem, then also any Euclidean transformation of \(x\) is a solution.
Hence, the model as stated above has too many parameters. We solve this by fixing the first
point at the origin, restricting the next point to a \(1\)-dimensional half space, the third point
to a \(2\)-dimensional half space et cetera. The last \(N-D-1\) points are unrestricted.
In Stan, we can accomplish this by using a <code> cholesky_factor_cov </code> matrix:
A positive-definite lower-trangular matrix. We then use the <code> transformed parameters </code>
block to concatenate the points together into a single matrix.
<br />
<br />
Secondly, the data that we use in the example below is highly censored.
Many of the distances are missing, and some are right censored. In such a case
MDS can be used to infer the missing distances, and not merely visualize the data.
The data that is passed to Stan, therefore, is a list of edges, a list of distances,
and a list of codes that determine the type of censoring.
<br />
<br />
Thirdly, as the title of this post suggests, we will use Stan to do some sort of
Bayesian MDS. In this case, we will sample a collection of "maps" \(x\) from a posterior distribution,
that gives information about the location of each point, but also the uncertainty of this location.
In this case, the fact that we restrict the first \(D+1\) points, comes back to bite us, as the
uncertainty of these points will be different than the unrestricted points.
Furthermore, it might be hard to compare the individual maps to one another,
and for instance compute sensible mean locations of the points, as some maps may be
"twisted" more than others.
Therefore, we use the <code> generated quantities </code> block to center and rotate
(cf. PCA) the sampled maps.
<br />
<br />
<script src="https://gist.github.com/chvandorp/dcfc60b019b1fd369938adbe9fb3f60c.js"></script>
<br />
<br />
<h3> Example: Antigenic cartography of the influenza virus </h3>
An interesting application of MDS is antigenic cartography of the influenza virus.
Influenza virus circumvents human antibody responses by continually evolving its surface proteins,
in particular, hemagglutinin (HA). This is known as antigenic drift.
In order to decide whether flu vaccines need to be updated,
the hemagglutination inhibition (HI) assay is used to determine if the induced antibody response is still
effective against future strains. The titers measured in the HI assay can be used to define "distances"
between antisera and antigens. Using MDS, the antisera and antigens can be drawn into a "map",
that shows the antigenic drift of the virus. This was done by
<a href="https://doi.org/10.1126/science.1097211"> Smith et al. </a> in 2004.
Conveniently, the data used for their map is <a href="http://antigenic-cartography.org/Science-2004/"> available online</a>.
This table gives HI titers \(H_{ij}\) of antigen \(i\) and antiserum \(j\).
A small titer corresponds to a large distance, which are defined as \(d_{ij} = \log_2(H_{\max,j}) - \log_2(H_{ij})\),
where \(H_{\max,j} = \max_{k} H_{kj}\).
As an example, I recreated their antigenic map using the Stan model above, and a Python script below.
<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiWqr2NjX3D0CxHRmEvJ5y2aUT2W4tg0Fe8cEXwuQzTfk7FMXGyWyI7MuExmzlDTHhhz0UXayZhCJxrsdBsi6r8HRlduVA6xacFnQBrjVPoKVDoKwbwESuvY9tO33E5GJ4AVIMLCfKihGee/s1600/mds-iav-ag-cart.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiWqr2NjX3D0CxHRmEvJ5y2aUT2W4tg0Fe8cEXwuQzTfk7FMXGyWyI7MuExmzlDTHhhz0UXayZhCJxrsdBsi6r8HRlduVA6xacFnQBrjVPoKVDoKwbwESuvY9tO33E5GJ4AVIMLCfKihGee/s640/mds-iav-ag-cart.png" width="424" height="640" data-original-width="1061" data-original-height="1600" /></a></div>
<br />
The white squares denote the relative positions of the antisera in the "antigenic space", while the colored circles
represent the antigens. The colors map to the years in which the infuenza strains were isolated.
<br />
<br />
<h3> Bayesian multidimensional scaling </h3>
For antigenic cartography of IAV, Bayesian MDS has been introduced by <a href="https://doi.org/10.7554/eLife.01914"> Bedford et al.</a>,
who used multiple HI assay results per antigen/antiserum pair to incorporate the uncertainty of these measurements in their antigenic map. Moreover, they were able to use genetic and temporal information about the antigens (i.e. the RNA sequences of HA and their isolation dates) to inform the position of the antigens and antisera on the map. We will not go this far in this post, but since we have already formulated the MDS algorithm in Stan, we might as well make a "Bayesian" antigenic map. This can give some insight into the uncertainty of the positions of the antigens and antisera. This is not unlike the confidence areas as drawn by Smith et al. (the ellipsoid shapes). The result is given by the following figure.
<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixl2-7TpVMa89QPpeJCHOH8xtnJWaaZliW7Le1DsRfV-a1urT6qBxsivqmCrJHto8qThegP6oEwOxnyOU1rg2BpOt68mZ9wtIsOlmqsnszqz9PwQzCjsJ7Bt1SkWqm21yAfYTroN_3t00k/s1600/bmds-iav-ag-cart.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixl2-7TpVMa89QPpeJCHOH8xtnJWaaZliW7Le1DsRfV-a1urT6qBxsivqmCrJHto8qThegP6oEwOxnyOU1rg2BpOt68mZ9wtIsOlmqsnszqz9PwQzCjsJ7Bt1SkWqm21yAfYTroN_3t00k/s640/bmds-iav-ag-cart.png" width="424" height="640" data-original-width="1061" data-original-height="1600" /></a></div>
<br />
Again, squares indicate antisera and colored circles the antigens. All the individual MCMC samples are represented by the grey dots.
The MCMC samples for each antigen or antiserum are used to draw a two-dimensional error bar (i.e. ellipse) around the mean location.
<br />
A Python script for parsing the HI titer data, compiling and running the Stan model and drawing the maps is added below.
For it to work, you will need to download the <code> mds_model.stan </code> file and make a csv file called
<code> baselinemap.csv </code> with the <a href="http://antigenic-cartography.org/Science-2004/"> HI table </a>
<br />
<br />
<script src="https://gist.github.com/chvandorp/165eb81476bd2ea70a8e19b01686a011.js"></script>
</div>
Christiaan van Dorphttp://www.blogger.com/profile/09480120443706387617noreply@blogger.com2tag:blogger.com,1999:blog-8385427779430646568.post-37974347001693503712018-02-06T09:08:00.000-08:002018-02-10T01:30:35.690-08:00Computing q-values with C++<div dir="ltr" style="text-align: left;" trbidi="on">
When looking for associations between features \(i = 1,\dots, m\) and some trait,
it is often necessary to have some sort of multiple-testing correction.
A very conservative method is the Bonferroni correction, that minimizes the family-wise error rate (FWER),
but at the cost of many false negatives.
This is not desirable when one wants to discover features or associations,
and therefore other methods have been developed.
One particularly intuitive method is based on the false discovery rate (FDR) and uses so-called \(q\)-values,
which are (under certain conditions) elegantly analogous to \(p\)-values.
<br/>
<h2> False discovery rate </h2>
Let \(S\) be the number of features called significant, and \(F\) the number of false positives
among the significant features (i.e. false discoveries).
In <a href="https://projecteuclid.org/download/pdf_1/euclid.aos/1074290335">an article by John D. Storey</a>,
The (positive) false discovery rate is defined as
\[ {\rm pFDR} := \mathbb{E}\left[\left.\frac{F}{S}\right| S > 0 \right]\,. \]
Hence, the pFDR is the expected fraction of false positives among the features that are called significant.
The condition \(S > 0\) ensures that it is well defined.
<br/>
<br/>
In the case of hypothesis testing, one typically has a test statistic \(T\),
and one wants to test if the null hypothesis is true (\(H = 0\)),
or rather that the alternative hypothesis is true (\(H = 1\)).
The statistical model specifies the distribution of \(T | H = 0\),
and the null hypothesis is rejected when the realization of \(T\) falls into
a pre-defined significance region \(\Gamma\).
<br />
When testing multiple features, we typically have a sequence \((T_i, H_i)_{i=1}^m\),
here assumed to be identically distributed and independent.
The \({\rm pFDR}\) then depends on \(\Gamma\):
\[ {\rm pFDR}(\Gamma) = \mathbb{E}\left[\left.\frac{F(\Gamma)}{S(\Gamma)}\right| S(\Gamma) > 0 \right] \,, \]
where \(F(\Gamma) := \#\{i : T_i \in \Gamma \wedge H_i = 0 \} \) and \(S(\Gamma) := \#\{i : T_i \in \Gamma\}\).
Storey derives that under certain conditions, we can write
\[ {\rm pFDR}(\Gamma) = \mathbb{P}[H = 0 | T \in \Gamma] = \frac{\mathbb{E}[F(\Gamma)]}{\mathbb{E}[S(\Gamma)]} \]
<br/>
<h2> The q-value </h2>
Let \(\{\Gamma_{\alpha}\}_{\alpha=0}^1\) be a nested family of significance regions.
That is \(\Gamma_{\alpha} \subseteq \Gamma_{\alpha'}\) whenever
\(\alpha \leq \alpha'\) and \(\mathbb{P}[T \in \Gamma_{\alpha} | H=0] = \alpha\).
For instance, if \(T | H = 0 \sim \mathcal{N}(0,1)\),
then we could choose \(\Gamma_{\alpha} = [z_{\alpha}, \infty)\),
where \(z_{\alpha} = \Phi^{-1}(\alpha)\), with \(\Phi\) the CDF if \(\mathcal{N}(0,1)\).
<br />
The \(q\)-value of a realization \(t\) of \(T\) is then defined as
\[ q(t) = \inf_{\Gamma_{\alpha} | t \in \Gamma_{\alpha}} {\rm pFDR}(\Gamma_{\alpha}) \]
We can now give the above-mentioned analogy between \(p\)-values and \(q\)-values.
The \(p\)-value is defined as:
\[ p(t) = \inf_{\{\Gamma_{\alpha} : t \in \Gamma_{\alpha}\}} \mathbb{P}[T \in \Gamma_{\alpha} | H = 0] \]
While under the right conditions, the \(q\)-value can be written as:
\[ q(t) = \inf_{\{\Gamma_{\alpha} : t \in \Gamma_{\alpha}\}} \mathbb{P}[H = 0 | T \in \Gamma_{\alpha} ] \]
<h2>Computing q-values</h2>
In order to compute \(q\)-values, given a sequence of \(p\)-values, we follow the steps given in
<a href="https://doi.org/10.1073/pnas.1530509100">a paper by Storey and Tibshirani</a>
In this scenario, the \(p\)-value plays the role of the realization \(t\) of the statistic \(T\).
Under the null hypothesis, these \(p\)-values are uniformly distributed.
As a family of significance regions, we simply take \(\Gamma_{\alpha} = [0,\alpha]\),
and write for instance \(S(\alpha) := S(\Gamma_{\alpha})\).
<br/>
<br/>
First, we have to estimate \(\mathbb{E}[S(\alpha)]\), for which we use \(\#\{i : p_i \leq \alpha\}\),
and we estimate \(\mathbb{E}[F(\alpha)]\) with \(m \cdot \hat{\pi}_0 \cdot \alpha\),
where \(\hat{\pi}_0\) is an estimate for \(\pi_0 = \mathbb{P}[H=0]\).
<br/>
The most difficult part of computing \(q\)-values is estimating \(\pi_0\).
An often used estimate is the average \(p\)-value, but we can do a little bit better
by making use of the empirical CDF of the \(p\)-values.
The figure below shows a "typical" histogram of \(p\)-values, where the \(p\)-values
sampled from the alternative distribution are given by the gray bars.
The marginal distribution of \(p\)-values becomes relatively flat towards the right,
where most \(p\)-values should come from the null distribution, which is \({\rm Uniform}(0,1)\).
The right panel of the figure shows a "rotated" empirical CDF of the \(p\)-values,
i.e.
\[ x \mapsto R(x) = 1-{\rm CDF}(1-x)\,, \]
and the fact that large \(p\)-values should be uniformly distributed
is resembled by the fact that \(R(x)\) is a straight line for small \(x\).
The slope of this straight line is an estimate of \(\pi_0\).
In the C++ code below, I use <a href="https://www.gnu.org/software/gsl/doc/html/index.html">GSL</a>
to fit a line though \(R\) (indicated by the red line in the figure),
using a weight function \(x \mapsto (1-x)^2\) to give more precedence to small values of \(x\),
thereby mostly ignoring the non-straight part of \(R\).
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiT4e7qJgHLblyWX7xLWxh1T8jjwS80JsXoalfqltd8gab4RVlDjeRY-DcS7Y190KXlyH4PMXND2I_17edKb6t30ScSNlsQtCKGky5KX7ts4AvGWzLfJ_ux3j1-t5YeDm9fjb_G5MhyphenhyphenJ7CN/s1600/pvals.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiT4e7qJgHLblyWX7xLWxh1T8jjwS80JsXoalfqltd8gab4RVlDjeRY-DcS7Y190KXlyH4PMXND2I_17edKb6t30ScSNlsQtCKGky5KX7ts4AvGWzLfJ_ux3j1-t5YeDm9fjb_G5MhyphenhyphenJ7CN/s400/pvals.png" width="400" height="196" data-original-width="1600" data-original-height="785" /></a>
<br />
In this example, the marginal \(p\)-values are sampled from a mixture distribution \(\pi_0 {\rm Uniform}(0,1) + (1-\pi_0) {\rm Beta}(10,1)\),
where \(\pi_0 = 3/4\).
</div>
<br />
<br />
Now we sort the \(p\)-values such that \(p_1 \leq p_2 \leq \dots \leq p_m\),
such that \(\#\{j : p_j \leq p_i\} = i\)
and first determine the \(q\)-value corresponding to \(p_m\):
\[ q_m = \hat{\pi}_0 \cdot p_m \]
The \(q\)-value \(q_i\) corresponding to \(p\)-value \(p_i\) is computed as
\[ q_i = \min(\hat{\pi}_0 p_i m/i, q_{i+1}) \]
Recently, I had to implement this algorithm in C++. The function is given in the following header file,
and accepts a "keyed" list of \(p\)-values, where the key is used to identify the feature.
The function returns a keyed list of \(q\)-values.
<br />
<br />
<script src="https://gist.github.com/chvandorp/8ec2489d275a2b3f555c1e3911f9d8eb.js"></script>
<br />
<br />
The following code gives an example of how to use qvalues.hpp, and can be used for testing:
<br />
<br />
<script src="https://gist.github.com/chvandorp/06cc3cab9d8547b44ac71386c6e74a11.js"></script>
<br />
After compiling and running the program, the result should be something like:
<pre>
$ g++ -O3 -std=c++11 qvalues_test.cpp -lgsl -lgslcblas -o qvaltest
$ ./qvaltest 1728
true false
discoveries: 143 32
negatives: 712 113
realized FDR: 0.182857
</pre>
</div>Christiaan van Dorphttp://www.blogger.com/profile/09480120443706387617noreply@blogger.com0tag:blogger.com,1999:blog-8385427779430646568.post-4118484635242778982017-04-18T10:56:00.000-07:002017-04-19T05:20:28.140-07:00How to calculate WBIC with Stan<div dir="ltr" style="text-align: left;" trbidi="on">
In a <a href="http://tbz533.blogspot.nl/2016/05/computing-bayes-factors-with-stan-and.html">previous post</a>, I showed how to use <a href="http://mc-stan.org/">Stan</a> to calculate the Bayes marginal likelihood with the path sampling method.
For large models, this method can be computationally expensive, and therefore estimates for the marginal likelihood have been developed. The most famous one is the Bayesian Information Criterion (BIC), an estimate for the Bayes free energy.
Although the name suggests otherwise, the BIC is not a "true Bayesian" estimate, since it depends only on a point estimate, and not the entire posterior distribution:
\[
{\sf BIC} = -2 L(\hat{\theta}) + k \cdot \log(n)\,,
\]
where \(L\) is the log-likelihood function, \(\hat{\theta}\) is the maximum likelihood estimate of the models parameters, \(k\) is the dimension of \(\theta\) and \(n\) is the number of data points.
<br />
More importantly, the BIC does not work for singular models. As <a href="http://www.jmlr.org/papers/volume14/watanabe13a/watanabe13a.pdf">Watanabe points out</a>,
if a statistical model contains hierarchical layers or hidden variables then it is singular.
Watanabe, therefore, proposes a generalization of the BIC, the WBIC, that works for singular models as well,
and is truly Bayesian. This is not unlike AIC versus WAIC.
<br />
<br />
Suppose that you have some sort of MCMC algorithm to sample from the posterior distribution of your model,
given your data. When you want to compute the WAIC, you will just have to store the likelihoods of each observation,
given each sampled set of parameters. This can be done during an actual run of the MCMC algorithm.
For the WBIC, however, you will need to do a separate run at a different "temperature". In fact,
<a href="http://www.jmlr.org/papers/volume14/watanabe13a/watanabe13a.pdf">Watanabe gives the following formula</a>:
\[
{\sf WBIC} = -2\mathbb{E}_{\theta}^{\beta}[L(\theta)]\,,
\]
where \(\beta = 1/\log(n)\), and the notation \(\mathbb{E}_{\theta}^{\beta}[f(\theta)]\) expresses that
the expectation of \(f\) is taken with respect to the distribution with PDF
\[
\theta \mapsto \frac{\exp(\beta L(\theta)) \pi(\theta)}{\int\exp(\beta L(\theta)) \pi(\theta)d\theta}
\]
which equals the posterior distribution when \(\beta = 1\) and the prior distribution \(\pi\) when \(\beta = 0\).
<br />
<br />
Something very similar happens in the <a href="http://tbz533.blogspot.nl/2016/05/computing-bayes-factors-with-stan-and.html">path sampling example</a>, with the exception that the "path" is replaced by a single point.
In a recent paper about another alternative to the BIC, the Singular BIC (sBIC),
<a href="http://www.rss.org.uk/Images/PDF/publications/Drton-Plummer-Oct-16.pdf">Drton and Plummer</a>
point out the similarity to the <a href="https://en.wikipedia.org/wiki/Mean_value_theorem">mean value theorem</a>
for integrals. Using the path sampling method, the log-marginal-likelihood is computed from the following integral
\[
\log(P(D|M)) = \int_0^1 \mathbb{E}_{\theta}^{\beta}[L(\theta)]\, d\beta\,.
\]
Here, we write \(P(D|M)\) for the marginal likelihood of model \(M\), and data \(D\).
According to the mean value theorem
\[
\exists \beta^{\ast} \in (0,1): \log(P(D|M)) = \mathbb{E}_{\theta}^{\beta^{\ast}}[L(\theta)]\,,
\]
and Watanabe argues that choosing \(1/\log(n)\) for \(\beta^{\ast}\) gives a good approximation.
Notice that the mean value theorem does not provide a recipe to compute \(\beta^{\ast}\),
and that Watanabe uses rather advanced algebraic geometry to prove that his choice for \(\beta^{\ast}\)
is good; the mean value theorem only gives a minimal justification for Watanabe's result.
<br />
<br />
<h2>Implementing WBIC in Stan</h2>
Let us start with a very simple example: the biased coin model. Suppose that we have \(N\) coin flips,
with outcomes \(x \in \{0,1\}^N\) (heads = 1 and tails = 0), and let \(K\) denote the number of heads.
We compare the null model (the coin is fair) with the alternative model (the coin is biased) using \(\Delta {\sf WBIC}\)
and see if we get the "right" answer. Notice that for this simple example, we can compute the Bayes factor exactly.
<br />
<br />
In order to have a Stan model that can be used for both regular sampling, and estimating WBIC,
a sampling <code>mode</code> is passed with the data to determine the desired behavior.
In the <code>transformed data</code> block, a parameter <code>watanabe_beta</code> is then defined,
determining the sampling "temperature".
<br />
The actual model is defined in the <code>transformed parameters</code> block, such that <code>log_likes</code>
can be used in both the <code>model</code> block as the <code>generated quantities</code> block.
In stead of a sampling statement (<code>x[n] ~ bernoulli(p)</code>),
we have to use the <code>bernoulli_lpmf</code> function, which allows us to scale the log-likelihood
with <code>watanabe_beta</code> (i.e. <code>target += watanabe_beta * sum(log_likes)</code>).
<pre class="brush: cpp">
// coin_model.stan
data {
int<lower=3> N; // number of coin tosses
int<lower=0, upper=1> x[N]; // outcomes (heads = 1, tails = 0)
int<lower=0, upper=1> mode; // 0 = normal sampling, 1 = WBIC sampling
}
transformed data {
real<lower=0, upper=1> watanabe_beta; // WBIC parameter
if ( mode == 0 ) {
watanabe_beta = 1.0;
}
else { // mode == 1
watanabe_beta = 1.0/log(N);
}
}
parameters {
real<lower=0, upper=1> p; // probability of heads
}
transformed parameters {
vector[N] log_likes;
for ( n in 1:N ) {
log_likes[n] = bernoulli_lpmf(x[n] | p);
}
}
model {
p ~ beta(1, 1);
target += watanabe_beta * sum(log_likes);
}
generated quantities {
real log_like;
log_like = sum(log_likes);
}
</pre>
Using the <code>pystan</code> module for Python (3), one could estimate \(p\) from data \(x\) as follows:
<pre class="brush:python">
import pystan
import numpy as np
import scipy.stats as sts
## compile the model
sm = pystan.StanModel(file="coin_model.stan")
## create random data
N = 50
p_real = 0.75
x = sts.bernoulli.rvs(p_real, size=N)
## prepare data for Stan
data = {'N' : N, 'x' : x, 'mode' : 0}
pars = ['p', 'log_like']
## run the Stan model
fit = sm.sampling(data=data)
chain = fit.extract(permuted=True, pars=pars)
ps = chain['p']
print("p =", np.mean(ps), "+/-", np.std(ps))
</pre>
Which should give output similar to
<pre>
p = 0.67187824104 +/- 0.0650602081466
</pre>
In order to calculate the WBIC, the <code>mode</code> has to be set to 1:
<pre class="brush:python">
## import modules, compile the Stan model, and create data as before...
## prepare data for Stan
data = {'N' : N, 'x' : x, 'mode' : 1} ## notice the mode
pars = ['p', 'log_like']
## run the Stan model
fit = sm.sampling(data=data)
chain = fit.extract(permuted=True, pars=pars)
WBIC = -2*np.mean(chain["log_like"])
print("WBIC =", WBIC)
</pre>
which should result in something similar to
<pre>
WBIC = 66.036854275
</pre>
In order to test if this result is any good, we first compute the WBIC of the null model.
This is easy since the null model does not have any parameters
\[
{\sf WBIC} = -2\cdot L(x) = -2 \cdot \log(\tfrac12^{K}(1-\tfrac12)^{N-K}) = 2N\cdot\log(2)
\]
and hence, when \(N=50\), we get \({\sf WBIC} \approx 69.3\). Hence, we find positive
evidence against the null model, since \(\Delta{\sf WBIC} \approx 3.3\).
<br />
<br />
For the coin-toss model, we know the posterior density explicitly, namely
\[
P(x|M) = \int_0^1 p^K (1-p)^{N-K} dp = B(K+1, N-K+1)\,,
\]
where \(M\) denotes the alternative (i.e. biased) model, and \(B\) denotes the Beta function.
In the above example, we had \(N = 50\) and \(K = 34\), and hence
\(-2\cdot\log(P(x|M)) \approx 66.31 \), which is remarkably close to the \({\sf WBIC}\) of the alternative model.
Similar results hold for several values of \(p\), as demonstrated by the following figure
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj21mjHcScusJxyTIDn34ucC1-DVEEBmZnJnxeV-AzHjDizoZh8z7yQARXIg_vaMksKhRHnGSrPEB155M_O6QBAtH2cK8vJLWv0YhXuaPzULzYnQnxxM9fM89TyeOFjFl6dCr0Ya6_DbXYX/s1600/wbic_performance.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj21mjHcScusJxyTIDn34ucC1-DVEEBmZnJnxeV-AzHjDizoZh8z7yQARXIg_vaMksKhRHnGSrPEB155M_O6QBAtH2cK8vJLWv0YhXuaPzULzYnQnxxM9fM89TyeOFjFl6dCr0Ya6_DbXYX/s400/wbic_performance.png" width="400" height="208" /></a></div>
<h2>A non-trivial example</h2>
<i> As the example above is non-singular, and WBIC is supposed to work also for singular models,
I plan to present an example with a singular model here.
</i>
<br /></div>
Christiaan van Dorphttp://www.blogger.com/profile/09480120443706387617noreply@blogger.com5tag:blogger.com,1999:blog-8385427779430646568.post-88455896176897805412016-11-22T08:29:00.000-08:002016-11-22T10:36:02.358-08:00Using the "ones trick" to handle unbalanced missing data with JAGS<div dir="ltr" style="text-align: left;" trbidi="on">
The so-called "ones trick" for JAGS and BUGS models allows the user to sample from distributions that are not
in the standard list. Here I show another application for "unbalanced" missing data.
More examples using the ones (or zeros) trick can be found in <a href="http://www.mrc-bsu.cam.ac.uk/software/bugs/the-bugs-project-the-bugs-book/">The BUGS Book</a>, and in particular in <a href="http://www.mrc-bsu.cam.ac.uk/wp-content/uploads/bugsbook_chapter9.pdf">Chapter 9</a>.
<br />
<br />
Suppose that we want to find an association between a trait \(x\) (possibly continuous) and another categorical trait \(k \in \{1, 2,\dots, K\}\). We have \(N\) observations \(x_i\), however, some of the \(k_i\) are missing. Whenever there is no information on the missing \(k_i\) whatsoever, we can simply write the following JAGS model:
<pre class="brush: cpp">
data {
/* alpha is the parameter for the Dirichlet prior of p,
* with p the estimated frequency of k
*/
for ( j in 1:K ) {
alpha[j] <- 1
}
}
model {
for ( i in 1:N ) {
/* Some model for x, given k */
x[i] ~ dnorm(xhat[k[i]], tau_x)
/* Sample missing values of k, and inform
* the distribution p with the known k's
*/
k[i] ~ dcat(p)
}
/* Model the distribution p_k of the trait k */
p ~ ddirch(alpha)
/* The precision of the likelihood for x */
tau_x ~ dgamma(0.01, 0.01)
/* Give the xhat's a shared prior to regularize them */
for ( j in 1:K ) {
xhat[j] ~ dnorm(mu_xhat, tau_xhat)
}
mu_xhat ~ dnorm(0, 0.01)
tau_xhat ~ dgamma(0.01, 0.01)
}
</pre>
The data file must contain the vector \(k\), with NA in the place of the missing values.
<br />
<br />
Suppose now that the missing \(k_i\)s are not completely unknown.
In stead, suppose that we know that some of the \(k_i\) must belong to a group \(g_i\).
The group \(g_i\) is encoded as a binary vector,
where a \(g_{i,j} = 1\) indicates that the trait value \(j\) is in the group,
and \(0\) that it is not. In particular, when \(k_i\) is known, then
\[
g_{i,j} = \left\{ \begin{array}{ll} 1 & {\rm if\ } j = k_i \\ 0 & {\rm otherwise} \end{array} \right.
\]
If \(k_i\) is completely unknown, then we just let \(g_{i,j} = 1\) for each \(j\).
<pre class="brush: cpp">
data {
for ( j in 1:K ) {
alpha[j] <- 1
}
/* for the "ones trick",
* we need a 1 for every observation
*/
for ( i in 1:N ) {
ones[i] <- 1
}
}
model {
for ( i in 1:N ) {
x[i] ~ dnorm(xhat[k[i]], tau_x)
/* sample missing k's using the group-vector g */
k[i] ~ dcat(g[i,])
/* in order to inform p correctly,
* we need to multiply the posterior with p[k[i]],
* which can be done by observing a 1,
* assumed to be Bernoulli(p[k[i]]) distributed
*/
ones[i] ~ dbern(p[k[i]])
}
p ~ ddirch(alpha)
tau_x ~ dgamma(0.01, 0.01)
for ( j in 1:K ) {
xhat[j] ~ dnorm(mu_xhat, tau_xhat)
}
mu_xhat ~ dnorm(0, 0.01)
tau_xhat ~ dgamma(0.01, 0.01)
}
</pre>
In stead of using the "ones trick", it would have been more clear-cut if we were able to write
<pre class="brush: cpp">
k[i] ~ dcat(g[i,]) /* sampling statement */
k[i] ~ dcat(p) /* statement to inform p */
</pre>
but in this way JAGS can not tell that it is to sample from \({\rm Categorical}(g_i)\),
and not from \({\rm Categorical}(p)\).
<br />
Similarly, it might have been tempting to write
<pre class="brush: cpp">
k[i] ~ dcat(g[i,] * p) /* point-wise vector multiplication */
</pre>
This would correctly prefer the common \(k\,\)s in group \(g_i\)
over the rare \(k\,\)s, but the fact that \(k_i\) must come from the group
\(g_i\) is not used to inform \(p\).
<br />
<br />
As an example, I generated some random \(k_i\)s sampled from \({\rm Categorical}(p)\)
(I did not bother to sample any \(x_i\)s).
I have taken \(K = 15\), \(N = 2000\), and \(3\) randomly chosen groups.
For a \(1000\) observations, I "forget" the actual \(k_i\), and only know the group \(g_i\).
The goal is to recover \(p\) and the missing \(k_i\)s.
<br />
Using a chain of length \(20000\) (using the first \(10000\) as a burn-in and \(1/10\)-th thinning)
we get the following result:
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhR0VJK62RRfCw3f7J1TQE_Y1105nHGSqk5J6KuxDj94JXa26Hc8qowGcdylN5KIcZGq2Dy8ERIj1RPp5KKxJQRoa6oGRoW3gUOWcJ4EqxD75uRAzvwhkqTKOHY8_yDspgKAcPd4TCdJdMW/s1600/jags-ones-trick.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhR0VJK62RRfCw3f7J1TQE_Y1105nHGSqk5J6KuxDj94JXa26Hc8qowGcdylN5KIcZGq2Dy8ERIj1RPp5KKxJQRoa6oGRoW3gUOWcJ4EqxD75uRAzvwhkqTKOHY8_yDspgKAcPd4TCdJdMW/s400/jags-ones-trick.png" width="400" height="400" /></a></div>
<br />
<br />
</div>
Christiaan van Dorphttp://www.blogger.com/profile/09480120443706387617noreply@blogger.com0tag:blogger.com,1999:blog-8385427779430646568.post-27932429001075179022016-05-05T08:36:00.004-07:002016-11-22T09:15:06.677-08:00Computing Bayes factors with Stan and a path-sampling method<div dir="ltr" style="text-align: left;" trbidi="on">
<a href="http://mc-stan.org/">Stan</a>
is a great program for MCMC (or <a href="https://en.wikipedia.org/wiki/Hybrid_Monte_Carlo">HMC</a>, really).
Vehtari <i>et al.</i> explain <a href="http://arxiv.org/abs/1507.04544">here</a> how to use Stan to compute WAIC.
For the Bayes factor, however, I have not found a method yet,
and therefore I would like to demonstrate a possible method here.
This will obviously not work well for every model; this is merely an experiment.
<br/>
Recently, I was really intrigued by a paper by <a href="https://projecteuclid.org/euclid.ss/1028905934">Gelman and Meng</a>, where several methods for computing Bayes factors, or normalizing constants, are explained and connected
(even the <a href="https://radfordneal.wordpress.com/2008/08/17/the-harmonic-mean-of-the-likelihood-worst-monte-carlo-method-ever/">really bad ones</a>).
Here, I will use the <i>path sampling</i> method.
<br/>
Let us implement a simple model in Stan, for which we can explicitly compute the marginal likelihood.
Then we can try to estimate this marginal likelihood with the path-sampling method, and compare it with the exact value.
<br/>
A very simple model is the 'fair coin' example (taken directly from <a href="https://en.wikipedia.org/wiki/Bayes_factor">wikipedia</a>). The Bayes factor between a null-model \(M_0\) and a model that incorporates a bias, \(M_1\), can be computed directly as the quotient of the marginal likelihoods,
and moreover, the null model does not have any parameters.
<br/>
Let \(n\) denote the number of coin tosses, and \(k\) the number of 'heads'. Hence, the data \(D\) is given by the pair \((n,k)\).
Given a prior \(\theta \sim {\rm Beta}(\alpha, \beta)\) on the probability of throwing heads, we get the posterior
\(\theta \sim {\rm Beta}(\alpha + k, \beta +n-k)\), and we can compute the marginal likelihood exactly:
<br/>
\[ p(D|M_1)
= \int_0^1 p(D|\theta)\pi(\theta) d\theta
\]
\[
= {n \choose k}\int_0^1 \theta^{k+\alpha-1}(1-\theta)^{n-k+\beta-1} d\theta
= {n \choose k} B(k+\alpha, n-k+\beta)\,,
\]
<br/>
where \(B\) denotes the Beta function. Meanwhile,
<br/>
\[ p(D | M_0)
= {n \choose k} \left(\tfrac12\right)^k\left(1-\tfrac12\right)^{n-k}\,.
\]
<br/>
In this instance of path sampling
(and closely following <a href="https://projecteuclid.org/euclid.ss/1028905934">Gelman and Meng</a>),
we consider a family of (un-normalized) distributions \(Q_T\), indexed by a parameter \(T \in [0,1]\),
such that \(Q_0(\theta) = \pi(\theta)\) and \(Q_1(\theta) = p(D|\theta)\pi(\theta)\).
The normalizing constants are denoted by \(z(T)\).
Notice that \(z(0) = 1\) and \(z(1) = p(D|M_1)\).
<br/>
Let \(\Theta = [0,1]\) denote the support of \(\theta\). Since
<br/>
\[ \frac{d}{dT} \log z(T)
= \frac{1}{z(T)} \frac{d}{dT} z(T)
= \frac{1}{z(T)} \frac{d}{dT} \int_{\Theta} Q_T(\theta) d\theta\,,
\]
<br/>
we get that
<br/>
\[ \frac{d}{dT} \log z(T)
= \int_{\Theta} \frac{1}{z(T)} \frac{d}{dT} Q_T(\theta) d\theta\,,
\]
<br/>
and hence
<br/>
\[ \frac{d}{dT} \log z(T)
= \int_{\Theta} \frac{Q_T(\theta)}{z(T)} \frac{d}{dT} \log(Q_T(\theta)) d\theta\,.
\]
<br/>
When we denote by \(\mathbb{E}_T\) the expectation under \(P_T\), we get that
<br/>
\[ \frac{d}{dT} \log z(T)
= \int_{\Theta} P_T(\theta) \frac{d}{dT} \log(Q_T(\theta)) d\theta
= \mathbb{E}_T\left[ \frac{d}{dT} \log(Q_T(\theta)) \right]\,.
\]
<br/>
We can think of \(U(\theta, T) := \frac{d}{dT} \log(Q_T(\theta))\) as 'potential energy', and we get
<br/>
\[
\int_0^1 \mathbb{E}_T\left[U(\theta, T)\right] dT = \int_0^1 \frac{d}{dT} \log(z(T)) dT
\]
\[
= \log(z(1)) - \log(z(0)) = \log(z(1)/z(0)) =: \lambda\,.
\]
<br/>
Notice that in our case \(\lambda = \log(P(D|M_1))\).
We can interpret \(\int_0^1 \mathbb{E}_T\left[U(\theta, T)\right] dT\)
as the expectation of \(U\) over the joint probability density of \(T\) (with a uniform prior) and \(\theta\):
<br/>
\[ \lambda = \mathbb{E}\left[ U(\theta, T)\right]\,. \]
<br/>
This suggests an estimator \(\hat{\lambda}\) for \(\lambda\):
<br/>
\[ \hat{\lambda} = \frac{1}{N} \sum_{i=1}^N U(\theta_i, T_i)\,, \]
<br/>
where \((\theta_i, T_i)_i\) is a sample from the joint distribution of \(\theta\) and \(T\).
A way of creating such a sample, is first sampling \(T_i\) from its marginal (uniform) distribution, and then
sampling \(\theta_i\) from \(P_{T_i}\). This last step <i>might</i> require some Monte Carlo sampling.
<br/>
First, we need to choose \(1\)-parameter family of distributions. A simple choice is the <i>geometric path</i>:
<br/>
\[ Q_T(\theta) = \pi(\theta)^{1-T} (\pi(\theta)p(D|\theta))^{T} = \pi(\theta) p(D|\theta)^T\,. \]
<br/>
In this case, the potential energy simply equals \(\frac{d}{dT}\log(Q_T(\theta)) = \log p(D|\theta)\)
<br/>
<h2>The Stan model</h2>
Using the <a href="http://pystan.readthedocs.io/en/latest/">pystan</a> interface,
we can implement the model as follows.
The most important parts are the parameter \(T\) (declared in the data section),
and the "generated quantity" \(U\).
<pre class="brush: python">
## import some modules
import pystan
import scipy.stats as sts
import scipy.special as spl
import numpy as np
import multiprocessing
## define a Stan model
model = """
data {
int<lower=0> n;
int<lower=0, upper=n> k;
real<lower=0> alpha;
real<lower=0> beta;
real<lower=0, upper=1> T; // parameter for path sampling
}
parameters {
real<lower=0, upper=1> theta;
}
model {
theta ~ beta(alpha, beta);
increment_log_prob(T*binomial_log(k, n, theta));
// replaces sampling statement "k ~ binomial(n, theta)"
}
generated quantities {
real U;
U <- binomial_log(k, n, theta);
}
"""
## let Stan translate this into C++, and compile...
sm = pystan.StanModel(model_code=model)
</pre>
<h2> A parallel method </h2>
We need to generate samples \(T_i\) from \({\rm Uniform}(0,1)\), and then,
given \(T_i\) we generate a sample \(\theta_i\) from \(P_{T_i}\).
The simplest way is just to make a partition \(T_i = \tfrac{i}{N}\) of \([0,1]\)
and then for each \(i=0,\dots,N\), use the Stan model with \(T = T_i\).
Notice that for each \(i\), we will generate multiple (\(K\), say) samples from \(P_{T_i}\).
This method lends itself well for multi-processing,
as all \(N+1\) Stan sessions can run in parallel.
<pre class="brush: python">
## choose some parameters
k = 10 ## heads
n = 100 ## coin tosses
alp = 1 ## determines prior for q
bet = 1 ## determines prior for q
K = 100 ## length of each chain
N = 1000 ## number of Ts
## a function that prepares a data dictionary,
## and then runs the Stan model
def runStanModel(T):
coin_data = {
'n' : n,
'k' : k,
'alpha' : alp,
'beta' : bet,
'T' : T
}
fit = sm.sampling(data=coin_data, iter=2*K,
warmup=K, chains=1)
la = fit.extract(permuted=True)
return la['U'] ## U is a "generated quantity"
## make a partition of [0,1]
Ts = np.linspace(0, 1, N+1)
## start a worker pool
pool = multiprocessing.Pool(4) ## 4 threads
## for each T in Ts, run the Stan model
Us = np.array(pool.map(runStanModel, Ts))
</pre>
Let's have a look at the result. Notice that for \(\alpha=\beta=1\),
the marginal likelihood does not depend on \(k\) as
\[ P(D|M_1)
= {n \choose k} \frac{\Gamma(k+1)\Gamma(n-k+1)}{\Gamma(n+2)}
= {n \choose k} \frac{k! (n-k)!}{(n+1)!}
= \frac{1}{n+1}
\]
We could take for \(\hat{\lambda}\) the average of <i>all</i> \((N+1)\cdot K\) samples,
but in my experience, the standard error is more realistic
when I only take one sample per \(T_i\).
<pre class="brush: python">
## take one sample for each T
lamhat = np.mean(Us[:,-1])
## we can also compute a standard error!!
se_lamhat = sts.sem(Us[:,-1])
print "extimated lambda = %f +/- %f"%(lamhat, se_lamhat)
print "estimated p(D|M_1) = %f"%np.exp(lamhat)
exactMargLike = spl.beta(k+alp, n-k+bet) * spl.binom(n,k)
exactMargLoglike = np.log(exactMargLike)
print "exact lambda = %f"%exactMargLoglike
print "exact p(D|M_1) = %f"%exactMargLike
</pre>
In my case, the result is
<pre>
estimated lambda = -4.724850 +/- 0.340359
estimated p(D|M_1) = 0.008872
exact lambda = -4.615121
exact p(D|M_1) = 0.009901
</pre>
<h2> A serial method </h2>
Another method does not use parallel processing,
but uses the fact that the distributions \(P_{T_i}\) and \(P_{T_{i+1}}\) are very similar when
\(T_{i}-T_{i+1} = \frac{1}{N}\) is small.
When we have a sample from \(P_{T_i}\), we can use
it as the initial condition for the Stan run with \(T = T_{i+1}\).
We then only need very little burn-in (warm-up) time before we are actually sampling from \(P_{T_{i+1}}\).
We can specify the number of independent chains that Stan computes,
and also separate initial parameters for each of the chains.
Hence, we can take multiple samples \(P_{T_i}\) as initial choices for the next chain.
For this very simple model, this "serial" method is much slower than the parallel method,
but my guess is that it could be a lot faster for more complicated models.
I hope to prove this claim in a future post.
<pre class="brush: python">
## choose some parameters
k = 10 ## heads
n = 100 ## coin tosses
alp = 1 ## determines prior for q
bet = 1 ## determines prior for q
K = 100 ## size initial chain
N = 200 ## number of Ts
## initially, do a longer run with T=0
coin_data = {
'n' : n,
'k' : k,
'alpha' : alp,
'beta' : bet,
'T' : 0
}
fit = sm.sampling(data=coin_data, iter=2*K,
warmup=K, chains=1)
la = fit.extract(permuted=True)
## in stead of length K,
## now use a much shorter chain (of length L)
L = 10
chains = 4
Us = np.zeros(shape=(N+1,L*chains))
Ts = np.linspace(0, 1, N+1)
## now run the 'chain of chains'
for i, Ti in enumerate(Ts):
coin_data['T'] = Ti ## take another T
## take some thetas from the previous sample
thetas = np.random.choice(la["theta"], chains)
initial_guesses = [{'theta' : theta} for theta in thetas]
fit = sm.sampling(data=coin_data, iter=2*L, warmup=L,
chains=chains, init=initial_guesses)
la = fit.extract(permuted=True)
Us[i,:] = la['U']
</pre>
Ok, let us have another look at the result. For this, I used the same code as above:
<pre>
estimated lambda = -5.277354 +/- 1.120274
estimated p(D|M_1) = 0.005106
exact lambda = -4.615121
exact p(D|M_1) = 0.009901
</pre>
As you can see, the estimate is less precise, but this is due to the fact that \(N=200\)
instead of \(1000\).
<br/>
I've written in, and I ran the above code fragments from a <a href="http://jupyter.org/">Jupyter notebook</a>. As compiling and sampling can take a lot of time, such an interface can be very convenient.
Please let me know when this was somehow useful for you or if you have any questions, and also please tell me if I did something stupid...
<br /></div>
Christiaan van Dorphttp://www.blogger.com/profile/09480120443706387617noreply@blogger.com0tag:blogger.com,1999:blog-8385427779430646568.post-5113387799221052722016-05-02T11:29:00.000-07:002016-05-02T12:33:17.374-07:00A Sunset Colormap for Python<div dir="ltr" style="text-align: left;" trbidi="on">
Room Z533 in the Kruyt building in Utrecht can have a wonderful view, especially at sunset. Not too long ago, I took the following photograph.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgnz-l5Wzg258THAZHkUGEqBx3Nnw2IpHLxBmRiERKVZozSIcARN6T3IEVOoXtvCq1q-fwr11A2BswTgihsT0cwj6jbMJzFwKwPUuBEpJSSIiFjYbTdOtevrtnFNI2Y6K_CgZEPC3tvj8A8/s1600/view-uithof.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgnz-l5Wzg258THAZHkUGEqBx3Nnw2IpHLxBmRiERKVZozSIcARN6T3IEVOoXtvCq1q-fwr11A2BswTgihsT0cwj6jbMJzFwKwPUuBEpJSSIiFjYbTdOtevrtnFNI2Y6K_CgZEPC3tvj8A8/s400/view-uithof.jpg" width="400" /></a></div>
Having a love-hate relationship with <a href="http://matplotlib.org/examples/color/colormaps_reference.html" target="_blank">colormaps</a>, and in particular <a href="http://matplotlib.org/users/colormaps.html" target="_blank">choosing</a> a good one, I INSTANTLY noticed the beauty of the gradient of the sky, and hence, its application as a colormap.<br />
<br />
In Python, it is easy to import figures as a matrix, and then extract the RGB values along a line. To make things easier, I first cropped out the part of the sky I wanted:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEilgXeAzlyoxGkVaoaNWdaZHOxnImKBuUAxL4XHdSSoykuUdTMR7GW7qO41Q69YumAV9NMgw9XambgDFaIkwcj84VA0HzkabQN2JJwSUtmDfwl_BwlzDHfaIeBOc6XoW5kG8zXAnq4eAv0y/s1600/uithof-sunset2.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEilgXeAzlyoxGkVaoaNWdaZHOxnImKBuUAxL4XHdSSoykuUdTMR7GW7qO41Q69YumAV9NMgw9XambgDFaIkwcj84VA0HzkabQN2JJwSUtmDfwl_BwlzDHfaIeBOc6XoW5kG8zXAnq4eAv0y/s200/uithof-sunset2.jpg" width="12" /></a></div>
Then, I had a look at <a href="http://matplotlib.org/examples/pylab_examples/custom_cmap.html" target="_blank">this</a> website.<br />
Let's import the "slice of sky" in Python and have a look:<br />
<pre class="brush: python">
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.colors import LinearSegmentedColormap
from PIL import Image
im = Image.open("/path/to/sunset.jpg")
pix = im.load()
xs = range(im.size[1])
y = im.size[0]/2
rs = [pix[y,x][0] for x in xs]
gs = [pix[y,x][1] for x in xs]
bs = [pix[y,x][2] for x in xs]
fig = plt.figure(figsize=(5,2))
ax = fig.add_subplot(111)
ax.plot(xs, rs, color='red')
ax.plot(xs, gs, color='green')
ax.plot(xs, bs, color='blue')
ax.set_xlabel("pixel")
ax.set_ylabel('r/g/b value')
plt.savefig('rgb-plot.png', dpi=300, bboxinches='tight')
</pre>
<br />
This results in the following figure:
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhoi3EQVldT4e2rCRcSJMGVofUp55-FWFtgziyZpoQFl85-55u_HuSiNJZ0JWOMD0YZQKyC6dWxonD6pdTJn7mqLNS0_u1nQ0oTxHp_SSEFCmVEzft6Gy3NQmnG_mE6l5SkzISpHeJaTpDZ/s1600/rgb-plot.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhoi3EQVldT4e2rCRcSJMGVofUp55-FWFtgziyZpoQFl85-55u_HuSiNJZ0JWOMD0YZQKyC6dWxonD6pdTJn7mqLNS0_u1nQ0oTxHp_SSEFCmVEzft6Gy3NQmnG_mE6l5SkzISpHeJaTpDZ/s320/rgb-plot.png" /></a></div>
<br /></div>
Now let's make the actual colormap.
<br />
<pre class="brush: python">
Dp = 1 ## determines the number of pixels between "nodes"
xs = np.linspace(0,1,im.size[1]/Dp + 1) ## points between 0 and 1
idxs = range(0,im.size[1],Dp) ## indices in the original picture matrix
redlist = [rs[idx]/255. for idx in idxs] + [rs[-1]/255., 0]
greenlist = [gs[idx]/255. for idx in idxs] + [gs[-1]/255., 0]
bluelist = [bs[idx]/255. for idx in idxs] + [bs[-1]/255., 0]
## LinearSegmentedColormap wants these weird triples,
## where some end points are ignored...
redtuples = [(x, redlist[i], redlist[i+1]) for i, x in enumerate(xs)]
greentuples = [(x, greenlist[i], greenlist[i+1]) for i, x in enumerate(xs)]
bluetuples = [(x, bluelist[i], bluelist[i+1]) for i, x in enumerate(xs)]
cdict = {'red' : redtuples, 'green' : greentuples, 'blue' : bluetuples}
cmap = LinearSegmentedColormap('tbz533', cdict) ## choose a name
plt.register_cmap(cmap=cmap)
## now we can use our new colormap!
</pre>
<br />
OK. Let's make some 70s wallpaper to celebrate the new colormap:
<br />
<pre class="brush: python">
xs = np.linspace(-5,5,1000)
ys = np.linspace(-5,5,1000)
zs = np.array([[np.sin(x)*np.cos(y) for x in xs] for y in ys])
fig = plt.figure(figsize=(5.5,5))
ax1 = fig.add_subplot(1,1,1)
C = ax1.pcolormesh(xs, ys, zs, cmap='tbz533')
fig.colorbar(C)
fig.savefig('my70swallpaper.png', dpi=200, bboxinches='tight')
</pre>
<br />
This results in the following figure:
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiHwZgkJ4zx6bW3NkMbp34hZXlFkmm4YxTT-W0blJpBnFbtGDZSc6tt11Vt9PFwrH_QugZWA2yjzaeoIuYX18ZdurxXVzBgIupGWT8zEdOCdZBVkH40LPYHjOD7cCwzva1lEqnc31c3rw-J/s1600/KruytZ533_example.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiHwZgkJ4zx6bW3NkMbp34hZXlFkmm4YxTT-W0blJpBnFbtGDZSc6tt11Vt9PFwrH_QugZWA2yjzaeoIuYX18ZdurxXVzBgIupGWT8zEdOCdZBVkH40LPYHjOD7cCwzva1lEqnc31c3rw-J/s320/KruytZ533_example.png" /></a></div>
<br />
That's it! Please leave your own favorite colormap in the comments.Christiaan van Dorphttp://www.blogger.com/profile/09480120443706387617noreply@blogger.com1tag:blogger.com,1999:blog-8385427779430646568.post-330392583210748222015-11-06T15:13:00.001-08:002016-05-02T11:58:01.857-07:00ETA for C++<div dir="ltr" style="text-align: left;" trbidi="on">
Simulations can take some time, and I'd like to know how long.
This is easy, right? Yes, it is. I've done it lots of times,
but every time I do, I curse myself for not using an old piece of code.
<br />
most likely, there is some standard, best way of doing this, but I haven't found it.
Most recently, I did this:
I made a simple object "EtaEstimator", that can be updated every (costly) time step
and asked for an estimated time of "arrival" at any time. Here's the header:
<pre class="brush: cpp">
// eta.hpp
#include <ctime>
#include <cmath> // floor
#include <iostream>
class EtaEstimator {
public:
EtaEstimator(int );
// constuction starts the clock. Pass the number of steps
void update();
void print(std::ostream & ) const;
private:
double ct, etl; // cumulative time, estimated time left
int n, N; // steps taken, total amount of steps
clock_t tick; // time after update ((c) matlab)
// statics...
static const int secperday = 86400;
static const int secperhour = 3600;
static const int secperminute = 60;
};
std::ostream & operator<<(std::ostream & , const EtaEstimator & );
</pre>
The members are straight forward too:
<pre class="brush: cpp">
// eta.cpp
#include "eta.hpp"
EtaEstimator::EtaEstimator(int N) :
ct(0.0), etl(0.0), n(0), N(N) {
tick = clock();
}
void EtaEstimator::update() {
clock_t dt = clock() - tick;
tick += dt;
ct += (double(dt)/CLOCKS_PER_SEC); // prevent integer division
// CLOCKS_PER_SEC is defined in ctime
++n;
etl = (ct/n) * (N-n);
}
void EtaEstimator::print(std::ostream & os) const {
double etlprime = etl;
int days = floor(etlprime / secperday);
etlprime -= days * secperday;
int hours = floor(etlprime / secperhour);
etlprime -= hours * secperhour;
int minutes = floor(etlprime / secperminute);
etlprime -= minutes * secperminute;
int seconds = floor(etlprime);
os << (days > 0 ? std::to_string(days) + " " : "")
<< hours << ":"
<< (minutes < 10 ? "0" : "") << minutes << ":"
<< (seconds < 10 ? "0" : "") << seconds;
}
std::ostream & operator<<(std::ostream & os,
const EtaEstimator & eta) {
eta.print(os);
return os;
}
</pre>
Typical usage of EtaEstimator would be the following:
<pre class="brush: cpp">
#include <iostream>
#include "eta.hpp"
// about to do lots of work...
int N = 1000;
EtaEstimator eta(N);
for ( int n = 0; n < N; ++n ) {
// do something very expensive
eta.update()
std::cout << "\rETA: " << eta << std::flush;
}
// ...
</pre>
PS: std::to_string is a C++11 feature, and can be ignored by using something like
<pre class="brush: cpp">
if ( days > 0 ) os << days << " "; // else nothing at all
</pre>
<br /></div>Christiaan van Dorphttp://www.blogger.com/profile/09480120443706387617noreply@blogger.com0tag:blogger.com,1999:blog-8385427779430646568.post-2793043206232117222015-10-26T03:14:00.000-07:002016-05-02T12:43:32.836-07:00Carnes's life span distribution<div dir="ltr" style="text-align: left;" trbidi="on">
In a paper by <a href="http://www.ncbi.nlm.nih.gov/pubmed/16732401">Carnes <i>et. al</i></a>, a simple parametric, but realistic life span distribution is given, and here I show how you can sample from it.
In addition, assuming a demographic equilibrium, the age of individuals will have a particular distribution. I will show what this distribution is, and again how to sample from it. Sampling ages instead of lifespans might be useful for initiating simulations. I model epidemics, and I want my disease free (a.k.a. virgin) population to have the 'right' age distribution.
<br/>
The life span distribution has hazard
\[\lambda(a) = e^{u_1 a + v_1} + e^{u_2 a + v_2}\,.\]
Typical parameters are given by \(u_1 = 0.1\), \(v_1 = -10.5\), \(u_2 = -0.4\), and \(v_2 = -8\), so that infants have a slightly increased hazard of dying, and after the age of 60, the hazard rapidly starts to grow, until it becomes exceedingly large around \(a = 100\).
<br/>
The survival function \(S(a) = {\rm Pr}(A > a)\), where \(A\) is the random variable denoting 'age at death' is given by
\(S(a) = e^{-\Lambda(a)}\), with \(\Lambda(a) := \int_0^a \lambda(a')da'\) denoting the cumulative hazard.
The cumulative hazard \(\Lambda\) is easily calculated:
\[ \Lambda(a) = \frac{e^{v_1}}{u_1}(e^{u_1 a}-1) + \frac{e^{v_2}}{u_2}(e^{u_2 a}-1)\,, \]
but its inverse, or the inverse of the survival function is more difficult to calculate.
<br/>
We need the inverse of \(S\), because sampling random deviates typically involves uniformly sampling a number \(u\in [0,1)\). The number \(S^{-1}(u)\) is then the desired deviate.
<br/>
<br/>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhgPXiRPygiWueigEJy6TUz4dJDe4Zw5lLl1ZIhjX_G0JoBC-y3pRRfpd6yWrzD48-4URp_LIJq7JJV2v_dowChF5nzj6DCkQ_dotAS5c1nyEp8ORS9jlRYdiXAeeggqnWt-yyQg_Kgf-me/s1600/carnes.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhgPXiRPygiWueigEJy6TUz4dJDe4Zw5lLl1ZIhjX_G0JoBC-y3pRRfpd6yWrzD48-4URp_LIJq7JJV2v_dowChF5nzj6DCkQ_dotAS5c1nyEp8ORS9jlRYdiXAeeggqnWt-yyQg_Kgf-me/s320/carnes.png" /></a></div>
In a future post, I will show how to use the <a href="http://www.gnu.org/software/gsl/">GNU Scientific Library (GSL)</a> to sample deviates from \(A\).
</br>
</br>
Suppose that the birth rate \(b\) in our population is constant. A PDE describing the population is given by
\[\frac{\partial N(a,t)}{\partial t} + \frac{\partial N(a,t)}{\partial a} = -\lambda(a)N(a,t)\,,\]
where \(N(a,t)\) is the number (density) of individuals of age \(a\), alive at time \(t\).
The boundary condition (describing birth) is given by
\[N(0,t) = b\]
When we assume that the population is in a demographic equilibrium, the time derivative with respect to \(t\) vanishes, and we get an ODE for
the age distribution:
\[\frac{\partial N(a)}{\partial a} = -\lambda(a) N(a)\,,\quad N(0) = b\,,\]
where we omitted the variable \(t\). This equation can be solved:
\[\frac{1}{N}\frac{\partial N}{\partial a} = \frac{\partial \log(N)}{\partial a} = -\lambda(a)
\implies
N(a) = c \cdot e^{-\Lambda(a)}\]
for some constant \(c\). Since \(b = N(0) = c \cdot e^{-\Lambda(0)} = c\), we have
\[N(a) = b\cdot e^{-\Lambda(a)}\,.\]
Hence, we now know the PDF of the age distribution (up to a constant). Unfortunately, we can't get a closed form
formula for the CDF, let alone invert it. Therefore, when we want to sample, we need another trick.
I've used a method from <a href="http://numerical.recipes/">Numerical Recipies in C</a>. It involves finding a dominating function of the PDF,
with an easy, and easily invertible, primitive.
</br>
Let's just assume that \(b = 1\), so that the objective PDF is \(N(a) = e^{-\Lambda(a)}\). Please notice that \(N\) is not a proper PDF, since, in general, the expectation won't be \(1\). We need to find a simple, dominating function for \(N\).
A stepwise defined function might be a good choice, since the hazard is practically zero when the age is below \(50\), and then increases rapidly. We first find a comparison cumulative hazard \(\tilde{\Lambda}\) that <i>is dominated by</i> the actual cumulative hazard \(\Lambda\). Many choices are possible, but one can take for instance
\[ \tilde{\Lambda}(a) = \left\{ \begin{array}{ll} 0 & \mbox{if } a < a_0 \\
\lambda(a^{\ast})\cdot (a-a_0) & \mbox{otherwise} \end{array}\right. \]
where
\[ a_0 = a^{\ast} - \frac{\Lambda(a^{\ast})}{\lambda(a^{\ast})}\,. \]
The constant \(a_0\) is chosen such that the cumulative hazards \(\Lambda\) and \(\tilde{\Lambda}\) are tangent at
\(a^{\ast}\).
</br>
Since \(\Lambda\) dominates \(\tilde{\Lambda}\), the survival function \(\tilde{S}\) defined by
\(\tilde{S}(a) = e^{-\tilde{\Lambda}(a)}\) dominates \(S\).
It is easy to find the inverse of \(a\mapsto\int_0^a \tilde{S}(a')da'\), and hence we can easily sample random
deviates from the age distribution corresponding to \(\tilde{\Lambda}\). In order to sample from the desired
age distribution, one can use a <i>rejection method</i>: (i) sample an age \(a\) from the easy age distribution.
(ii) compute the ratio \(r = S(a)/\tilde{S}(a) \leq 1\). (iii) sample a deviate \(u \sim \mbox{Uniform}(0,1)\).
(iv) <i>accept</i> the age \(a\) when \(u \leq r\), and <i>reject</i> \(a\) otherwise. Repeat these steps until an age \(a\) was accepted.
<br/>
<br/>
The only thing we still need to do, is to find a good value for \(a^{\ast}\). To make the sampler as efficient
as possible, we want to minimize the probability that we have to reject the initially sampled age \(a\) from \(S\).
This boils down to minimizing
\[\int_0^{\infty} \tilde{S}(a)da =
a_0 + \frac{1}{\lambda(a^{\ast})} =
a^{\ast} + \frac{1 - \Lambda(a^{\ast})}{\lambda(a^{\ast})}\,.
\]
The derivative of \(a^{\ast} \mapsto \int_0^{\infty} \tilde{S}(a)da\) equals
\[
\frac{(\Lambda(a^{\ast}) - 1)\lambda'(a^{\ast})}{\lambda(a^{\ast})^2}
\]
and thus, we find an extreme value for \(\int_0^{\infty} \tilde{S}(a)da\) when \(\Lambda(a^{\ast}) = 1\) or when
\(\lambda'(a^{\ast}) = \frac{d\lambda}{da^{\ast}} = 0\). The second condition can only correspond to
a very small \(a^{\ast}\), and therefore will not minimize \(\int_0^{\infty} \tilde{S}(a)da\). Hence, we have to
solve \(\Lambda(a^{\ast}) = 1\). When we ignore the second term of \(\Lambda\), we find that:
\[ a^{\ast} = \log(1 + u_1 \exp(-v_1))/u_1\]
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi7OXDECXkgJKAgTGqpzQdy42uJPqwImGpp96tn-OPU6oVVGZk_p7ZhYqlkfQ5tDd3WN1gj7G3JwEQ9X5G9rRdHOotkxY91rABtQRv6nco7duYq5cN4YljeXkZi2yKEEkPMOX-XfaZ5FxFg/s1600/carnes-dom.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi7OXDECXkgJKAgTGqpzQdy42uJPqwImGpp96tn-OPU6oVVGZk_p7ZhYqlkfQ5tDd3WN1gj7G3JwEQ9X5G9rRdHOotkxY91rABtQRv6nco7duYq5cN4YljeXkZi2yKEEkPMOX-XfaZ5FxFg/s320/carnes-dom.png" /></a></div>
<br/></div>Christiaan van Dorphttp://www.blogger.com/profile/09480120443706387617noreply@blogger.com0tag:blogger.com,1999:blog-8385427779430646568.post-88715191374758001272015-09-18T03:17:00.001-07:002016-05-02T11:58:01.862-07:00Basic statistics using higher-order functions in C++<div dir="ltr" style="text-align: left;" trbidi="on">
I do a lot of individual-based simulations, and often I want to calculate some population statistics 'on the fly'. I found that it can be helpful to use C/C++'s basic functional capabilities. <br />
<br />
A higher-order function is a function that takes other functions as arguments (or returns a function as a result).
In C/C++, functions can be passed to other functions, but the notation can be a bit awkward, as one needs to pass
a reference to a function. If the function is a method of some class, then the notation gets even more involved.
You can make your life easier by using a typedef.<br />
<br />
The following code snippet shows the header file of a simple example.
The goal is to calculate some statistics on a population of "Critters". These Critters have multiple "traits", and
the traits are accessed by methods of the class Critter of signature "double Critter::trait() const".
Suppose that we want to calculate the mean of the trait "happiness". It's trivial to write a function that
does this, but then we might also get interested in the average "wealth". The function that calculates
the average wealth is identical to the function that calculates average happiness, except that happiness is replaced by wealth. We can get rid of this redundancy by defining the typedef Trait as a method of Critter, and writing a function that takes the average of an <i>arbitrary</i> Trait.<br />
<br />
<script src="https://gist.github.com/chvandorp/e765e161d37e76ad4493.js"></script>
Let us now look at the source file. The most important things to notice are... <br />
(1) whenever you pass a member "trait" (e.g. wealth) of Critter to a function, you should pass it as "&Critter::trait" (i.e. pass the reference).<br />
(2) when you want to evaluate Trait x of Critter alice, you'll need to de-reference x, and call the resulting function: "(alice.*x)()" <br />
<br />
<script src="https://gist.github.com/chvandorp/f2667a5435daa2210b32.js"></script>
If you want to play with this example, put the header in a file called "main.hpp", and the source in a file called "main.cpp", and compile main.cpp by typing "g++ main.cpp -o critters" in your terminal (I assume that you are using Linux and have the gcc compiler installed).
<br /></div>
Christiaan van Dorphttp://www.blogger.com/profile/09480120443706387617noreply@blogger.com0tag:blogger.com,1999:blog-8385427779430646568.post-67419507366385682152015-07-10T07:20:00.001-07:002016-05-02T11:57:31.245-07:00Blob plots, like violin plots, but more preciseViolin plots are beautiful, useful and much clearer than drawing several histograms on the same plot.<br />
In my opinion, the smoothing function that they implement with python does not really compensate the loss of precision with the visual appealing.<br />
Why not plotting the whole distribution, then?<br />
<br />
I modified the violin plots source code that I found here <a href="http://pyinsci.blogspot.nl/2009/09/violin-plot-with-matplotlib.html">HERE</a> <br />
<br />
And produced... let's called them blob plots:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEifheChRlTxhwdoMoD4Y9P7gDzUpUQoPpqxOxu-5VmbPWKRBpQefYujcmS4sDo-Vn7-dLkCQ0bpcMHWmbTi4-Y1TSh9jieBRsdMC0TnLr83S61dk6SiMFW5ElvonkDR3r5_YA6G5ySP5Io/s1600/blobplot.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEifheChRlTxhwdoMoD4Y9P7gDzUpUQoPpqxOxu-5VmbPWKRBpQefYujcmS4sDo-Vn7-dLkCQ0bpcMHWmbTi4-Y1TSh9jieBRsdMC0TnLr83S61dk6SiMFW5ElvonkDR3r5_YA6G5ySP5Io/s400/blobplot.png" width="400" /></a></div>
<br />
Here is the code.<br />
<br /><script src="https://gist.github.com/chvandorp/a436afffa5d7fe8f18fd.js"></script>
Feel free to take it and do whatever you want with it.<br />
If you have ideas to improve it let me know!<br />
If you like the result, go say thanks to the violin plot maker (I did very little)Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-8385427779430646568.post-8411271404113128472015-06-26T07:32:00.000-07:002016-05-02T11:58:01.859-07:00A simple class wrapper for parallelism in C++<div dir="ltr" style="text-align: left;" trbidi="on">
Concurrency can be extremely complicated, and causes problems that will
<a href="https://en.wikipedia.org/wiki/Concurrency_(computer_science)">haunt you in your dreams</a>.
The classical libraries in C/C++ don't protect you from this horror in
any way, and you will have to figure it out yourself. Parallelism is
supposed to be a lot easier, but C/C++ does not have standard libraries
like---for instance---Pythons parallelism modules. The <a class="wiki_link_ext" href="http://www.boost.org/" rel="nofollow">boost</a> libraries, however, provide an extensive interface for concurrency and parallelism.<br />
If you don't want to use boost, don't panic. There are other options. The POSIX <a class="wiki_link_ext" href="https://en.wikipedia.org/wiki/POSIX_Threads" rel="nofollow">pthread</a>
library provides a down-to-the-metal concurrency model. It took me a
while to find out what it is all about, and I haven't successfully
applied all it's possibilities. What I have managed to apply, is a
so-called "worker-pool" model. This is one of the easier concurrency
applications (it falls under the parallelism category) of the pthread
library, but can be quite useful. Here I will demonstrate a "wrapper" that C++ classes can inherit.<br />
<br />
Suppose that you have a class Fun, that needs to do some computations. We declare Fun in---say---Fun.hpp:
<br />
<script src="https://gist.github.com/chvandorp/33b39d004d0026873a10.js"></script>
Fun inherits the worker pool functionality from the class "WorkerPoolWrapper" (imported from WorkerPoolWrapper.hpp).
The class WorkerPoolWrapper has a "pure virtual" member executeJob(void* ).
You, the user, must specify what you want to compute in your own definition of the executeJob method.
Besides implementing executeJob, you must also initiate the worker pool, and somewhere before Fun's destuctor returns, the threads must be "joined", i.e., all worker threads must finish their execution.
In this case, I use the constructor and destructor of Fun to accomplish these things:
<br />
<script src="https://gist.github.com/chvandorp/e9276effd64d38202f01.js"></script>
The methods initWorkerPool and waitForWorkerPoolToExit are inherited from WorkerPoolWrapper. Lets use Fun to compute the number of primes pi(n) below a number n. We overload operator() as follows:
<br />
<script src="https://gist.github.com/chvandorp/3836891a5d9296c36782.js"></script>
Notice that this implementation of pi(n) is not the most efficient one. It checks for every integer i between 0 and n whether i is prime or not. This prime test is performed by executeJob.
In the background, WorkerPoolWrapper has a list of jobs that have to be executed. Jobs can be added to this job list using addNewJob(void* ). Once executed, the result of a job must somehow be stored in the job again. Above, the number pi is the sum of the array ns, which makes sense when we look at the implementation of executeJob:
<br />
<script src="https://gist.github.com/chvandorp/d97ded1320d7483bd0de.js"></script>
Hence, executeJob transforms the number i pointed to by job into 0 if i is composit, or 1 if i is prime, such that the sum of the i's equals pi(n).
Before we gathered the results in Fun::operator(), we called syncWorkerThreads(). This method lets the program halt until every job in the job list has been executed.
<br />
Using the functionoid Fun now works as follows:
<br />
<script src="https://gist.github.com/chvandorp/8e7630e0252301976f8c.js"></script>
The class WorkerPoolWrapper is declared here:
<br />
<script src="https://gist.github.com/chvandorp/2d1e15f49a442e5b5fbf.js"></script>
And the members are defined here:
<script src="https://gist.github.com/chvandorp/b54fb35ebdda96794df5.js"></script>
The credits for combining pthread with C++ classes go to <a href="http://stackoverflow.com/users/131930/jeremy-friesner">Jeremy Friesner</a>.
</div>Christiaan van Dorphttp://www.blogger.com/profile/09480120443706387617noreply@blogger.com0