Saturday, 9 December 2023

Contributing code to Stan

This post is about me trying to contribute some code to the Stan codebase. My colleagues and I often use the Dirichlet-Multinomial distribution (e.g. in this preprint), which is not a native distribution in the Stan language (v2.33, December 9, 2023). It is not hard to define this as a user-defined distribution, but it still would be convenient to have this as a native function; it helps with avoiding bugs, repetitive code or having to #include files. This is going to be my first pull request, so mistakes will be made.
I based all this on tutorials from the Stan Math website (specifically the sections on "adding new functions" and "adding new distributions"), and the stanc webpage. The first link explains how to add functions to the Stan Math library, and the second explains how to make such functions available in the Stan language.

Some updates about the submission process:

2023-12-09: After trying three times, I managed to pass all automated tests for the Stan Math library (see Steps 2 and 7 below). Now I'm waiting for code reviews from a Stan developer.

2023-12-23: Last week a Stan developer reviewed my code, made improvements and requested some changes. I discuss the improvements below (Step 7), as they are really instructive. My PR was then approved and merged with the "develop" branch. I then submitted PRs for the documentation and the stan-to-c++ compiler. This went through without any issues. This means that the Dirichlet-multinomial distribution should be available in Stan version 2.34.

2024-01-15: I posted this blog post on the Stan furum and received some useful feed-back. I've added some notes below.


All code and documentation published in this blog post is licensed under the following licenses:

Step 1: Coding the distribution as a user-defined function in Stan

A good place to start is to write a working Stan model that we can use for testing and benchmarking. The Dirichlet-Multinomial distribution is a Multinomial distribution in which the probability vector (simplex) is sampled from a Dirichlet distribution. We can actually use this fact to generate random samples in Stan. Lets start there, because it is easy.

To define a custom RNG in Stan, you write a function in the functions block with name distribution_name_rng where distribution_name is the name of your distribution. In our case, that's dirichlet_multinomial. The RNG function typically uses native Stan RNGs to generate random samples, which are then transformed, or rejected until they satisfy some condition, or whatever you need. In this case, we first sample a probability vector from the Dirichlet distribution using dirichlet_rng(alpha). Then, we use this probability vector as input for the Multinomial distribution to get a integer array with multinomial_rng(dirichlet_rng(alpha), N). Here N is the "population size" parameter of the multinomial distribution (also referred to as the "number of trials"). Using some python code, we can use the Stan model to sample random Dirichelet Multinomial deviates \(X\) and plot \(X / N\) on a \(K\)-simplex. Here we generate \(40\,000\) samples using \(N = 1000\) and \(K = 3\).



Next, we define a probability mass function (PMF), which, like everything has to be defined on the log scale in Stan. The PMF of the Dirichlet Multinomial distribution is \[ p(x | \alpha) = \frac{N B(\alpha_0, N)}{\prod_{k : x_k > 0} x_k B(\alpha_k, x_k)}\] where \(\alpha_0 = \sum_{k=1}^K \alpha_k \), \( N = \sum_{k=1}^K x_k \) and \(K\) is the number of categories. The function \(B\) is the Beta function. We can code this in Stan almost verbatim using the lbeta function, which computes \(\log \circ B\), and replacing products with sums.

This implementation contains a small bug, as it does not work for \(x_1 = x_2 = \cdots = x_K = 0\), but we'll fix that later.
To do a simple test, I colored randomly sampled points (using \(\alpha = (2.4, 4.0, 6.0)'\) and \(N = 1000\)) scaled to fit in a 3-simplex in three ways. First I used the scipy.stats implementation of the Dirichlet-Multinomial PMF, then I used the dirichlet_multinomial_lpmf function defined in the functions block (and exponentiated the values). Finally, I used a Gaussian KDE (from scipy.stats) to compute an emperical density of points in the sample (this is how I colored points in the simplex plot above). The tree methods of coloring give very similar results.

Step 2: Adding the distribution to the Stan math library

Stan is built on top of the Stan math library. All native functions are defined here. We first have to clone the Stan math library from github with git clone git@github.com:stan-dev/math.git. Documentation for the Stan math library can be found here. It shows you how to write programs that use the math library, giving you access to automatic differentiation, Stan functions, etc. The distributions can be found in the folder math/stan/math/prim/prob. Here we want to add the files dirichlet_multinomial_rng.hpp for the random number generator, dirichlet_multinomial_lpmf.hpp for the probability mass function and dirichlet_multinomial_log.hpp which is only there for legacy reasons. In the same folder, we also find the Dirichlet distribution, the Multinomial distribution and the Beta-Binomial distribution. If we somehow combine these three, we should get our Dirichlet-Multinomial distribution, so these examples are very useful. Let's start with the RNG, which is easiest.

We'll talk a bit about the docstring later, so let's ignore that for now. The actual logic of the code is really simple: we make use of the Dirichlet and Multinomial RNGs. Therefore, we have to #include their header files. The difficulty is understanding some C++ tricks, error checking, and working with the Eigen library. The first observation is that Stan math functions are defined in the namespace stan::math. Next, we see that the parameter alpha has a somewhat complicated type

const Eigen::Matrix<double, Eigen::Dynamic, 1>& alpha

This means that alpha is a \(K \times 1\) matrix (a vector) with double entries. \(K\) is not determined at compile time. The variable function is used for generating error messages and contains the name of the C++ function. The alpha_ref variable is the result of the to_ref function call. What this does is "evaluating" Eigen expressions. Eigen expressions can be "lazy" in the sense that they are not immediately evaluated. In this case, alpha could be the result of some complicated vector computation. Here, we need the actual values of alpha. If alpha is already evaluated, then to_ref does nothing.
Next, we validate the input. The variable alpha has to be positive, and not infinite. There is also a check_positive test, that does allow positive infinity. I guess \(+\infty\) could make sense if only one of the \(\alpha_k\) is infinite. But if multiple \(\alpha_k\) are infinite, the distribution is no longer well defined. So it's better to avoid infinities. The integer N has to be non-negative, and all integers are finite.
As mentioned, the logic is identical to the user-defined Stan function. There is one exception: the case where \(N = 0\). If we look at the RNG for the Binomial distribution (v4.7.0 (September 5, 2023)), we find that \(N\) has to be positive:

check_positive(function, "number of trials variables", N);

and using this RNG with \(N=0\) indeed results in an exception:

Exception: multinomial_rng: number of trials variables is 0,
but must be positive!

On the other hand, the multinomial_lpmf function works fine when \(N = \sum_{k=1}^K n_k = 0\), and return \(0 = \log(1)\). So I don't think the check_positive check is correct. I might file an issue on GitHub in the future. For now, we have to make sure that the case \(N = 0\) is handled separately by our RNG: we just return the zero vector.

Next, we are going to write the code for the probability mass function. The distribution is discrete, so we have to define a function than ends with _lpmf instead of _lpdf. If you read the example for "adding new distributions" on the Stan Math page, you'll find that a lot of effort is put in computing the gradients of the function, and also making sure the function is properly vectorized. The binomial example that I based my implementation on did not do any of these things. It relies on automatic differentiation and is currently not vectorized. In fact none of the multivariate discrete distributions are currently vectorized. Therefore I did not try to implement vectorization for the Dirichlet-Multinomial either. In the non vectorized case, it also does not make much sense to manually compute derivatives. The log-PMF is just a sum of log-Beta functions. I'll leave vectorization and gradients for a future PR.

Alas, as it turns out, the Stan team did not let me get away with this so easily. See section "Revisions" in "Step 7" below. Thankfully, I got a lot of help. Also, the version of the code with analytic derivatives is somewhat faster than my lazy version shown here.

Again, the logic of the actual computations are pretty simple and very similar to our user-defined Stan function. We also perform similar domain checks as in the RNG function. In addition, we check that ns and alpha have compatible sizes. Then we define lp the log-probability. The type of lp has to be the same as the base type of alpha, and this can be achieved with return_type_t. If you had multiple parameters where some have base type double and others Stan's autodiff type var, then passing all parameter types to return_type_t would result in the appropriate return type var. The template parameter T_prior_size is the type of the "prior size" (or intensity) parameter alpha. There are some restrictions on what alpha is allowed to be, and the template parameter require_eigen_col_vector_t<T_prior_size>* = nullptr ensures that alpha has the correct Eigen type. I would like to understand this better, but that will have to wait.
Next let's look at the propto template parameter. This is used for dropping constant terms in the computation of the log-PMF. In a Stan sampling statement, we don't have to compute any constant terms of the log-probability or log-density, as constant terms do not matter in MCMC. As \(n\) is an integer vector and therefore always constant "data", we don't have to compute log(sum_ns) or log(ns[i]). Not computing these logs can same time and energy. In some cases, we do want to know the absolute log-PMG. For instance in the generated quantities block when we want to use log-probabilities for model comparison (WAIC, LOO_IC, etc.). It is also possible that alpha is data (meaning that the base type is double. In that case, there is nothing to compute if we're only interested in the log PMF up-to constants. Hence, we return 0.0. To test if "summands" depending on the "prior size" have to be included in the log PMF, if propto == true, we can use include_summand<propto, T_prior_size>::value. Next, we compute the terms of the log PMF that include alpha, and in the same loop, we compute the sum of the ns. If sum_ns is zero (\(N = 0\)), then lp is still zero and we can simply return this value.
Then we compute \(\alpha_0 = \sum_{k=1}^K \alpha_k\) using Eigen's sum method. Note that the type is value_type_t<T_prior_size>, which is the base type of alpha. The difference between return_type_t and value_type_t is that the former takes multiple arguments and returns the most "general" base type that fits all of them. The arguments of return_type_t don't have to be containers.
If propto is false, we still have to add the terms that only depend on ns.

To give the user access to the functions we just defined, we have to add #include directives in the file prob.hpp in folder stan/math/prim. This file contains a list of include directions for all distributions in alphabetical order. So we just add our new files in the right place

Let's now write a C++ program that imports the Stan library to test our new function. Instructions on how to use the Stan math library in your project can be found here. In this example, I will simply sample a random vector from the Dirichlet-Multinomial distribution with fixed intensity. I then compute the log-probability of that sample. I put some effort in generating "real" random numbers using std::random_device, and I wrote a function to print std::vector. It helps to read a bit about the Eigen library, which has some methods for formatting matrices and vectors.

This program prints something like:

random sample from dirichlet_multinomial(100, (10.1,  3.5,  7.2)) = (30, 8, 62)
log dirichlet_multinomial((30, 8, 62) | (10.1,  3.5,  7.2)) = -8.60311

To compile a program that makes use of the Stan math library, Stan provides a useful tool called "standalone". Compilation of the above code is done as follows

make -f path_to_stan_math/make/standalone hello_dirmult

Building the Stan math library can be done in a similar way

make -j4 -f path_to_stan_math/make/standalone math-libs

The -j4 flag determines the number of CPU cores used for the task (or threads). I have an old core i5. So I'll use 4.

If we look at the makefile in the Stan math directory, we see that there is a target called doxygen. This is for building the documentation with doxygen using docstrings in the code. In our case, we have used some LaTeX equations (using delimiters \f$ ... \f$ and \f[ ... \f]), that are rendered by doxygen. Let's check that this is rendered correctly, and that we didn't make any TeX errors. We can build the documentation with make doxygen. We then find the HTML pages in doc/api/html and view them in a browser. Here's a screenshot
And here's one for the RNG documentation
Building the docs revealed several mistakes (and I left at least one in the screenshots above), so doing this was definitely helpful.

Step 3: Writing unit tests

This step may sound a bit boring, but I found out that it is quite important. First of all the Stan team will not accept your code without any unit tests. Second, testing your code helps. I managed to catch two errors in my (very small) piece of code, all because of writing tests. Stan makes use of the GoogleTest framework. There are lots of examples of tests for distributions in the folder math/test/unit/math/prim/prob/. I took multinomial_test.cpp as a template, which I modified. One of the tests checks that a exception is thrown if one of the intensity parameters are infinite. For the multinomial distribution, the function checks that the parameters form a simplex. Hence, it also checks of the values are finite, as they are in the interval \((0,1)\). In our case, the intensities only have to be positive, and so initially I changed check_simplex into check_positive. However, infinity is also positive and therefore this check does not raise an error if one of the parameters is infinite. Therefore I changed the check to check_positive_finite (provided in math/stan/math/prim/err.hpp) and this time my test passed. The other error was produced by a case in which all counts are zero. In our definition of the PMF, we would lead to \(0 / 0\) which gives a NaN. However, if the number of trials (or population size) is zero, then it makes sense to define the probability of the event \((0,0,\dots,0)'\) as \(1\), since this is the only possible event. I fixed this by checking for this condition in dirichlet_multinomial_lpmf. I will not show you all the unit tests I wrote/modified but just to give an idea, this is the first one where I compare the value of the Dirichlet-Multinomial PMF with a known value (calculated with scipy.stats in Python)

What's happening here is that TEST is a macro defined in gtest.h. The identifiers ProbDistributionsDirichletMultinomial and DirichletMultinomial are used to name the test function. The test function's body is defined in curly brackets. We then create a real vector \(\theta = (2.0, 3.0, 5.0)'\) and an integer vector \(n = (1,2,3)'\) and test if \[ \log(p(n | \theta)) = -2.477938 \] where \(p\) is the PMF of the Dirichlet-Multinomial distribution.
You can run all or a subset of tests with the runTests.py python script. For instance

python3 runTests.py test/unit/math/prim/prob/dirichlet_multinomial_test

runs only the new tests. Warning: running all tests takes a lot of time.

Step 4: Adding the distribution to the Stan documentation

Before we expose our new distribution to the user (i.e. make it a native function in the Stan programming language), we have to write some documentation for the Stan functions reference. We can clone the documentation with git clone git@github.com:stan-dev/docs.git. The documentation is written in R markdown. This is the part that I added to multivariate_discrete_distributions.Rmd in the folder docs/src/functions-reference

This is largely based on the multinomial example. Note that all functions have an HTML-commented function signature. The README.md says this about these comments:
The Stan Functions Reference contains HTML comments which describe the function signature for all functions. The script extract_function_sigs.py is used to scrape these signatures into a plain text file.
This feature exists for Stan-specific syntax highlighting. Syntax highlighters (see this link for a list) have to know which variable names correspond to native functions so that they can be colored distictly. To build the documentation, use the python script build.py as follows

python3 build.py 2 34

The numbers 2 and 34 correspond to the major and minor version numbers. The R markdown file contains strings such as `r since("2.34")` which correspond to the version number when a function became available. It was not clear to me what version I should use in my new documentation. At the time of writing, the version was 2.33, and I reasoned that this new distribution would become available in the next version: 2.34.
The build script should work if you have R studio installed, which I don't (I have rstudio-server, in case you're wondering how I can survive). In such a case, you'll have to install both pandoc and pandoc-citeproc with sudo apt install (on Ubuntu 22.04, that is). Make sure the R packages bookdown, dplyr, ggplot2, kableExtra are installed with install.packages(). However, I also found that I needed to install the Ubuntu packages libfontconfig1-dev, libcurl4-openssl-dev, libxml2-dev with sudo apt install. Here's a screenshot of the new page in the functions reference pdf.


Step 5: adding the distribution to the stanc3 compiler

In terms of lines of code, this step is really simple. The stanc3 compiler is written in the OCaml language which might be a bit unfamiliar (it certainly is for me). Also the OCaml ecosystem has a bit of a steep learning curve. We start by cloning stanc3 using git clone git@github.com:stan-dev/stanc3.git. The file that contains all native functions and distributions can then be found in stanc3/src/middle/Stan_math_signatures.ml. We just have to add lines for dirichlet_multinomial_rng and dirichlet, and we can basically copy the code from the signatures of multinomial_rng and multinomial.

Instructions on how to install all dependencies and stanc can be found here. the stanc repository contains a script that does basically everything for you. However, if you want to install OCaml yourself, check this page. You will need the dune package. I started working on this a while ago, and when I picked up the work recently, I found that I had to update my OCaml version. This can be done with opam switch create [version number] (with the right version number).

Naturally, the stanc compiler has its own tests which includes a couple of Stan programs to test function signatures. The Stan program stanc3/test/integration/good/function_signatures/distributions/rngs.stan contains function calls for all RNGs. We can add our new RNG to this file by adding the line

  ns = dirichlet_multinomial_rng(alpha, 20);

The variables ns and alpha are already defined or declared. I then added a folder stanc3/test/integration/good/function-signatures/distributions/multivariate/discrete/dirichlet_multinomial and created a file dirichlet_multinomial_lpmf.stan in this folder. I basically copied the content of the test file for the Multinomial distribution (located at stanc3/test/integration/good/function-signatures/distributions/multivariate/discrete/multinomial) and changed multinomial to dirichlet_multinomial. The fact that the multinomial distribution takes simplices and the Dirichlet Multinomial positive vectors does not seem to matter for these tests: it's just about function signatures. This is the test file that I added:

Finally, Use dune runtest to check changes in the code, and then run dune promote to accept these changes. Otherwise the integration tests (see Step 7 below) will fail. More instructions on exposing new functions can be found here

Step 6: Compiling a model with cmdstanpy

To test if everything works as expected, I will now write a Stan model that can be used to estimate intensity parameters of the Dirichlet-Multinomial distribution, given some data that was sampled from this distribution with known parameters. To make life easy (i.e. no messing around with cmdstan, json files etc.), I want to do this in Python using the cmdstanpy interface. Let's first write our Stan model.

This is pretty straightforward. We have \(K\) categories, \(N\) observations \(X\) and for sampling pseudo-observations, we also pass a population size parameter \(M\) to the model. In the parameters block, we define \(\alpha \in \mathbb{R}_+^K\), which is the intensity, and in the model block we give \(\alpha\) a weakly-informative log-normal prior. We have to write the "sampling statements" for \(X\) in a for loop, as the Dirichlet-Multinomial distribution does not allow vectorization at this point. In the generated quantities block, we sample "fake data" \(Y\) and compute the likelihood of the data \(X\), again in a for loop. This is not the most exciting model, but it's good enough for this test.

Important note: The "linking" solution presented in the following section works, but there is a better "official" solution, which I explain a little bit later.
So far, we have only used the Stan Math library in C++ directly. In order to get cmdstan to work with (1) our extended Stan Math library, and (2) the updated stanc compiler, we have to link all these components together. First, we have to point stan to the right math library. If you clone stan from github (with git clone git@github.com:stan-dev/stan.git) you will find a folder stan/lib/stan_math. This folder is empty and should contain the contents for the Stan Math library. It is possible replace the folder stan_math with a link to the math folder. First cd to the stan/lib folder and remove the empty stan_math folder. Then create a link with

ln -s path/to/math stan_math

Then clone cmdstan with git clone git@github.com:stan-dev/cmdstan.git. The cmdstan folder contains an empty stan folder. Replace this with a link to the modified stan folder.

This approach of creating links worked for me, but there is an easier way. It is possible to clone cmdstan from github "recursively", i.e. also clone the content of the stan folder, and the content of the lib/stan_math folder at once. Download and installation instructions for cmdstan can be found here. As indicated in the "Installing from GitHub repo" section, we can clone the stan and math repositories with make stan-update from the cmdstan folder. Or, instead of just cloning the cmdstan repository, you can clone recursively with git clone https://github.com/stan-dev/cmdstan.git --recursive. This cloning is quite slow, because you don't only get the latest version, but also the entire history of Stan development. To speed things up, you could do a "shallow" clone with the additional flags --depth 1 --shallow-submodules. We can then build cmdstan with make build. This will also download a pre-built stanc compiler. This is not quite what we want, though, because we have to link to our own version of the math library, and our modified stanc compiler.
To tell cmdstan that it should use our modified stanc version, we have to create a file called local in the folder cmdstan/make. In fact, there is a file local.example already present in this folder which contains come other (commented-out) options. We can copy this file with cp local.example local and then open it in a text editor and add the line STANC3=path/to/stanc3. I have stanc3 and cmdstan in the same folder, so for me this line is STANC3=../stanc3

It turns out that a similar trick can be used to use a custom math repository, and so we don't have to create symbolic links. In the cmdstan makefile, you'll find these lines:
-include make/local     # user-defined variables

STAN ?= stan/
MATH ?= $(STAN)lib/stan_math/
The ?= assignment in makefiles only works when the variable on the left-hand-side hasn't already been defined. So if we define MATH=path/to/math in our make/local file (which is included in the main makefile), then we override the MATH ?= stan/lib/stan_math/ assignment. We could do the same for the STAN variable.


To compile and run the above Stan model with cmdstanpy, we have to make sure cmdstanpy can find the right cmdstan installation. This can simpy be done using cmdstanpy.set_cmdstan_path("path/to/cmdstan") where "path/to/cmdstan" is the location of the modified cmdstan installation.

The python script makes a plot of the ground truth parameter vector alpha (black dots), together with gray violins representing the posterior distribution. Judging from this plot, everything seems to be working.
It also helps to look at the diagnostics (with print(sam.diagnose()) in python). They look reassuring:

Processing csv files: /tmp/tmp_vwn59j0/dirmultqq672ujg/dirmult-20231126104727_1.csv, ...

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Effective sample size satisfactory.

Split R-hat values satisfactory all parameters.

Processing complete, no problems detected.

Step 7: creating a pull request (PR) on GitHub

As Stan has multiple components, we have to submit multiple PRs. First, we have to submit our code to the Stan Math library. Then we have to provide new documentation for the functions reference, and then we have to submit our modifications to stanc3.

Stan Math PR

I will start with creating a PR for the math library. All PRs must correspond to an issue. So typically you first "request" a feature and then submit a PR to add it. In this case, there already exists an issue with a feature request for the Dirichlet-Multinomial distribution (from 2014), so we'll use that.
There's a lot of information to digest before you make a PR. A place to start is the "contributing" page. A much more detailed (and essential) "Developer process overview" can be found here, although these instructions are sometimes specific for the stan repository (not math). Then there is also this Introduction to Stan for New Developers. It is also a good idea to have a look at the pull request template (which is math specific).

On GitHub, fork the Stan math repository. Then clone the fork to your local machine. Make sure you're on the develop branch with git branch, and create a new branch with git checkout -b feature/issue-NN-short-description (this creates and switches to the new branch) where NN is the issue number and "short-description" is... a short description of the feature. In Steps 1-6, I had been working on just a clone of the math repository. So at this point I copied all new files and applied changes to the header file. The Stan team asks that you only commit well formatted code (more about this later), so be careful with what you commit to the feature branch. When you're ready, add files with git add path/to/file and commit changes with git commit -a -m "explain what changed". The -a option makes sure that all tracked files are added before the commit. To push the new brach to the remote fork, use git push -u origin feature/issue-NN-short-description

There are some requirements not covered in Steps 1-6 above. To make sure your code passes Stan's automated tests, these tests should pass:

make test-headers
make test-math-dependencies
runTests.py test/unit
make doxygen
make cpplint

For me, cpplint failed as lines were longer than 80 characters, and some lines ended in whitespace. In order to run cpplint, I had to install python2 (as Ubuntu 22.04 only has python3) with sudo apt install python2. On the web, there are some pages explaining how to make python2 the default python version. I think that's a really bad idea. However, you can tell make to use python2 with

export PYTHON=python2
make cpplint

The rule make test-math-dependencies did not work for me, but this make target just points to ./runChecks.py which has a hash-bang that did not work for me. So I just ran the script directly with

python3 runChecks.py

This ran without producing any output. Which means no errors were found.

My first PR failed very quickly. It turned out that passing cpplint was not sufficient to ensure proper formatting. To make sure that your code is formatted correctly, you have to use clang-formatter.
Installing clang-format in Ubuntu requires that you to first add the PPA for LLVM. Instructions are found here. On ubuntu 22.04, you would need the line "deb http://apt.llvm.org/jammy/ llvm-toolchain-jammy main". First get the signature with "wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -" (see section "Install (stable branch)". Then add the PPA with sudo add-apt-repository followed by the PPA string. It should also be possible to use the GUI (software and updates). On the coding style page instructions are given to install clang-format-5.0. For me, this did not work, but installing clang-format (without the 5.0) did. In the hooks folder, a script is provided to install a pre-commit git "hook". This makes sure that clang-format is run on all cpp and hpp files that were changed.

My second PR failed as well. This time at the "verify changes" stage
I got the following message

remote: Not Found
fatal: repository 'https://github.com/chvandorp/stan_math/math/' not found
error: pathspec 'forkedOrigin/feature/issue-54-dirichlet-multinomial' 
	did not match any file(s) known to git
script returned exit code 1

Which makes some sense because my fork is called stan_math and not stan_math/math. I renamed the repository math to stan_math to have a somewhat more descriptive name. The solution to this was simple: just change the name of the fork on GitHub. A name change is automatically updated the PR. All you have to do is change the remote URL in your local .git/config file. In the meantime, the Stan team has fixed this "bug" in their testing pipeline: it should now be possible to choose a different name for your fork.

After attempt 2, I also received a request to add an additional unit test. This test had to be added to test/unit/math/mix/prob/ using the expect_ad function. I was able to use the Dirichlet distribution as an example. expect_ad compares the AD with a value based on finite difference method. These tests also pass if an exception is thrown. It then just checks that all AD types also lead to the same exception. With these changes I tried submitting the PR again.
This time all checks passed.

Revisions

My dirichlet_multinomial_lpmf function is written in an ugly imperative style, while the Eigen and Stan math libraries provide methods to vectorize a lot of these operations. A Stan developer re-wrote my function in a much nicer, more functional style, and also added analytic derivatives.

I definitely do not understand all details of this revised function, but let's go through it anyway. The first line that's different defines a variable alpha_val as follows

const auto& alpha_val = as_value_array_or_scalar(alpha_ref);
As I understand it, the point of this is that if we want to compute gradients ourselves, then we want to temporarily "turn off" automatic differentiation. Hence, we have to use the "values" of Stan autodiff types instead of the full types in our computations. This is what as_value_array_or_scalar does. Next, we see that Stan math provides an overloaded function sum that takes sums of a lot of things. So there's no need for for loops or std::accumulate. The integer valued ns are then cast to double values using Eigen's cast<double> method (I'm not sure if the template disambiguater is required here).

Next, we are going to compute the log-probability lp. First, we add \(\log(N) - \sum_{k : x_k > 0} \log(x_k)\) to lp, but only if we are interested in constants. Notice that we can do elementwise comparison ns_array > 0 on Eigen arrays, and use the select method to compute an elementwise ternary operation. One important thing to realize here is that the computations are "lazy". This means although the expression log(ns_array) occurs in the C++ code, the log of the elements of ns_array that are zero are never evaluated. Similarly, we add \(\log(B(\alpha_0, N)) - \sum_{k : x_k > 0} \log(B(\alpha_k, x_k))\) to lp.

Now we will compute the gradient. For this, we use a object of the class partials_propagator, using the function make_partials_propagator. This function takes a variable number of arguments, one for each of the parameters or random variable that might require a gradient. In this case, this only holds for alpha, because ns is always constant. To put the "manually calculated" gradient in the right "slot", we use the partials<0> function, where the template argument 0 is used because we only had a single non-constant parameter alpha. The gradient of the distribution is given by \[ \frac{\partial\log(p(x | \alpha))}{\partial\alpha_i} = \frac{\partial\log(B(\alpha_0, N))}{\partial \alpha_i} - \sum_{k : x_k > 0} \frac{\partial \log(B(\alpha_k, x_k))}{\partial \alpha_k} \] Then we write \( \log(B(a, b)) = \log(\Gamma(a)) + \log(\Gamma(b)) - \log(\Gamma(a+b)) \) and consider two cases, \(x_i = 0\) (case I) and \(x_i > 0\) (case II). In case I, we get \[ \frac{\partial\log(p(x | \alpha))}{\partial\alpha_i} = \psi(\alpha_0) - \psi(\alpha_0 + N) \] where \(\psi(a) = \Gamma'(a) / \Gamma(a) \) denotes the digamma function. In case II, we get \[ \frac{\partial\log(p(x | \alpha))}{\partial\alpha_i} = \psi(\alpha_0) - \psi(\alpha_0 + N) - \psi(\alpha_i) + \psi(\alpha_i + x_i) \] Notice that we can use the latter expression for both cases, because \(\psi(\alpha_i) - \psi(\alpha_i + x_i) = 0\) when \(x_i = 0\). However, evaluating the digamma function is not free. Therefore we use the select method. The build method of the partials_propagator class then combines gradients and values in the right way and returns the appropriate value.

The other PRs

The remaining PRs (documentation and stanc) are quite uneventful. I did not create separate issues for the doc and stanc PRs, but referred to the accepted PR on stan-dev/math. For user-facing changes to stanc3, you must provide a PR which documents these changes. So first submit the PR for the doc, and then create a PR for stanc3.