X ~ normal(mu, sigma)
with target += normal_lcdf(X | mu, sigma)
when \(X\) is in fact the upper bound of a left-censored observation, that is \(\mathcal{N}(\mu, \sigma^2)\) distributed. For a right-censored
observation you have to use the complementary cumulative distribution function normal_lccdf(X | mu, sigma)
.
A way to encode censoring in a Stan model such that all types of data can be treated equally is by using a custom probability distribution function
censored_normal_lpdf
that takes an additional argument cc
that determines the type of censoring. Below we use the following codes:
Uncensored | 0 |
Left-censored | 1 |
Right-censored | 2 |
Missing data | 3 |
The function
censored_normal_lpdf
is defined in the functions
of the Stan model below.
In addition to N
Observations
, we add N
CensorCodes
that
must be either 0, 1, 2, or 3 in the data
block.
In the model
block, we now can use the
sampling statement Observations[n] ~ censored_normal(mu, sigma, CensorCodes[n])
,
because this is just syntactic sugar for
target += censored_normal_lpdf(Observations[n] | mu, sigma, CensorCodes[n])
.
Vectorization unfortunately does not work with this method.
To demonstrate the
censored_normal
distribution,
I added the following python script that generates some random data
and compiles and runs the Stan model.
The output of the script will be something like
mu: -0.018, 95% CrI: [-0.176, 0.133] sigma: 1.071, 95% CrI: [0.953, 1.203]