X ~ normal(mu, sigma)
with target += normal_lcdf(X | mu, sigma)
when X is in fact the upper bound of a left-censored observation, that is N(μ,σ2) distributed. For a right-censored
observation you have to use the complementary cumulative distribution function normal_lccdf(X | mu, sigma)
.
A way to encode censoring in a Stan model such that all types of data can be treated equally is by using a custom probability distribution function
censored_normal_lpdf
that takes an additional argument cc
that determines the type of censoring. Below we use the following codes:
Uncensored | 0 |
Left-censored | 1 |
Right-censored | 2 |
Missing data | 3 |
The function
censored_normal_lpdf
is defined in the functions
of the Stan model below.
In addition to N
Observations
, we add N
CensorCodes
that
must be either 0, 1, 2, or 3 in the data
block.
In the model
block, we now can use the
sampling statement Observations[n] ~ censored_normal(mu, sigma, CensorCodes[n])
,
because this is just syntactic sugar for
target += censored_normal_lpdf(Observations[n] | mu, sigma, CensorCodes[n])
.
Vectorization unfortunately does not work with this method.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
functions { | |
real censored_normal_lpdf(real x, real mu, real sigma, int cc) { | |
real ll; | |
if ( cc == 0 ) { // uncensored | |
ll = normal_lpdf(x | mu, sigma); | |
} else if ( cc == 1 ) { // left-censored | |
ll = normal_lcdf(x | mu, sigma); | |
} else if ( cc == 2 ) { // right-censored | |
ll = normal_lccdf(x | mu, sigma); | |
} else if ( cc == 3 ) { // missing data | |
ll = 0.0; | |
} else { // any other censoring code is invalid | |
reject("invalid censoring code in censored_normal_lpdf"); | |
} | |
return ll; | |
} | |
} | |
data { | |
int<lower=0> N; | |
real Observations[N]; | |
int<lower=0, upper=3> CensorCodes[N]; | |
} | |
parameters { | |
real mu; | |
real<lower=0> sigma; | |
} | |
model { | |
for ( n in 1:N ) { | |
Observations[n] ~ censored_normal(mu, sigma, CensorCodes[n]); | |
} | |
} |
To demonstrate the
censored_normal
distribution,
I added the following python script that generates some random data
and compiles and runs the Stan model.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/python3 | |
import numpy as np | |
import pystan | |
from scipy.stats import norm | |
## define the codes used for the type of censoring | |
uncensored_code = 0 | |
left_censored_code = 1 | |
right_censored_code = 2 | |
missing_code = 3 | |
## number of Observations | |
N = 200 | |
## generate some (uncensored) data | |
UncensObs = norm.rvs(loc=0, scale=1, size=N) | |
## choose artificial "limits of detection" | |
LowerBound = -1 | |
UpperBound = 2 | |
## apply censoring accoring to the chosen bounds | |
def make_cens_obs(x): | |
if x < LowerBound: | |
return (LowerBound, left_censored_code) | |
elif x > UpperBound: | |
return (UpperBound, right_censored_code) | |
else: | |
return (x, uncensored_code) | |
ObsCensCodes = [make_cens_obs(x) for x in UncensObs] | |
## collect the data in a dictionary | |
data_dict = { | |
"N" : N, | |
"Observations" : [x[0] for x in ObsCensCodes], | |
"CensorCodes" : [x[1] for x in ObsCensCodes] | |
} | |
## compile the stan model | |
sm = pystan.StanModel(file="censored-normal.stan") | |
## sample from the posterior | |
sam = sm.sampling(data=data_dict) | |
chain_dict = sam.extract(permuted=True) | |
## look at the results | |
mu_est = np.mean(chain_dict["mu"]) | |
mu_cri = np.percentile(chain_dict["mu"], [2.5, 97.5]) | |
sigma_est = np.mean(chain_dict["sigma"]) | |
sigma_cri = np.percentile(chain_dict["sigma"], [2.5, 97.5]) | |
print(f"mu: {mu_est:0.3f}, 95% CrI: [{mu_cri[0]:0.3f}, {mu_cri[1]:0.3f}]") | |
print(f"sigma: {sigma_est:0.3f}, 95% CrI: [{sigma_cri[0]:0.3f}, {sigma_cri[1]:0.3f}]") |
The output of the script will be something like
mu: -0.018, 95% CrI: [-0.176, 0.133] sigma: 1.071, 95% CrI: [0.953, 1.203]