Monday 26 October 2015

Carnes's life span distribution

In a paper by Carnes et. al, a simple parametric, but realistic life span distribution is given, and here I show how you can sample from it. In addition, assuming a demographic equilibrium, the age of individuals will have a particular distribution. I will show what this distribution is, and again how to sample from it. Sampling ages instead of lifespans might be useful for initiating simulations. I model epidemics, and I want my disease free (a.k.a. virgin) population to have the 'right' age distribution.
The life span distribution has hazard \[\lambda(a) = e^{u_1 a + v_1} + e^{u_2 a + v_2}\,.\] Typical parameters are given by \(u_1 = 0.1\), \(v_1 = -10.5\), \(u_2 = -0.4\), and \(v_2 = -8\), so that infants have a slightly increased hazard of dying, and after the age of 60, the hazard rapidly starts to grow, until it becomes exceedingly large around \(a = 100\).
The survival function \(S(a) = {\rm Pr}(A > a)\), where \(A\) is the random variable denoting 'age at death' is given by \(S(a) = e^{-\Lambda(a)}\), with \(\Lambda(a) := \int_0^a \lambda(a')da'\) denoting the cumulative hazard. The cumulative hazard \(\Lambda\) is easily calculated: \[ \Lambda(a) = \frac{e^{v_1}}{u_1}(e^{u_1 a}-1) + \frac{e^{v_2}}{u_2}(e^{u_2 a}-1)\,, \] but its inverse, or the inverse of the survival function is more difficult to calculate.
We need the inverse of \(S\), because sampling random deviates typically involves uniformly sampling a number \(u\in [0,1)\). The number \(S^{-1}(u)\) is then the desired deviate.

In a future post, I will show how to use the GNU Scientific Library (GSL) to sample deviates from \(A\).

Suppose that the birth rate \(b\) in our population is constant. A PDE describing the population is given by \[\frac{\partial N(a,t)}{\partial t} + \frac{\partial N(a,t)}{\partial a} = -\lambda(a)N(a,t)\,,\] where \(N(a,t)\) is the number (density) of individuals of age \(a\), alive at time \(t\). The boundary condition (describing birth) is given by \[N(0,t) = b\] When we assume that the population is in a demographic equilibrium, the time derivative with respect to \(t\) vanishes, and we get an ODE for the age distribution: \[\frac{\partial N(a)}{\partial a} = -\lambda(a) N(a)\,,\quad N(0) = b\,,\] where we omitted the variable \(t\). This equation can be solved: \[\frac{1}{N}\frac{\partial N}{\partial a} = \frac{\partial \log(N)}{\partial a} = -\lambda(a) \implies N(a) = c \cdot e^{-\Lambda(a)}\] for some constant \(c\). Since \(b = N(0) = c \cdot e^{-\Lambda(0)} = c\), we have \[N(a) = b\cdot e^{-\Lambda(a)}\,.\] Hence, we now know the PDF of the age distribution (up to a constant). Unfortunately, we can't get a closed form formula for the CDF, let alone invert it. Therefore, when we want to sample, we need another trick. I've used a method from Numerical Recipies in C. It involves finding a dominating function of the PDF, with an easy, and easily invertible, primitive.
Let's just assume that \(b = 1\), so that the objective PDF is \(N(a) = e^{-\Lambda(a)}\). Please notice that \(N\) is not a proper PDF, since, in general, the expectation won't be \(1\). We need to find a simple, dominating function for \(N\). A stepwise defined function might be a good choice, since the hazard is practically zero when the age is below \(50\), and then increases rapidly. We first find a comparison cumulative hazard \(\tilde{\Lambda}\) that is dominated by the actual cumulative hazard \(\Lambda\). Many choices are possible, but one can take for instance \[ \tilde{\Lambda}(a) = \left\{ \begin{array}{ll} 0 & \mbox{if } a < a_0 \\ \lambda(a^{\ast})\cdot (a-a_0) & \mbox{otherwise} \end{array}\right. \] where \[ a_0 = a^{\ast} - \frac{\Lambda(a^{\ast})}{\lambda(a^{\ast})}\,. \] The constant \(a_0\) is chosen such that the cumulative hazards \(\Lambda\) and \(\tilde{\Lambda}\) are tangent at \(a^{\ast}\).
Since \(\Lambda\) dominates \(\tilde{\Lambda}\), the survival function \(\tilde{S}\) defined by \(\tilde{S}(a) = e^{-\tilde{\Lambda}(a)}\) dominates \(S\). It is easy to find the inverse of \(a\mapsto\int_0^a \tilde{S}(a')da'\), and hence we can easily sample random deviates from the age distribution corresponding to \(\tilde{\Lambda}\). In order to sample from the desired age distribution, one can use a rejection method: (i) sample an age \(a\) from the easy age distribution. (ii) compute the ratio \(r = S(a)/\tilde{S}(a) \leq 1\). (iii) sample a deviate \(u \sim \mbox{Uniform}(0,1)\). (iv) accept the age \(a\) when \(u \leq r\), and reject \(a\) otherwise. Repeat these steps until an age \(a\) was accepted.

The only thing we still need to do, is to find a good value for \(a^{\ast}\). To make the sampler as efficient as possible, we want to minimize the probability that we have to reject the initially sampled age \(a\) from \(S\). This boils down to minimizing \[\int_0^{\infty} \tilde{S}(a)da = a_0 + \frac{1}{\lambda(a^{\ast})} = a^{\ast} + \frac{1 - \Lambda(a^{\ast})}{\lambda(a^{\ast})}\,. \] The derivative of \(a^{\ast} \mapsto \int_0^{\infty} \tilde{S}(a)da\) equals \[ \frac{(\Lambda(a^{\ast}) - 1)\lambda'(a^{\ast})}{\lambda(a^{\ast})^2} \] and thus, we find an extreme value for \(\int_0^{\infty} \tilde{S}(a)da\) when \(\Lambda(a^{\ast}) = 1\) or when \(\lambda'(a^{\ast}) = \frac{d\lambda}{da^{\ast}} = 0\). The second condition can only correspond to a very small \(a^{\ast}\), and therefore will not minimize \(\int_0^{\infty} \tilde{S}(a)da\). Hence, we have to solve \(\Lambda(a^{\ast}) = 1\). When we ignore the second term of \(\Lambda\), we find that: \[ a^{\ast} = \log(1 + u_1 \exp(-v_1))/u_1\]

No comments:

Post a Comment