In a paper by Carnes

The life span distribution has hazard \[\lambda(a) = e^{u_1 a + v_1} + e^{u_2 a + v_2}\,.\] Typical parameters are given by \(u_1 = 0.1\), \(v_1 = -10.5\), \(u_2 = -0.4\), and \(v_2 = -8\), so that infants have a slightly increased hazard of dying, and after the age of 60, the hazard rapidly starts to grow, until it becomes exceedingly large around \(a = 100\).

The survival function \(S(a) = {\rm Pr}(A > a)\), where \(A\) is the random variable denoting 'age at death' is given by \(S(a) = e^{-\Lambda(a)}\), with \(\Lambda(a) := \int_0^a \lambda(a')da'\) denoting the cumulative hazard. The cumulative hazard \(\Lambda\) is easily calculated: \[ \Lambda(a) = \frac{e^{v_1}}{u_1}(e^{u_1 a}-1) + \frac{e^{v_2}}{u_2}(e^{u_2 a}-1)\,, \] but its inverse, or the inverse of the survival function is more difficult to calculate.

We need the inverse of \(S\), because sampling random deviates typically involves uniformly sampling a number \(u\in [0,1)\). The number \(S^{-1}(u)\) is then the desired deviate.

In a future post, I will show how to use the GNU Scientific Library (GSL) to sample deviates from \(A\).

Suppose that the birth rate \(b\) in our population is constant. A PDE describing the population is given by \[\frac{\partial N(a,t)}{\partial t} + \frac{\partial N(a,t)}{\partial a} = -\lambda(a)N(a,t)\,,\] where \(N(a,t)\) is the number (density) of individuals of age \(a\), alive at time \(t\). The boundary condition (describing birth) is given by \[N(0,t) = b\] When we assume that the population is in a demographic equilibrium, the time derivative with respect to \(t\) vanishes, and we get an ODE for the age distribution: \[\frac{\partial N(a)}{\partial a} = -\lambda(a) N(a)\,,\quad N(0) = b\,,\] where we omitted the variable \(t\). This equation can be solved: \[\frac{1}{N}\frac{\partial N}{\partial a} = \frac{\partial \log(N)}{\partial a} = -\lambda(a) \implies N(a) = c \cdot e^{-\Lambda(a)}\] for some constant \(c\). Since \(b = N(0) = c \cdot e^{-\Lambda(0)} = c\), we have \[N(a) = b\cdot e^{-\Lambda(a)}\,.\] Hence, we now know the PDF of the age distribution (up to a constant). Unfortunately, we can't get a closed form formula for the CDF, let alone invert it. Therefore, when we want to sample, we need another trick. I've used a method from Numerical Recipies in C. It involves finding a dominating function of the PDF, with an easy, and easily invertible, primitive.

Let's just assume that \(b = 1\), so that the objective PDF is \(N(a) = e^{-\Lambda(a)}\). Please notice that \(N\) is not a proper PDF, since, in general, the expectation won't be \(1\). We need to find a simple, dominating function for \(N\). A stepwise defined function might be a good choice, since the hazard is practically zero when the age is below \(50\), and then increases rapidly. We first find a comparison cumulative hazard \(\tilde{\Lambda}\) that

Since \(\Lambda\) dominates \(\tilde{\Lambda}\), the survival function \(\tilde{S}\) defined by \(\tilde{S}(a) = e^{-\tilde{\Lambda}(a)}\) dominates \(S\). It is easy to find the inverse of \(a\mapsto\int_0^a \tilde{S}(a')da'\), and hence we can easily sample random deviates from the age distribution corresponding to \(\tilde{\Lambda}\). In order to sample from the desired age distribution, one can use a

The only thing we still need to do, is to find a good value for \(a^{\ast}\). To make the sampler as efficient as possible, we want to minimize the probability that we have to reject the initially sampled age \(a\) from \(S\). This boils down to minimizing \[\int_0^{\infty} \tilde{S}(a)da = a_0 + \frac{1}{\lambda(a^{\ast})} = a^{\ast} + \frac{1 - \Lambda(a^{\ast})}{\lambda(a^{\ast})}\,. \] The derivative of \(a^{\ast} \mapsto \int_0^{\infty} \tilde{S}(a)da\) equals \[ \frac{(\Lambda(a^{\ast}) - 1)\lambda'(a^{\ast})}{\lambda(a^{\ast})^2} \] and thus, we find an extreme value for \(\int_0^{\infty} \tilde{S}(a)da\) when \(\Lambda(a^{\ast}) = 1\) or when \(\lambda'(a^{\ast}) = \frac{d\lambda}{da^{\ast}} = 0\). The second condition can only correspond to a very small \(a^{\ast}\), and therefore will not minimize \(\int_0^{\infty} \tilde{S}(a)da\). Hence, we have to solve \(\Lambda(a^{\ast}) = 1\). When we ignore the second term of \(\Lambda\), we find that: \[ a^{\ast} = \log(1 + u_1 \exp(-v_1))/u_1\]

*et. al*, a simple parametric, but realistic life span distribution is given, and here I show how you can sample from it. In addition, assuming a demographic equilibrium, the age of individuals will have a particular distribution. I will show what this distribution is, and again how to sample from it. Sampling ages instead of lifespans might be useful for initiating simulations. I model epidemics, and I want my disease free (a.k.a. virgin) population to have the 'right' age distribution.The life span distribution has hazard \[\lambda(a) = e^{u_1 a + v_1} + e^{u_2 a + v_2}\,.\] Typical parameters are given by \(u_1 = 0.1\), \(v_1 = -10.5\), \(u_2 = -0.4\), and \(v_2 = -8\), so that infants have a slightly increased hazard of dying, and after the age of 60, the hazard rapidly starts to grow, until it becomes exceedingly large around \(a = 100\).

The survival function \(S(a) = {\rm Pr}(A > a)\), where \(A\) is the random variable denoting 'age at death' is given by \(S(a) = e^{-\Lambda(a)}\), with \(\Lambda(a) := \int_0^a \lambda(a')da'\) denoting the cumulative hazard. The cumulative hazard \(\Lambda\) is easily calculated: \[ \Lambda(a) = \frac{e^{v_1}}{u_1}(e^{u_1 a}-1) + \frac{e^{v_2}}{u_2}(e^{u_2 a}-1)\,, \] but its inverse, or the inverse of the survival function is more difficult to calculate.

We need the inverse of \(S\), because sampling random deviates typically involves uniformly sampling a number \(u\in [0,1)\). The number \(S^{-1}(u)\) is then the desired deviate.

In a future post, I will show how to use the GNU Scientific Library (GSL) to sample deviates from \(A\).

Suppose that the birth rate \(b\) in our population is constant. A PDE describing the population is given by \[\frac{\partial N(a,t)}{\partial t} + \frac{\partial N(a,t)}{\partial a} = -\lambda(a)N(a,t)\,,\] where \(N(a,t)\) is the number (density) of individuals of age \(a\), alive at time \(t\). The boundary condition (describing birth) is given by \[N(0,t) = b\] When we assume that the population is in a demographic equilibrium, the time derivative with respect to \(t\) vanishes, and we get an ODE for the age distribution: \[\frac{\partial N(a)}{\partial a} = -\lambda(a) N(a)\,,\quad N(0) = b\,,\] where we omitted the variable \(t\). This equation can be solved: \[\frac{1}{N}\frac{\partial N}{\partial a} = \frac{\partial \log(N)}{\partial a} = -\lambda(a) \implies N(a) = c \cdot e^{-\Lambda(a)}\] for some constant \(c\). Since \(b = N(0) = c \cdot e^{-\Lambda(0)} = c\), we have \[N(a) = b\cdot e^{-\Lambda(a)}\,.\] Hence, we now know the PDF of the age distribution (up to a constant). Unfortunately, we can't get a closed form formula for the CDF, let alone invert it. Therefore, when we want to sample, we need another trick. I've used a method from Numerical Recipies in C. It involves finding a dominating function of the PDF, with an easy, and easily invertible, primitive.

Let's just assume that \(b = 1\), so that the objective PDF is \(N(a) = e^{-\Lambda(a)}\). Please notice that \(N\) is not a proper PDF, since, in general, the expectation won't be \(1\). We need to find a simple, dominating function for \(N\). A stepwise defined function might be a good choice, since the hazard is practically zero when the age is below \(50\), and then increases rapidly. We first find a comparison cumulative hazard \(\tilde{\Lambda}\) that

*is dominated by*the actual cumulative hazard \(\Lambda\). Many choices are possible, but one can take for instance \[ \tilde{\Lambda}(a) = \left\{ \begin{array}{ll} 0 & \mbox{if } a < a_0 \\ \lambda(a^{\ast})\cdot (a-a_0) & \mbox{otherwise} \end{array}\right. \] where \[ a_0 = a^{\ast} - \frac{\Lambda(a^{\ast})}{\lambda(a^{\ast})}\,. \] The constant \(a_0\) is chosen such that the cumulative hazards \(\Lambda\) and \(\tilde{\Lambda}\) are tangent at \(a^{\ast}\).Since \(\Lambda\) dominates \(\tilde{\Lambda}\), the survival function \(\tilde{S}\) defined by \(\tilde{S}(a) = e^{-\tilde{\Lambda}(a)}\) dominates \(S\). It is easy to find the inverse of \(a\mapsto\int_0^a \tilde{S}(a')da'\), and hence we can easily sample random deviates from the age distribution corresponding to \(\tilde{\Lambda}\). In order to sample from the desired age distribution, one can use a

*rejection method*: (i) sample an age \(a\) from the easy age distribution. (ii) compute the ratio \(r = S(a)/\tilde{S}(a) \leq 1\). (iii) sample a deviate \(u \sim \mbox{Uniform}(0,1)\). (iv)*accept*the age \(a\) when \(u \leq r\), and*reject*\(a\) otherwise. Repeat these steps until an age \(a\) was accepted.The only thing we still need to do, is to find a good value for \(a^{\ast}\). To make the sampler as efficient as possible, we want to minimize the probability that we have to reject the initially sampled age \(a\) from \(S\). This boils down to minimizing \[\int_0^{\infty} \tilde{S}(a)da = a_0 + \frac{1}{\lambda(a^{\ast})} = a^{\ast} + \frac{1 - \Lambda(a^{\ast})}{\lambda(a^{\ast})}\,. \] The derivative of \(a^{\ast} \mapsto \int_0^{\infty} \tilde{S}(a)da\) equals \[ \frac{(\Lambda(a^{\ast}) - 1)\lambda'(a^{\ast})}{\lambda(a^{\ast})^2} \] and thus, we find an extreme value for \(\int_0^{\infty} \tilde{S}(a)da\) when \(\Lambda(a^{\ast}) = 1\) or when \(\lambda'(a^{\ast}) = \frac{d\lambda}{da^{\ast}} = 0\). The second condition can only correspond to a very small \(a^{\ast}\), and therefore will not minimize \(\int_0^{\infty} \tilde{S}(a)da\). Hence, we have to solve \(\Lambda(a^{\ast}) = 1\). When we ignore the second term of \(\Lambda\), we find that: \[ a^{\ast} = \log(1 + u_1 \exp(-v_1))/u_1\]

## No comments:

## Post a Comment