Processing math: 100%

Monday, 26 October 2015

Carnes's life span distribution

In a paper by Carnes et. al, a simple parametric, but realistic life span distribution is given, and here I show how you can sample from it. In addition, assuming a demographic equilibrium, the age of individuals will have a particular distribution. I will show what this distribution is, and again how to sample from it. Sampling ages instead of lifespans might be useful for initiating simulations. I model epidemics, and I want my disease free (a.k.a. virgin) population to have the 'right' age distribution.
The life span distribution has hazard λ(a)=eu1a+v1+eu2a+v2. Typical parameters are given by u1=0.1, v1=10.5, u2=0.4, and v2=8, so that infants have a slightly increased hazard of dying, and after the age of 60, the hazard rapidly starts to grow, until it becomes exceedingly large around a=100.
The survival function S(a)=Pr(A>a), where A is the random variable denoting 'age at death' is given by S(a)=eΛ(a), with Λ(a):=a0λ(a)da denoting the cumulative hazard. The cumulative hazard Λ is easily calculated: Λ(a)=ev1u1(eu1a1)+ev2u2(eu2a1), but its inverse, or the inverse of the survival function is more difficult to calculate.
We need the inverse of S, because sampling random deviates typically involves uniformly sampling a number u[0,1). The number S1(u) is then the desired deviate.

In a future post, I will show how to use the GNU Scientific Library (GSL) to sample deviates from A.

Suppose that the birth rate b in our population is constant. A PDE describing the population is given by N(a,t)t+N(a,t)a=λ(a)N(a,t), where N(a,t) is the number (density) of individuals of age a, alive at time t. The boundary condition (describing birth) is given by N(0,t)=b When we assume that the population is in a demographic equilibrium, the time derivative with respect to t vanishes, and we get an ODE for the age distribution: N(a)a=λ(a)N(a),N(0)=b, where we omitted the variable t. This equation can be solved: 1NNa=log(N)a=λ(a)N(a)=ceΛ(a) for some constant c. Since b=N(0)=ceΛ(0)=c, we have N(a)=beΛ(a). Hence, we now know the PDF of the age distribution (up to a constant). Unfortunately, we can't get a closed form formula for the CDF, let alone invert it. Therefore, when we want to sample, we need another trick. I've used a method from Numerical Recipies in C. It involves finding a dominating function of the PDF, with an easy, and easily invertible, primitive.
Let's just assume that b=1, so that the objective PDF is N(a)=eΛ(a). Please notice that N is not a proper PDF, since, in general, the expectation won't be 1. We need to find a simple, dominating function for N. A stepwise defined function might be a good choice, since the hazard is practically zero when the age is below 50, and then increases rapidly. We first find a comparison cumulative hazard ˜Λ that is dominated by the actual cumulative hazard Λ. Many choices are possible, but one can take for instance ˜Λ(a)={0if a<a0λ(a)(aa0)otherwise where a0=aΛ(a)λ(a). The constant a0 is chosen such that the cumulative hazards Λ and ˜Λ are tangent at a.
Since Λ dominates ˜Λ, the survival function ˜S defined by ˜S(a)=e˜Λ(a) dominates S. It is easy to find the inverse of aa0˜S(a)da, and hence we can easily sample random deviates from the age distribution corresponding to ˜Λ. In order to sample from the desired age distribution, one can use a rejection method: (i) sample an age a from the easy age distribution. (ii) compute the ratio r=S(a)/˜S(a)1. (iii) sample a deviate uUniform(0,1). (iv) accept the age a when ur, and reject a otherwise. Repeat these steps until an age a was accepted.

The only thing we still need to do, is to find a good value for a. To make the sampler as efficient as possible, we want to minimize the probability that we have to reject the initially sampled age a from S. This boils down to minimizing 0˜S(a)da=a0+1λ(a)=a+1Λ(a)λ(a). The derivative of a0˜S(a)da equals (Λ(a)1)λ(a)λ(a)2 and thus, we find an extreme value for 0˜S(a)da when Λ(a)=1 or when λ(a)=dλda=0. The second condition can only correspond to a very small a, and therefore will not minimize 0˜S(a)da. Hence, we have to solve Λ(a)=1. When we ignore the second term of Λ, we find that: a=log(1+u1exp(v1))/u1