The so-called "ones trick" for JAGS and BUGS models allows the user to sample from distributions that are not
in the standard list. Here I show another application for "unbalanced" missing data.
More examples using the ones (or zeros) trick can be found in The BUGS Book, and in particular in Chapter 9.
Suppose that we want to find an association between a trait x (possibly continuous) and another categorical trait k∈{1,2,…,K}. We have N observations xi, however, some of the ki are missing. Whenever there is no information on the missing ki whatsoever, we can simply write the following JAGS model:
Suppose now that the missing kis are not completely unknown. In stead, suppose that we know that some of the ki must belong to a group gi. The group gi is encoded as a binary vector, where a gi,j=1 indicates that the trait value j is in the group, and 0 that it is not. In particular, when ki is known, then gi,j={1if j=ki0otherwise If ki is completely unknown, then we just let gi,j=1 for each j.
Similarly, it might have been tempting to write
As an example, I generated some random kis sampled from Categorical(p) (I did not bother to sample any xis). I have taken K=15, N=2000, and 3 randomly chosen groups. For a 1000 observations, I "forget" the actual ki, and only know the group gi. The goal is to recover p and the missing kis.
Using a chain of length 20000 (using the first 10000 as a burn-in and 1/10-th thinning) we get the following result:
Suppose that we want to find an association between a trait x (possibly continuous) and another categorical trait k∈{1,2,…,K}. We have N observations xi, however, some of the ki are missing. Whenever there is no information on the missing ki whatsoever, we can simply write the following JAGS model:
data { /* alpha is the parameter for the Dirichlet prior of p, * with p the estimated frequency of k */ for ( j in 1:K ) { alpha[j] <- 1 } } model { for ( i in 1:N ) { /* Some model for x, given k */ x[i] ~ dnorm(xhat[k[i]], tau_x) /* Sample missing values of k, and inform * the distribution p with the known k's */ k[i] ~ dcat(p) } /* Model the distribution p_k of the trait k */ p ~ ddirch(alpha) /* The precision of the likelihood for x */ tau_x ~ dgamma(0.01, 0.01) /* Give the xhat's a shared prior to regularize them */ for ( j in 1:K ) { xhat[j] ~ dnorm(mu_xhat, tau_xhat) } mu_xhat ~ dnorm(0, 0.01) tau_xhat ~ dgamma(0.01, 0.01) }The data file must contain the vector k, with NA in the place of the missing values.
Suppose now that the missing kis are not completely unknown. In stead, suppose that we know that some of the ki must belong to a group gi. The group gi is encoded as a binary vector, where a gi,j=1 indicates that the trait value j is in the group, and 0 that it is not. In particular, when ki is known, then gi,j={1if j=ki0otherwise If ki is completely unknown, then we just let gi,j=1 for each j.
data { for ( j in 1:K ) { alpha[j] <- 1 } /* for the "ones trick", * we need a 1 for every observation */ for ( i in 1:N ) { ones[i] <- 1 } } model { for ( i in 1:N ) { x[i] ~ dnorm(xhat[k[i]], tau_x) /* sample missing k's using the group-vector g */ k[i] ~ dcat(g[i,]) /* in order to inform p correctly, * we need to multiply the posterior with p[k[i]], * which can be done by observing a 1, * assumed to be Bernoulli(p[k[i]]) distributed */ ones[i] ~ dbern(p[k[i]]) } p ~ ddirch(alpha) tau_x ~ dgamma(0.01, 0.01) for ( j in 1:K ) { xhat[j] ~ dnorm(mu_xhat, tau_xhat) } mu_xhat ~ dnorm(0, 0.01) tau_xhat ~ dgamma(0.01, 0.01) }In stead of using the "ones trick", it would have been more clear-cut if we were able to write
k[i] ~ dcat(g[i,]) /* sampling statement */ k[i] ~ dcat(p) /* statement to inform p */but in this way JAGS can not tell that it is to sample from Categorical(gi), and not from Categorical(p).
Similarly, it might have been tempting to write
k[i] ~ dcat(g[i,] * p) /* point-wise vector multiplication */This would correctly prefer the common ks in group gi over the rare ks, but the fact that ki must come from the group gi is not used to inform p.
As an example, I generated some random kis sampled from Categorical(p) (I did not bother to sample any xis). I have taken K=15, N=2000, and 3 randomly chosen groups. For a 1000 observations, I "forget" the actual ki, and only know the group gi. The goal is to recover p and the missing kis.
Using a chain of length 20000 (using the first 10000 as a burn-in and 1/10-th thinning) we get the following result: