When looking for associations between features i=1,…,m and some trait,
it is often necessary to have some sort of multiple-testing correction.
A very conservative method is the Bonferroni correction, that minimizes the family-wise error rate (FWER),
but at the cost of many false negatives.
This is not desirable when one wants to discover features or associations,
and therefore other methods have been developed.
One particularly intuitive method is based on the false discovery rate (FDR) and uses so-called q-values,
which are (under certain conditions) elegantly analogous to p-values.
In the case of hypothesis testing, one typically has a test statistic T, and one wants to test if the null hypothesis is true (H=0), or rather that the alternative hypothesis is true (H=1). The statistical model specifies the distribution of T|H=0, and the null hypothesis is rejected when the realization of T falls into a pre-defined significance region Γ.
When testing multiple features, we typically have a sequence (Ti,Hi)mi=1, here assumed to be identically distributed and independent. The pFDR then depends on Γ: pFDR(Γ)=E[F(Γ)S(Γ)|S(Γ)>0], where F(Γ):=#{i:Ti∈Γ∧Hi=0} and S(Γ):=#{i:Ti∈Γ}. Storey derives that under certain conditions, we can write pFDR(Γ)=P[H=0|T∈Γ]=E[F(Γ)]E[S(Γ)]
The q-value of a realization t of T is then defined as q(t)=infΓα|t∈ΓαpFDR(Γα) We can now give the above-mentioned analogy between p-values and q-values. The p-value is defined as: p(t)=inf{Γα:t∈Γα}P[T∈Γα|H=0] While under the right conditions, the q-value can be written as: q(t)=inf{Γα:t∈Γα}P[H=0|T∈Γα]
First, we have to estimate E[S(α)], for which we use #{i:pi≤α}, and we estimate E[F(α)] with m⋅ˆπ0⋅α, where ˆπ0 is an estimate for π0=P[H=0].
The most difficult part of computing q-values is estimating π0. An often used estimate is the average p-value, but we can do a little bit better by making use of the empirical CDF of the p-values. The figure below shows a "typical" histogram of p-values, where the p-values sampled from the alternative distribution are given by the gray bars. The marginal distribution of p-values becomes relatively flat towards the right, where most p-values should come from the null distribution, which is Uniform(0,1). The right panel of the figure shows a "rotated" empirical CDF of the p-values, i.e. x↦R(x)=1−CDF(1−x), and the fact that large p-values should be uniformly distributed is resembled by the fact that R(x) is a straight line for small x. The slope of this straight line is an estimate of π0. In the C++ code below, I use GSL to fit a line though R (indicated by the red line in the figure), using a weight function x↦(1−x)2 to give more precedence to small values of x, thereby mostly ignoring the non-straight part of R.
In this example, the marginal p-values are sampled from a mixture distribution π0Uniform(0,1)+(1−π0)Beta(10,1), where π0=3/4.
Now we sort the p-values such that p1≤p2≤⋯≤pm, such that #{j:pj≤pi}=i and first determine the q-value corresponding to pm: qm=ˆπ0⋅pm The q-value qi corresponding to p-value pi is computed as qi=min(ˆπ0pim/i,qi+1) Recently, I had to implement this algorithm in C++. The function is given in the following header file, and accepts a "keyed" list of p-values, where the key is used to identify the feature. The function returns a keyed list of q-values.
The following code gives an example of how to use qvalues.hpp, and can be used for testing:
After compiling and running the program, the result should be something like:
False discovery rate
Let S be the number of features called significant, and F the number of false positives among the significant features (i.e. false discoveries). In an article by John D. Storey, The (positive) false discovery rate is defined as pFDR:=E[FS|S>0]. Hence, the pFDR is the expected fraction of false positives among the features that are called significant. The condition S>0 ensures that it is well defined.In the case of hypothesis testing, one typically has a test statistic T, and one wants to test if the null hypothesis is true (H=0), or rather that the alternative hypothesis is true (H=1). The statistical model specifies the distribution of T|H=0, and the null hypothesis is rejected when the realization of T falls into a pre-defined significance region Γ.
When testing multiple features, we typically have a sequence (Ti,Hi)mi=1, here assumed to be identically distributed and independent. The pFDR then depends on Γ: pFDR(Γ)=E[F(Γ)S(Γ)|S(Γ)>0], where F(Γ):=#{i:Ti∈Γ∧Hi=0} and S(Γ):=#{i:Ti∈Γ}. Storey derives that under certain conditions, we can write pFDR(Γ)=P[H=0|T∈Γ]=E[F(Γ)]E[S(Γ)]
The q-value
Let {Γα}1α=0 be a nested family of significance regions. That is Γα⊆Γα′ whenever α≤α′ and P[T∈Γα|H=0]=α. For instance, if T|H=0∼N(0,1), then we could choose Γα=[zα,∞), where zα=Φ−1(α), with Φ the CDF if N(0,1).The q-value of a realization t of T is then defined as q(t)=infΓα|t∈ΓαpFDR(Γα) We can now give the above-mentioned analogy between p-values and q-values. The p-value is defined as: p(t)=inf{Γα:t∈Γα}P[T∈Γα|H=0] While under the right conditions, the q-value can be written as: q(t)=inf{Γα:t∈Γα}P[H=0|T∈Γα]
Computing q-values
In order to compute q-values, given a sequence of p-values, we follow the steps given in a paper by Storey and Tibshirani In this scenario, the p-value plays the role of the realization t of the statistic T. Under the null hypothesis, these p-values are uniformly distributed. As a family of significance regions, we simply take Γα=[0,α], and write for instance S(α):=S(Γα).First, we have to estimate E[S(α)], for which we use #{i:pi≤α}, and we estimate E[F(α)] with m⋅ˆπ0⋅α, where ˆπ0 is an estimate for π0=P[H=0].
The most difficult part of computing q-values is estimating π0. An often used estimate is the average p-value, but we can do a little bit better by making use of the empirical CDF of the p-values. The figure below shows a "typical" histogram of p-values, where the p-values sampled from the alternative distribution are given by the gray bars. The marginal distribution of p-values becomes relatively flat towards the right, where most p-values should come from the null distribution, which is Uniform(0,1). The right panel of the figure shows a "rotated" empirical CDF of the p-values, i.e. x↦R(x)=1−CDF(1−x), and the fact that large p-values should be uniformly distributed is resembled by the fact that R(x) is a straight line for small x. The slope of this straight line is an estimate of π0. In the C++ code below, I use GSL to fit a line though R (indicated by the red line in the figure), using a weight function x↦(1−x)2 to give more precedence to small values of x, thereby mostly ignoring the non-straight part of R.

In this example, the marginal p-values are sampled from a mixture distribution π0Uniform(0,1)+(1−π0)Beta(10,1), where π0=3/4.
Now we sort the p-values such that p1≤p2≤⋯≤pm, such that #{j:pj≤pi}=i and first determine the q-value corresponding to pm: qm=ˆπ0⋅pm The q-value qi corresponding to p-value pi is computed as qi=min(ˆπ0pim/i,qi+1) Recently, I had to implement this algorithm in C++. The function is given in the following header file, and accepts a "keyed" list of p-values, where the key is used to identify the feature. The function returns a keyed list of q-values.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/** qvalues.hpp | |
* transform a keyed list of p-values into a keyed list of q-values | |
*/ | |
#ifndef QVALUES_HPP_ | |
#define QVALUES_HPP_ | |
#include <map> | |
#include <list> | |
#include <vector> | |
#include <algorithm> // min, count_if | |
#include <stdexcept> | |
#include <gsl/gsl_fit.h> // gsl_fit_wmul | |
template<class KeyType> | |
std::map<KeyType, double> computeQValues(const std::map<KeyType, double> & pValues) { | |
int m = pValues.size(); | |
// sort keys by p-values | |
std::list<KeyType> keys; | |
for ( auto kp : pValues ) { | |
keys.push_back(kp.first); | |
} | |
keys.sort([&](KeyType lk, KeyType rk){return pValues.at(lk) < pValues.at(rk);}); | |
// make a partition of the interval [0, 1] | |
int n = int(2*sqrt(m)); | |
double mesh = 1.0 / n; | |
std::vector<double> xs(n, 0.0); | |
for ( int i = 0; i < n; ++i ) { | |
xs[i] = (i+1) * mesh; | |
} | |
// make a rotated CDF, i.e.: rcdf(x) = 1-cdf(1-x) | |
std::vector<double> rcdf(n, 0.0); | |
for ( int i = 0; i < n; ++i ) { | |
auto pred = [&](std::pair<KeyType, double> pair){return pair.second < 1.0 - xs[i];}; | |
rcdf[i] = 1.0 - std::count_if(pValues.begin(), pValues.end(), pred) / double(m); | |
} | |
// make a weight vector | |
std::vector<double> ws = xs; | |
std::for_each(ws.begin(), ws.end(), [](double & x){x = (1-x)*(1-x);}); | |
// fit a line through the rotated cdf | |
double hatPi0, cov11, sumsq; // passed by ref to GSL function | |
int err_code_fit = gsl_fit_wmul(xs.data(), 1, ws.data(), 1, rcdf.data(), | |
1, n, &hatPi0, &cov11, &sumsq); | |
if ( err_code_fit != GSL_SUCCESS ) { | |
throw std::runtime_error("unable to estimate pi_0"); | |
} | |
// compute the q-values, by starting with the largest p-value | |
std::map<KeyType, double> qValues; // to-be-returned | |
double q_previous = 1.0; | |
keys.reverse(); | |
int i = m; // index in the for loop over the keys | |
for ( auto k : keys ) { | |
q_previous = std::min(q_previous, (m * hatPi0 * pValues.at(k)) / (i--)); | |
// postfix i-- returns the old value | |
qValues[k] = q_previous; | |
} | |
return qValues; | |
} | |
#endif |
The following code gives an example of how to use qvalues.hpp, and can be used for testing:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/** qvalues_test.cpp | |
* program for testing qvalues.hpp | |
* compile with | |
* g++ -O3 -std=c++11 qvalues_test.cpp -lgsl -lgslcblas -o qvaltest | |
* then choose a seed for the random number generator (e.g. 1728) and run | |
* ./qvaltest 1728 | |
*/ | |
#include <iostream> | |
#include <iomanip> // for printing a decent table | |
#include <gsl/gsl_rng.h> | |
#include <gsl/gsl_randist.h> | |
#include "qvalues.hpp" | |
int main(int argc, char* argv[]) { | |
// choose a random seed | |
unsigned long seed = 144169; | |
if ( argc > 1 ) { | |
seed = atoi(argv[1]); | |
} | |
// initialize a random number generator | |
gsl_rng* rng = gsl_rng_alloc(gsl_rng_taus); | |
gsl_rng_set(rng, seed); | |
// choose the probability of a null result | |
double pi0 = 0.75; | |
// and the number of tests | |
int m = 1000; | |
/* p-values from the alternative distribution are assumed | |
* to be Beta distributed with parameters a and b | |
*/ | |
double a = 1; | |
double b = 20; | |
// create p-values from a mixture distribution | |
std::map<int, double> pValues; | |
std::map<int, bool> Hs; | |
for ( int i = 0; i < m; ++i ) { | |
/* H = 0 means null hypothesis is true, | |
* H = 1 means alternative hypothesis is true | |
*/ | |
bool H = gsl_ran_bernoulli(rng, 1-pi0); | |
Hs.emplace(i, H); | |
// p-values from the null model have a flat (uniform) distribution | |
double p = H ? gsl_ran_beta(rng, a, b) : gsl_ran_flat(rng, 0.0, 1.0); | |
pValues.emplace(i, p); | |
} | |
// transform p-values into q-values | |
auto qValues = computeQValues(pValues); | |
// call features with a q-value < 0.2 significant. | |
double qThreshold = 0.2; | |
int fds(0), tds(0), fns(0), tns(0); // count false/true discoveries/negatives | |
for ( auto pair : pValues ) { | |
int i = pair.first; | |
double p = pair.second; | |
double q = qValues.at(i); | |
bool H = Hs[i]; | |
if ( q < qThreshold ) { // call significant | |
if ( H ) tds++; | |
else fds++; | |
} else { // call negative | |
if ( H ) fns++; | |
else tns++; | |
} | |
} | |
// compute the "realized" false discovery rate | |
double FDR = double(fds) / (tds + fds); | |
// print some results | |
std::cout << std::setw(20) << "true" | |
<< std::setw(6) << "false" << std::endl | |
<< std::setw(14) << "discoveries:" << std::setw(6) << tds | |
<< std::setw(6) << fds << std::endl | |
<< std::setw(14) << "negatives:" << std::setw(6) << tns | |
<< std::setw(6) << fns << std::endl; | |
std::cout << "realized FDR: " << FDR << std::endl; | |
// delete the random number generator | |
gsl_rng_free(rng); | |
return 0; | |
} |
After compiling and running the program, the result should be something like:
$ g++ -O3 -std=c++11 qvalues_test.cpp -lgsl -lgslcblas -o qvaltest $ ./qvaltest 1728 true false discoveries: 143 32 negatives: 712 113 realized FDR: 0.182857