Original Article Effect of genome-wide simultaneous hypotheses tests on the discovery rate
Susana Eyheramendy, Christian Gieger, Maris Laan, Thomas Illig, Thomas Meitinger, Erich Wichmann
Institute of Epidemiology, GSF-National Research Institute for Health and Environment, Ingolst adter Landstrasse, Neuherberg, Germany; Dr. Susana Eyheramendy, Department of Statistics, Facultad de Matem´aticas, P. Universidad Cat´olica de Chile, Avenida Vicu˜na Mackenna 4860, Santiago, Chile; Department of Biotechnology, Institute of Molecular and Cell Biology, University of Tartu, Riia 23, 51010 Tartu, Estonia; Psychiatry Department, University of Mainz, Untere Zahlbacherstr 8, 55131 Mainz, Germany; Institute of Bioinformatics, GSF-National Research Institute for Health and Environment, Ingolst, adter Landstrasse 1, Neuherberg, Germany; Institute of Human Genetics, GSF-National Research Institute for Health and Environment, Ingolst¨adter Landstrasse 1, Neuherberg, Germany.
Received June 30, 2010; accepted April 22, 2011; Epub May 5, 2011; published May 20, 2011
Abstract: An increasing number of genome-wide association studies are being performed in hundreds of thousands of single nucleotide polymorphisms (SNPs). Many of such studies carry on a second stage in which a selected number of SNPs are genotyped in new individuals in order to validate genome-wide findings. Unfortunately, a large proportion of such studies have been unable to validate the genome-wide findings. In this study we aim to better understand how to distinguish the truly associated features from the false positives in genome-wide scans. In order to achieve this goal we use empirical data to look at three aspects that may play a key role in determining which features are called to be associated with the phenotype. First, we examine the usual assumption of a uniform distribution on null p-values and assess whether or not it affects which features are called significant and the number of significant features. Second, we compare the global behavior of the p-value distribution genome-wide with the local behavior at each chromosome. Third, we look at the effect of minor allele frequency in the p-value distribution. We show empirically that the uniform distribution is not a generally valid assumption and we find that as a consequence strikingly different conclusions can be drawn regarding what we call significant associations and the number of significant findings. We propose that in order to better assign significance to potential associations one needs to estimate the true distribution of null and non-null p-values. (IJMEG1006004).
Key words: Genome-wide association study (GWAS), single nucleotide (SNPs), p-value distribution
Address all correspondence to: Susana Eyheramendy,PhD Department of Statistics Facultad de Matem´aticas, P. Universidad Cat´olica de Chile Avenida Vicu˜na Mackenna 4860, Santiago, Chile E-mail:suzanne.eyheramendy@gsf.de