International Journal of Molecular Epidemiology and Genetics

Int J Mol Epidemiol Genet 2010;2(2):163-177.

Original Article
Effect of genome-wide simultaneous hypotheses tests on the discovery rate

Susana Eyheramendy, Christian Gieger, Maris Laan, Thomas Illig, Thomas Meitinger, Erich Wichmann

Institute of Epidemiology, GSF-National Research Institute for Health and Environment, Ingolst adter Landstrasse, Neuherberg,
Germany; Dr. Susana Eyheramendy, Department of Statistics, Facultad de Matem´aticas, P. Universidad Cat´olica de Chile,
Avenida Vicu˜na Mackenna 4860, Santiago, Chile; Department of Biotechnology, Institute of Molecular and Cell Biology,
University of Tartu, Riia 23, 51010 Tartu, Estonia; Psychiatry Department, University of Mainz, Untere Zahlbacherstr 8, 55131
Mainz, Germany; Institute of Bioinformatics, GSF-National Research Institute for Health and Environment, Ingolst, adter
Landstrasse 1, Neuherberg, Germany; Institute of Human Genetics, GSF-National Research Institute for Health and
Environment, Ingolst¨adter Landstrasse 1, Neuherberg, Germany.

Received June 30, 2010; accepted April 22, 2011; Epub May 5, 2011; published May 20, 2011

Abstract: An increasing number of genome-wide association studies are being performed in hundreds of thousands of single
nucleotide polymorphisms (SNPs). Many of such studies carry on a second stage in which a selected number of SNPs are
genotyped in new individuals in order to validate genome-wide findings. Unfortunately, a large proportion of such studies
have been unable to validate the genome-wide findings. In this study we aim to better understand how to distinguish the truly
associated features from the false positives in genome-wide scans. In order to achieve this goal we use empirical data to look
at three aspects that may play a key role in determining which features are called to be associated with the phenotype. First, we
examine the usual assumption of a uniform distribution on null p-values and assess whether or not it affects which features
are called significant and the number of significant features. Second, we compare the global behavior of the p-value distribution
genome-wide with the local behavior at each chromosome. Third, we look at the effect of minor allele frequency in the p-value
distribution. We show empirically that the uniform distribution is not a generally valid assumption and we find that as a
consequence strikingly different conclusions can be drawn regarding what we call significant associations and the number of
significant findings. We propose that in order to better assign significance to potential associations one needs to estimate the
true distribution of null and non-null p-values. (IJMEG1006004).

Key words: Genome-wide association study (GWAS), single nucleotide (SNPs), p-value distribution

Full Text PDF

Address all correspondence to:
Susana Eyheramendy,PhD
Department of Statistics
Facultad de Matem´aticas,
P. Universidad Cat´olica de Chile
Avenida Vicu˜na Mackenna 4860,
Santiago, Chile
E-mail:suzanne.eyheramendy@gsf.de