Epistasis Blog

From the Computational Genetics Laboratory at Dartmouth Medical School (www.epistasis.org)

Monday, April 14, 2014

To replicate or not to replicate? The case of pharmacogenetic studies

Statistical replication has always been the gold standard in genome-wide association studies (GWAS). However, as we have previously pointed out, there are many good reasons why true genetic associations might not replicate (Greene et al. 2009). This 2013 paper explores the issue with respect to pharmacogenetic studies. The mantra of GWAS is now focused on the identification of new drug targets using genetic association results. If this is true, should biological validation matter more than statistical replication? 

Aslibekyan S, Claas SA, Arnett DK. To replicate or not to replicate: the case of pharmacogenetic studies: Establishing validity of pharmacogenomic findings: from replication to triangulation. Circ Cardiovasc Genet. 2013 Aug;6(4):409-12 [PubMed]

Sunday, April 13, 2014

Why human disease-associated residues appear as the wild-type in other species: genome-scale structural evidence for the compensation hypothesis

Xu J, Zhang J. Why human disease-associated residues appear as the wild-type in other species: genome-scale structural evidence for the compensation hypothesis. Mol Biol Evol. 2014 [PubMed]


Many human-disease associated amino acid residues (DARs) appear as the wild-type in other species. This phenomenon is commonly explained by the presence of compensatory residues in these other species that alleviate the deleterious effects of the DARs. The general validity of this hypothesis, however, is unclear, because few compensatory residues have been identified. Here we test the compensation hypothesis by assembling and analyzing 1077 DARs located in 177 proteins of known crystal structures. Because destabilizing protein structures is a primary reason why DARs are deleterious, we focus on protein stability in this analysis. We discover that, in species where a DAR represents the wild-type, the destabilizing effect of the DAR is generally lessened by the observed amino acid substitutions in the spatial proximity of the DAR. This and other findings provide genome-scale evidence for the compensation hypothesis and have important implications for understanding epistasis in protein evolution and for using animal models of human diseases.

Saturday, April 12, 2014

Detection and replication of epistasis influencing transcription in humans

This study demonstrates that replicable epistasis is common at the level of transcription.

Hemani G, Shakhbazov K, Westra HJ, Esko T, Henders AK, McRae AF, Yang J, Gibson G, Martin NG, Metspalu A, Franke L, Montgomery GW, Visscher PM, Powell JE. Detection and replication of epistasis influencing transcription in humans. Nature. 2014 Apr 10;508(7495):249-53. [PubMed]


Epistasis is the phenomenon whereby one polymorphism's effect on a trait depends on other polymorphisms present in the genome. The extent to which epistasis influences complex traits and contributes to their variation is a fundamental question in evolution and human genetics. Although often demonstrated in artificial gene manipulation studies in model organisms, and some examples have been reported in other species, few examples exist for epistasis among natural polymorphisms in human traits. Its absence from empirical findings may simply be due to low incidence in the genetic control of complex traits, but an alternative view is that it has previously been too technically challenging to detect owing to statistical and computational issues. Here we show, using advanced computation and a gene expression study design, that many instances of epistasis are found between common single nucleotide polymorphisms (SNPs). In a cohort of 846 individuals with 7,339 gene expression levels measured in peripheral blood, we found 501 significant pairwise interactions between common SNPs influencing the expression of 238 genes (P = 2.91 × 10(-16)). Replication of these interactions in two independent data sets showed both concordance of direction of epistatic effects (P = 5.56 × 10(-31)) and enrichment of interaction P values, with 30 being significant at a conservative threshold of P < 9.98 × 10(-5). Forty-four of the genetic interactions are located within 5 megabases of regions of known physical chromosome interactions (P = 1.8 × 10(-10)). Epistatic networks of three SNPs or more influence the expression levels of 129 genes, whereby one cis-acting SNP is modulated by several trans-acting SNPs. For example, MBNL1 is influenced by an additive effect at rs13069559, which itself is masked by trans-SNPs on 14 different chromosomes, with nearly identical genotype-phenotype maps for each cis-trans interaction. This study presents the first evidence, to our knowledge, for many instances of segregating common polymorphisms interacting to influence human traits.

Friday, February 14, 2014

Promotion and tenure review at universities

There is an interesting thread on promotion and tenure by @phylogenomics on twitter using the hashtag #publishperish14. He is tweeting comments from @lindakateh, the chancellor at UC Davis. All the tweets from the conference have been archived here. Some of the discussion was related to whether everyone should have tenure since very few actually don't get it. I also like the comments about the tenure process reducing creativity and risk-taking. She also discussed the idea of rewarding faculty for other forms of expression including blogs.

Wednesday, February 12, 2014

P-values, the 'gold standard' of statistical validity, are not as reliable as many scientists assume

This piece in Nature about the limitations of p-values is a must read. It is very relevant to genetics, epidemiology and, especially, genome-wide association studies (GWAS) that have put so much emphasis on p-values. We have also written about the limitations of p-values in a short editorial.

Thursday, January 30, 2014

Reconciling clinical importance and statistical significance in GWAS

Genome-wide association studies (GWAS) have identified many risk-associated SNPs with very small effects. The mantra for identifying more associations is to greatly increase the sample size to be able to detect smaller and smaller effects. This wonderful letter in the European Journal of Human Genetics points out that at some point the effect size goes below the measurement error calling into question the clinical significance of these GWAS hits. If I were funding a big GWAS study I would first want to know whether increasing the sample size is justified given the effects sizes to be detected and the error of the phenotype measures.

Shriner D, Adeyemo A, Rotimi CN. Reconciling clinical importance and statistical significance. Eur J Hum Genet. 2014 Feb;22(2):158-9. [EJHG]

Saturday, January 25, 2014

My take of the FDA's decision to regulate 23andMe

In 2013 the FDA ordered 23andMe to stop selling it's genetic testing services for health-related purposes. This was a very controversial ruling that generated lots of discussion in the media. A collection of links to media coverage and opinions put together by writer David Dobbs can be found here. My take on the issue can be found in this Dartmouth Medicine Magazine piece.

Saturday, January 11, 2014

Percentile Ranking and Citation Impact of a Large Cohort of NHLBI-Funded Cardiovascular R01 Grants

I just ran across this interesting new study that evaluated the relationship between the score that an NIH R01 grant receives during peer-review and the future impact of the grant as measured by number and quality of publications. The bottom line is that a grant that receives a top score in the 10th percentile does not produce publications with impact above and beyond a grant in the 30th percentile that would not be funded by 2014 criteria.

Danthi N, Wu CO, Shi P, Lauer MS. Percentile Ranking and Citation Impact of a Large Cohort of NHLBI-Funded Cardiovascular R01 Grants. Circ Res. 2014 Jan 9. [PubMed]


Rationale: Funding decisions for cardiovascular R01 grant applications at NHLBI largely hinge on percentile rankings. It is not known whether this approach enables the highest impact science.

Objective: To conduct an observational analysis of percentile rankings and bibliometric outcomes for a contemporary set of funded NHLBI cardiovascular R01 grants.

Methods and Results: We identified 1492 investigator-initiated de novo R01 grant applications that were funded between 2001 and 2008, and followed their progress for linked publications and citations to those publications. Our co-primary endpoints were citations received per million dollars of funding, citations obtained within 2-years of publication, and 2-year citations for each grant's maximally cited paper. In 7654 grant-years of funding that generated $3004 million of total NIH awards, the portfolio yielded 16,793 publications that appeared between 2001 and 2012 (median per grant 8, 25th and 75th percentiles 4 and 14, range 0 - 123), which received 2,224,255 citations (median per grant 1048, 25th and 75th percentiles 492 and 1,932, range 0 - 16,295). We found no association between percentile ranking and citation metrics; the absence of association persisted even after accounting for calendar time, grant duration, number of grants acknowledged per paper, number of authors per paper, early investigator status, human versus non-human focus, and institutional funding. An exploratory machine-learning analysis suggested that grants with the very best percentile rankings did yield more maximally cited papers.

Conclusions: In a large cohort of NHLBI-funded cardiovascular grants, we were unable to find a monotonic association between better percentile ranking and higher scientific impact as assessed by citation metrics.

Monday, December 30, 2013

Epistasis Blog Posts from 2013

January, 2013 

Gene-gene interactions in a pathway-based analysis of genetic susceptibility to bladder cancer

Complex effects of nucleotide variants in a mammalian cis-regulatory element

Four tips for success in graduate school and beyond

Gene-based testing of interactions in association studies of quantitative traits

Alternative definitions of epistasis

Role of genetic heterogeneity and epistasis in bladder cancer susceptibility and outcome: a learning classifier system approach

Multifactor dimensionality reduction reveals a three-locus epistatic interaction associated with susceptibility to pulmonary tuberculosis

ViSEN: Methodology and software for visualization of statistical epistasis networks

Statistical epistasis networks reduce the computational complexity of searching three-locus genetic models

An information-gain approach to detecting three-way epistatic interactions in genetic association studies

Things genes can't do. Shall we have pie or stew?

Probabilistic multifactor causation - what do we mean?

Journal impact factors - updated

A robustness study of parametric and non-parametric tests in model-based multifactor dimensionality reduction for epistasis detection

JAMIA special issue on Translational Bioinformatics

A simple extension of Multifactor Dimensionality Reduction (MDR) for detecting epistasis effects on quantitative traits

The effect of genetic background on genetic interaction networks

Genotype-environment interactions reveal causal pathways that mediate genetic effects on phenotype

Best paper award at Translational Bioinformatics Conference

Big data analysis on autopilot?

Wednesday, December 11, 2013

Big Data Analysis on Autopilot?

My latest editorial with Scott Williams on whether big data analysis in genomics and other disciplines has shifted into autopilot with potentially dangerous consequences for the study of human health. This paper is open-access.

Williams SM, Moore JH. Big Data analysis on autopilot? BioData Min. 2013;6(1):22. [PubMed] [PDF]