Epistasis Blog

From the Computational Genetics Laboratory at Dartmouth Medical School (www.epistasis.org)

Thursday, February 26, 2009

Evolutionary Systems Biology

This looks interesting. Let me know if you try it.

Loewe L. A framework for evolutionary systems biology. BMC Syst Biol. 2009 Feb 24;3(1):27. [PubMed]


BACKGROUND: Many difficult problems in evolutionary genomics are related to mutations that have weak effects on fitness, as the consequences of mutations with large effects are often simple to predict. Current systems biology has accumulated much data on mutations with large effects and can predict the properties of knockout mutants in some systems. However experimental methods are too insensitive to observe small effects. RESULTS: Here I propose a novel framework that brings together evolutionary theory and current systems biology approaches in order to quantify small effects of mutations and their epistatic interactions in silico. Central to this approach is the definition of fitness correlates that can be computed in some current systems biology models employing the rigorous algorithms that are at the core of much work in computational systems biology. The framework exploits synergies between the realism of such models and the need to understand real systems in evolutionary theory. This framework can address many longstanding topics in evolutionary biology by defining various levels of the adaptive landscape. Addressed topics include the distribution of mutational effects on fitness, as well as the nature of advantageous mutations, epistasis and robustness. Combining corresponding parameter estimates with population genetics models raises the possibility of testing evolutionary hypotheses at a new level of realism. CONCLUSIONS: EvoSysBio is expected to lead to a more detailed understanding of the fundamental principles of life by combining knowledge about well-known biological systems from several disciplines. This will benefit both evolutionary theory and current systems biology. Understanding robustness by analysing distributions of mutational effects and epistasis is pivotal for drug design, cancer research, responsible genetic engineering in synthetic biology and many other practical applications.

Tuesday, February 24, 2009

What are genes for? What is the question?

Great new paper in BioEssays from Anne Buchanan and Ken Weiss.

Buchanan AV, Sholtis S, Richtsmeier J, Weiss KM. What are genes "for" or where are traits "from"? What is the question? Bioessays. 2009 Feb;31(2):198-208. [PubMed] [Wiley]


For at least a century it has been known that multiple factors play a role in the development of complex traits, and yet the notion that there are genes "for" such traits, which traces back to Mendel, is still widespread. In this paper, we illustrate how the Mendelian model has tacitly encouraged the idea that we can explain complexity by reducing it to enumerable genes. By this approach many genes associated with simple as well as complex traits have been identified. But the genetic architecture of biological traits, or how they are made, remains largely unknown. In essence, this reflects the tension between reductionism as the current "modus operandi" of science, and the emerging knowledge of the nature of complex traits. Recent interest in systems biology as a unifying approach indicates a reawakened acceptance of the complexity of complex traits, though the temptation is to replace "gene for" thinking by comparably reductionistic "network for" concepts. Both approaches implicitly mix concepts of variants and invariants in genetics. Even the basic question is unclear: what does one need to know to "understand" the genetic basis of complex traits? New operational ideas about how to deal with biological complexity are needed.

Monday, February 23, 2009

MDR 2.0 beta

The beta version of MDR 2.0 was release in late January. There are several bug fixes in the new version. Be sure and let us know what you think! You can download MDR 2. beta here.

Sunday, February 22, 2009

MDR Publications

Click here to carry out a PubMed search for papers with "multifactor dimensionality reduction" in the title or abstract. There are more than 140 now.

Thursday, February 19, 2009

In Silico Epistasis

This is a neat paper.

Imielinski M, Belta C. Exploiting the pathway structure of metabolism to reveal high-order epistasis. BMC Syst Biol. 2008 Apr 30;2:40. [PubMed]


BACKGROUND: Biological robustness results from redundant pathways that achieve an essential objective, e.g. the production of biomass. As a consequence, the biological roles of many genes can only be revealed through multiple knockouts that identify a set of genes as essential for a given function. The identification of such "epistatic" essential relationships between network components is critical for the understanding and eventual manipulation of robust systems-level phenotypes. RESULTS: We introduce and apply a network-based approach for genome-scale metabolic knockout design. We apply this method to uncover over 11,000 minimal knockouts for biomass production in an in silico genome-scale model of E. coli. A large majority of these "essential sets" contain 5 or more reactions, and thus represent complex epistatic relationships between components of the E. coli metabolic network. CONCLUSION: The complex minimal biomass knockouts discovered with our approach illuminate robust essential systems-level roles for reactions in the E. coli metabolic network. Unlike previous approaches, our method yields results regarding high-order epistatic relationships and is applicable at the genome-scale.

Tuesday, February 17, 2009

New evaluation measures for multifactor dimensionality reduction (MDR)

There are two recent papers that propose alternatives to accuracy as a measure of the quality of MDR models. See:

Namkung J, Kim K, Yi S, Chung W, Kwon MS, Park T. New evaluation measures for multifactor dimensionality reduction classifiers in gene-gene interaction analysis. Bioinformatics. 2009 Feb 1;25(3):338-45. [PubMed]

Bush WS, Edwards TL, Dudek SM, McKinney BA, Ritchie MD. Alternative contingency table measures improve the power and detection of multifactor dimensionality reduction. BMC Bioinformatics. 2008 May 16;9:238. [PubMed]

Which of these would users like to see added to our open-source MDR software package? Let me know.

Monday, February 16, 2009

MDR Survival Analysis

Our paper on using MDR to carry out a survival analysis of bladder cancer has been accepted for publication in Human Genetics.

Andrew, A.S. Gui, J., Sanderson, A.C., Mason, R.A., Morlock, E.V., Schned, A.R., Kelsey, K.T., Marsit, C.J., Moore, J.H., Karagas, M.R. Bladder cancer SNP panel predicts susceptibility and survival. Human Genetics, in press (2009).


Bladder cancer is the fourth most common malignancy in men and the eighth most common in women in western countries. A hereditary component is likely since a family history of bladder cancer and variations in genes that detoxify aromatic amines and repair DNA are associated with increased risk. SNPs in many other genes that regulate processes altered in carcinogenesis including telomere maintenance, mitosis, inflammation, and apoptosis have not been assessed extensively for this disease. Using a population-based study with 1191 controls and 832 bladder cancer cases, we assessed the relationship between genetic variation and cancer susceptibility or survival. Findings included an increased risk associated with variants in the methyl-metabolism gene, MTHFD2 (adjusted OR 1.7 95%CI 1.3-2.3), the telomerase TEP1 (adjusted OR 1.8 95%CI 1.2-2.6) and decreased risk for the inflammatory response gene variant IL8RB (adjusted OR 0.6 95%CI 0.5-0.9) compared to wild-type. We filtered for gene-gene interactions using Multifactor Dimensionality Reduction to predict combinations of SNPs associated with risk. Shorter survival was associated with apoptotic gene variants, including CASP9 (adjusted HR 1.8 95%CI 1.1-3.0). Variants in the detoxification gene EPHX1 experienced longer survival (adjusted HR 0.4 (95%CI 0.2-0.8). These results suggest that genetic variation in processes frequently altered in carcinogenesis is associated with bladder cancer risk and prognosis. These genes that can now be assessed in multiple study populations to identify and validate SNPs appropriate for clinical use.

Thursday, February 12, 2009

Shadows of Complexity

Our paper on "Shadows of complexity: what biological networks reveal about epistasis and pleiotropy" is now out in BioEssays. [PubMed] [Wiley]

Monday, February 09, 2009

MDR for DNA Sequence Analysis

Our paper on using MDR for DNA sequence analysis has been accepted for publication in BioData Mining. This paper shows how our MDR method and software can be used for other data mining questions in bioinformatics.

Eric Arehart, Scott Gleim, Bill White, John Hwa and Jason H. Moore. Multifactor Dimensionality Reduction analysis identifies specific nucleotide patterns promoting genetic polymorphisms. BioData Mining, in press (2009).



The fidelity of DNA replication serves as the nidus for both genetic evolution and genomic instability fostering disease. Single nucleotide polymorphisms (SNPs) constitute greater than 80% of the genetic variation between individuals. A new theory regarding DNA replication fidelity has emerged where selectivity is governed by base-pair geometry and interactions between the selected nucleotide, complementary strand and the polymerase active site. We hypothesize that certain sequence combinations in the flanking regions of SNP fragments may predispose toward mutation.


We assembled a dataset from the Broad Institute as a first attempt at testing the hypothesis that flanking region motifs are associated with mutagenesis (n=2194). We expanded our inquiry by assembling another dataset of human SNPs and their flanking sequences (n = 29967) collected from the National Center for Biotechnology Information (NCBI) database and a control set of human sequences randomly selected from the NCBI database (n=909,364). The relationship between DNA sequence and mutation type was modeled using the novel multifactor dimensionality reduction (MDR) approach. MDR was originally developed to detect synergistic interactions between multiple SNPs that are predictive of disease susceptibility.


The present study represents the first use of this computational methodology for modeling nonlinear patterns in molecular genetics. We discovered six significant models in the smaller Broad Institute dataset. We also found significant models (p<< 0.001) for each SNP type examined in the larger NCBI dataset. Importantly, we also discovered a consistent motif of flanking region sites that predisposed to SNP genesis and that this motif was elongated or truncated depending on the SNP type examined. The MDR approach was able to effectively discern single sites within SNP and their respective identities and also their collective contribution to SNP genesis.

Sunday, February 08, 2009

Interpretation of Genetic Association Studies

This is a great new paper on the usefulness of genetic associations for personalized medicine. Very important.

Jakobsdottir J, Gorin MB, Conley YP, Ferrell RE, Weeks DE. Interpretation of genetic association studies: markers with replicated highly significant odds ratios may be poor classifiers. PLoS Genet. 2009 Feb;5(2):e1000337 [PubMed]

Recent successful discoveries of potentially causal single nucleotide polymorphisms (SNPs) for complex diseases hold great promise, and commercialization of genomics in personalized medicine has already begun. The hope is that genetic testing will benefit patients and their families, and encourage positive lifestyle changes and guide clinical decisions. However, for many complex diseases, it is arguable whether the era of genomics in personalized medicine is here yet. We focus on the clinical validity of genetic testing with an emphasis on two popular statistical methods for evaluating markers. The two methods, logistic regression and receiver operating characteristic (ROC) curve analysis, are applied to our age-related macular degeneration dataset. By using an additive model of the CFH, LOC387715, and C2 variants, the odds ratios are 2.9, 3.4, and 0.4, with p-values of 10(-13), 10(-13), and 10(-3), respectively. The area under the ROC curve (AUC) is 0.79, but assuming prevalences of 15%, 5.5%, and 1.5% (which are realistic for age groups 80 y, 65 y, and 40 y and older, respectively), only 30%, 12%, and 3% of the group classified as high risk are cases. Additionally, we present examples for four other diseases for which strongly associated variants have been discovered. In type 2 diabetes, our classification model of 12 SNPs has an AUC of only 0.64, and two SNPs achieve an AUC of only 0.56 for prostate cancer. Nine SNPs were not sufficient to improve the discrimination power over that of nongenetic predictors for risk of cardiovascular events. Finally, in Crohn's disease, a model of five SNPs, one with a quite low odds ratio of 0.26, has an AUC of only 0.66. Our analyses and examples show that strong association, although very valuable for establishing etiological hypotheses, does not guarantee effective discrimination between cases and controls. The scientific community should be cautious to avoid overstating the value of association findings in terms of personalized medicine before their time.

Saturday, February 07, 2009

IEEE CEC Papers Accepted

We had three papers accepted for publication and presentation as part of the 2009 IEEE Congress on Evolutionary Computation (CEC) to be held in May in Norway. Here are the paper titles. Preprints will be available upon request once we have finished the revisions.

1) Anna Tyler, Bill C. White, Casey S. Greene, Peter C. Andrews, Richard Cowper-Sal lari and Jason H. Moore. Development and evaluation of an open-ended computational evolution system for the creation of digital organisms with complex genetic architecture. Proceedings of the IEEE Congress on Evolutionary Computation, in press (2009).

2) Casey S. Greene, Jeff Kiralis and Jason H. Moore. Nature-Inspired Algorithms for the Genetic Analysis of Epistasis in Common Human Diseases: A Theoretical Assessment of Wrapper vs. Filter Approaches. Proceedings of the IEEE Congress on Evolutionary Computation, in press (2009).

3) Casey S. Greene, Bill C. White and Jason H. Moore. Sensible Initialization Using Expert Knowledge for Genome-Wide Analysis of Epistasis Using Genetic Programming. Proceedings of the IEEE Congress on Evolutionary Computation, in press (2009).