Epistasis Blog

From the Computational Genetics Laboratory at the University of Pennsylvania (www.epistasis.org)

Sunday, January 30, 2005

Data Mining Books

Data mining methods play a very important role in detecting, characterizing, and interpreting nonlinear interactions between multiple genetic and environmental risk factors. The following is a short list of books that we have found particularly useful.

The Elements of Statistical Learning by Hastie et al., Springer (2001)

How to Solve It: Modern Heuristics by Michalewicz and Fogel, Springer (2000)

Introduction to Machine Learning by Alpaydin, MIT Press (2004)

Machine Learning by Mitchell, McGraw-Hill (1997)

Pattern Classification by Duda et al., Wiley (2000)

Pattern Recognition by Theodoridis and Koutroumbas, Academic Press (2003)

Friday, January 28, 2005

Hypothesis Testing

A recent paper by Lipton ("Testing hypotheses: prediction and prejudice", Science. 2005 Jan 14;307(5707):219-21) discusses 'prediction' vs. 'accomodation' for hypothesis construction. This discussion is highly relevant to genetic studies of human disease.

New Epistasis Paper

Leamy et al. (Heredity 2005) document abundant epistasis in the genetic architecture of fluctuating asymmetry of tooth size and shape in mice.

CFP: BioGEC Workshop

The fourth annual workshop on Biological Applications of Genetic and Evolutionary Computation (BioGEC), organized in connection with the 2005 Genetic and Evolutionary Computation Conference (GECCO-2005) in Washington DC, USA, is intended to explore and critically evaluate the application of GEC to biological problems. Specifically, the goal is to bring biologists and computer scientists together to foster an exchange of ideas that will yield emergent properties that will move the field forward in unpredictable ways.

In order to facilitate interaction and discussion, the workshop invites papers in the form of commentaries, essays, perspectives, surveys, tutorials, and reviews that focus on ideas for discussion. Details on the call for papers can be found here.

Software: Open-Source MDR

Multifactor Dimensionality Reduction (MDR) is a nonparametric and genetic model-free alternative to logistic regression for detecting and characterizing nonlinear interactions among discrete genetic and environmental attributes. The MDR method combines attribute selection, attribute construction, and classification with cross-validation and permutation testing to provide a comprehensive and powerful approach to detecting nonlinear interactions. See Moore (Expert Review of Molecular Diagnostics, 4:795-803, 2004) for a recent review of the MDR method and its application to real data.

We will be releasing an open-source, JAVA version of the MDR software sometime in February. Please check the 'Open-Source MDR Project' web page for updates.

Thursday, January 27, 2005


Welcome to the Dartmouth Computational Genetics Laboratory (CGL) blog site. Starting today, CGL members, alumni, and colleagues will blog on epistasis, computational genetics, and related topics. Our goal is to facilitate an online discussion on the detection, characterization, and interpretation of DNA sequence variations that play a role in human disease susceptibility primarily through nonlinear interactions with other DNA sequence variations and environmental factors.

Please join us by sharing your comments and thoughts on the topics presented here.