Epistasis Blog

From the Computational Genetics Laboratory at Dartmouth Medical School (www.epistasis.org)

Sunday, September 17, 2006

2006 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology

On September 28 I will give the keynote lecture at the 2006 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) in Toronto. The title of my talk is "Genome-Wide Genetic Analysis using Computational Intelligence: The Importance of Expert Knowledge". This presentation will review our recent work on using expert knowledge to guide the genome-wide analysis of epistasis using stochastic search algorithms such as genetic programming. See several of my recent blog posts for more information.

On September 29th I will be presenting at CIBCB our peer-reviewed on "Feature Selection using a Random Forests Classifier for the Integrated Analysis of Multiple Data Types" by Reif et al. Here is the citation:

Reif DM, Motsinger A, McKinney B, Crowe J, Moore JH. Feature Selection using a Random Forests Classifier for the Integrated Analysis of Multiple Data Types. In: Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology. in press (2006).

Tuesday, September 12, 2006

Exploiting Expert Knowledge for Genome-Wide Genetic Analysis of Epistasts

Our paper on "Exploiting expert knowledge for genome-wide genetic analysis using genetic programming" (See May 31 post here) has been published by Springer in the Lecture Notes in Computer Science series. I presented the paper today at the Parallel Problem Solving from Nature (PPSN) IX conference in Iceland. Here is the final citation:

Moore JH, White BC. Exploiting expert knowledge for genome-wide genetic analysis using genetic programming. In: Runarsson et al. (eds.) Parallel Problem Solving from Nature - PPSN IX, Lecture Notes in Computer Science 4193, 969-977 (2006).

This paper should be available sometime soon from the Springer website. I would be happy to email you a copy if you can't find it.

Also, the conference has been wonderful. PPSN is always a nice mix of machine learning, evolutionary algorithms, and artificial life. The general theme is biologically-inspired computing.

Saturday, September 09, 2006

Symbolic Modeling of Epistasis

We have spent the last few months extending our Symbolic Disciminant Analysis (SDA) approach (see Moore et al. Genetic Epidemiology 23, 57-69, 2003 [PubMed]) for detecting, characterizing and interpreting epistasis. I presented this new method this morning at the Bio-Inspired Computing in Computational Biology workshop held in conjunction with the Parallel Problem Solving from Nature (PPSN) IX conference in Iceland. This work has been accepted for publication (pending revisions) in a special issue of Human Heredity that will focus on gene-gene and gene-environment interactions. An open-source software package for Symbolc Modeling (SyMod) is in development and will be available later this fall. Here is the title and abstract for the paper:

Symbolic Modeling of Epistasis

Jason H. Moore, Nate Barney, Chia-Ti Tsai, Fu-Tien Chiang, Bill C. White

The workhorse of modern genetic analysis is the parametric linear model. The advantages of the linear modeling framework are many and include a mathematical understanding of the model fitting process and ease of interpretation. However, an important limitation is that linear models make assumptions about the nature of the data being modeled. This assumption may not be realistic for complex biological systems such as disease susceptibility where nonlinearities in the genotype to phenotype mapping relationship that result from epistasis, plastic reaction norms, locus heterogeneity, and phenocopy, for example, are the norm rather than the exception. We have previously developed a flexible modeling approach called symbolic discriminant analysis (SDA) that makes no assumptions about the patterns in the data. Rather, SDA lets the data dictate the size, shape, and complexity of a symbolic discriminant function that could include any set of mathematical functions from a list of candidates supplied by the user. Here, we outline a new five step process for symbolic model discovery that uses genetic programming (GP) for coarse-grained stochastic searching, experimental design for parameter optimization, graphical modeling for generating expert knowledge, and estimation of distribution algorithms for fine-grained stochastic searching. Finally, we introduce function mapping as a new method for interpreting symbolic discriminant functions. We show that function mapping when combined with measures of interaction information facilitates statistical interpretation by providing a graphical approach to decomposing complex models to highlight synergistic, redundant, and independent effects of polymorphisms and their composite functions. We illustrate this five step SDA modeling process with a real case-control dataset.