Epistasis Blog

From the Computational Genetics Laboratory at Dartmouth Medical School (www.epistasis.org)

Friday, June 01, 2012

The Challenges of Personalized Medicine and Genomics

This new piece in Science magazine summarizes nicely some of my concerns about the integration of whole genome sequencing into the clinic. I have tweeted about a number of these issues. A must read.

Liam R. Brunham, Michael R. Hayden. Whole-Genome Sequencing: The New Standard of Care? Science 336;1112-1113. [Science]

Rapid advances in DNA sequencing technology have made whole-genome sequencing (WGS) both technically and economically feasible. WGS has been used with great effect in specific settings to clarify molecular diagnosis and even to guide therapy. But are we ready for the routine use of WGS in the care of healthy individuals?

Saturday, May 05, 2012

Gene-Based Multifactor Dimensionality Reduction (MDR)

A new paper by Oh et al. in BMC Bioinformatics reports on a new method for using our Multifactor Dimensionality Reduction (MDR) method for genome-wide association studies. This approach first tests for within gene interactions as a way to cut down on the number of tests. Seems like a good idea.

Sohee Oh, Jaehoon Lee, Min-Seok Kwon, Bruce Weir, Kyooseob Ha and Taesung Park. A novel method to identify high order gene-gene interactions in genome-wide association studies: gene-based MDR. BMC Bioinformatics 2012, 13(Suppl 9):S5 [BMC]

Abstract (provisional)

Background

Because common complex diseases are affected by multiple genes and environmental factors, it is essential to investigate gene-gene and/or gene-environment interactions to understand genetic architecture of complex diseases. After the great success of large scale genome-wide association (GWA) studies using the high density single nucleotide polymorphism (SNP) chips, the study of gene-gene interaction becomes a next challenge. Multifactor dimensionality reduction (MDR) analysis has been widely used for the gene-gene interaction analysis. In practice, however, it is not easy to perform high order gene-gene interaction analyses via MDR in genome-wide level because it requires exploring a huge search space and suffers from a computational burden due to high dimensionality.

Results

We propose dimensional reduction analysis, Gene-MDR analysis for the fast and efficient high order gene-gene interaction analysis. The proposed Gene-MDR method is composed of two-step applications of MDR: within- and between-gene MDR analyses. First, within-gene MDR analysis summarizes each gene effect via MDR analysis by combining multiple SNPs from the same gene. Second, between-gene MDR analysis then performs interaction analysis using the summarized gene effects from within-gene MDR analysis. We apply the Gene-MDR method to bipolar disorder (BD) GWA data from Wellcome Trust Case Control Consortium (WTCCC). The results demonstrate that Gene-MDR is capable of detecting high order gene-gene interactions associated with BD.

Conclusion

By reducing the dimension of genome-wide data from SNP level to gene level, Gene-MDR efficiently identifies high order gene-gene interactions. Therefore, Gene-MDR can provide the key to understand complex disease etiology.

Tuesday, April 03, 2012

The predictive capacity of personal genome sequencing and missing heritability

There are two papers I would like to call your attention two that provide fuel for a healthy debate about the complexity of the genotype to phenotype mapping relationship. The first, from Eric Lander's group (Zuk et al. 2012), calls into question how heritability is estimated. This is an interesting paper that makes the argument that epistasis or gene-gene interaction is likely to explain a significant amount of the missing heritability. Many heritability estimates are based only on additive effects. My only concern with this paper is that this is not a new observation. My group and many others have been writing about the importance of epistasis for many years and much of this work is not acknowledged. The second paper, from Roberts (2012), calls into question the usefulness of personal genomic sequencing for preditcing disease. This paper shows that it is difficult to predict disease in MZ twins. In other words, twins do not always die of the same diseases. These papers both call into question the assumption made by genome-wide association studies (GWAS) of common alleles and whole-genome sequencing for rare alleles that there will be single loci with big effects that will in turn be useful for personalized medicine.

Zuk O, Hechter E, Sunyaev SR, Lander ES. The mystery of missing heritability: Genetic interactions create phantom heritability. Proc Natl Acad Sci U S A. 2012 Jan 24;109(4):1193-8. [PubMed]

Roberts et al., The predictive capacity of personal genome sequencing. Science Transl Med (2012), in press. [PubMed]

Here are several other blog posts and new stories that provide positive and negative perspectives on the Roberts/Vogelstein paper. It is interesting to note that a number of geneticists are upset that the paper got published and that it received so much press (e.g. Nature News Blog). I think they are more worried about the negative publicity for genome technology than the actual science (which they claim is flawed).

Alzheimer Research Forum

Genomes Unzipped

Nature News Blog

Science News

Thursday, March 29, 2012

Is Life Law-Like?

This is a must read for those of you interested in epistasis and biological complexity. Fabulous paper.

Weiss KM, Buchanan AV. Is life law-like? Genetics. 2011 Aug;188(4):761-71. [PubMed]

Abstract

Genes are generally assumed to be primary biological causes of biological phenotypes and their evolution. In just over a century, a research agenda that has built on Mendel's experiments and on Darwin's theory of natural selection as a law of nature has had unprecedented scientific success in isolating and characterizing many aspects of genetic causation. We revel in these successes, and yet the story is not quite so simple. The complex cooperative nature of genetic architecture and its evolution include teasingly tractable components, but much remains elusive. The proliferation of data generated in our "omics" age raises the question of whether we even have (or need) a unified theory or "law" of life, or even clear standards of inference by which to answer the question. If not, this not only has implications for the widely promulgated belief that we will soon be able to predict phenotypes like disease risk from genes, but also speaks to the limitations in the underlying science itself. Much of life seems to be characterized by ad hoc, ephemeral, contextual probabilism without proper underlying distributions. To the extent that this is true, causal effects are not asymptotically predictable, and new ways of understanding life may be required.

Wednesday, March 07, 2012

Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges

Great new paper on pathway analysis. Must read.

Khatri P, Sirota M, Butte AJ. Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Comput Biol. 2012 Feb;8(2):e1002375. [PLoS]

Abstract

Pathway analysis has become the first choice for gaining insight into the underlying biology of differentially expressed genes and proteins, as it reduces complexity and has increased explanatory power. We discuss the evolution of knowledge base–driven pathway analysis over its first decade, distinctly divided into three generations. We also discuss the limitations that are specific to each generation, and how they are addressed by successive generations of methods. We identify a number of annotation challenges that must be addressed to enable development of the next generation of pathway analysis methods. Furthermore, we identify a number of methodological challenges that the next generation of methods must tackle to take advantage of the technological advances in genomics and proteomics in order to improve specificity, sensitivity, and relevance of pathway analysis.

Saturday, February 11, 2012

Six Degrees of Epistasis

This is a nice overview of network methods for addressing gene-gene interactions. Our recent paper on the topic came out too late to be included. See Hu et al. (2011).

McKinney BA, Pajewski NM. Six Degrees of Epistasis: Statistical Network Models for GWAS. Front Genet. 2011;2:109. [PubMed]

Abstract

There is growing evidence that much more of the genome than previously thought is required to explain the heritability of complex phenotypes. Recent studies have demonstrated that numerous common variants from across the genome explain portions of genetic variability, spawning various avenues of research directed at explaining the remaining heritability. This polygenic structure is also the motivation for the growing application of pathway and gene set enrichment techniques, which have yielded promising results. These findings suggest that the coordination of genes in pathways that are known to occur at the gene regulatory level also can be detected at the population level. Although genes in these networks interact in complex ways, most population studies have focused on the additive contribution of common variants and the potential of rare variants to explain additional variation. In this brief review, we discuss the potential to explain additional genetic variation through the agglomeration of multiple gene-gene interactions as well as main effects of common variants in terms of a network paradigm. Just as is the case for single-locus contributions, we expect each gene-gene interaction edge in the network to have a small effect, but these effects may be reinforced through hubs and other connectivity structures in the network. We discuss some of the opportunities and challenges of network methods for analyzing genome-wide association studies (GWAS) such as the study of hubs and motifs, and integrating other types of variation and environmental interactions. Such network approaches may unveil hidden variation in GWAS, improve understanding of mechanisms of disease, and possibly fit into a network paradigm of evolutionary genetics.

Tuesday, February 07, 2012

Genetic Epidemiology with a Capital E

Great new article by Duncan Thomas that raises a number of very interesting questions about genome-wide association studies (GWAS), Big Science, dominant journals with large influence and the role of consortia in the sociology of sciences. Worth a read.

Here is an excerpt:

"Another feature of GWAS having become routine is the emergence of consortia for discovering smaller and smaller risks because of the need for enormous sample sizes [Hunter et al., 2007]. This pressure towards Big Science will doubtless become even stronger as we move into sequence data and look for rarer variants. There are some issues in the sociology of science that are worth attention:

How are new investigators to find their niche in such an environment without becoming lost in a list of hundreds of authors?

What is the role of investigator-initiated studies and novel or paradigm-shifting ideas?

Is the dominance of a single journal, by virtue of its impact factor, in setting the agenda for the entire field a good thing?

Is the huge burden of time and effort required to establish these consortia really worth the yield of smaller and smaller effect sizes?

Certainly these consortia can be expected to yield more and more—and finer and finer—gold dust, but what about the nuggets?

How are we to deal with the requirement of replication when a consortium has essentially the corner on all the available data or in unique situations (e.g., a gene-environment interaction with an unusual or unusually well-characterized exposure)—perhaps by some form of internal cross-validation?"

Duncan C. Thomas. Genetic Epidemiology with a Capital E: Where Will We Be in Another 10 Years? Genetic Epidemiology, in press (2012) [Wiley]

Abstract:

In a commentary on the evolution of the field of genetic epidemiology over the past 10 years, Khoury et al. (2011) highlight several important developments, including the emergence of evaluation of genetic discoveries for their translational utility and of standards for reporting genetic findings. In this companion to their article, I reflect on some of these trends and speculate about the direction of the field in the future. In particular, I emphasize the opportunities posed by novel technologies like next-generation sequencing and the biological insights emerging from integrative genomics, but I also question the utility of large consortia. The basic principles of population-based research and the importance of taking account of the environment remain important to the field.

Tuesday, January 17, 2012

Lower-order effects adjustment in quantitative traits model-based multifactor

Here is a new MDR paper from Kristel Van Steen.

Mahachie John JM, Cattaert T, Van Lishout F, Gusareva ES, Van Steen K. Lower-order effects adjustment in quantitative traits model-based multifactor dimensionality reduction. PLoS One. 2012;7(1):e29594. [PubMed]

Abstract

Identifying gene-gene interactions or gene-environment interactions in studies of human complex diseases remains a big challenge in genetic epidemiology. An additional challenge, often forgotten, is to account for important lower-order genetic effects. These may hamper the identification of genuine epistasis. If lower-order genetic effects contribute to the genetic variance of a trait, identified statistical interactions may simply be due to a signal boost of these effects. In this study, we restrict attention to quantitative traits and bi-allelic SNPs as genetic markers. Moreover, our interaction study focuses on 2-way SNP-SNP interactions. Via simulations, we assess the performance of different corrective measures for lower-order genetic effects in Model-Based Multifactor Dimensionality Reduction epistasis detection, using additive and co-dominant coding schemes. Performance is evaluated in terms of power and familywise error rate. Our simulations indicate that empirical power estimates are reduced with correction of lower-order effects, likewise familywise error rates. Easy-to-use automatic SNP selection procedures, SNP selection based on "top" findings, or SNP selection based on p-value criterion for interesting main effects result in reduced power but also almost zero false positive rates. Always accounting for main effects in the SNP-SNP pair under investigation during Model-Based Multifactor Dimensionality Reduction analysis adequately controls false positive epistasis findings. This is particularly true when adopting a co-dominant corrective coding scheme. In conclusion, automatic search procedures to identify lower-order effects to correct for during epistasis screening should be avoided. The same is true for procedures that adjust for lower-order effects prior to Model-Based Multifactor Dimensionality Reduction and involve using residuals as the new trait. We advocate using "on-the-fly" lower-order effects adjusting when screening for SNP-SNP interactions using Model-Based Multifactor Dimensionality Reduction analysis.

Saturday, January 14, 2012

Imaging Genetics

I am collaborating with Drs. Andy Saykin and Li Shen at IUPUI to develop and apply novel methods for the genetic analysis of neuroimaging phenotypes. This is a really hot new area. I will be giving a talk on some of our recent results at the 8th International Imaging Genetics Conference, to be held on January 16th and 17th, 2012 at the Beckman Center of the National Academy of Sciences in Irvine, CA. We have applied some of our recent network science methods (see paper below) to the Alzheimer's Disease Neuroimaging Initiative (ADNI) data.

Hu T, Sinnott-Armstrong NA, Kiralis JW, Andrew AS, Karagas MR, Moore JH. Characterizing genetic interactions in human disease association studies using statistical epistasis networks. BMC Bioinformatics. 2011 Sep 12;12:364. [PubMed]

Sunday, January 01, 2012

List of Epistasis Blog Posts from 2011

January, 2011

Yeast genetics is complex. What about humans?

The Meaning of Interaction

Model-based multifactor dimensionality reduction for detecting epistasis

Application of the Explicit Test of Epistasis to Colon Cancer

Real-world comparison of CPU and GPU implementations of SNPrank

NIH/NIGMS Funding by Priority Score

Layers of Epistasis

February, 2011

Gene-Gene Interaction Analysis Using ReliefF and MDR

A genome-wide screen of gene-gene interactions for rheumatoid arthritis susceptibility

Epistatic Interactions in Genetic Regulation of t-PA and PAI-1 Levels in a Ghanaian Population

Dissecting genetic networks underlying complex phenotypes: the theoretical framework

A Comparison of Multifactor Dimensionality Reduction and Penalized Regression

March, 2011

Interactome Networks and Human Disease

Gene–Environment Interactions in Human Disease

Model-Based Multifactor Dimensionality Reduction to detect epistasis for quantitative traits in the presence of error-free and noisy data

April, 2011

Genetic analysis of complex traits in the emerging collaborative cross

Travelling the world of gene-gene interactions

May, 2011

Detecting genetic interactions for quantitative traits with U-statistics

Transcriptional robustness and protein interactions are associated in yeast

The effects of linkage disequilibrium in large scale SNP datasets for MDR

Computational Intelligence Using Genetic Programming

Microbiome Studies at the 2012 Pacific Symposium on Biocomputing

June, 2011

Pathway of distinction analysis

Molecular mechanisms of epistasis

Two Epistasis Papers in Science

July, 2011

Generating data with complex genotype-phenotype relationships

Powerful SNP-set analysis for case-control genome-wide association studies

August, 2011

People are inherently biased against creative ideas

New Center Grant on Gene-Environment Interactions

September, 2011

Gene-environment interaction in psychiatric research

Characterizing Genetic Interactions in Human Disease Association Studies Using Statistical Epistasis Networks

HyperCube Rule Mining

An R Package Implementation of Multifactor Dimensionality Reduction

The 24/7 Lab - Does Creativity Suffer?

November, 2011

Entropy-based information gain approaches to detect and to characterize gene-gene and gene-environment interactions

December, 2011

The Causes of Epistasis