Epistasis Blog

From the Computational Genetics Laboratory at Dartmouth Medical School (www.epistasis.org)

Monday, July 20, 2009

Fostering Innovation in a University Setting

I was asked today by another faculty member what universities can do foster innovative research. Innovation is usually defined as the act of introducing something new [e.g. Dictionary.com]. According to Wikipedia, innovation may be incremental, radical or revolutionary and is different than invention in that is represents an idea that has been successfully applied to some problem. My own personal opinion is that 'significant' innovation is usually characterized by a 'radical' new approach to a particular problem. In molecular biology, PCR was truly innovative because it allowed investigators to pursue new and important research questions that were otherwise not feasible. However, I don't see the many incremental derivatives of PCR (e.g. rtPCR) as innovative because they are all fundamentally based on the same innovative idea.

What can universities do to foster innovation from their faculty? Here are some initial ideas. Send me your suggestions. I will update this list over the next few days.

1) Provide recurring discretionary money. Unfortunately, the NIH peer-review system does not encourage or reward innovative thinking. Most research proposals that are funded by the NIH are those that present incremental advances on previous ideas. Using the molecular biology example from above, a grant proposing to develop PCR would have a much harder time getting funded than a grant proposing to develop rtPCR once PCR had already been established. It is much easier to convince your peers that an incremental advance on an existing idea will work than a new idea. Conventional wisdom says that you need to have 1/4 to 1/2 the research already done to convince the NIH reviewers that you can actually do the work. By that time, the idea is no longer innovative. An important way universities can foster innovative research is to provide talented faculty with recurring discretionary funds that they can use to pursue the kind of innovative ideas that the NIH doesn't typically fund. The best way to do this is to establish endowed chairs that return 90% or more of the interest back to the investigator. Some universities do this and some do not.

2) Require or encourage faculty to take sabbaticals at other universities. I am a firm believer that innovation is stimulated by a change in scenery. Universities should require and pay their faculty to take short sabbaticals (e.g. one month) at least once every two years and long sabbaticals (6-12 months) every five years. Ideally sabbaticals would be taken at other universities where the investigator would get exposed to new faculty and new research environments. Our ability to innovate is significantly influenced by our local environment. Alternatively, the short sabbatical could be replaced by hosting visiting professors for one month. No university official has ever recommended that I take a sabbatical of any kind.

3) Require or encourage faculty to attend multiple scientific conferences. Departments and centers should encourage their faculty to attend at least 4-5 scientific conferences each year in a diversity of different disciplines. Those of us in biomedical research should be attending conferences in economics or meteorology in addition to cell biology and genetics. Innovation often comes from seeing how others solve complex problems. Knowing the state of the art in your own field only encourages incremental science. This could be facilitated by the department or institution paying for their faculty to attend one conference per year that is in a radically different discipline.

4) Require or encourage graduate students to take courses in other disciplines. Graduate students can be a wonderful source of innovation and we need to provide them with the same opportunities for stimulating creative thought. One way to do this is to require them to take a course in a completely different area of their choosing and give them graduate level credit for it. For example, a graduate student in cell biology could take a course in graduate level course in music, psychology, art or economics. Allowing a graduate student to be innovative greatly influences the level of innovation in the research lab as a whole. I require all my students to take at least one year of additional coursework in a different area. One of my students is working on a Ph.D. in Genetics and doing an M.S. in Computer Science at the same time. This ensures they can speak multiple languages and also ensures there is a constant flow of new ideas back to the lab.

5) Provide institutional recognition for innovative research. It is critical that faculty who successfully develop innovative ideas are appropriately rewarded. This can come in the form of promotion, annual awards from the institution, additional discretionary research dollars or salary increases, for example. The challenge of course is knowing when an innovative idea has been developed and then proactively recognizing it. Institutions should not wait until an innovative faculty member threatens to leave to provide recognition.

Friday, July 17, 2009

GPU MDR

Our technical note on adapting MDR to run on a Graphical Processing Unit (GPU) has been accepted for publication in BMC Research Notes. Email me for a pre-print. The source code (mdrgpu) is available on sourceforge.net. The benchmarking results are VERY impressive.

Sinnott-Armstrong, N.A., Greene, C.S., Cancare, F., Moore, J.H. Accelerating epistasis analysis in human genetics with consumer graphics hardware. BMC Research Notes, in press (2009).

Abstract

Background: Human geneticists are now capable of measuring more than one million DNA sequence variations from across the human genome. The new challenge is to develop computationally feasible methods capable of analyzing these data for associations with common human disease, particularly in the context of epistasis. Epistasis describes the situation where multiple genes interact in a complex non-linear manner to determine anindividual’s disease risk and is thought to be ubiquitous for common diseases. Multifactor Dimensionality Reduction (MDR) is an algorithm capable of detecting epistasis. An exhaustive analysis with MDR is often computationally expensive, particularly for high order interactions. This challenge has previously been met with parallel computation and expensive hardware. The option we examine here exploits commodity hardware designed for computer graphics. In modern computers Graphics Processing Units (GPUs) have more memory bandwidth and computational capability than Central Processing Units (CPUs) and are well suited to this problem. Advances in the video game industry have led to an economy of scale creating a situation where these powerful components are readily available at very low cost. Here we implement and evaluate the performance ofthe MDR algorithm on GPUs. Of primary interest are the time required for an epistasis analysis and the price to performance ratio of available solutions.

Findings: We found that using MDR on GPUs consistently increased performance per machine over both a feature rich Java software package and a C++ cluster implementation. The performance of a GPU workstation running a GPU implementation reduces computation time by a factor of 160 compared to an 8-core workstation running the Java implementation on CPUs. This GPU workstation performs similarly to 150 cores running anoptimized C++ implementation on a Beowulf cluster. Furthermore this GPU system provides extremely cost effective performance while leaving the CPU available for other tasks. The GPU workstation containing three GPUs costs $2000 while obtaining similar performance on a Beowulf cluster requires 150 CPU cores which, including the added infrastructure and support cost of the cluster system, cost approximately $82,500.

Conclusions: Graphics hardware based computing provides a cost effective means to perform genetic analysis of epistasis using MDR on large datasets without the infrastructure of a computing cluster.

Thursday, July 16, 2009

Can genes predict drug responses?

The Summer 2009 issue of Biomedical Computation Review has a nice article by Dr. Chandra Shekhar on pharmacogenetics. Discussed is a 2009 paper in the New England Journal of Medicine on warfarin dosing. I like this article because there are several statements about the importance of interactions. Also, my former student, Dr. Marylyn Ritchie, is quoted. For more information about the detection of epistasis or gene-gene interaction in pharmacologic studies see our 2005 paper in Nature Reviews Drug Discovery and our 2008 paper in Current Pharmacogenomics and Personalized Medicine.

Also in this issue of Biomedical Computation Review (see pp. 2-3) is a discussion of whether grant applications for the development and maintenance of biomedical software should compete head to head with basic research applications. There are good points on both sides of this argument. I don't have a problem with software grant competing with basic research grants as long as the reviewers are qualified to review both types.

Wednesday, July 15, 2009

GWAS Analysis Using Gene Ontology

Peter Holmans has published a very nice paper in the American Journal of Human Genetics on using Gene Ontology to analyze genome-wide association study (GWAS) data. See also my Dec. 6, 2008 post on our paper by Askland et al. that approaches the problem in the same way.

Holmans P, Green EK, Pahwa JS, Ferreira MA, Purcell SM, Sklar P; Wellcome Trust Case-Control Consortium, Owen MJ, O'Donovan MC, Craddock N. Gene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder. Am J Hum Genet. 2009 Jul;85(1):13-24. [PubMed]

Abstract

We present a method for testing overrepresentation of biological pathways, indexed by gene-ontology terms, in lists of significant SNPs from genome-wide association studies. This method corrects for linkage disequilibrium between SNPs, variable gene size, and multiple testing of nonindependent pathways. The method was applied to the Wellcome Trust Case-Control Consortium Crohn disease (CD) data set. At a general level, the biological basis of CD is relatively well known for a complex genetic trait, and it thus acted as a test of the method. The method, known as ALIGATOR (Association LIst Go AnnoTatOR), successfully detected biological pathways implicated in CD. The method was also applied to a meta-analysis of bipolar disorder, and it implicated the modulation of transcription and cellular activity, including that which occurs via hormonal action, as an important player in pathogenesis.

Tuesday, July 07, 2009

Diversity and Complexity in DNA Recognition by Transcription Factors

Why did anyone think it would be so simple? Let's first assume complexity and not be surprised when we find it.

Badis G, Berger MF, Philippakis AA, Talukder S, Gehrke AR, Jaeger SA, Chan ET, Metzler G, Vedenko A, Chen X, Kuznetsov H, Wang CF, Coburn D, Newburger DE, Morris Q, Hughes TR, Bulyk ML. Diversity and complexity in DNA recognition by transcription factors. Science. 2009 Jun 26;324(5935):1720-3. [PubMed]

Abstract

Sequence preferences of DNA binding proteins are a primary mechanism by which cells interpret the genome. Despite the central importance of these proteins in physiology, development, and evolution, comprehensive DNA binding specificities have been determined experimentally for only a few proteins. Here, we used microarrays containing all 10-base pair sequences to examine the binding specificities of 104 distinct mouse DNA binding proteins representing 22 structural classes. Our results reveal a complex landscape of binding, with virtually every protein analyzed possessing unique preferences. Roughly half of the proteins each recognized multiple distinctly different sequence motifs, challenging our molecular understanding of how proteins interact with their DNA binding sites. This complexity in DNA recognition may be important in gene regulation and in the evolution of transcriptional regulatory networks.

Monday, June 29, 2009

Gene-Environment Interaction in Asthma

The complexity of the genotype-phenotype mapping relationship seems to be catching on in asthma genetics. Why would gene-environment interaction not be excepted to play a role in asthma? Why would anyone investigate the genetic basis of asthma one SNP at a time? ...or one environment at a time?

von Mutius E. Gene-environment interactions in asthma. J Allergy Clin Immunol. 2009 Jan;123(1):3-11 [PubMed]

Martinez FD. Gene-environment interaction in complex diseases: asthma as an illustrative case. Novartis Found Symp. 2008;293:184-92 [PubMed]

See also my previous post on this topic.

Saturday, June 27, 2009

European Conference on Artificial Life (ECAL'09)

The following two papers were accepted for publication and presentation as part of the 2009 European Conference on Artificial Life (ECAL'09) to be held in Budapest in September. Hope to see you there!

Greene CS, Hill DP, Moore JH. An Open-Ended Computational Evolution Strategy for Evolving Parsimonious Solutions to Human Genetics Problems. Lecture Notes in Computer Science, in press (2009).

Abstract

In human genetics a primary goal is the discovery of genetic factors that predict individual susceptibility to common human diseases, but this has proven difficult to achieve because these diseases are likely to result from the joint failure of two or more interacting components. Currently geneticists measure genetic variations from across the genomes of individuals with and without the disease. The association of single variants with disease is then assessed. Our goal is to develop methods capable of identifying combinations of genetic variations predictive of discrete measures of health in human population data. “Artificial evolution” approaches loosely based on real biological processes have been developed and applied, but it has recently been suggested that “computational evolution” approaches will be more likely to solve problems of interest to biomedical researchers. Here we introduce a method to evolve parsimonious solutions in an open-ended computational evolution framework that more closely mimics the complexity of biological systems. In ecological systems a highly specialized organism can fail to thrive as the environment changes. By introducing numerous small changes into training data, i.e. the environment, during evolution we drive evolution towards general solutions. We show that this method leads to smaller solutions and does not reduce the power of an open-ended computational evolution system. This method of environmental perturbation fits within the computational evolution framework and is an effective method of evolving parsimonious solutions.

Gilmore JM, Greene CS, Andrews PC, Moore JH. An Analysis of New Expert Knowledge Scaling Methods for Biologically Inspired Computing. Lecture Notes in Computer Science, in press (2009).

Abstract

High-throughput genotyping has made genome-wide data on human genetic variation commonly available, however, finding associations between specific variations and common diseases has proven difficult. The size of these datasets presents an informatics challenge because exhaustive searching for even only pair-wise interactions is computationally expensive. Instead, search methods must be used which efficiently and effectively mine these datasets. Furthermore, individual susceptibility to common diseases likely depends on gene-gene interactions, i.e. epistasis, and not merely on independent genes. To meet these challenges, we turn to a biologically inspired ant colony optimization strategy. We have previously developed an ant system which allows the incorporation of expert knowledge as heuristic information. One method of scaling expert knowledge to probabilities usable in the algorithm, an exponential distribution function which respects intervals between raw expert knowledge scores, has been previously examined. Here, we develop and evaluate three additional expert knowledge scaling methods and find parameter sets for each which maximize power.

Wednesday, June 17, 2009

Neglected Advances in Classical Genetics

I ran across this great paper the other day. We don't do a good enough job teaching classicial genetics. Papers on 'omics' approaches have replaced the papers that form the foundation of genetics. This is a not a good trend and we need both.

Wilmer J. Miller and Willard F. Hollander. Three neglected advances in classical genetics. BioScience Vol. 45 No 2 Feb. 1995 pp. 98-104. [Web]

"Geneticists now concentrate on the wondrous new molecular techniques. Their promise is being fulfilled in applied as well as theoretical advances. But, while attention is diverted elsewhere, some advances in the classical areas have been neglected."

Monday, June 15, 2009

Recipe for Successful Graduate Students

Dr. John Holland was interviewed recently for SIGEVOlution magazine. He was asked what his recipe was for successful graduate students since he has had so many. Here is what he said. I use the same recipe.

1) Have the student find a broad question that really interests them.

I give my students great freedom to find a general topic they are really interested in. This helps provide the motivation that is sometimes lacking in graduate school.

2) Have the students learn a lot about a lot of different disciplines.

I require my students to take at least one additional year of courses to become fluent in another discipline.

3) Have the student find a mentor that will stand up for them no matter how crazy the idea.

Graduate school is the last time in your career that you will have true freedom and time to explore novel and crazy ideas. I encourage students to go out on a limb and explore the fringes of sciences. This is where all the really good ideas come from. I not a believer in incremental me-too science.

Thursday, June 11, 2009

MDR 2.0 beta 4 released

We released this week a new version of our open-source multifactor dimensionality reduction (MDR) software package. It can be downloaded from sourceforge.net.

New features include our novel Spatially Uniform ReliefF (SURF) algorithm that improves the power to filter SNPs involved in interactions from a large list. The paper reporting these results is under review. SURF can also be combined with our previously developed Tuned ReliefF (TuRF) algorithm to give SURF and TuRF. There are also some minor bug fixes in this new version.