Cutting Edge: Changing the rules of the game
Genomics researchers have built a Facebook game that allows members of the public to join the effort to understand a disease that has killed millions of ash trees across Europe.
Crowdsourcing encompasses a wide range of activities, all of which involve getting something done using labour contributed by people who enjoy doing the thing you want to be done. It draws strength from community spirit and a sense of involvement in the project, and is made possible by the fact that social networks and mobile computing devices mean that many people are always online and able to contribute very easily.
Scientists have been quick to make use of the power of crowds in both the physical and the life sciences (see Box 1). Possibly the most audacious and visually exciting application of crowdsourcing is the Eyewire project. For Eyewire, researchers have used a laser-guided microscope to capture images of the retina, in particular the neurons within the retina, at different depths. People playing Eyewire have to colour in the neurons within these images so that, eventually, it becomes possible to combine all these two-dimensional images to make a three-dimensional model of the complex network of neurons in the retina.
Other examples of crowdsourcing in science
Examples of crowdsourcing in science range from foldit, an online protein-folding game, to Galaxy zoo, which involves members of the public identifying different types of galaxies in images captured by the Hubble Space Telescope.
A novel use of Facebook for scientific crowdsourcing happened in 2011 when Devin Bloom, a PhD student at the University of Toronto Scarborough, and his colleagues needed to identify the species of 5000 fish that had been caught in the Cuyuni River in Guyana as part of a research project. Pressed for time, they uploaded photographs of the fish onto Facebook and asked other scientists and fish experts to help. Within 24 hr, all 5000 fish had been identified.
Also, the rapid release of genetic data during an outbreak of E. coli in Germany in 2011 was instrumental in the rapid identification of the strain responsible for the outbreak because it allowed bacterial genomicists from around the world to contribute analyses of the data.
More recently, scientists from the Wellcome Trust Sanger Institute mentored budding game developers in a short coding competition to produce a game based on genomics and genetics. And Cancer Research UK is collaborating with Amazon, Facebook and Google to run a ‘GameJam’ to design and develop a mobile phone game that would analyse genetic data as part of the effort to speed up the development of improved treatments for cancer.
The usefulness of using crowdsourcing to do genetics research has been demonstrated by the results of the Phylo project. Phylo is a game in which the four bases in DNA are represented by blocks of different colour: the aim of the game is to move groups of blocks to the left and right in order to find the best possible alignment of groups with each other. Or, in the language of bioinformatics, to work on a computationally intractable problem called ‘multiple sequence alignment’. The game has provided the Phylo team with many thousands of improved solutions to this problem for genes related to disease in humans and 43 other vertebrate species (see Kawrykow et al., 2012; note that ‘Phylo players’ are listed as an author on this paper).
More than a game
Both of these games require a low amount of investment: from the players’ point of view, they can be dipped into in a few spare minutes and played for their intrinsic reward as a game and still return useful results. However, this minor effort per player does not mean that the results are trivial: rather, there are certain types of problems that the human brain and eye can solve much faster than a computer can solve them.
For problems that require a high level of investment in time and application, and perhaps specialist knowledge from a different field, incentives beyond those provided by a game have been provided. For example, Karim Lakhani of Harvard Business School and co-workers ran a prize-based contest in which they invited algorithmic scientists from outside of biomedicine to develop algorithms to annotate (i.e. correctly label) genetic recombination in human T cells (Lakhani et al., 2013). The advances sought here were substantially technical, and the contest brought to the fore a variety of techniques that had not previously been tried by the computational biology community: moreover, some of the entries achieved algorithmic speeds that were faster than existing techniques by a factor of over 1000, and some entries achieved accuracies that were close to the theoretical maximum for the test data set.
A crucial component of this approach, apart from the promise of a prize, was the appeal to a pre-existing community. The contest was launched through the TopCoder.com website, a community of more than 500,000 coders who regularly compete to solve coding problems. To reach this community effectively, the genetics problem the researchers were interested in was recast as an algorithmic problem that was better suited to the target audience.
When we want to make use of crowds to do specialist analyses, one of the challenges is to accommodate their non-expert status (or, in the previous example, their other-expert status) by translating the problem into a form that is readily understandable by the crowd. Also, in many cases non-experts will not appreciate the value of the thing they are working towards, so other goals and rewards must be wrapped around the activity. Game interfaces are able to mix these three factors—a problem that can be understood, a goal and a reward—together. This is the approach that the OpenAshDieBack project—run by researchers at the Sainsbury Laboratory, the John Innes Centre and the Genome Analysis Centre, all in Norwich in the UK—is now taking to gain a better understanding of ash dieback, a fungal disease that has killed millions of ash trees (Fraxinus excelsior) across Europe.
An open approach to genomics
Ash dieback is caused by a fungus, Chalara fraxinea, that emerged in Poland in the early 1990s and spread west across Europe. There is no known treatment for the disease, so when it reached the UK in 2012, the author and colleagues set up the OpenAshDieBack project to rapidly generate genomic sequence data for C. fraxinea and release these data directly to the scientific community (even before we had a chance to analyse them ourselves). This crowdsourcing approach enabled us to gather analyses from scientific experts around the globe, and we now have a variety of new genomic resources, including draft genome assemblies of both F. excelsior and C. fraxinea.
To encourage input from beyond the scientific community we have now developed a game interface that allows non-geneticists and non-scientists to contribute to our genomic analyses by analysing genetic variants within Facebook. The game, which is called Fraxinus, is not merely an exercise in outreach, it is designed to return data that are useful to scientists.
The process of preparing a genome sequence is complex because genomes are extremely long and because existing sequencing techniques have various shortcomings. The genome of C. fraxinea is 63 million bases long and the genome of the ash tree is even longer (approximately 954 million bases). The biochemical process we use to sequence these genomes can give us fragments of less than 200 bases: this means that we have millions of fragments (or ‘reads’) for each genome, and we must assemble these reads together into longer contiguous sections in order to understand the genetic code of that organism.
Individuals from the same species will have DNA sequences that are almost identical. However, the differences between them are important because they contribute to differences in, for example, the ability to cause disease (for pathogens) or the susceptibility to disease (for trees). Therefore, by comparing the DNA sequences from different samples with a reference genome for that species, we should be able to identify the genetic variations that make some strains of C. fraxinea so deadly and/or the variations that make some trees resistant to ash dieback.
Common variants include a simple change in the identity of a base—such as an adenine (A) being replaced by a thymine (T)—or the insertion/deletion of a base compared with the reference (a variant called an indel). A lot of research has gone into developing computer programs that can identify these variants automatically, but computational approaches have some significant shortcomings, especially when identifying indels in certain types of genomes, such as genomes that are highly repetitive or contain a lot of cytosine (C) and guanine (G) bases. In these situations, the visual acuity and excellent pattern recognition skills of humans can be invaluable and can help us to find these elusive and potentially crucial variants.
Game on for genomics
The basic idea of Fraxinus is to compare reads from different samples (which could be pathogens or trees) with the relevant reference genome to identify important variants. To make this challenge manageable, we have selected locations within the reference genome that we know (based on a preliminary analysis of the data) are likely to contain genetic variants and included the bases on either side of these locations to make target patterns that contain a total of 21 bases. We have then extracted all the reads that overlap to some extent with each target pattern. To begin with, the game will only contain data on the pathogens, but the game could be played with data from any species, and we plan to include data on the trees within the next few weeks.
In the game the A, C, G and T bases within both the target and read patterns are represented by coloured leaf shapes (see Figure 1). The aim of the game, for the player, is to align the read patterns with the target pattern: the better the alignment between these two patterns, the higher the player’s score. The player can improve the alignment—and, therefore, his or her score—by shifting the whole read pattern, and then hiding individual leaves (to represent a deletion) or introducing a space between two leaves (an insertion). And if there already is a space between two leaves, it can be increased in size or removed.
The player’s score is recalculated after every action, and play ends when the player decides that he or she cannot improve the score any further. If the player’s score is the highest for that target pattern, the player ‘owns’ the pattern. The player can then either tackle a target pattern that is not owned by another player, or attempt to ‘steal’ a target pattern owned by someone else by getting a higher score for it. Players know that the alignments they create will be stored by the scientists running the game and that their names will also be stored (if they consent) so that they can be credited for their contribution to science.
Parts of the genome of C. fraxinea are rich in G and C bases, which makes it difficult to identify variants with computational techniques, so there is a good chance that the game will supply us with a rich supply of new variants to analyse. With more accurate lists of genetic variants we will be much better placed to assess genetic variability between strains and to carry out experiments to associate particular organismal traits (such as the ability to cause or withstand infection) with their genetic root. Ultimately this will help us to breed trees that are resistant to ash dieback (or other diseases), to identify treatments, and to plan for better management of our woodlands and tree stocks.
By embedding Fraxinus in the Facebook platform, we hope to encourage competition between players and other members of their social network, as this will lead to multiple independent analyses of the same data and to more accurate results. However, anyone attempting to do something similar should be aware that game design is difficult, and it requires input from experienced software engineers, gameplay experts and graphic designers. (We worked with a company called Team Cooper to help us develop Fraxinus.) A key challenge is to bridge the gap between the players and the scientists—if the game is not exciting to play, the project will fail no matter how exciting the science is to the scientists.
We think that with Fraxinus we have created a game that exists in a happy middle ground as a game platform that effectively and simply encapsulates the problem we need to solve, that is enjoyable and competitive for the player and, we hope, can provide us with the results we need.
Article and author information
- Version of Record published: August 13, 2013 (version 1)
© 2013, MacLean
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
- Page views
Article citation count generated by polling the highest count across the following sources: Scopus, Crossref, PubMed Central.
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
- Genetics and Genomics
Mutant RHO is the most frequent genetic cause of autosomal dominant retinitis pigmentosa. Here, we developed an allele-specific gene editing therapeutic drug to selectively target the human T17M RHO mutant allele while leaving the wild-type RHO allele intact for the first time. We identified a Staphylococcus aureus Cas9 (SaCas9) guide RNA that was highly active and specific to the human T17M RHO allele. In vitro experiments using HEK293T cells and patient-specific induced pluripotent stem cells (iPSCs) demonstrated active nuclease activity and high specificity. Subretinal delivery of a single adeno-associated virus serotype 2/8 packaging SaCas9 and sgRNA to the retinas of the RHO humanized mice showed that this therapeutic drug targeted the mutant allele selectively, thereby downregulating the mutant RHO mRNA expression. Administration of this therapeutic drug resulted in a long-term (up to 11 months after treatment) improvement of retinal function and preservation of photoreceptors in the mutant humanized heterozygous mice. Our study demonstrated a dose-dependent therapeutic effect in vivo. Unwanted off-target effects were not observed at the whole-genome sequencing level. Our study provides strong support for the further development of this effective therapeutic drug to treat RHO-T17M associated autosomal dominant retinitis pigmentosa (adRP), also offers a generalizable framework for developing gene editing medicine. Furthermore, our success in restoring the vision loss in the suffering RHO humanized mice verifies the feasibility of allele-specific CRISPR/Cas9-based medicines for other autosomal dominant inherited retinal dystrophies.
- Computational and Systems Biology
- Genetics and Genomics
Mitochondria play an important role in both normal heart function and disease etiology. We report analysis of common genetic variations contributing to mitochondrial and heart functions using an integrative proteomics approach in a panel of inbred mouse strains called the Hybrid Mouse Diversity Panel (HMDP). We performed a whole heart proteome study in the HMDP (72 strains, n=2-3 mice) and retrieved 848 mitochondrial proteins (quantified in ≥50 strains). High-resolution association mapping on their relative abundance levels revealed three trans-acting genetic loci on chromosomes (chr) 7, 13 and 17 that regulate distinct classes of mitochondrial proteins as well as cardiac hypertrophy. DAVID enrichment analyses of genes regulated by each of the loci revealed that the chr13 locus was highly enriched for complex-I proteins (24 proteins, P=2.2E-61), the chr17 locus for mitochondrial ribonucleoprotein complex (17 proteins, P=3.1E-25) and the chr7 locus for ubiquinone biosynthesis (3 proteins, P=6.9E-05). Follow-up high resolution regional mapping identified NDUFS4, LRPPRC and COQ7 as the candidate genes for chr13, chr17 and chr7 loci, respectively, and both experimental and statistical analyses supported their causal roles. Furthermore, a large cohort of Diversity Outbred mice was used to corroborate Lrpprc gene as a driver of mitochondrial DNA (mtDNA)-encoded gene regulation, and to show that the chr17 locus is specific to heart. Variations in all three loci were associated with heart mass in at least one of two independent heart stress models, namely, isoproterenol-induced heart failure and diet-induced obesity. These findings suggest that common variations in certain mitochondrial proteins can act in trans to influence tissue-specific mitochondrial functions and contribute to heart hypertrophy, elucidating mechanisms that may underlie genetic susceptibility to heart failure in human populations.