1. Computational and Systems Biology
Download icon

Point of View: Data science for the scientific life cycle

  1. Daphne Ezer  Is a corresponding author
  2. Kirstie Whitaker
  1. Alan Turing Institute, United Kingdom
  2. University of Warwick, United Kingdom
  3. University of Cambridge, United Kingdom
Feature Article
  • Cited 0
  • Views 2,035
  • Annotations
Cite this article as: eLife 2019;8:e43979 doi: 10.7554/eLife.43979

Abstract

Data science can be incorporated into every stage of a scientific study. Here we describe how data science can be used to generate hypotheses, to design experiments, to perform experiments, and to analyse data. We also present our vision for how data science techniques will be an integral part of the laboratory of the future.

Introduction

A key tenet of the scientific method is that we learn from previous work. In principle we observe something about the world and generate a hypothesis. We then design an experiment to test that hypothesis, set up the experiment, collect the data and analyse the results. And when we report our results and interpretation of them in a paper, we make it possible for other researchers to build on our work.

In practice, there are impediments at every step of the process. In particular, our work depends on published research that often does not contain all the information required to reproduce what was reported. There are too many possible experimental parameters to test under our time and budget constraints, so we make decisions that affect how we interpret the outcomes of our experiments. As researchers, we should not be complacent about these obstacles: rather, we should always look towards new technologies, such as data science, to help us improve the quality and efficiency of scientific research.

Data science could easily be dismissed as a simple rebranding of "science" – after all, nearly all scientists analyse data in some form. An alternative definition of a data scientist is someone who develops new computational or statistical analysis techniques that can easily be adapted to a wide range of scenarios, or who can apply these techniques to answer a specific scientific question. While there is no clear dividing line between data science and statistics, data science generally involves larger datasets. Moreover, data scientists often think in terms of training predictive models that can be applied to other datasets, rather than limiting the analysis to an existing dataset.

Data science emerged as a discipline largely because the internet led to the creation of incredibly large datasets (such as ImageNet, a database of 14 million annotated images; Krizhevsky et al., 2012). The availability of these datasets enabled researchers to apply a variety of machine learning algorithms which, in turn, led to the development of new techniques for analysing large datasets. One area in which progress has been rapid is the automated annotation and interpretation of images and texts on the internet, and these techniques are now being applied to other data-rich domains, including genetics and genomics (Libbrecht and Noble, 2015) and the study of gravitational waves (Abbott et al., 2016).

It is clear that data science can inform the analysis of an experiment, either to test a specific hypothesis or to make sense of large datasets that have been collected without a specific hypothesis in mind. What is less obvious, albeit equally important, is how these techniques can improve other aspects of the scientific method, such as the generation of hypotheses and the design of experiments.

Data science is an inherently interdisciplinary approach to science. New experimental techniques have revolutionised biology over the years, from DNA sequencing and microarrays in the past to CRISPR and cryo-EM more recently. Data science differs in that it is not a single technique, but rather a framework for solving a whole range of problems. The potential for data science to answer questions in a range of different disciplines is what excites so many researchers. That said, however, there are social challenges that cannot be fixed with a technical solution, and it is all too easy for expertise to be "lost in translation" when people from different academic backgrounds come together.

In October 2018, we brought together statisticians, experimental researchers, and social scientists who study the behaviour of academics in the lab (and in the wild) at a workshop at the Alan Turing Institute in London to discuss how we can harness the power of data science to make each stage of the scientific life cycle more efficient and effective. Here we summarise the key points that emerged from the workshop, and propose a framework for integrating data science techniques into every part of the research process (Figure 1). Statistical methods can optimise the power of an experiment by selecting which observations should be collected. Robotics and software pipelines can automate data collection and analysis, and incorporate machine learning analyses to adaptively update the experimental design based on incoming data. And the traditional output of research, a static PDF manuscript, can be enhanced to include analysis code and well-documented datasets to make the next iteration of the cycle faster and more efficient. We also highlight several of the challenges, both technical and social, that must be overcome to translate theory into practice, and share our vision for the laboratory of the future.

Integrating data science into the scientific life cycle.

Data science can be used to generate new hypotheses, optimally design which observations should be collected, automate and provide iterative feedback on this design as data are being observed, reproducibly analyse the information, and share all research outputs in a way that is findable, accessible, interoperable and reusable (FAIR).We propose a virtuous cycle through which experiments can effectively and efficiently "stand on the shoulders" of previous work in order to generate new scientific insights.

Data science for planning experiments

Hypothesis-driven research usually requires a scientist to change an independent variable and measure a dependent variable. However, there are often too many parameters to take account of. In plant science, for instance, these parameters might include temperature, exposure to light, access to water and nutrients, humidity and so on, and the plant might respond to a change in each of these in a context-dependent way.

Data scientists interested in designing optimal experiments must find ways of transforming a scientific question into an optimisation problem. For instance, let us say that a scientist wants to fit a regression model of how temperature and light exposure influence wheat growth. Initially they might measure the height of the wheat at a number of combinations of temperature and light exposure. Then, the scientist could ask: what other combinations of temperature and light exposure should I grow the wheat at in order to improve my ability to predict wheat growth, considering the cost and time constraints of the project?

At the workshop Stefanie Biedermann (University of Southampton) discussed how to transform a wide range of experimental design questions into optimisation problems. She and her colleagues have applied these methods to find optimal ways of selecting parameters for studies of enzyme kinetics (Dette and Biedermann, 2003) and medical applications (Tompsett et al., 2018). Other researchers have used data science to increase the production of a drug while reducing unwanted by-products (Overstall et al., 2018). The process iteratively builds on small number of initial experiments that are conducted with different choices of experimental parameters (such as reagent concentrations, temperature or timing): new experimental parameters are then suggested until the optimal set are identified.

Optimal experimental design can also help researchers fit parameters of dynamical models, which can help them develop a mechanistic understanding of biological systems. Ozgur Akman (University of Exeter) focuses on dynamic models that can explain how gene expression changes over time and where the model parameters represent molecular properties, such as the transcription rates or mRNA degradation rates. As an example he explained how he had used this approach to find the optimal parameters for a mathematical model for the circadian clock (Aitken and Akman, 2013). Akman also described how it is possible to search for possible gene regulatory networks that can explain existing experimental data (Doherty, 2017), and then select new experiments to help distinguish between these alternative hypotheses (Sverchkov and Craven, 2017). For instance, the algorithm might suggest performing a certain gene knockout experiment, followed by RNA-seq, to gain more information about the network structure.

A clear message from the workshop was that statisticians need to be involved in the experimental design process as early as possible, rather than being asked to analyze the data at the end of a project. Involving statisticians before data collection makes it more likely the scientist will be able to answer the research questions they are interested in. Another clear message was that the data, software, infrastructure and the protocols generated during a research project were just as important as the results and interpretations that constitute a scientific paper.

Data science for performing experiments

In order to effectively plan an experiment, it is necessary to have some preliminary data as a starting point. Moreover, ensuring that the data collected during a particular experiment is used to inform the planning process for future experiments will make the whole process more efficient. For standard molecular biology experiments, this kind of feedback loop can be achieved through laboratory automation.

Ross King (University of Manchester) and co-workers have developed the first robot scientists – laboratory robots that physically perform experiments and use machine learning to generate hypothesis, plan experiments, and perform deductive reasoning to come to scientific conclusions. The first of these robots, Adam, successfully identified the yeast genes encoding orphan enzymes (King et al., 2009), and the second, Eve, intelligently screened drug targets for neglected tropical diseases (Williams et al., 2015). King is convinced that robot scientists improves research productivity, and also helps scientists to develop a better understanding of science as a process (King et al., 2018). For instance, an important step towards building these robotic scientists was the development of a formal language for describing scientific discoveries. Humans might enjoy reading about scientific discoveries that are described in English or some other human language, but such languages are subject to ambiguity and exaggeration. However, translating the deductive logic of research projects into the formal languages of "robotic scientists" should lead to a more precise description of our scientific conclusions (Sparkes et al., 2010).

Let us imagine that a research team observe that plants with a gene knockout are shorter than wild type plants. Their written report of the experiment will state that this gene knockout results in shorter plants. They are likely to leave unsaid the caveat that this result was only observed under their experimental set-up and, therefore, that this may not be the case under all possible experimental parameters. The mutant might be taller than a wild type plant under certain lighting conditions or temperatures that were not tested in the original study. In the future, researchers may be able to write their research outcomes in an unambiguous way so that it is clear that the evidence came from a very specific experimental set-up. The work done to communicate these conditions in a computer-readable format will benefit the human scientists who extend and replicate the original work.

Even though laboratory automation technology has existed for a number of years, it has yet to be widely incorporated into academic research environments. Laboratory automation is full of complex hardware that is difficult to use, but a few start-ups are beginning to build tools to help researchers communicate with their laboratory robots more effectively. Vishal Sanchania (Synthace) discussed how their software tool Antha enables scientists to easily develop workflows for controlling laboratory automation. Furthermore, these workflows can be iterative: that is, data collected by the laboratory robots can be used within the workflow to plan the next experimental procedure (Fell et al., 2018).

One benefit of having robotic platforms perform experiments as a service is that researchers are able to publish their experimental protocols as executable code, which any other researcher, from anywhere around the world, can run on another automated laboratory system, improving the reproducibility of experiments.

Data science for reproducible data analysis

As the robot scientists (and their creators) realised, there is a lot more information that must be captured and shared for another researcher to reproduce an experiment. It is important that data collection and its analysis are reproducible. All too often, there is no way to verify the results in published papers because the reader does not have access to the data, nor to the information needed to repeat the same, often complex, analyses (Ioannidis et al., 2014). At our workshop Rachael Ainsworth (University of Manchester) highlighted Peng’s description of the reproducibility spectrum, which ranges from “publication only” to "full replication" with linked and executable code and data (Peng, 2011). Software engineering tools and techniques that are commonly applied in data science projects can nudge researchers towards the full replication end of the spectrum. These tools include interactive notebooks (Jupyter, Rmarkdown), version control and collaboration tools (git, GitHub, GitLab), package managers and containers to capture computational environments (Conda, Docker), and workflows to test and continuously integrate updates to the project (Travis CI). See Beaulieu-Jones and Greene, 2017 for an overview of how to repurpose these tools for scientific analyses.

Imaging has always been a critical technology for cell and developmental biology (Burel et al., 2015), ever since scientists looked at samples through a microscope and made drawings of what they saw. Photography came next, followed by digital image capture and analysis. Sébastien Besson (University of Dundee) presented a candidate for the next technology in this series, a set of open-source software and format standards called the Open Microscopy Environment (OME). This technology has already supported projects as diverse as the development a deep learning classifier to identify patients with clinical heart failure (Nirschl et al., 2018), to the generation of ultra-large high resolution electron microscopy maps in human, mouse and zebrafish tissue (Faas et al., 2012).

The OME project also subscribes to the philosophy that data must be FAIR: findable, accessible, interoperable and reusable (Wilkinson et al., 2016). It does this as follows: i) data are made findable by hosting them online and providing links to the papers the data have been used in; ii) data are made accessible through an open API (application programming interface) and the availability of highly curated metadata; iii) data are made interoperable via the Bio-Formats software, which allows more than 150 proprietary imaging file formats to be converted into a variety of open formats using a common vocabulary (Linkert et al., 2010); iv) data, software and other outputs are made reusable under permissive open licences or through "copyleft" licences which require the user to release anything they derive from the resource under the same open licence. (Alternatively, companies can pay for private access through Glencoe Software which provides a commercially licenced version of the OME suite of tools.).

Each group who upload data to be shared through OME's Image Data Resource can choose their own license for sharing their data, although they are strongly encouraged to use the most open of the creative commons licenses (CC-BY or CC0). When shared in this way, these resources open up new avenues for replication and verification studies, methods development, and exploratory work that leads to the generation of new hypotheses.

Data science for hypothesis generation

A hypothesis is essentially an "educated guess" by a researcher about what they think will happen when they do an experiment. A new hypothesis usually comes from theoretical models or from a desire to extend previously published experimental research. However, the traditional process of hypothesis generation is limited by the amount knowledge an individual researcher can hold in their head and the number of papers they can read each year, and it is also susceptible to their personal biases (van Helden, 2013).

In contrast, machine learning techniques such as text mining of published abstracts or electronic health records (Oquendo et al., 2012), or exploratory meta-analyses of datasets pooled from laboratories around the world, can be used for automated, reproducible and transparent hypothesis generation. For example, teams at IBM Research have mined 100,000 academic papers to identify new protein kinases that interact with a protein tumour suppressor (Spangler, 2014) and predicted hospital re-admissions from the electronic health records of 5,000 patients (Xiao et al., 2018). However, the availability of datasets, ideally datasets that are FAIR, is a prerequisite for automated hypothesis generation (Hall and Pesenti, 2017).

The challenges of translating theory into practice

When we use data science techniques to design and analyse experiments, we need to ensure that the techniques we use are transparent and interpretable. And when we use robot scientists to design, perform and analyse experiments, we need to ensure that science continues to explore a broad range of scientific questions. Other challenges include avoiding positive feedback loops and algorithmic bias, equipping scientists with the skills they need to thrive in this new multidisciplinary environment, and ensuring that scientists in the global south are not left behind. We discuss all these points in more detail below.

Interpreting experimental outcomes

When data science is used to make decisions about how to perform an experiment, we need to ensure that scientists calibrate their level of trust appropriately. On one hand, biologists will need to relinquish some level of control and to trust the computer program to make important decisions for them. On the other hand, we must make sure that scientists do not trust the algorithms so much that they stop thinking critically about the outcomes of their experiments and end up misinterpreting their results. We also want to ensure that scientists remain creative and open to serendipitous discoveries.

We discussed above the importance of having a formal (machine-readable) language that can be used to describe both scientific ideas and experimental protocols to robots. However, it is equally important that the results and conclusions of these experiments are expressed in a human-understandable format. Ultimately, the goal of science is not just to be able to predict natural phenomenon, but also to give humans a deeper insight into the mechanisms driving the observed processes. Some machine learning methods, such as deep learning, while excellent for predicting an outcome, suffer from a lack of interpretability (Angermueller et al., 2016). How to balance predictability and interpretability for the human reader is an open question in machine learning.

Positive feedback loops and algorithmic bias

As with all applications of data science to new disciplines, there are risks related to algorithmic bias (Hajian et al., 2016). Recently there have been some concerns over algorithmic bias related to face-recognition of criminals – the face-recognition software was more likely to report a false-positive of a black face than a white face due to biases in the dataset that the software was trained on (Snow, 2017; Buolamwini and Gebru, 2018). Societal parameters shape the input data that is fed into machine learning models, and if actions are taken on the basis of their output, these societal biases will only be amplified.

There are parallel issues with data science for experimental biology – for instance there are certain research questions that are popular within a community through accidents of history. Many people study model organisms such as roundworms and fruit flies because early genetics researchers studied them, and now there are more experimental tools that have been tried and tested on them – a positive feedback loop (Stoeger et al., 2018).

We need to be careful to ensure that any attempt to design experiments has the correct balance between exploring new research ideas and exploiting the existing data and experimental tools available in well-established sub-disciplines.

Implementation and training

According to Chris Mellingwood (University of Edinburgh), some biologists are amphibious and fluidly move between "wet" laboratories and "dry" computing (Mellingwood, 2017). However, many biologists do not know how to code or do not have the required mathematical background to be able to reframe their research questions as data science problems, so it may be difficult for biologists to find ways of using these new tools to design experiments in their own laboratories. They might not even realise that there is a tool available to help them resolve the experimental design problems that they face. Researchers may need specialised training in order to learn how to interact with data science tools in a efficient and effective way.

Reproducible data analysis alone requires an understanding of version control, at least one, if not multiple, programming languages, techniques such as testing, containerisation and continuous integration. Machine learning and optimisation algorithms require detailed statistical knowledge along with the technical expertise – sometimes including high performance computing skills – to implement them. Requiring all these skills along with the robotics engineering expertise to build an automated lab is outside of the capacity of most researchers trained by the current system.

Infrastructure and accessibility

Even once a system is built, it needs to be constantly adapted as science progresses. There is a risk that by the time a platform is developed, it might be out of date. Sarah Abel (University of Iceland) discussed how university incentive systems do not always reward the types of activities that would be required for incorporating data science into a laboratory, such as interdisciplinary collaborations or maintenance of long-term infrastructure.

Furthermore, due to the burden of developing and maintaining the infrastructure needed for this new approach to science, some researchers may be left behind. Louise Bezuidenhout (University of Oxford) explained that even though one of the goals of "data science for experimental design" is to have open and "accessible" data available around the world, scientists in the global south might not have access to computing resources needed for this approach (Bezuidenhout et al., 2017). Therefore, we need to consider how the benefits of data science and laboratory automation techniques are felt around the world.

Augmentation, not automation

As we discuss the role of data science in the cycle of research, we need to be aware that these technologies should be used to augment, not replace, human researchers. These new tools will release researchers from the tasks that a machine can do well, giving them time and space to work on the tasks that only humans can do. Humans are able think "out-of-the-box", while the behaviour of any algorithm will inherently be restricted by its code.

Perhaps the last person you might imagine supporting the integration of artificial and human intelligence is Garry Kasparov, chess grand master. Kasparov lost to the IBM supercomputer Deep Blue in 1997 but more than 20 years later he is optimistic about the potential for machines to provide insights into how humans see the world (Kasparov, 2017). An example that is closer to the life sciences is the citizen science game BrainDr, in which participants quickly assess the quality of brain imaging scans by swiping left or right (Keshavan et al., 2018). Over time, there were enough ratings to train an algorithm to automatically assess the quality of the images. The tool saves researchers thousands of hours, permits the quality assessment of very large datasets, improves the reliability of the results, and is really fun!

So where does this leave the biologist of the future? Experimentalists can continue to do creative research by, for example, designing new protocols that enable them to study new phenomena and measure new variables. There are also many experimental protocols that are performed so rarely that it would be inefficient to automate them. However, humans will not need to carry out standard protocols, using as purifying DNA, but they might still need to know how to perform various specialised tasks, such as dissecting specimens. They will also need to constantly update the robotic platform to incorporate new experimental protocols.

Finally and most crucially, the biologist of the future will need to provide feedback into the cycle of research – providing insight into what hypotheses are interesting to the community, thinking deeply about how experimental results fit into the theories proposed by the broader community, and finding innovative connections across disciplinary boundaries. Essentially, they will be focused on the varied and interesting parts of science, rather than the mundane and repetitive parts.

Box 1.

What can you do now?

Planning an experiment

In the lab of the future, we envision that experimental parameters will be chosen in a theoretically sound way, rather than through ad hoc human decision making. There are already plenty of tools to help researchers plan their experiments, including tools for selecting optimal time points for conducting an experiment (Kleyman et al., 2017; Ezer and Keir, 2018), a collection of R packages that enable optimisation of experimental design (CRAN) and the acebayes package, which takes prior information about the system as input, and then designs experiments that are most likely to produce the best outputs (Overstall et al., 2017).

Performing an experiment

In the future, standard molecular biology experiments will be performed by robots, and executable experimental protocols will be published alongside each journal article to ensure reproducibility. Although many labs do not have access to laboratory automation, there are many associated techniques that will improve the reproducibility of research. For instance, systems like Protocols.io can help researchers to describe protocols in unambiguous ways that can be easily understood by other researchers. Sharing laboratory know-how, using tools such as OpenWetWare, will also enable a tighter feedback look between performing and planning experiments.

Analysing experimental data

In a few years, we hope that many more data formats and pipelines will be standardised, reproducible and open-access. For researchers who are most comfortable in a wet-lab environment, the article "Five selfish reasons to work reproducibly" makes a strong case for learning how to code (Markowetz, 2015). Jupyter notebooks are an easy way to share data analyses with embedded text descriptions, executable code snippets, and figures. Workshops run by The Carpentries are a good way to learn software skills such as the unix shell, version control with git, and programming in Python or R or domain specific techniques for data wrangling, analysis and visualisation.

Sharing your work

For anyone keen to know more about archiving their data, preprints, open access and the pre-registration of studies, we recommend the "Rainbow of Open Science Practices" (Kramer and Bosman, 2018) and Rachael Ainsworth's slides from our workshop (Ainsworth, 2018).

Generating new hypotheses

Pooling studies across laboratories and textual analysis of publications will help identify scientific questions worth studying. The beginning of a new cycle of research might start with an automated search of the literature for similar research (Extance, 2018) with tools such as Semantic Scholar from the Allen Institute for Artificial Intelligence. Alternatively, you could search for new data to investigate using Google's Dataset Search, or more specific resources from the European Bioinformatics Institute or National Institute of Mental Health Data Archive.

References

  1. 1
    Observation of gravitational waves from a binary black hole merger
    1. BP Abbott
    2. R Abbott
    3. TD Abbott
    4. MR Abernathy
    5. F Acernese
    6. K Ackley
    7. C Adams
    8. T Adams
    9. P Addesso
    10. RX Adhikari
    11. VB Adya
    12. C Affeldt
    13. M Agathos
    14. K Agatsuma
    15. N Aggarwal
    16. OD Aguiar
    17. L Aiello
    18. A Ain
    19. P Ajith
    20. B Allen
    21. A Allocca
    22. PA Altin
    23. SB Anderson
    24. WG Anderson
    25. K Arai
    26. MA Arain
    27. MC Araya
    28. CC Arceneaux
    29. JS Areeda
    30. N Arnaud
    31. KG Arun
    32. S Ascenzi
    33. G Ashton
    34. M Ast
    35. SM Aston
    36. P Astone
    37. P Aufmuth
    38. C Aulbert
    39. S Babak
    40. P Bacon
    41. MK Bader
    42. PT Baker
    43. F Baldaccini
    44. G Ballardin
    45. SW Ballmer
    46. JC Barayoga
    47. SE Barclay
    48. BC Barish
    49. D Barker
    50. F Barone
    51. B Barr
    52. L Barsotti
    53. M Barsuglia
    54. D Barta
    55. J Bartlett
    56. MA Barton
    57. I Bartos
    58. R Bassiri
    59. A Basti
    60. JC Batch
    61. C Baune
    62. V Bavigadda
    63. M Bazzan
    64. B Behnke
    65. M Bejger
    66. C Belczynski
    67. AS Bell
    68. CJ Bell
    69. BK Berger
    70. J Bergman
    71. G Bergmann
    72. CP Berry
    73. D Bersanetti
    74. A Bertolini
    75. J Betzwieser
    76. S Bhagwat
    77. R Bhandare
    78. IA Bilenko
    79. G Billingsley
    80. J Birch
    81. R Birney
    82. O Birnholtz
    83. S Biscans
    84. A Bisht
    85. M Bitossi
    86. C Biwer
    87. MA Bizouard
    88. JK Blackburn
    89. CD Blair
    90. DG Blair
    91. RM Blair
    92. S Bloemen
    93. O Bock
    94. TP Bodiya
    95. M Boer
    96. G Bogaert
    97. C Bogan
    98. A Bohe
    99. P Bojtos
    100. C Bond
    101. F Bondu
    102. R Bonnand
    103. BA Boom
    104. R Bork
    105. V Boschi
    106. S Bose
    107. Y Bouffanais
    108. A Bozzi
    109. C Bradaschia
    110. PR Brady
    111. VB Braginsky
    112. M Branchesi
    113. JE Brau
    114. T Briant
    115. A Brillet
    116. M Brinkmann
    117. V Brisson
    118. P Brockill
    119. AF Brooks
    120. DA Brown
    121. DD Brown
    122. NM Brown
    123. CC Buchanan
    124. A Buikema
    125. T Bulik
    126. HJ Bulten
    127. A Buonanno
    128. D Buskulic
    129. C Buy
    130. RL Byer
    131. M Cabero
    132. L Cadonati
    133. G Cagnoli
    134. C Cahillane
    135. J Calderón Bustillo
    136. T Callister
    137. E Calloni
    138. JB Camp
    139. KC Cannon
    140. J Cao
    141. CD Capano
    142. E Capocasa
    143. F Carbognani
    144. S Caride
    145. J Casanueva Diaz
    146. C Casentini
    147. S Caudill
    148. M Cavaglià
    149. F Cavalier
    150. R Cavalieri
    151. G Cella
    152. CB Cepeda
    153. L Cerboni Baiardi
    154. G Cerretani
    155. E Cesarini
    156. R Chakraborty
    157. T Chalermsongsak
    158. SJ Chamberlin
    159. M Chan
    160. S Chao
    161. P Charlton
    162. E Chassande-Mottin
    163. HY Chen
    164. Y Chen
    165. C Cheng
    166. A Chincarini
    167. A Chiummo
    168. HS Cho
    169. M Cho
    170. JH Chow
    171. N Christensen
    172. Q Chu
    173. S Chua
    174. S Chung
    175. G Ciani
    176. F Clara
    177. JA Clark
    178. F Cleva
    179. E Coccia
    180. PF Cohadon
    181. A Colla
    182. CG Collette
    183. L Cominsky
    184. M Constancio
    185. A Conte
    186. L Conti
    187. D Cook
    188. TR Corbitt
    189. N Cornish
    190. A Corsi
    191. S Cortese
    192. CA Costa
    193. MW Coughlin
    194. SB Coughlin
    195. JP Coulon
    196. ST Countryman
    197. P Couvares
    198. EE Cowan
    199. DM Coward
    200. MJ Cowart
    201. DC Coyne
    202. R Coyne
    203. K Craig
    204. JD Creighton
    205. TD Creighton
    206. J Cripe
    207. SG Crowder
    208. AM Cruise
    209. A Cumming
    210. L Cunningham
    211. E Cuoco
    212. T Dal Canton
    213. SL Danilishin
    214. S D'Antonio
    215. K Danzmann
    216. NS Darman
    217. CF Da Silva Costa
    218. V Dattilo
    219. I Dave
    220. HP Daveloza
    221. M Davier
    222. GS Davies
    223. EJ Daw
    224. R Day
    225. S De
    226. D DeBra
    227. G Debreczeni
    228. J Degallaix
    229. M De Laurentis
    230. S Deléglise
    231. W Del Pozzo
    232. T Denker
    233. T Dent
    234. H Dereli
    235. V Dergachev
    236. RT DeRosa
    237. R De Rosa
    238. R DeSalvo
    239. S Dhurandhar
    240. MC Díaz
    241. L Di Fiore
    242. M Di Giovanni
    243. A Di Lieto
    244. S Di Pace
    245. I Di Palma
    246. A Di Virgilio
    247. G Dojcinoski
    248. V Dolique
    249. F Donovan
    250. KL Dooley
    251. S Doravari
    252. R Douglas
    253. TP Downes
    254. M Drago
    255. RW Drever
    256. JC Driggers
    257. Z Du
    258. M Ducrot
    259. SE Dwyer
    260. TB Edo
    261. MC Edwards
    262. A Effler
    263. HB Eggenstein
    264. P Ehrens
    265. J Eichholz
    266. SS Eikenberry
    267. W Engels
    268. RC Essick
    269. T Etzel
    270. M Evans
    271. TM Evans
    272. R Everett
    273. M Factourovich
    274. V Fafone
    275. H Fair
    276. S Fairhurst
    277. X Fan
    278. Q Fang
    279. S Farinon
    280. B Farr
    281. WM Farr
    282. M Favata
    283. M Fays
    284. H Fehrmann
    285. MM Fejer
    286. D Feldbaum
    287. I Ferrante
    288. EC Ferreira
    289. F Ferrini
    290. F Fidecaro
    291. LS Finn
    292. I Fiori
    293. D Fiorucci
    294. RP Fisher
    295. R Flaminio
    296. M Fletcher
    297. H Fong
    298. JD Fournier
    299. S Franco
    300. S Frasca
    301. F Frasconi
    302. M Frede
    303. Z Frei
    304. A Freise
    305. R Frey
    306. V Frey
    307. TT Fricke
    308. P Fritschel
    309. VV Frolov
    310. P Fulda
    311. M Fyffe
    312. HA Gabbard
    313. JR Gair
    314. L Gammaitoni
    315. SG Gaonkar
    316. F Garufi
    317. A Gatto
    318. G Gaur
    319. N Gehrels
    320. G Gemme
    321. B Gendre
    322. E Genin
    323. A Gennai
    324. J George
    325. L Gergely
    326. V Germain
    327. A Ghosh
    328. A Ghosh
    329. S Ghosh
    330. JA Giaime
    331. KD Giardina
    332. A Giazotto
    333. K Gill
    334. A Glaefke
    335. JR Gleason
    336. E Goetz
    337. R Goetz
    338. L Gondan
    339. G González
    340. JM Gonzalez Castro
    341. A Gopakumar
    342. NA Gordon
    343. ML Gorodetsky
    344. SE Gossan
    345. M Gosselin
    346. R Gouaty
    347. C Graef
    348. PB Graff
    349. M Granata
    350. A Grant
    351. S Gras
    352. C Gray
    353. G Greco
    354. AC Green
    355. RJ Greenhalgh
    356. P Groot
    357. H Grote
    358. S Grunewald
    359. GM Guidi
    360. X Guo
    361. A Gupta
    362. MK Gupta
    363. KE Gushwa
    364. EK Gustafson
    365. R Gustafson
    366. JJ Hacker
    367. BR Hall
    368. ED Hall
    369. G Hammond
    370. M Haney
    371. MM Hanke
    372. J Hanks
    373. C Hanna
    374. MD Hannam
    375. J Hanson
    376. T Hardwick
    377. J Harms
    378. GM Harry
    379. IW Harry
    380. MJ Hart
    381. MT Hartman
    382. CJ Haster
    383. K Haughian
    384. J Healy
    385. J Heefner
    386. A Heidmann
    387. MC Heintze
    388. G Heinzel
    389. H Heitmann
    390. P Hello
    391. G Hemming
    392. M Hendry
    393. IS Heng
    394. J Hennig
    395. AW Heptonstall
    396. M Heurs
    397. S Hild
    398. D Hoak
    399. KA Hodge
    400. D Hofman
    401. SE Hollitt
    402. K Holt
    403. DE Holz
    404. P Hopkins
    405. DJ Hosken
    406. J Hough
    407. EA Houston
    408. EJ Howell
    409. YM Hu
    410. S Huang
    411. EA Huerta
    412. D Huet
    413. B Hughey
    414. S Husa
    415. SH Huttner
    416. T Huynh-Dinh
    417. A Idrisy
    418. N Indik
    419. DR Ingram
    420. R Inta
    421. HN Isa
    422. JM Isac
    423. M Isi
    424. G Islas
    425. T Isogai
    426. BR Iyer
    427. K Izumi
    428. MB Jacobson
    429. T Jacqmin
    430. H Jang
    431. K Jani
    432. P Jaranowski
    433. S Jawahar
    434. F Jiménez-Forteza
    435. WW Johnson
    436. NK Johnson-McDaniel
    437. DI Jones
    438. R Jones
    439. RJ Jonker
    440. L Ju
    441. K Haris
    442. CV Kalaghatgi
    443. V Kalogera
    444. S Kandhasamy
    445. G Kang
    446. JB Kanner
    447. S Karki
    448. M Kasprzack
    449. E Katsavounidis
    450. W Katzman
    451. S Kaufer
    452. T Kaur
    453. K Kawabe
    454. F Kawazoe
    455. F Kéfélian
    456. MS Kehl
    457. D Keitel
    458. DB Kelley
    459. W Kells
    460. R Kennedy
    461. DG Keppel
    462. JS Key
    463. A Khalaidovski
    464. FY Khalili
    465. I Khan
    466. S Khan
    467. Z Khan
    468. EA Khazanov
    469. N Kijbunchoo
    470. C Kim
    471. J Kim
    472. K Kim
    473. NG Kim
    474. N Kim
    475. YM Kim
    476. EJ King
    477. PJ King
    478. DL Kinzel
    479. JS Kissel
    480. L Kleybolte
    481. S Klimenko
    482. SM Koehlenbeck
    483. K Kokeyama
    484. S Koley
    485. V Kondrashov
    486. A Kontos
    487. S Koranda
    488. M Korobko
    489. WZ Korth
    490. I Kowalska
    491. DB Kozak
    492. V Kringel
    493. B Krishnan
    494. A Królak
    495. C Krueger
    496. G Kuehn
    497. P Kumar
    498. R Kumar
    499. L Kuo
    500. A Kutynia
    501. P Kwee
    502. BD Lackey
    503. M Landry
    504. J Lange
    505. B Lantz
    506. PD Lasky
    507. A Lazzarini
    508. C Lazzaro
    509. P Leaci
    510. S Leavey
    511. EO Lebigot
    512. CH Lee
    513. HK Lee
    514. HM Lee
    515. K Lee
    516. A Lenon
    517. M Leonardi
    518. JR Leong
    519. N Leroy
    520. N Letendre
    521. Y Levin
    522. BM Levine
    523. TG Li
    524. A Libson
    525. TB Littenberg
    526. NA Lockerbie
    527. J Logue
    528. AL Lombardi
    529. LT London
    530. JE Lord
    531. M Lorenzini
    532. V Loriette
    533. M Lormand
    534. G Losurdo
    535. JD Lough
    536. CO Lousto
    537. G Lovelace
    538. H Lück
    539. AP Lundgren
    540. J Luo
    541. R Lynch
    542. Y Ma
    543. T MacDonald
    544. B Machenschalk
    545. M MacInnis
    546. DM Macleod
    547. F Magaña-Sandoval
    548. RM Magee
    549. M Mageswaran
    550. E Majorana
    551. I Maksimovic
    552. V Malvezzi
    553. N Man
    554. I Mandel
    555. V Mandic
    556. V Mangano
    557. GL Mansell
    558. M Manske
    559. M Mantovani
    560. F Marchesoni
    561. F Marion
    562. S Márka
    563. Z Márka
    564. AS Markosyan
    565. E Maros
    566. F Martelli
    567. L Martellini
    568. IW Martin
    569. RM Martin
    570. DV Martynov
    571. JN Marx
    572. K Mason
    573. A Masserot
    574. TJ Massinger
    575. M Masso-Reid
    576. F Matichard
    577. L Matone
    578. N Mavalvala
    579. N Mazumder
    580. G Mazzolo
    581. R McCarthy
    582. DE McClelland
    583. S McCormick
    584. SC McGuire
    585. G McIntyre
    586. J McIver
    587. DJ McManus
    588. ST McWilliams
    589. D Meacher
    590. GD Meadors
    591. J Meidam
    592. A Melatos
    593. G Mendell
    594. D Mendoza-Gandara
    595. RA Mercer
    596. E Merilh
    597. M Merzougui
    598. S Meshkov
    599. C Messenger
    600. C Messick
    601. PM Meyers
    602. F Mezzani
    603. H Miao
    604. C Michel
    605. H Middleton
    606. EE Mikhailov
    607. L Milano
    608. J Miller
    609. M Millhouse
    610. Y Minenkov
    611. J Ming
    612. S Mirshekari
    613. C Mishra
    614. S Mitra
    615. VP Mitrofanov
    616. G Mitselmakher
    617. R Mittleman
    618. A Moggi
    619. M Mohan
    620. SR Mohapatra
    621. M Montani
    622. BC Moore
    623. CJ Moore
    624. D Moraru
    625. G Moreno
    626. SR Morriss
    627. K Mossavi
    628. B Mours
    629. CM Mow-Lowry
    630. CL Mueller
    631. G Mueller
    632. AW Muir
    633. A Mukherjee
    634. D Mukherjee
    635. S Mukherjee
    636. N Mukund
    637. A Mullavey
    638. J Munch
    639. DJ Murphy
    640. PG Murray
    641. A Mytidis
    642. I Nardecchia
    643. L Naticchioni
    644. RK Nayak
    645. V Necula
    646. K Nedkova
    647. G Nelemans
    648. M Neri
    649. A Neunzert
    650. G Newton
    651. TT Nguyen
    652. AB Nielsen
    653. S Nissanke
    654. A Nitz
    655. F Nocera
    656. D Nolting
    657. ME Normandin
    658. LK Nuttall
    659. J Oberling
    660. E Ochsner
    661. J O'Dell
    662. E Oelker
    663. GH Ogin
    664. JJ Oh
    665. SH Oh
    666. F Ohme
    667. M Oliver
    668. P Oppermann
    669. RJ Oram
    670. B O'Reilly
    671. R O'Shaughnessy
    672. CD Ott
    673. DJ Ottaway
    674. RS Ottens
    675. H Overmier
    676. BJ Owen
    677. A Pai
    678. SA Pai
    679. JR Palamos
    680. O Palashov
    681. C Palomba
    682. A Pal-Singh
    683. H Pan
    684. Y Pan
    685. C Pankow
    686. F Pannarale
    687. BC Pant
    688. F Paoletti
    689. A Paoli
    690. MA Papa
    691. HR Paris
    692. W Parker
    693. D Pascucci
    694. A Pasqualetti
    695. R Passaquieti
    696. D Passuello
    697. B Patricelli
    698. Z Patrick
    699. BL Pearlstone
    700. M Pedraza
    701. R Pedurand
    702. L Pekowsky
    703. A Pele
    704. S Penn
    705. A Perreca
    706. HP Pfeiffer
    707. M Phelps
    708. O Piccinni
    709. M Pichot
    710. M Pickenpack
    711. F Piergiovanni
    712. V Pierro
    713. G Pillant
    714. L Pinard
    715. IM Pinto
    716. M Pitkin
    717. JH Poeld
    718. R Poggiani
    719. P Popolizio
    720. A Post
    721. J Powell
    722. J Prasad
    723. V Predoi
    724. SS Premachandra
    725. T Prestegard
    726. LR Price
    727. M Prijatelj
    728. M Principe
    729. S Privitera
    730. R Prix
    731. GA Prodi
    732. L Prokhorov
    733. O Puncken
    734. M Punturo
    735. P Puppo
    736. M Pürrer
    737. H Qi
    738. J Qin
    739. V Quetschke
    740. EA Quintero
    741. R Quitzow-James
    742. FJ Raab
    743. DS Rabeling
    744. H Radkins
    745. P Raffai
    746. S Raja
    747. M Rakhmanov
    748. CR Ramet
    749. P Rapagnani
    750. V Raymond
    751. M Razzano
    752. V Re
    753. J Read
    754. CM Reed
    755. T Regimbau
    756. L Rei
    757. S Reid
    758. DH Reitze
    759. H Rew
    760. SD Reyes
    761. F Ricci
    762. K Riles
    763. NA Robertson
    764. R Robie
    765. F Robinet
    766. A Rocchi
    767. L Rolland
    768. JG Rollins
    769. VJ Roma
    770. JD Romano
    771. R Romano
    772. G Romanov
    773. JH Romie
    774. D Rosińska
    775. S Rowan
    776. A Rüdiger
    777. P Ruggi
    778. K Ryan
    779. S Sachdev
    780. T Sadecki
    781. L Sadeghian
    782. L Salconi
    783. M Saleem
    784. F Salemi
    785. A Samajdar
    786. L Sammut
    787. LM Sampson
    788. EJ Sanchez
    789. V Sandberg
    790. B Sandeen
    791. GH Sanders
    792. JR Sanders
    793. B Sassolas
    794. BS Sathyaprakash
    795. PR Saulson
    796. O Sauter
    797. RL Savage
    798. A Sawadsky
    799. P Schale
    800. R Schilling
    801. J Schmidt
    802. P Schmidt
    803. R Schnabel
    804. RM Schofield
    805. A Schönbeck
    806. E Schreiber
    807. D Schuette
    808. BF Schutz
    809. J Scott
    810. SM Scott
    811. D Sellers
    812. AS Sengupta
    813. D Sentenac
    814. V Sequino
    815. A Sergeev
    816. G Serna
    817. Y Setyawati
    818. A Sevigny
    819. DA Shaddock
    820. T Shaffer
    821. S Shah
    822. MS Shahriar
    823. M Shaltev
    824. Z Shao
    825. B Shapiro
    826. P Shawhan
    827. A Sheperd
    828. DH Shoemaker
    829. DM Shoemaker
    830. K Siellez
    831. X Siemens
    832. D Sigg
    833. AD Silva
    834. D Simakov
    835. A Singer
    836. LP Singer
    837. A Singh
    838. R Singh
    839. A Singhal
    840. AM Sintes
    841. BJ Slagmolen
    842. JR Smith
    843. MR Smith
    844. ND Smith
    845. RJ Smith
    846. EJ Son
    847. B Sorazu
    848. F Sorrentino
    849. T Souradeep
    850. AK Srivastava
    851. A Staley
    852. M Steinke
    853. J Steinlechner
    854. S Steinlechner
    855. D Steinmeyer
    856. BC Stephens
    857. SP Stevenson
    858. R Stone
    859. KA Strain
    860. N Straniero
    861. G Stratta
    862. NA Strauss
    863. S Strigin
    864. R Sturani
    865. AL Stuver
    866. TZ Summerscales
    867. L Sun
    868. PJ Sutton
    869. BL Swinkels
    870. MJ Szczepańczyk
    871. M Tacca
    872. D Talukder
    873. DB Tanner
    874. M Tápai
    875. SP Tarabrin
    876. A Taracchini
    877. R Taylor
    878. T Theeg
    879. MP Thirugnanasambandam
    880. EG Thomas
    881. M Thomas
    882. P Thomas
    883. KA Thorne
    884. KS Thorne
    885. E Thrane
    886. S Tiwari
    887. V Tiwari
    888. KV Tokmakov
    889. C Tomlinson
    890. M Tonelli
    891. CV Torres
    892. CI Torrie
    893. D Töyrä
    894. F Travasso
    895. G Traylor
    896. D Trifirò
    897. MC Tringali
    898. L Trozzo
    899. M Tse
    900. M Turconi
    901. D Tuyenbayev
    902. D Ugolini
    903. CS Unnikrishnan
    904. AL Urban
    905. SA Usman
    906. H Vahlbruch
    907. G Vajente
    908. G Valdes
    909. M Vallisneri
    910. N van Bakel
    911. M van Beuzekom
    912. JF van den Brand
    913. C Van Den Broeck
    914. DC Vander-Hyde
    915. L van der Schaaf
    916. JV van Heijningen
    917. AA van Veggel
    918. M Vardaro
    919. S Vass
    920. M Vasúth
    921. R Vaulin
    922. A Vecchio
    923. G Vedovato
    924. J Veitch
    925. PJ Veitch
    926. K Venkateswara
    927. D Verkindt
    928. F Vetrano
    929. A Viceré
    930. S Vinciguerra
    931. DJ Vine
    932. JY Vinet
    933. S Vitale
    934. T Vo
    935. H Vocca
    936. C Vorvick
    937. D Voss
    938. WD Vousden
    939. SP Vyatchanin
    940. AR Wade
    941. LE Wade
    942. M Wade
    943. SJ Waldman
    944. M Walker
    945. L Wallace
    946. S Walsh
    947. G Wang
    948. H Wang
    949. M Wang
    950. X Wang
    951. Y Wang
    952. H Ward
    953. RL Ward
    954. J Warner
    955. M Was
    956. B Weaver
    957. LW Wei
    958. M Weinert
    959. AJ Weinstein
    960. R Weiss
    961. T Welborn
    962. L Wen
    963. P Weßels
    964. T Westphal
    965. K Wette
    966. JT Whelan
    967. SE Whitcomb
    968. DJ White
    969. BF Whiting
    970. K Wiesner
    971. C Wilkinson
    972. PA Willems
    973. L Williams
    974. RD Williams
    975. AR Williamson
    976. JL Willis
    977. B Willke
    978. MH Wimmer
    979. L Winkelmann
    980. W Winkler
    981. CC Wipf
    982. AG Wiseman
    983. H Wittel
    984. G Woan
    985. J Worden
    986. JL Wright
    987. G Wu
    988. J Yablon
    989. I Yakushin
    990. W Yam
    991. H Yamamoto
    992. CC Yancey
    993. MJ Yap
    994. H Yu
    995. M Yvert
    996. A Zadrożny
    997. L Zangrando
    998. M Zanolin
    999. JP Zendri
    1000. M Zevin
    1001. F Zhang
    1002. L Zhang
    1003. M Zhang
    1004. Y Zhang
    1005. C Zhao
    1006. M Zhou
    1007. Z Zhou
    1008. XJ Zhu
    1009. ME Zucker
    1010. SE Zuraw
    1011. J Zweizig
    1012. LIGO Scientific Collaboration and Virgo Collaboration
    (2016)
    Physical Review Letters 116:061102.
    https://doi.org/10.1103/PhysRevLett.116.061102
  2. 2
  3. 3
  4. 4
  5. 5
    Reproducibility: automated
    1. B Beaulieu-Jones
    2. C Greene
    (2017)
    Accessed February 26, 2019.
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
    Optimisation and landscape analysis of computational biology models: a case study
    1. K Doherty
    (2017)
    In: Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO '17). pp. 1644–1651.
    https://doi.org/10.1145/3067695.3084609
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
    Algorithmic bias: from discrimination discovery to Fairness-Aware data mining part 1 & 2
    1. S Hajian
    2. F Bonchi
    3. C Castillo
    (2016)
    Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
    https://doi.org/10.1145/2939672.2945386
  16. 16
  17. 17
  18. 18
    Deep Thinking: Where Machine Intelligence Ends and Human Creativity Begins
    1. G Kasparov
    (2017)
    London: John Murray.
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
    Rainbow of open science practices
    1. B Kramer
    2. J Bosman
    (2018)
    Zenodo, 10.5281/zenodo.1147025.
  24. 24
    ImageNet Classification with Deep Convolutional Neural Network
    1. A Krizhevsky
    2. I Sutskever
    3. GE Hinton
    (2012)
    In: K. Q Weinberger, F Pereira, C. J. C Burges, L Bottou, editors. Advances in Neural Information Processing Systems, 25.  Curran Associates, Inc. pp. 1097–1105.
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
    Automated hypothesis generation based on mining scientific literature
    1. S Spangler
    (2014)
    In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
    https://doi.org/10.1145/2623330.2623667
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43

Article and author information

Author details

  1. Daphne Ezer

    Daphne Ezer is at the Alan Turing Institute, London, and in the Department of Statistics, University of Warwick, Coventry, United Kingdom

    Contribution
    Conceptualization, Writing—original draft, Writing—review and editing
    Contributed equally with
    Kirstie Whitaker
    For correspondence
    dezer@turing.ac.uk
    Competing interests
    No competing interests declared
    Additional information
    Corresponding author
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1685-6909
  2. Kirstie Whitaker

    Kirstie Whitaker is at the Alan Turing Institute, London, and in the Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom

    Contribution
    Conceptualization, Writing—original draft, Writing—review and editing
    Contributed equally with
    Daphne Ezer
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8498-4059

Funding

Engineering and Physical Sciences Research Council (EP/S001360/1)

  • Daphne Ezer

Alan Turing Institute (TU/A/000017)

  • Daphne Ezer
  • Kirstie Whitaker

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank the speakers and attendees at the Data Science for Experimental Design Workshop, and Anneca York and the Events Team at the Alan Turing Institute.

Publication history

  1. Received: November 28, 2018
  2. Accepted: February 27, 2019
  3. Version of Record published: March 6, 2019 (version 1)

Copyright

© 2019, Ezer and Whitaker

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 2,035
    Page views
  • 234
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Computational and Systems Biology
    2. Microbiology and Infectious Disease
    Nir Drayman et al.
    Research Article
    1. Computational and Systems Biology
    2. Physics of Living Systems
    Aby Joseph et al.
    Tools and Resources