Computational and Systems Biology

Point of View: Data science for the scientific life cycle

Alan Turing Institute, United Kingdom
University of Warwick, United Kingdom
University of Cambridge, United Kingdom

Mar 6, 2019

Open access
Copyright information

Download
Cite
CommentOpen annotations (there are currently 0 annotations on this page).
Share

Article
Figures and data
Abstract
Introduction
Data science for planning experiments
Data science for performing experiments
Data science for reproducible data analysis
Data science for hypothesis generation
The challenges of translating theory into practice
References
Article and author information
Metrics

Abstract

Data science can be incorporated into every stage of a scientific study. Here we describe how data science can be used to generate hypotheses, to design experiments, to perform experiments, and to analyse data. We also present our vision for how data science techniques will be an integral part of the laboratory of the future.

Introduction

A key tenet of the scientific method is that we learn from previous work. In principle we observe something about the world and generate a hypothesis. We then design an experiment to test that hypothesis, set up the experiment, collect the data and analyse the results. And when we report our results and interpretation of them in a paper, we make it possible for other researchers to build on our work.

In practice, there are impediments at every step of the process. In particular, our work depends on published research that often does not contain all the information required to reproduce what was reported. There are too many possible experimental parameters to test under our time and budget constraints, so we make decisions that affect how we interpret the outcomes of our experiments. As researchers, we should not be complacent about these obstacles: rather, we should always look towards new technologies, such as data science, to help us improve the quality and efficiency of scientific research.

Data science could easily be dismissed as a simple rebranding of "science" – after all, nearly all scientists analyse data in some form. An alternative definition of a data scientist is someone who develops new computational or statistical analysis techniques that can easily be adapted to a wide range of scenarios, or who can apply these techniques to answer a specific scientific question. While there is no clear dividing line between data science and statistics, data science generally involves larger datasets. Moreover, data scientists often think in terms of training predictive models that can be applied to other datasets, rather than limiting the analysis to an existing dataset.

Data science emerged as a discipline largely because the internet led to the creation of incredibly large datasets (such as ImageNet, a database of 14 million annotated images; Krizhevsky et al., 2012). The availability of these datasets enabled researchers to apply a variety of machine learning algorithms which, in turn, led to the development of new techniques for analysing large datasets. One area in which progress has been rapid is the automated annotation and interpretation of images and texts on the internet, and these techniques are now being applied to other data-rich domains, including genetics and genomics (Libbrecht and Noble, 2015) and the study of gravitational waves (Abbott et al., 2016).

It is clear that data science can inform the analysis of an experiment, either to test a specific hypothesis or to make sense of large datasets that have been collected without a specific hypothesis in mind. What is less obvious, albeit equally important, is how these techniques can improve other aspects of the scientific method, such as the generation of hypotheses and the design of experiments.

Data science is an inherently interdisciplinary approach to science. New experimental techniques have revolutionised biology over the years, from DNA sequencing and microarrays in the past to CRISPR and cryo-EM more recently. Data science differs in that it is not a single technique, but rather a framework for solving a whole range of problems. The potential for data science to answer questions in a range of different disciplines is what excites so many researchers. That said, however, there are social challenges that cannot be fixed with a technical solution, and it is all too easy for expertise to be "lost in translation" when people from different academic backgrounds come together.

In October 2018, we brought together statisticians, experimental researchers, and social scientists who study the behaviour of academics in the lab (and in the wild) at a workshop at the Alan Turing Institute in London to discuss how we can harness the power of data science to make each stage of the scientific life cycle more efficient and effective. Here we summarise the key points that emerged from the workshop, and propose a framework for integrating data science techniques into every part of the research process (Figure 1). Statistical methods can optimise the power of an experiment by selecting which observations should be collected. Robotics and software pipelines can automate data collection and analysis, and incorporate machine learning analyses to adaptively update the experimental design based on incoming data. And the traditional output of research, a static PDF manuscript, can be enhanced to include analysis code and well-documented datasets to make the next iteration of the cycle faster and more efficient. We also highlight several of the challenges, both technical and social, that must be overcome to translate theory into practice, and share our vision for the laboratory of the future.

Figure 1

Download asset Open asset

Integrating data science into the scientific life cycle.

Data science can be used to generate new hypotheses, optimally design which observations should be collected, automate and provide iterative feedback on this design as data are being observed, reproducibly analyse the information, and share all research outputs in a way that is findable, accessible, interoperable and reusable (FAIR).We propose a virtuous cycle through which experiments can effectively and efficiently "stand on the shoulders" of previous work in order to generate new scientific insights.

Data science for planning experiments

Hypothesis-driven research usually requires a scientist to change an independent variable and measure a dependent variable. However, there are often too many parameters to take account of. In plant science, for instance, these parameters might include temperature, exposure to light, access to water and nutrients, humidity and so on, and the plant might respond to a change in each of these in a context-dependent way.

Data scientists interested in designing optimal experiments must find ways of transforming a scientific question into an optimisation problem. For instance, let us say that a scientist wants to fit a regression model of how temperature and light exposure influence wheat growth. Initially they might measure the height of the wheat at a number of combinations of temperature and light exposure. Then, the scientist could ask: what other combinations of temperature and light exposure should I grow the wheat at in order to improve my ability to predict wheat growth, considering the cost and time constraints of the project?

At the workshop Stefanie Biedermann (University of Southampton) discussed how to transform a wide range of experimental design questions into optimisation problems. She and her colleagues have applied these methods to find optimal ways of selecting parameters for studies of enzyme kinetics (Dette and Biedermann, 2003) and medical applications (Tompsett et al., 2018). Other researchers have used data science to increase the production of a drug while reducing unwanted by-products (Overstall et al., 2018). The process iteratively builds on small number of initial experiments that are conducted with different choices of experimental parameters (such as reagent concentrations, temperature or timing): new experimental parameters are then suggested until the optimal set are identified.

Optimal experimental design can also help researchers fit parameters of dynamical models, which can help them develop a mechanistic understanding of biological systems. Ozgur Akman (University of Exeter) focuses on dynamic models that can explain how gene expression changes over time and where the model parameters represent molecular properties, such as the transcription rates or mRNA degradation rates. As an example he explained how he had used this approach to find the optimal parameters for a mathematical model for the circadian clock (Aitken and Akman, 2013). Akman also described how it is possible to search for possible gene regulatory networks that can explain existing experimental data (Doherty, 2017), and then select new experiments to help distinguish between these alternative hypotheses (Sverchkov and Craven, 2017). For instance, the algorithm might suggest performing a certain gene knockout experiment, followed by RNA-seq, to gain more information about the network structure.

A clear message from the workshop was that statisticians need to be involved in the experimental design process as early as possible, rather than being asked to analyze the data at the end of a project. Involving statisticians before data collection makes it more likely the scientist will be able to answer the research questions they are interested in. Another clear message was that the data, software, infrastructure and the protocols generated during a research project were just as important as the results and interpretations that constitute a scientific paper.

Data science for performing experiments

In order to effectively plan an experiment, it is necessary to have some preliminary data as a starting point. Moreover, ensuring that the data collected during a particular experiment is used to inform the planning process for future experiments will make the whole process more efficient. For standard molecular biology experiments, this kind of feedback loop can be achieved through laboratory automation.

Ross King (University of Manchester) and co-workers have developed the first robot scientists – laboratory robots that physically perform experiments and use machine learning to generate hypothesis, plan experiments, and perform deductive reasoning to come to scientific conclusions. The first of these robots, Adam, successfully identified the yeast genes encoding orphan enzymes (King et al., 2009), and the second, Eve, intelligently screened drug targets for neglected tropical diseases (Williams et al., 2015). King is convinced that robot scientists improves research productivity, and also helps scientists to develop a better understanding of science as a process (King et al., 2018). For instance, an important step towards building these robotic scientists was the development of a formal language for describing scientific discoveries. Humans might enjoy reading about scientific discoveries that are described in English or some other human language, but such languages are subject to ambiguity and exaggeration. However, translating the deductive logic of research projects into the formal languages of "robotic scientists" should lead to a more precise description of our scientific conclusions (Sparkes et al., 2010).

Let us imagine that a research team observe that plants with a gene knockout are shorter than wild type plants. Their written report of the experiment will state that this gene knockout results in shorter plants. They are likely to leave unsaid the caveat that this result was only observed under their experimental set-up and, therefore, that this may not be the case under all possible experimental parameters. The mutant might be taller than a wild type plant under certain lighting conditions or temperatures that were not tested in the original study. In the future, researchers may be able to write their research outcomes in an unambiguous way so that it is clear that the evidence came from a very specific experimental set-up. The work done to communicate these conditions in a computer-readable format will benefit the human scientists who extend and replicate the original work.

Even though laboratory automation technology has existed for a number of years, it has yet to be widely incorporated into academic research environments. Laboratory automation is full of complex hardware that is difficult to use, but a few start-ups are beginning to build tools to help researchers communicate with their laboratory robots more effectively. Vishal Sanchania (Synthace) discussed how their software tool Antha enables scientists to easily develop workflows for controlling laboratory automation. Furthermore, these workflows can be iterative: that is, data collected by the laboratory robots can be used within the workflow to plan the next experimental procedure (Fell et al., 2018).

One benefit of having robotic platforms perform experiments as a service is that researchers are able to publish their experimental protocols as executable code, which any other researcher, from anywhere around the world, can run on another automated laboratory system, improving the reproducibility of experiments.

Data science for reproducible data analysis

As the robot scientists (and their creators) realised, there is a lot more information that must be captured and shared for another researcher to reproduce an experiment. It is important that data collection and its analysis are reproducible. All too often, there is no way to verify the results in published papers because the reader does not have access to the data, nor to the information needed to repeat the same, often complex, analyses (Ioannidis et al., 2014). At our workshop Rachael Ainsworth (University of Manchester) highlighted Peng’s description of the reproducibility spectrum, which ranges from “publication only” to "full replication" with linked and executable code and data (Peng, 2011). Software engineering tools and techniques that are commonly applied in data science projects can nudge researchers towards the full replication end of the spectrum. These tools include interactive notebooks (Jupyter, Rmarkdown), version control and collaboration tools (git, GitHub, GitLab), package managers and containers to capture computational environments (Conda, Docker), and workflows to test and continuously integrate updates to the project (Travis CI). See Beaulieu-Jones and Greene, 2017 for an overview of how to repurpose these tools for scientific analyses.

Imaging has always been a critical technology for cell and developmental biology (Burel et al., 2015), ever since scientists looked at samples through a microscope and made drawings of what they saw. Photography came next, followed by digital image capture and analysis. Sébastien Besson (University of Dundee) presented a candidate for the next technology in this series, a set of open-source software and format standards called the Open Microscopy Environment (OME). This technology has already supported projects as diverse as the development a deep learning classifier to identify patients with clinical heart failure (Nirschl et al., 2018), to the generation of ultra-large high resolution electron microscopy maps in human, mouse and zebrafish tissue (Faas et al., 2012).

The OME project also subscribes to the philosophy that data must be FAIR: findable, accessible, interoperable and reusable (Wilkinson et al., 2016). It does this as follows: i) data are made findable by hosting them online and providing links to the papers the data have been used in; ii) data are made accessible through an open API (application programming interface) and the availability of highly curated metadata; iii) data are made interoperable via the Bio-Formats software, which allows more than 150 proprietary imaging file formats to be converted into a variety of open formats using a common vocabulary (Linkert et al., 2010); iv) data, software and other outputs are made reusable under permissive open licences or through "copyleft" licences which require the user to release anything they derive from the resource under the same open licence. (Alternatively, companies can pay for private access through Glencoe Software which provides a commercially licenced version of the OME suite of tools.).

Each group who upload data to be shared through OME's Image Data Resource can choose their own license for sharing their data, although they are strongly encouraged to use the most open of the creative commons licenses (CC-BY or CC0). When shared in this way, these resources open up new avenues for replication and verification studies, methods development, and exploratory work that leads to the generation of new hypotheses.

Data science for hypothesis generation

A hypothesis is essentially an "educated guess" by a researcher about what they think will happen when they do an experiment. A new hypothesis usually comes from theoretical models or from a desire to extend previously published experimental research. However, the traditional process of hypothesis generation is limited by the amount knowledge an individual researcher can hold in their head and the number of papers they can read each year, and it is also susceptible to their personal biases (van Helden, 2013).

In contrast, machine learning techniques such as text mining of published abstracts or electronic health records (Oquendo et al., 2012), or exploratory meta-analyses of datasets pooled from laboratories around the world, can be used for automated, reproducible and transparent hypothesis generation. For example, teams at IBM Research have mined 100,000 academic papers to identify new protein kinases that interact with a protein tumour suppressor (Spangler, 2014) and predicted hospital re-admissions from the electronic health records of 5,000 patients (Xiao et al., 2018). However, the availability of datasets, ideally datasets that are FAIR, is a prerequisite for automated hypothesis generation (Hall and Pesenti, 2017).

The challenges of translating theory into practice

When we use data science techniques to design and analyse experiments, we need to ensure that the techniques we use are transparent and interpretable. And when we use robot scientists to design, perform and analyse experiments, we need to ensure that science continues to explore a broad range of scientific questions. Other challenges include avoiding positive feedback loops and algorithmic bias, equipping scientists with the skills they need to thrive in this new multidisciplinary environment, and ensuring that scientists in the global south are not left behind. We discuss all these points in more detail below.

Interpreting experimental outcomes

When data science is used to make decisions about how to perform an experiment, we need to ensure that scientists calibrate their level of trust appropriately. On one hand, biologists will need to relinquish some level of control and to trust the computer program to make important decisions for them. On the other hand, we must make sure that scientists do not trust the algorithms so much that they stop thinking critically about the outcomes of their experiments and end up misinterpreting their results. We also want to ensure that scientists remain creative and open to serendipitous discoveries.

We discussed above the importance of having a formal (machine-readable) language that can be used to describe both scientific ideas and experimental protocols to robots. However, it is equally important that the results and conclusions of these experiments are expressed in a human-understandable format. Ultimately, the goal of science is not just to be able to predict natural phenomenon, but also to give humans a deeper insight into the mechanisms driving the observed processes. Some machine learning methods, such as deep learning, while excellent for predicting an outcome, suffer from a lack of interpretability (Angermueller et al., 2016). How to balance predictability and interpretability for the human reader is an open question in machine learning.

Positive feedback loops and algorithmic bias

As with all applications of data science to new disciplines, there are risks related to algorithmic bias (Hajian et al., 2016). Recently there have been some concerns over algorithmic bias related to face-recognition of criminals – the face-recognition software was more likely to report a false-positive of a black face than a white face due to biases in the dataset that the software was trained on (Snow, 2017; Buolamwini and Gebru, 2018). Societal parameters shape the input data that is fed into machine learning models, and if actions are taken on the basis of their output, these societal biases will only be amplified.

There are parallel issues with data science for experimental biology – for instance there are certain research questions that are popular within a community through accidents of history. Many people study model organisms such as roundworms and fruit flies because early genetics researchers studied them, and now there are more experimental tools that have been tried and tested on them – a positive feedback loop (Stoeger et al., 2018).

We need to be careful to ensure that any attempt to design experiments has the correct balance between exploring new research ideas and exploiting the existing data and experimental tools available in well-established sub-disciplines.

Implementation and training

According to Chris Mellingwood (University of Edinburgh), some biologists are amphibious and fluidly move between "wet" laboratories and "dry" computing (Mellingwood, 2017). However, many biologists do not know how to code or do not have the required mathematical background to be able to reframe their research questions as data science problems, so it may be difficult for biologists to find ways of using these new tools to design experiments in their own laboratories. They might not even realise that there is a tool available to help them resolve the experimental design problems that they face. Researchers may need specialised training in order to learn how to interact with data science tools in a efficient and effective way.

Reproducible data analysis alone requires an understanding of version control, at least one, if not multiple, programming languages, techniques such as testing, containerisation and continuous integration. Machine learning and optimisation algorithms require detailed statistical knowledge along with the technical expertise – sometimes including high performance computing skills – to implement them. Requiring all these skills along with the robotics engineering expertise to build an automated lab is outside of the capacity of most researchers trained by the current system.

Infrastructure and accessibility

Even once a system is built, it needs to be constantly adapted as science progresses. There is a risk that by the time a platform is developed, it might be out of date. Sarah Abel (University of Iceland) discussed how university incentive systems do not always reward the types of activities that would be required for incorporating data science into a laboratory, such as interdisciplinary collaborations or maintenance of long-term infrastructure.

Furthermore, due to the burden of developing and maintaining the infrastructure needed for this new approach to science, some researchers may be left behind. Louise Bezuidenhout (University of Oxford) explained that even though one of the goals of "data science for experimental design" is to have open and "accessible" data available around the world, scientists in the global south might not have access to computing resources needed for this approach (Bezuidenhout et al., 2017). Therefore, we need to consider how the benefits of data science and laboratory automation techniques are felt around the world.

Augmentation, not automation

As we discuss the role of data science in the cycle of research, we need to be aware that these technologies should be used to augment, not replace, human researchers. These new tools will release researchers from the tasks that a machine can do well, giving them time and space to work on the tasks that only humans can do. Humans are able think "out-of-the-box", while the behaviour of any algorithm will inherently be restricted by its code.

Perhaps the last person you might imagine supporting the integration of artificial and human intelligence is Garry Kasparov, chess grand master. Kasparov lost to the IBM supercomputer Deep Blue in 1997 but more than 20 years later he is optimistic about the potential for machines to provide insights into how humans see the world (Kasparov, 2017). An example that is closer to the life sciences is the citizen science game BrainDr, in which participants quickly assess the quality of brain imaging scans by swiping left or right (Keshavan et al., 2018). Over time, there were enough ratings to train an algorithm to automatically assess the quality of the images. The tool saves researchers thousands of hours, permits the quality assessment of very large datasets, improves the reliability of the results, and is really fun!

So where does this leave the biologist of the future? Experimentalists can continue to do creative research by, for example, designing new protocols that enable them to study new phenomena and measure new variables. There are also many experimental protocols that are performed so rarely that it would be inefficient to automate them. However, humans will not need to carry out standard protocols, using as purifying DNA, but they might still need to know how to perform various specialised tasks, such as dissecting specimens. They will also need to constantly update the robotic platform to incorporate new experimental protocols.

Finally and most crucially, the biologist of the future will need to provide feedback into the cycle of research – providing insight into what hypotheses are interesting to the community, thinking deeply about how experimental results fit into the theories proposed by the broader community, and finding innovative connections across disciplinary boundaries. Essentially, they will be focused on the varied and interesting parts of science, rather than the mundane and repetitive parts.

Box 1.

What can you do now?

Planning an experiment

In the lab of the future, we envision that experimental parameters will be chosen in a theoretically sound way, rather than through ad hoc human decision making. There are already plenty of tools to help researchers plan their experiments, including tools for selecting optimal time points for conducting an experiment (Kleyman et al., 2017; Ezer and Keir, 2018), a collection of R packages that enable optimisation of experimental design (CRAN) and the acebayes package, which takes prior information about the system as input, and then designs experiments that are most likely to produce the best outputs (Overstall et al., 2017).

Performing an experiment

In the future, standard molecular biology experiments will be performed by robots, and executable experimental protocols will be published alongside each journal article to ensure reproducibility. Although many labs do not have access to laboratory automation, there are many associated techniques that will improve the reproducibility of research. For instance, systems like Protocols.io can help researchers to describe protocols in unambiguous ways that can be easily understood by other researchers. Sharing laboratory know-how, using tools such as OpenWetWare, will also enable a tighter feedback look between performing and planning experiments.

Analysing experimental data

In a few years, we hope that many more data formats and pipelines will be standardised, reproducible and open-access. For researchers who are most comfortable in a wet-lab environment, the article "Five selfish reasons to work reproducibly" makes a strong case for learning how to code (Markowetz, 2015). Jupyter notebooks are an easy way to share data analyses with embedded text descriptions, executable code snippets, and figures. Workshops run by The Carpentries are a good way to learn software skills such as the unix shell, version control with git, and programming in Python or R or domain specific techniques for data wrangling, analysis and visualisation.

Sharing your work

For anyone keen to know more about archiving their data, preprints, open access and the pre-registration of studies, we recommend the "Rainbow of Open Science Practices" (Kramer and Bosman, 2018) and Rachael Ainsworth's slides from our workshop (Ainsworth, 2018).

Generating new hypotheses

Pooling studies across laboratories and textual analysis of publications will help identify scientific questions worth studying. The beginning of a new cycle of research might start with an automated search of the literature for similar research (Extance, 2018) with tools such as Semantic Scholar from the Allen Institute for Artificial Intelligence. Alternatively, you could search for new data to investigate using Google's Dataset Search, or more specific resources from the European Bioinformatics Institute or National Institute of Mental Health Data Archive.

References

1. Abbott BP
2. Abbott R
3. Abbott TD
4. Abernathy MR
5. Acernese F
6. Ackley K
7. Adams C
8. Adams T
9. Addesso P
10. Adhikari RX
11. Adya VB
12. Affeldt C
13. Agathos M
14. Agatsuma K
15. Aggarwal N
16. Aguiar OD
17. Aiello L
18. Ain A
19. Ajith P
20. Allen B
21. Allocca A
22. Altin PA
23. Anderson SB
24. Anderson WG
25. Arai K
26. Arain MA
27. Araya MC
28. Arceneaux CC
29. Areeda JS
30. Arnaud N
31. Arun KG
32. Ascenzi S
33. Ashton G
34. Ast M
35. Aston SM
36. Astone P
37. Aufmuth P
38. Aulbert C
39. Babak S
40. Bacon P
41. Bader MK
42. Baker PT
43. Baldaccini F
44. Ballardin G
45. Ballmer SW
46. Barayoga JC
47. Barclay SE
48. Barish BC
49. Barker D
50. Barone F
51. Barr B
52. Barsotti L
53. Barsuglia M
54. Barta D
55. Bartlett J
56. Barton MA
57. Bartos I
58. Bassiri R
59. Basti A
60. Batch JC
61. Baune C
62. Bavigadda V
63. Bazzan M
64. Behnke B
65. Bejger M
66. Belczynski C
67. Bell AS
68. Bell CJ
69. Berger BK
70. Bergman J
71. Bergmann G
72. Berry CP
73. Bersanetti D
74. Bertolini A
75. Betzwieser J
76. Bhagwat S
77. Bhandare R
78. Bilenko IA
79. Billingsley G
80. Birch J
81. Birney R
82. Birnholtz O
83. Biscans S
84. Bisht A
85. Bitossi M
86. Biwer C
87. Bizouard MA
88. Blackburn JK
89. Blair CD
90. Blair DG
91. Blair RM
92. Bloemen S
93. Bock O
94. Bodiya TP
95. Boer M
96. Bogaert G
97. Bogan C
98. Bohe A
99. Bojtos P
100. Bond C
101. Bondu F
102. Bonnand R
103. Boom BA
104. Bork R
105. Boschi V
106. Bose S
107. Bouffanais Y
108. Bozzi A
109. Bradaschia C
110. Brady PR
111. Braginsky VB
112. Branchesi M
113. Brau JE
114. Briant T
115. Brillet A
116. Brinkmann M
117. Brisson V
118. Brockill P
119. Brooks AF
120. Brown DA
121. Brown DD
122. Brown NM
123. Buchanan CC
124. Buikema A
125. Bulik T
126. Bulten HJ
127. Buonanno A
128. Buskulic D
129. Buy C
130. Byer RL
131. Cabero M
132. Cadonati L
133. Cagnoli G
134. Cahillane C
135. Calderón Bustillo J
136. Callister T
137. Calloni E
138. Camp JB
139. Cannon KC
140. Cao J
141. Capano CD
142. Capocasa E
143. Carbognani F
144. Caride S
145. Casanueva Diaz J
146. Casentini C
147. Caudill S
148. Cavaglià M
149. Cavalier F
150. Cavalieri R
151. Cella G
152. Cepeda CB
153. Cerboni Baiardi L
154. Cerretani G
155. Cesarini E
156. Chakraborty R
157. Chalermsongsak T
158. Chamberlin SJ
159. Chan M
160. Chao S
161. Charlton P
162. Chassande-Mottin E
163. Chen HY
164. Chen Y
165. Cheng C
166. Chincarini A
167. Chiummo A
168. Cho HS
169. Cho M
170. Chow JH
171. Christensen N
172. Chu Q
173. Chua S
174. Chung S
175. Ciani G
176. Clara F
177. Clark JA
178. Cleva F
179. Coccia E
180. Cohadon PF
181. Colla A
182. Collette CG
183. Cominsky L
184. Constancio M
185. Conte A
186. Conti L
187. Cook D
188. Corbitt TR
189. Cornish N
190. Corsi A
191. Cortese S
192. Costa CA
193. Coughlin MW
194. Coughlin SB
195. Coulon JP
196. Countryman ST
197. Couvares P
198. Cowan EE
199. Coward DM
200. Cowart MJ
201. Coyne DC
202. Coyne R
203. Craig K
204. Creighton JD
205. Creighton TD
206. Cripe J
207. Crowder SG
208. Cruise AM
209. Cumming A
210. Cunningham L
211. Cuoco E
212. Dal Canton T
213. Danilishin SL
214. D'Antonio S
215. Danzmann K
216. Darman NS
217. Da Silva Costa CF
218. Dattilo V
219. Dave I
220. Daveloza HP
221. Davier M
222. Davies GS
223. Daw EJ
224. Day R
225. De S
226. DeBra D
227. Debreczeni G
228. Degallaix J
229. De Laurentis M
230. Deléglise S
231. Del Pozzo W
232. Denker T
233. Dent T
234. Dereli H
235. Dergachev V
236. DeRosa RT
237. De Rosa R
238. DeSalvo R
239. Dhurandhar S
240. Díaz MC
241. Di Fiore L
242. Di Giovanni M
243. Di Lieto A
244. Di Pace S
245. Di Palma I
246. Di Virgilio A
247. Dojcinoski G
248. Dolique V
249. Donovan F
250. Dooley KL
251. Doravari S
252. Douglas R
253. Downes TP
254. Drago M
255. Drever RW
256. Driggers JC
257. Du Z
258. Ducrot M
259. Dwyer SE
260. Edo TB
261. Edwards MC
262. Effler A
263. Eggenstein HB
264. Ehrens P
265. Eichholz J
266. Eikenberry SS
267. Engels W
268. Essick RC
269. Etzel T
270. Evans M
271. Evans TM
272. Everett R
273. Factourovich M
274. Fafone V
275. Fair H
276. Fairhurst S
277. Fan X
278. Fang Q
279. Farinon S
280. Farr B
281. Farr WM
282. Favata M
283. Fays M
284. Fehrmann H
285. Fejer MM
286. Feldbaum D
287. Ferrante I
288. Ferreira EC
289. Ferrini F
290. Fidecaro F
291. Finn LS
292. Fiori I
293. Fiorucci D
294. Fisher RP
295. Flaminio R
296. Fletcher M
297. Fong H
298. Fournier JD
299. Franco S
300. Frasca S
301. Frasconi F
302. Frede M
303. Frei Z
304. Freise A
305. Frey R
306. Frey V
307. Fricke TT
308. Fritschel P
309. Frolov VV
310. Fulda P
311. Fyffe M
312. Gabbard HA
313. Gair JR
314. Gammaitoni L
315. Gaonkar SG
316. Garufi F
317. Gatto A
318. Gaur G
319. Gehrels N
320. Gemme G
321. Gendre B
322. Genin E
323. Gennai A
324. George J
325. Gergely L
326. Germain V
327. Ghosh A
328. Ghosh A
329. Ghosh S
330. Giaime JA
331. Giardina KD
332. Giazotto A
333. Gill K
334. Glaefke A
335. Gleason JR
336. Goetz E
337. Goetz R
338. Gondan L
339. González G
340. Gonzalez Castro JM
341. Gopakumar A
342. Gordon NA
343. Gorodetsky ML
344. Gossan SE
345. Gosselin M
346. Gouaty R
347. Graef C
348. Graff PB
349. Granata M
350. Grant A
351. Gras S
352. Gray C
353. Greco G
354. Green AC
355. Greenhalgh RJ
356. Groot P
357. Grote H
358. Grunewald S
359. Guidi GM
360. Guo X
361. Gupta A
362. Gupta MK
363. Gushwa KE
364. Gustafson EK
365. Gustafson R
366. Hacker JJ
367. Hall BR
368. Hall ED
369. Hammond G
370. Haney M
371. Hanke MM
372. Hanks J
373. Hanna C
374. Hannam MD
375. Hanson J
376. Hardwick T
377. Harms J
378. Harry GM
379. Harry IW
380. Hart MJ
381. Hartman MT
382. Haster CJ
383. Haughian K
384. Healy J
385. Heefner J
386. Heidmann A
387. Heintze MC
388. Heinzel G
389. Heitmann H
390. Hello P
391. Hemming G
392. Hendry M
393. Heng IS
394. Hennig J
395. Heptonstall AW
396. Heurs M
397. Hild S
398. Hoak D
399. Hodge KA
400. Hofman D
401. Hollitt SE
402. Holt K
403. Holz DE
404. Hopkins P
405. Hosken DJ
406. Hough J
407. Houston EA
408. Howell EJ
409. Hu YM
410. Huang S
411. Huerta EA
412. Huet D
413. Hughey B
414. Husa S
415. Huttner SH
416. Huynh-Dinh T
417. Idrisy A
418. Indik N
419. Ingram DR
420. Inta R
421. Isa HN
422. Isac JM
423. Isi M
424. Islas G
425. Isogai T
426. Iyer BR
427. Izumi K
428. Jacobson MB
429. Jacqmin T
430. Jang H
431. Jani K
432. Jaranowski P
433. Jawahar S
434. Jiménez-Forteza F
435. Johnson WW
436. Johnson-McDaniel NK
437. Jones DI
438. Jones R
439. Jonker RJ
440. Ju L
441. Haris K
442. Kalaghatgi CV
443. Kalogera V
444. Kandhasamy S
445. Kang G
446. Kanner JB
447. Karki S
448. Kasprzack M
449. Katsavounidis E
450. Katzman W
451. Kaufer S
452. Kaur T
453. Kawabe K
454. Kawazoe F
455. Kéfélian F
456. Kehl MS
457. Keitel D
458. Kelley DB
459. Kells W
460. Kennedy R
461. Keppel DG
462. Key JS
463. Khalaidovski A
464. Khalili FY
465. Khan I
466. Khan S
467. Khan Z
468. Khazanov EA
469. Kijbunchoo N
470. Kim C
471. Kim J
472. Kim K
473. Kim NG
474. Kim N
475. Kim YM
476. King EJ
477. King PJ
478. Kinzel DL
479. Kissel JS
480. Kleybolte L
481. Klimenko S
482. Koehlenbeck SM
483. Kokeyama K
484. Koley S
485. Kondrashov V
486. Kontos A
487. Koranda S
488. Korobko M
489. Korth WZ
490. Kowalska I
491. Kozak DB
492. Kringel V
493. Krishnan B
494. Królak A
495. Krueger C
496. Kuehn G
497. Kumar P
498. Kumar R
499. Kuo L
500. Kutynia A
501. Kwee P
502. Lackey BD
503. Landry M
504. Lange J
505. Lantz B
506. Lasky PD
507. Lazzarini A
508. Lazzaro C
509. Leaci P
510. Leavey S
511. Lebigot EO
512. Lee CH
513. Lee HK
514. Lee HM
515. Lee K
516. Lenon A
517. Leonardi M
518. Leong JR
519. Leroy N
520. Letendre N
521. Levin Y
522. Levine BM
523. Li TG
524. Libson A
525. Littenberg TB
526. Lockerbie NA
527. Logue J
528. Lombardi AL
529. London LT
530. Lord JE
531. Lorenzini M
532. Loriette V
533. Lormand M
534. Losurdo G
535. Lough JD
536. Lousto CO
537. Lovelace G
538. Lück H
539. Lundgren AP
540. Luo J
541. Lynch R
542. Ma Y
543. MacDonald T
544. Machenschalk B
545. MacInnis M
546. Macleod DM
547. Magaña-Sandoval F
548. Magee RM
549. Mageswaran M
550. Majorana E
551. Maksimovic I
552. Malvezzi V
553. Man N
554. Mandel I
555. Mandic V
556. Mangano V
557. Mansell GL
558. Manske M
559. Mantovani M
560. Marchesoni F
561. Marion F
562. Márka S
563. Márka Z
564. Markosyan AS
565. Maros E
566. Martelli F
567. Martellini L
568. Martin IW
569. Martin RM
570. Martynov DV
571. Marx JN
572. Mason K
573. Masserot A
574. Massinger TJ
575. Masso-Reid M
576. Matichard F
577. Matone L
578. Mavalvala N
579. Mazumder N
580. Mazzolo G
581. McCarthy R
582. McClelland DE
583. McCormick S
584. McGuire SC
585. McIntyre G
586. McIver J
587. McManus DJ
588. McWilliams ST
589. Meacher D
590. Meadors GD
591. Meidam J
592. Melatos A
593. Mendell G
594. Mendoza-Gandara D
595. Mercer RA
596. Merilh E
597. Merzougui M
598. Meshkov S
599. Messenger C
600. Messick C
601. Meyers PM
602. Mezzani F
603. Miao H
604. Michel C
605. Middleton H
606. Mikhailov EE
607. Milano L
608. Miller J
609. Millhouse M
610. Minenkov Y
611. Ming J
612. Mirshekari S
613. Mishra C
614. Mitra S
615. Mitrofanov VP
616. Mitselmakher G
617. Mittleman R
618. Moggi A
619. Mohan M
620. Mohapatra SR
621. Montani M
622. Moore BC
623. Moore CJ
624. Moraru D
625. Moreno G
626. Morriss SR
627. Mossavi K
628. Mours B
629. Mow-Lowry CM
630. Mueller CL
631. Mueller G
632. Muir AW
633. Mukherjee A
634. Mukherjee D
635. Mukherjee S
636. Mukund N
637. Mullavey A
638. Munch J
639. Murphy DJ
640. Murray PG
641. Mytidis A
642. Nardecchia I
643. Naticchioni L
644. Nayak RK
645. Necula V
646. Nedkova K
647. Nelemans G
648. Neri M
649. Neunzert A
650. Newton G
651. Nguyen TT
652. Nielsen AB
653. Nissanke S
654. Nitz A
655. Nocera F
656. Nolting D
657. Normandin ME
658. Nuttall LK
659. Oberling J
660. Ochsner E
661. O'Dell J
662. Oelker E
663. Ogin GH
664. Oh JJ
665. Oh SH
666. Ohme F
667. Oliver M
668. Oppermann P
669. Oram RJ
670. O'Reilly B
671. O'Shaughnessy R
672. Ott CD
673. Ottaway DJ
674. Ottens RS
675. Overmier H
676. Owen BJ
677. Pai A
678. Pai SA
679. Palamos JR
680. Palashov O
681. Palomba C
682. Pal-Singh A
683. Pan H
684. Pan Y
685. Pankow C
686. Pannarale F
687. Pant BC
688. Paoletti F
689. Paoli A
690. Papa MA
691. Paris HR
692. Parker W
693. Pascucci D
694. Pasqualetti A
695. Passaquieti R
696. Passuello D
697. Patricelli B
698. Patrick Z
699. Pearlstone BL
700. Pedraza M
701. Pedurand R
702. Pekowsky L
703. Pele A
704. Penn S
705. Perreca A
706. Pfeiffer HP
707. Phelps M
708. Piccinni O
709. Pichot M
710. Pickenpack M
711. Piergiovanni F
712. Pierro V
713. Pillant G
714. Pinard L
715. Pinto IM
716. Pitkin M
717. Poeld JH
718. Poggiani R
719. Popolizio P
720. Post A
721. Powell J
722. Prasad J
723. Predoi V
724. Premachandra SS
725. Prestegard T
726. Price LR
727. Prijatelj M
728. Principe M
729. Privitera S
730. Prix R
731. Prodi GA
732. Prokhorov L
733. Puncken O
734. Punturo M
735. Puppo P
736. Pürrer M
737. Qi H
738. Qin J
739. Quetschke V
740. Quintero EA
741. Quitzow-James R
742. Raab FJ
743. Rabeling DS
744. Radkins H
745. Raffai P
746. Raja S
747. Rakhmanov M
748. Ramet CR
749. Rapagnani P
750. Raymond V
751. Razzano M
752. Re V
753. Read J
754. Reed CM
755. Regimbau T
756. Rei L
757. Reid S
758. Reitze DH
759. Rew H
760. Reyes SD
761. Ricci F
762. Riles K
763. Robertson NA
764. Robie R
765. Robinet F
766. Rocchi A
767. Rolland L
768. Rollins JG
769. Roma VJ
770. Romano JD
771. Romano R
772. Romanov G
773. Romie JH
774. Rosińska D
775. Rowan S
776. Rüdiger A
777. Ruggi P
778. Ryan K
779. Sachdev S
780. Sadecki T
781. Sadeghian L
782. Salconi L
783. Saleem M
784. Salemi F
785. Samajdar A
786. Sammut L
787. Sampson LM
788. Sanchez EJ
789. Sandberg V
790. Sandeen B
791. Sanders GH
792. Sanders JR
793. Sassolas B
794. Sathyaprakash BS
795. Saulson PR
796. Sauter O
797. Savage RL
798. Sawadsky A
799. Schale P
800. Schilling R
801. Schmidt J
802. Schmidt P
803. Schnabel R
804. Schofield RM
805. Schönbeck A
806. Schreiber E
807. Schuette D
808. Schutz BF
809. Scott J
810. Scott SM
811. Sellers D
812. Sengupta AS
813. Sentenac D
814. Sequino V
815. Sergeev A
816. Serna G
817. Setyawati Y
818. Sevigny A
819. Shaddock DA
820. Shaffer T
821. Shah S
822. Shahriar MS
823. Shaltev M
824. Shao Z
825. Shapiro B
826. Shawhan P
827. Sheperd A
828. Shoemaker DH
829. Shoemaker DM
830. Siellez K
831. Siemens X
832. Sigg D
833. Silva AD
834. Simakov D
835. Singer A
836. Singer LP
837. Singh A
838. Singh R
839. Singhal A
840. Sintes AM
841. Slagmolen BJ
842. Smith JR
843. Smith MR
844. Smith ND
845. Smith RJ
846. Son EJ
847. Sorazu B
848. Sorrentino F
849. Souradeep T
850. Srivastava AK
851. Staley A
852. Steinke M
853. Steinlechner J
854. Steinlechner S
855. Steinmeyer D
856. Stephens BC
857. Stevenson SP
858. Stone R
859. Strain KA
860. Straniero N
861. Stratta G
862. Strauss NA
863. Strigin S
864. Sturani R
865. Stuver AL
866. Summerscales TZ
867. Sun L
868. Sutton PJ
869. Swinkels BL
870. Szczepańczyk MJ
871. Tacca M
872. Talukder D
873. Tanner DB
874. Tápai M
875. Tarabrin SP
876. Taracchini A
877. Taylor R
878. Theeg T
879. Thirugnanasambandam MP
880. Thomas EG
881. Thomas M
882. Thomas P
883. Thorne KA
884. Thorne KS
885. Thrane E
886. Tiwari S
887. Tiwari V
888. Tokmakov KV
889. Tomlinson C
890. Tonelli M
891. Torres CV
892. Torrie CI
893. Töyrä D
894. Travasso F
895. Traylor G
896. Trifirò D
897. Tringali MC
898. Trozzo L
899. Tse M
900. Turconi M
901. Tuyenbayev D
902. Ugolini D
903. Unnikrishnan CS
904. Urban AL
905. Usman SA
906. Vahlbruch H
907. Vajente G
908. Valdes G
909. Vallisneri M
910. van Bakel N
911. van Beuzekom M
912. van den Brand JF
913. Van Den Broeck C
914. Vander-Hyde DC
915. van der Schaaf L
916. van Heijningen JV
917. van Veggel AA
918. Vardaro M
919. Vass S
920. Vasúth M
921. Vaulin R
922. Vecchio A
923. Vedovato G
924. Veitch J
925. Veitch PJ
926. Venkateswara K
927. Verkindt D
928. Vetrano F
929. Viceré A
930. Vinciguerra S
931. Vine DJ
932. Vinet JY
933. Vitale S
934. Vo T
935. Vocca H
936. Vorvick C
937. Voss D
938. Vousden WD
939. Vyatchanin SP
940. Wade AR
941. Wade LE
942. Wade M
943. Waldman SJ
944. Walker M
945. Wallace L
946. Walsh S
947. Wang G
948. Wang H
949. Wang M
950. Wang X
951. Wang Y
952. Ward H
953. Ward RL
954. Warner J
955. Was M
956. Weaver B
957. Wei LW
958. Weinert M
959. Weinstein AJ
960. Weiss R
961. Welborn T
962. Wen L
963. Weßels P
964. Westphal T
965. Wette K
966. Whelan JT
967. Whitcomb SE
968. White DJ
969. Whiting BF
970. Wiesner K
971. Wilkinson C
972. Willems PA
973. Williams L
974. Williams RD
975. Williamson AR
976. Willis JL
977. Willke B
978. Wimmer MH
979. Winkelmann L
980. Winkler W
981. Wipf CC
982. Wiseman AG
983. Wittel H
984. Woan G
985. Worden J
986. Wright JL
987. Wu G
988. Yablon J
989. Yakushin I
990. Yam W
991. Yamamoto H
992. Yancey CC
993. Yap MJ
994. Yu H
995. Yvert M
996. Zadrożny A
997. Zangrando L
998. Zanolin M
999. Zendri JP
1000. Zevin M
1001. Zhang F
1002. Zhang L
1003. Zhang M
1004. Zhang Y
1005. Zhao C
1006. Zhou M
1007. Zhou Z
1008. Zhu XJ
1009. Zucker ME
1010. Zuraw SE
1011. Zweizig J
1012. LIGO Scientific Collaboration and Virgo Collaboration
(2016) Observation of gravitational waves from a binary black hole merger
Physical Review Letters 116:061102.

https://doi.org/10.1103/PhysRevLett.116.061102
- PubMed
- Google Scholar
Conference
1. Ainsworth R
(2018) Reproducibility and open science
Data Science for Experimental Design (DSED).

https://doi.org/10.5281/zenodo.1464853
- Google Scholar
1. Aitken S
2. Akman OE
(2013) Nested sampling for parameter inference in systems biology: application to an exemplar circadian model
BMC Systems Biology 7:72.

https://doi.org/10.1186/1752-0509-7-72
- PubMed
- Google Scholar
(2016) Deep learning for computational biology
Molecular Systems Biology 12:878.

https://doi.org/10.15252/msb.20156651
- PubMed
- Google Scholar
Website
1. Beaulieu-Jones B
2. Greene C
(2017) Reproducibility: automated
Accessed February 26, 2019.

https://elifesciences.org/labs/e623676c/reproducibility-automated
(2017) ‘$100 Is Not Much To You’: Open Science and neglected accessibilities for scientific research in Africa
Critical Public Health 27:39–49.

https://doi.org/10.1080/09581596.2016.1252832
- Google Scholar
Website
1. Buolamwini J
2. Gebru T
(2018) Gender shades: intersectional accuracy disparities in commercial gender classification (PMLR 81:77-91)
Accessed February 26, 2019.

http://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a.pdf
1. Burel JM
2. Besson S
3. Blackburn C
4. Carroll M
5. Ferguson RK
6. Flynn H
7. Gillen K
8. Leigh R
9. Li S
10. Lindner D
11. Linkert M
12. Moore WJ
13. Ramalingam B
14. Rozbicki E
15. Tarkowska A
16. Walczysko P
17. Allan C
18. Moore J
19. Swedlow JR
(2015) Publishing and sharing multi-dimensional image data with OMERO
Mammalian Genome 26:441–447.

https://doi.org/10.1007/s00335-015-9587-6
- PubMed
- Google Scholar
1. Dette H
2. Biedermann S
(2003) Robust and efficient designs for the Michaelis–Menten model
Journal of the American Statistical Association 98:679–686.

https://doi.org/10.1198/016214503000000585
- Google Scholar
Conference
1. Doherty K
(2017) Optimisation and landscape analysis of computational biology models: a case study
In: Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO '17). pp. 1644–1651.

https://doi.org/10.1145/3067695.3084609
- Google Scholar
1. Extance A
(2018) How AI technology can tame the scientific literature
Nature 561:273–274.

https://doi.org/10.1038/d41586-018-06617-5
- PubMed
- Google Scholar
Preprint
1. Ezer D
2. Keir JC
(2018) Selection of time points for costly experiments: a comparison between human intuition and computer-aided experimental design
bioRxiv.

https://doi.org/10.1101/301796
- Google Scholar
(2012) Virtual nanoscopy: generation of ultra-large high resolution electron microscopy maps
Journal of Cell Biology 198:457–469.

https://doi.org/10.1083/jcb.201201140
- PubMed
- Google Scholar
Website
1. Fell T
2. Ward S
3. Gershater M
4. Watson M
5. Crane P
6. Wiederhold R
(2018) Computer-Aided biology
Accessed February 26, 2019.

https://static1.squarespace.com/static/5af46322620b851d41f3f64f/t/5bb1d987e5e5f08a8c7fb24a/1538383791006/Computer_Aided_Biology_Synthace_10_18.pdf
Conference
(2016) Algorithmic bias: from discrimination discovery to Fairness-Aware data mining part 1 & 2
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

https://doi.org/10.1145/2939672.2945386
- Google Scholar
Website
1. Hall W
2. Pesenti J
(2017) Growing the artificial intelligence industry in the UK
Accessed February 26, 2019.

https://www.gov.uk/government/publications/growing-the-artificial-intelligence-industry-in-the-uk
(2014) Publication and other reporting biases in cognitive sciences: detection, prevalence, and prevention
Trends in Cognitive Sciences 18:235–241.

https://doi.org/10.1016/j.tics.2014.02.010
- PubMed
- Google Scholar
Book
1. Kasparov G
(2017)
Deep Thinking: Where Machine Intelligence Ends and Human Creativity Begins

London: John Murray.
- Google Scholar
Preprint
(2018) Combining citizen science and deep learning to amplify expertise in neuroimaging
bioRxiv.

https://doi.org/10.1101/363382
- Google Scholar
1. King RD
2. Rowland J
3. Aubrey W
4. Liakata M
5. Markham M
6. Soldatova LN
7. Whelan KE
8. Clare A
9. Young M
10. Sparkes A
11. Oliver SG
12. Pir P
(2009) The robot scientist Adam
Computer 42:46–54.

https://doi.org/10.1109/MC.2009.270
- Google Scholar
(2018) Automating sciences: philosophical and social dimensions
IEEE Technology and Society Magazine 37:40–46.

https://doi.org/10.1109/MTS.2018.2795097
- Google Scholar
1. Kleyman M
2. Sefer E
3. Nicola T
4. Espinoza C
5. Chhabra D
6. Hagood JS
7. Kaminski N
8. Ambalavanan N
9. Bar-Joseph Z
(2017) Selecting the most appropriate time points to profile in high-throughput studies
eLife 6:e18541.

https://doi.org/10.7554/eLife.18541
- PubMed
- Google Scholar
1. Kramer B
2. Bosman J
(2018) Rainbow of open science practices
Zenodo.

https://doi.org/10.5281/zenodo.1147025
- Google Scholar
Book
(2012)
ImageNet Classification with Deep Convolutional Neural Network

In: Weinberger K. Q, Pereira F, Burges C. J. C, Bottou L, editors. Advances in Neural Information Processing Systems, 25. Curran Associates, Inc. pp. 1097–1105.
- Google Scholar
1. Libbrecht MW
2. Noble WS
(2015) Machine learning applications in genetics and genomics
Nature Reviews Genetics 16:321–332.

https://doi.org/10.1038/nrg3920
- PubMed
- Google Scholar
1. Linkert M
2. Rueden CT
3. Allan C
4. Burel JM
5. Moore W
6. Patterson A
7. Loranger B
8. Moore J
9. Neves C
10. Macdonald D
11. Tarkowska A
12. Sticco C
13. Hill E
14. Rossner M
15. Eliceiri KW
16. Swedlow JR
(2010) Metadata matters: access to image data in the real world
Journal of Cell Biology 189:777–782.

https://doi.org/10.1083/jcb.201004104
- PubMed
- Google Scholar
1. Markowetz F
(2015) Five selfish reasons to work reproducibly
Genome Biology 16:274.

https://doi.org/10.1186/s13059-015-0850-7
- PubMed
- Google Scholar
Website
1. Mellingwood C
(2017) What about the frogs?: reflections on 'Community and Identity in the Techno-Sciences' workshop
Accessed February 26, 2019.

https://blogs.sps.ed.ac.uk/engineering-life/2017/03/30/what-about-the-frogs-reflections-on-community-and-identity-in-the-techno-sciences-workshop/
(2018) A deep-learning classifier identifies patients with clinical heart failure using whole-slide images of H&E tissue
PloS One 13:e0192726.

https://doi.org/10.1371/journal.pone.0192726
- PubMed
- Google Scholar
(2012) Machine learning and data mining: strategies for hypothesis generation
Molecular Psychiatry 17:956–959.

https://doi.org/10.1038/mp.2011.173
- PubMed
- Google Scholar
Preprint
(2017) Acebayes: an R package for bayesian optimal design of experiments via approximate coordinate exchange
arXiv.

https://arxiv.org/abs/1705.08096
- Google Scholar
Website
(2018) Bayesian prediction for physical models with application to the optimization of the synthesis of pharmaceutical products using chemical kinetics computational statistics & data analysis
Accessed February 26, 2019.

https://eprints.soton.ac.uk/425529/
1. Peng RD
(2011) Reproducible research in computational science
Science 334:1226–1227.

https://doi.org/10.1126/science.1213847
- PubMed
- Google Scholar
Website
1. Snow J
(2017) Amazon's face recognition falsely matched 28 members of congress with mugshots
Accessed February 26, 2019.

https://www.aclu.org/blog/privacy-technology/surveillance-technologies/amazons-face-recognition-falsely-matched-28
Conference
1. Spangler S
(2014) Automated hypothesis generation based on mining scientific literature
In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

https://doi.org/10.1145/2623330.2623667
- Google Scholar
1. Sparkes A
2. Aubrey W
3. Byrne E
4. Clare A
5. Khan MN
6. Liakata M
7. Markham M
8. Rowland J
9. Soldatova LN
10. Whelan KE
11. Young M
12. King RD
(2010) Towards robot scientists for autonomous scientific discovery
Automated Experimentation 2:1.

https://doi.org/10.1186/1759-4499-2-1
- PubMed
- Google Scholar
(2018) Large-scale investigation of the reasons why potentially important genes are ignored
PLOS Biology 16:e2006643.

https://doi.org/10.1371/journal.pbio.2006643
- PubMed
- Google Scholar
1. Sverchkov Y
2. Craven M
(2017) A review of active learning approaches to experimental design for uncovering biological networks
PLOS Computational Biology 13:e1005466.

https://doi.org/10.1371/journal.pcbi.1005466
- PubMed
- Google Scholar
(2018) Simultaneous confidence sets for several effective doses
Biometrical Journal 60:703–720.

https://doi.org/10.1002/bimj.201700161
- PubMed
- Google Scholar
1. van Helden P
(2013) Data-driven hypotheses
EMBO Reports 14:104.

https://doi.org/10.1038/embor.2012.207
- PubMed
- Google Scholar
1. Wilkinson MD
2. Dumontier M
3. Aalbersberg IJ
4. Appleton G
5. Axton M
6. Baak A
7. Blomberg N
8. Boiten JW
9. da Silva Santos LB
10. Bourne PE
11. Bouwman J
12. Brookes AJ
13. Clark T
14. Crosas M
15. Dillo I
16. Dumon O
17. Edmunds S
18. Evelo CT
19. Finkers R
20. Gonzalez-Beltran A
21. Gray AJ
22. Groth P
23. Goble C
24. Grethe JS
25. Heringa J
26. 't Hoen PA
27. Hooft R
28. Kuhn T
29. Kok R
30. Kok J
31. Lusher SJ
32. Martone ME
33. Mons A
34. Packer AL
35. Persson B
36. Rocca-Serra P
37. Roos M
38. van Schaik R
39. Sansone SA
40. Schultes E
41. Sengstag T
42. Slater T
43. Strawn G
44. Swertz MA
45. Thompson M
46. van der Lei J
47. van Mulligen E
48. Velterop J
49. Waagmeester A
50. Wittenburg P
51. Wolstencroft K
52. Zhao J
53. Mons B
(2016) The FAIR guiding principles for scientific data management and stewardship
Scientific Data 3:160018.

https://doi.org/10.1038/sdata.2016.18
- PubMed
- Google Scholar
1. Williams K
2. Bilsland E
3. Sparkes A
4. Aubrey W
5. Young M
6. Soldatova LN
7. De Grave K
8. Ramon J
9. de Clare M
10. Sirawaraporn W
11. Oliver SG
12. King RD
(2015) Cheaper faster drug development validated by the repositioning of drugs against neglected tropical diseases
Journal of the Royal Society Interface 12:20141289.

https://doi.org/10.1098/rsif.2014.1289
- PubMed
- Google Scholar
1. Xiao C
2. Ma T
3. Dieng AB
4. Blei DM
5. Wang F
(2018) Readmission prediction via deep contextual embedding of clinical concepts
PLOS ONE 13:e0195024.

https://doi.org/10.1371/journal.pone.0195024
- Google Scholar

Article and author information

Author details

Daphne Ezer

Daphne Ezer is at the Alan Turing Institute, London, and in the Department of Statistics, University of Warwick, Coventry, United Kingdom

Contribution
Conceptualization, Writing—original draft, Writing—review and editing

Contributed equally with
Kirstie Whitaker

For correspondence
dezer@turing.ac.uk

Competing interests
No competing interests declared

Additional information
Corresponding author

"This ORCID iD identifies the author of this article:" 0000-0002-1685-6909
Kirstie Whitaker

Kirstie Whitaker is at the Alan Turing Institute, London, and in the Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom

Contribution
Conceptualization, Writing—original draft, Writing—review and editing

Contributed equally with
Daphne Ezer

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0001-8498-4059

Funding

Engineering and Physical Sciences Research Council (EP/S001360/1)

Daphne Ezer

Alan Turing Institute (TU/A/000017)

Daphne Ezer
Kirstie Whitaker

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank the speakers and attendees at the Data Science for Experimental Design Workshop, and Anneca York and the Events Team at the Alan Turing Institute.

Publication history

Received: November 28, 2018
Accepted: February 27, 2019
Version of Record published: March 6, 2019

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.