Integrating Large Data into Plant Science

21–22 April 2016
Dartington Hall, Totnes, Devon


Sabina Leonelli, Ruth Bastow, Geraint Parry, David Salt

Aims of the workshop

This workshop brought together prominent biologists, data scientists, database leads, publishers, representatives of learned societies and funders to discuss ways of harnessing and integrating large plant data to foster discovery.

Over the last decade, data infrastructures such as cloud, grids and repositories have garnered attention and funding as crucial tools to facilitate the re-use of existing datasets. This is a complex task, and within plant science a variety of strategies have been developed to collect, combine and mine research data for new purposes.

This workshop aimed to review these strategies, identify examples of best practices and successful re-use both within and beyond plant science, and discussed both technical and institutional conditions for effective data mining.

In particular, workshop participants

  1. assessed how effective the mining of existing large datasets and their re-use by others has been in advancing plant biology,
  2. identified current bottlenecks and barriers in the data dissemination, mining and reuse pipelines,
  3. highlighted areas of plant science that are falling behind in the big data era,
  4. explored how data infrastructures can effectively harness community knowledge,
  5. evaluated business models and incentives for data users to donate resources, acknowledge databases and provide feedback that can be widely shared and provide added value to the resource.

The workshop was jointly organized by the Exeter Centre for the Study of Life Sciences (Egenis) and GARNet, with funding from BBSRC and the European Research Council. This was a follow-up from the 2012 GARNet/Egenis data sharing workshop exploring how successful data sharing projects work, how data are being integrated and how the process should be improved, which resulted in a report available here.

Participation to this workshop was free, but numbers were limited. Prospective participants had to register as soon as possible by emailing a brief statement of motivation for interest in the workshop to This email address is being protected from spambots. You need JavaScript enabled to view it..

Meeting Schedule

Workshop booklet with schedule and abstracts available here as PDF. All presentations are 20min talks plus 5min discussion.

Thursday April 21st

12:00-12:45 Lunch

12:45-1:15 Introduction: Ruth Bastow & Sabina Leonelli.

1:15-3:00 Session 1: Cases of Data Re-Use. Chair and introduction: David Salt.

  • Angela Hancock (MFPL, University of Vienna): Data integration for evolutionary analysis.
  • Gordon Simpson (University of Dundee): What do genomes really encode? Analysing Arabidopsis transcriptomes and epitranscriptomes.
  • George Bassel (University of Birmingham): 3D digital single cell analysis.
  • Dan Bebber (University of Exeter): Big data and the global food security debate.

3:00-3:30 Coffee break

3:30-5:35 Session 2: Data Infrastructures to Foster Re-Use. Chair and introduction: David Studholme. [20min talks, 5min questions]

  • Carole Goble (University of Manchester): FAIRDOM — FAIR asset management and sharing experiences in systems biology.
  • David Salt (iHub, University of Aberdeen): iHub — an information and collaborative management platform for ionomic research.
  • Nick Provart (BAR, University of Toronto): Raising the BAR for hypothesis generation in plant biology using large data sets.
  • Tomasz Zielinki (BioDare, University of Edinburgh): Tipping the balance — introducing data management on a centre-wide level.
  • David Johnson (OERC, University of Oxford): Data Infrastructures to foster data re-use.

5:45-6:45 Panel Discussion 1: Challenges of Re-Use. Chair: Geraint Parry. Panelists:

  • Angela Hancock
  • George Bassel
  • Carole Goble
  • David Salt
  • Nick Provart

7:30 Conference dinner

Friday April 22nd

9:00-11:05 Session 3: Integrating Community Knowledge. Chair and Introduction: George Littlejohn.

  • Eva Huala (TAIR/ Phoenix Bioinformatics): Integration of community data to produce high quality foundational datasets.
  • Matthew Vaughn (Texas Advanced Computing Centre, Araport): Arabidopsis Information Portal (AIP): A Community-Extensible Platform for Open, Reusable Data and Visualisation.
  • Elizabeth Arnauld (CropOntology): CropOntology — traits and variables harmonizing field data for meta-analysis.
  • Georgios Gkoutos (University of Birmingham): Exploring the phenome.
  • Dan MacLean (The Sainsbury Lab, Norwich): Crowdsourcing from scientists — how much bang for very little buck?

11:05-11:30 Coffee break

11:30-12:30 Panel Discussion 2: Strategies for Community Involvement. Chair: Ruth Bastow. Panelists:

  • Eva Huala
  • Matthew Vaughn
  • Georgios Gkoutos
  • Dan MacLean

12:30-13:30 Lunch

13:30-15:00 Panel Discussion 3: Business models for data infrastructures. Chair: Sabina Leonelli.

  • Geoffrey Boulton OBE (CODATA and Royal Society)
  • Rowan McKibbin/ Michael Ball (BBSRC)
  • Derek Scuffel (Syngenta)
  • Chris Surridge (Nature Plants)
  • Rebecca Cunning (Journal of Experimental Botany)

15:00-15:30 Concluding session and steps forward.

Workshop presentations

Available presentations are the organizers' introduction, and by Eva Huala, Matt Vaughn, Nick Provart, and David Salt.

Workshop summary in tweets

You can see what was tweeted during the workshop below (or here for a stand-alone version, until 16 May 2018, when Storify shuts down).