Towards Responsible Plant Data Linkage: Global Challenges for Food Security and Governance (March 2021)

An Alan Turing Institute & University of Exeter Workshop

Date and time: March 5, 12, 19 & 26, 14:00–16:00 BST.
Location: Online (Zoom) due to the Covid-19 pandemic
Organisers: Hugh Williamson, Sabina Leonelli

Please find the programme and videos/slides of each presentation below! (click on presentation titles)

Funding kindly provided by the Alan Turing Institute (project From Local Fields to Global Indicators) and the University of Exeter. Administrative assistance from Egenis and the Institute for Data Science and AI.

Workshop brief

The development of reliable infrastructures for managing and linking plant data has become critical to international efforts to ensure global food security. Understanding and addressing the complex environmental and socioeconomic challenges of the twenty-first century, including the impact of climate change on agriculture and the persistent structures of poverty identified in the UN’s Sustainable Development Goals, requires integrating data of multiple types and from diverse sources and domains, ranging from basic plant science through crop field trials, socioeconomic studies and climate modelling. Much progress has been made on the development of tools and specifications for data sharing, standardisation and analysis, but the diversity, multiplicity and rapid development of these resources is creating technical challenges in linking them effectively and consistently. Alongside these technical challenges, data linkage poses political and social challenges that need to be addressed in order to ensure that data-centric solutions to food security are equitable, responsible and accountable, qualities that will be essential to their long-term resilience.

This workshop will examine the contemporary contours of such challenges through sustained engagement with current and historical initiatives and discussion of best practices and prospective future directions for ensuring responsible data linkage. To this aim, it will bring together representatives of key global initiatives for plant data with scholars in the history, philosophy and social studies of plant and agricultural science, thus combining technical expertise in data governance with an in-depth understanding of local situations of data use as well as their historical, social and scientific contexts and implications. This exchange of perspectives will provide a novel platform for addressing the technological and social implications of data linkage for food security together and in detail.

The workshop will be divided into four sessions, each with a distinct theme, held over the course of four weeks. Following the workshop, papers and commentaries will be developed and assembled into an edited collection in Open Access format, which will be published in summer 2020.

Programme

Introduction & Session 1: Experiences from The Trenches (March 5, 14:00-16:00 BST)

How is data managed in practice? To start the workshop, this session will discuss case studies of plant data use and linkage in the context of particular research projects and breeding programs, drawn from contemporary experience as well as historical research. Consideration of these cases will ground the thematic discussion of the following sessions, and provide an opportunity to reflect on the practical dimensions of the various challenges of data linkage and their solutions. This session will also begin with a general introduction to the online workshop goals and format by the organisers.

14:00

Introduction by organisers

14:10

Between Subsistence and Agronomy: Carl Linnaeus (1707-1778) on Famine Foods
Staffan Müller-Wille (University of Cambridge)

Having witnessed a catastrophic famine in his native province Småland in 1725, food, and especially famine foods – or what to eat when nothing is left to eat – were always on the mind of the Swedish naturalist Carl Linnaeus (1707–1778). In my contribution, I will focus on information Linnaeus collected during his Laplandic Journey (1732) about food sources used by settler and reindeer-herding communities in the very North of Sweden. Following the trajectories of this information in later works of Linnaeus, I will show how he possessed a keen eye for the way in which sustainable subsistence practices problematize accepted definitions of food and blur dividing lines between the wild and the cultivated, foraging and agriculture, and poverty and wealth. At the same time, Linnaeus propagated the idea of the North as a barren wilderness that needed to be “cultivated,” resulting in the displacement of extant livelihoods by an extractive plantation economy. This tension, I will argue, was intrinsic to Linnaeus’s taxonomic enterprise, which was infused by a logic of re-placement that continues to inform current efforts to attain food security.

14:35

Managing Data in Crop Breeding: A Hundred Year Challenge
Richard Harrison (NIAB) & Mario Caccamo (NIAB)

The rediscovery of Mendelian genetics at the dawn of the 20th century ushered in a revolution in agriculture. For the first time, varieties with known performance characteristics were systematically developed, based upon the principles of heredity and the genetic control of traits. This inevitably led to questions over the uniformity, distinctiveness and stability of distributed genetic material throughout the supply chain and led to the development and implementation of data standards for measurement of key traits of agronomic importance. A prime example of this is the co-development of certification standards by NIAB in partnership with the Plant Breeding Institute (PBI) and the seed industry in the early 20th Century. These systems have ultimately led to our modern-day varietal testing and certification systems and meant that records of the output much of the breeding progress of the past 100 years were kept in a reliable and robust system.

Within many breeding programmes, records stretch back over 100 years and for many, the early years of publicly funded breeding programmes data are published in annual reports and are in the public domain. From the 1980’s onwards, as public breeding programmes (in many European countries) have traversed into the private sector, many proprietary datasets are no longer public, apart from at the point of release, where national and recommended listing systems exist, which serve not only to evaluate the relative performance of varieties, but provide key trait information. As there is no longer an imperative to release all data into the public domain, in some cases this has led to a relaxation in data capture standards, coupled with the proliferation of digital standards over the past thirty years leading to issues around data longevity. Yet, maintenance, curation and linkage of historical data can prove valuable, as we will demonstrate through examples, utilising examples drawn from across NIAB.

With the advent of multi-omic data and the proliferation of data types used within modern breeding programmes, the enormity of managing and maintaining gold-standard record keeping has never been harder. New skills sets and infrastructure are needed, that are often siloed within sectors. Moreover, the perception of the febrile nature of cloud-based or open-source platforms has led to inertia over their adoption. The risk of valuable data loss is now greater than ever before reducing the options for future exploitation.

We speculate as to what approaches may be best to ensure that within breeding programmes, data is captured and archived in such a way that it may have longevity and what role expanded or aligned programmes of data collection at the point of varietal registration and varietal evaluation could have in ensuring the best use of public data to address some of the challenges agriculture faces over the next 100 years, namely the transformation to sustainable, low-emissions and biodiversity-promoting farming, all of which can be addressed in part by breeding and require data-driven solutions.

15:00

Data, Duplication, and the Decentralisation of Crop Collections
Helen Anne Curry (University of Cambridge)

In the 1970s, the number of accessions held in national and international collections of crop germplasm increased steadily. Concerns about rapid 'genetic erosion' arising in the wake of the Green Revolution prompted efforts to forestall such erosion by assembling or augmenting collections of landraces and crop wild relatives. By the 1980s, this growth, initially a source of pride, was increasingly recognized as a liability. Too many accessions lacked the basic information necessary for researchers to make requests of gene bank managers, let alone put samples to work knowledgably in breeding programmes. Many gene banks came under scrutiny for poor management practices, and several prominent banks found themselves accused of mishandling a 'global patrimony' entrusted to them by the international community. In this paper, I explore a response to these failings, real and perceived, that attracted attention from many in the germplasm conservation community: creating linked, standardised databases of collections. Calls for more thorough and consistent data about accessions often emphasised, and still emphasise today, that these data will make collections easier to navigate and therefore more valued and more used. Here I take a close look at the use of data collation and standardisation as a means of 'rationalising' collections, a motivation that has not been advertised as prominently. For some researchers and collection managers, the identification of duplicates would allow the channelling of limited time and money to only the most unique accessions, even creating the possibility of de-accessioning items known to be held elsewhere. As I show, the vision of achieving efficiencies through close collaboration depended not only on overcoming technical hurdles in data management but also on social and political alliances. Efforts to identify and weed out duplicates in the interest of stretching gene bank resources appear to have been pursued most vigorously by communities of researchers whose boundaries were delineated by European Union membership and who were already connected by their expertise in particular crop species.

15:25

Data Management in a Multi-Disciplinary African RTB Crop Breeding Program
Afolabi Agbona (IITA), Prasad Peteti (IITA), Elizabeth Parkes (IITA), Ismail Rabbi (IITA), Lukas A. Mueller (Boyce Thompson Institute), Chiedozie Egesi (IITA) & Peter Kulakow (IITA)

Quality phenotype and genotype data is important for the success of a breeding program. Like most programs, African breeding programs generate large multi-disciplinary phenotypic and genotypic datasets from many locations that must be carefully managed through the use of an appropriate database management system (DBMS) in other to generate reliable and accurate information for decision making. A DBMS is essential for data collection, storage, retrieval, validation, curation and analysis in plant breeding programs to enhance the ultimate goal of increasing genetic gain. The International Institute of Tropical Agriculture (IITA), working on the root, tuber and banana (RTB) crops like cassava (https://cassavabase.org/), yam (https://yambase.org/), banana and plantain (https://musabase.org/) has deployed the use of a FAIR-compliant (Findable, Accessible, Interoperable, Reusable) web-based database; BREEDBASE (https://breedbase.org). The functionalities of these databases in data management and data analysis have been instrumental in achieving breeding goals. Such capabilities include ontology driven data management (https://www.cropontology.org/), statistical analyses, interfaces with Breeding API (BrAPI), barcode-based data collection using the PhenoApps (http://phenoapps.org/). User-friendly PhenoApp examples include Fieldbook for phenotype data collection, Coordinate for genotype tissue sample collection and tracking, and Inventory for weighing samples without the need for data transcription. Standard Operating Procedures (SOP) for each breeding process have been developed to allow a cognitive walkthrough for the users. This has further helped to increase the usage and enhance the acceptability of the system. The wide acceptability gained among breeders in the global RTB research programs have resulted in improvements in precision and quality of genotyping and phenotyping data, and has resulted in improved progress to reach breeding program goals.

15:50

Final discussion and wrap-up

Session 2: Technical Challenges of Data Linkage (March 12, 14:00-16:00 BST)

Making plant data FAIR (Findable, Accessible, Interoperable, Reusable) has been the subject of much effort. Extensive semantic tools are now available, including the multiple, intersecting ontologies that comprise the Planteome project, as are metadata standards such as the Minimum Information About a Plant Phenotyping Experiment (MIAPPE). Such tools nevertheless require collective work to develop and maintain. Beyond ensuring data themselves are FAIR, actively linking and circulating data poses further challenges. These include finding ways to link biologically, experimentally or geographically related yet heterogeneous datasets consistently, and to make data usable in practice to potential users with divergent aims and resources, not only reusable in theory. This session will address the technical challenges of data linkage, including the development of standards and infrastructures; epistemic issues; and the organizational requirements of this work.

14:00

Introduction by organisers

14:05

Linking Legacies: Realising the Potential of Long-Term Agricultural Experiments
Richard Ostler (Rothamsted Research)

Long-Term agricultural Experiments are vital resources for assessing the sustainability of food production and soil health. For researchers to effectively use a long-term experiment it is essential to have access to relevant historical data and necessary metadata. In turn, new datasets generated from investigations using an LTE should be resolvable back to the source LTE as part of that experiment’s continuing narrative. Further value from LTEs can be derived if experiments sharing common characteristics, such as cropping system, treatment, management or environment, can be identified and their datasets integrated.

LTEs can generate very diverse data types, from annually collected yield traits, periodic and ad hoc surveys to continuous sensor data. To be usefully findable, interoperable and re-usable LTE datasets not only need to be described using community accepted semantic and metadata standards but require knowledge both of how they relate to each other, in time, space and scale, and when they do not. Within a single experimental system an LTE can therefore encapsulate key challenges facing plant data linkage and these challenges are only amplified when attempting to link data across LTEs.

This presentation reviews the approach being taken at Rothamsted Research to apply FAIR data principles to its long-term datasets, and how Rothamsted is working with the wider agricultural data and long-term experiments communities to address some of the technical and cultural challenges faced.