NIFA FACT Coordinated Innovation Network LOI

Project Director
Name: Dorrie Main
Title: Professor of Bioinformatics
Dept: Horticulture
Institution: Washington State University
Email: dorrie@wsu.edu

Co-Project Directors

Name: Jacqueline Campbell
Title: Research Analyst II – Data curator
Department: Computer Science
Institution: Iowa State University

Name: Ethalinda Cannon
Title: Associate Scientist
Department: Computer Science
Institution: Iowa State University

Name: Lisa Harper
Title: Plant Geneticist
Department: Corn Insects and Crop Genetics Research Unit
Institution: USDA-ARS

Name: Eva Huala
Title: Executive Director
Institution: Phoenix Bioinformatics

Name: Sook Jung
Title: Assistant Research Professor of Bioinformatics
Department: Horticulture
Institution: Washington State University

Name: Monica Poelchau
Title: Geneticist
Department: Knowledge Services Division, National Agricultural Library
Institution: USDA-ARS

Program Area or Program Area Priority
AFRI Foundational and Applied Science FACT program area priority (A1541) - Coordinated Innovation Networks

Title:  AgBioData: A Coordinated, Collaborative and Innovative Network of Genomic, Genetic and Breeding Databases for Enhanced Agricultural Research Outcomes

Rationale:  Advances in agricultural science are increasingly being led by data-driven discovery. The value of data significantly increases when properly stored, described, integrated and shared for utilization in future analyses. Initiated in 2015, AgBioData (https://www.agbiodata.org) is a consortium of more than 100 scientists representing all the major US agricultural genomic, genetic and breeding databases and allied resources. Consortium members are working together to identify common issues in database development, curation and management, with the goal of providing database products that are more Findable, Accessible, Interoperable and Reusable (FAIR). Implementing AgBioData recommendations – a white paper is accepted for publication in Database (Oxford) – requires resources for communication, coordination, collaboration and database sustainability assessment as well as data-literacy training for scientists.  Access to a NIFA FACT Coordinated Innovative Network will facilitate these value-added efforts.

Overall Hypothesis or Goal:  Working to ensure standards and best practices for acquisition, curation, visualization and retrieval of genomic, genetic and breeding data through enhanced collaboration among AgBioData consortium databases will facilitate agricultural research.

Specific Objectives:
Obj. 1: Develop and implement standards for AgBioData data curation
Obj. 2: Establish common practices for broad use of ontologies, specifically GO, PO, TO and PATO within AgBioData member curation efforts, and provide tools and training for researchers
Obj. 3: Establish metadata standards across AgBioData members and promote compliance
Obj. 4: Identify opportunities for a federated model of data exchange for AgBioData
Obj. 5: Identify funding options for long term database sustainability
Obj. 6: Work with funding agencies and journals to enhance data provision by researchers 

Approach: We propose a three-fold approach:  First, hold regular meetings with AgBioData members through an annual in-person workshop and monthly online meetings.  Second, utilize the >100 members of AgBioData to create specific Task Groups to accomplish specific objectives 1- 4, and (a) Develop online data management modules for use in training scientists at the undergraduate, graduate and postdoctoral level (b) Conduct cross-training webinars and in person meetings of AgBioData members (c) Conduct a pilot database sustainability study, and (d) Collaborate with federal funding agencies and journals on enforcing FAIR principles for data management by researchers.  Third, maintain the AgBioData website and hire a part-time AgBioData Coordinator to help organize and manage consortium activities.

Potential Impact and Expected Outcomes:
This project brings together multi-disciplinary scientists from all the major US agricultural genomic, genetic and breeding (GGB) databases, and allied resources, to accelerate further the synergistic efforts of the AgBioData Consortium. By adopting best practices for data management, access and retrieval of GGB data across databases; providing researchers with data-literacy training; and exploring options for database sustainability, this project will help maximize use of data generated, shared and reused, facilitating enhanced scientific outcomes. AgBioData is a model for how databases can work together to be more resource-efficient and how, as central resources for the communities they serve, use a collective voice to lead efforts for better data management and database resource availability.