Monthly Webinar Schedule

General Information:

AgBioData webinars are generally held on the first Wednesday of each month at 10A PT | 11A MT | 12P CT | 1P ET
Connection details are distributed about a week before the webinar via email to the AgBioData members.
Visit the registration page to join and sign up for the email list.


  • October 4th -  Pascal Neveu, UMR MISTEA, INRAE (France)
    PHIS, an ontology-based Information System for Plant Phenotyping

    The European EMPHASIS infrastructure aims to enable researchers to use facilities, resources and services for high-throughput phenotyping of plants. Within the infrastructure we are leading actions to help scientists better understand plant performance and translate this knowledge into applications. This presentation will look at some examples of data management and implementation of data standards carried out in this context to add value to the phenotyping data. In particular, we will look at PHIS, an ontology-based information system based on the OpenSILEX framework.

Past Meetings


  • September 6th -  Harry Caufield, Lawrence Berkeley National Laboratory
    Staying grounded: assembling structured biological knowledge with help from large language models
    Developing comprehensive knowledge bases and ontologies demands meticulous curation. The emergence of highly flexible, artificial intelligence-driven approaches to natural language processing offers novel ways to expedite this process. Current methods often rely on extensive training data, however, and struggle with complex, nested knowledge structures. In this talk, I will describe a new approach, Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES). This method for information extraction leverages the capability of Large Language Models (LLMs) to perform zero-shot learning (ZSL) with a variety of natural-language prompts. SPIRES operates with predefined data schemas, enabling information extraction that adheres to these structures. It also grounds concepts with well-established ontologies and vocabularies, avoiding the "hallucinations" common to text generated by LLMs. 
    Within OntoGPT, an LLM-querying framework we have developed, SPIRES supports rapid application to summarization and modeling across plant science and biology. Notably, this approach allows customization for new tasks and topics without a need for new training data. We have found that OntoGPT and SPIRES are capable of extracting structured knowledge from large literature collections and constructing knowledge graphs from the resulting relationships. Through harnessing the language comprehension capabilities of LLMs, SPIRES streamlines knowledge acquisition from across agricultural science and beyond.
    | Recording | Slides |
  • August 2nd -  Virtual round table on phenotypic data management issues of the AgBioData member databases
    In Year 2 of our current NSF RCN grant, the AgBioData database community indicated sharing and managing phenotypic data as one of the primary data issues to address. We have surveyed our member databases, and many of them agreed that curating phenotypic data is challenging due to their diversity in terms of data types (e.g., images vs. spreadsheets) and data sources (e.g., breeding program, experimental trials, or literature), as well as the lack of standardization. We are inviting the AgBioData member databases and the larger community to discuss these challenges to understand their importance for AgBioData member databases and if they can be addressed entirely or partially in a new AgBioData working group.
    | Recording | Slides |
  • July 12th - Sarah Lippincott, Dryad
    Companion planting: How generalist and specialist repositories can work together to promote agricultural data sharing and reuse.
    This conversation explores strategies for enhancing open sharing and reuse of agricultural data through collaboration between disciplinary and generalist repositories, specifically Dryad, an open data publishing platform and community. Generalist and specialist repositories bring distinct strengths to data sharing and reuse. Generalist repositories offer a home for a wide range of data types and support serendipitous discovery, while discipline-specific repositories offer granular metadata, specialized tools, and deep understanding of community needs. Agricultural data creators, and future data re-users, can benefit from collaboration between these different solutions, including building connections with complementary datasets stored in multiple repositories; consistency in metadata standards; and federated discovery systems. In this conversation, Dryad’s Head of Community Engagement, Sarah Lippincott, will describe Dryad’s stewardship of agricultural data and engage attendees in an exploration of how Dryad’s can work with the agricultural research community to improve data sharing, discovery, and reuse.
    | Recording | Slides |
  • June 7th - Peter Selby, Cornell University
    Applications and impacts of the BrAPI project on plant breeding

    Modern genomic breeding methods rely heavily on very large amounts of phenotypic and genotypic data, presenting new challenges in effective data management and integration. The datasets are often large and complex, and the data is often stored on multiple systems, sometimes separated by country and organization. As the common analysis methods increasingly require aggregation of datasets from diverse sources, data exchange between disparate systems becomes a challenge. This webinar will be an introduction to The Breeding API (BrAPI) Project. The BrAPI Project began in 2014 when a small group of plant breeding and technology experts came together to try to standardize their data. Since then, BrAPI has become internationally accepted as one of the primary data exchange standards in the plant breeding domain. This webinar will give an overview of what BrAPI is, how it works, what it is capable of, and the impact the project has had so far on the community.
    | Recording | Slides |
  • May 1-2 - 2023 AgBioData community workshop (Chicago, IL)
    | Recording |
  • April 5th - Krystal Tsosie, Arizona State University
    From Green Revolution to “Rescue” Indigeneity: Using Digital Data Tools and Machine Learning Approaches to Protect Indigenous Knowledge and Biodiversity

    Comprising less than 5% of the world's population, Indigenous people protect 80% of global biodiversity. The next genomic ”discoveries” in industry and academia may co-opt Indigenous knowledge or disenfranchise Indigenous peoples, who are often last to benefit and are least protected from intellectual property claims. Ethical and sustainable research necessitates new digital data approaches as grounded in machine learning and Indigenous stewardship models to operationalize CARE data governance principles, direct benefit sharing, and equitable engagement and partnerships with Indigenous communities.
    | Slides |
  • March 1st - Irene Cobo Simón, Institute of Forest Science (ICIFOR-INIA, CSIC; Spain)
    CartograPlant: Cyberinfrastructure to improve plant health and productivity in the context of a changing climate.
    Climate change is threatening plant health and productivity at all spatial scales. To date, it remains largely unknown whether plant breeding and agricultural management practices can keep pace with the rate and direction of environmental change, as well as species’ rate of adaptation to rapid environmental change. In addition, the incidence of invasive pests and pathogens is increasing as a consequence of globalization. This trend is being exacerbated by climate change. Thus, future plant health and productivity will depend on the match between genotypes (and their resulting phenotypes) and new environments. However, these analyses are challenging since they require the integration of diverse data types, usually decentralized and lacking in standardization: genotypic, phenotypic and environmental. Hence, centralized and up-to-date platforms which integrate, visualize and analyze high-throughput biological data are key, especially in the current big data era in plant biology. CartograPlant ( is a web-based application that integrates, visualizes, and analyzes genotypic, phenotypic, environmental data, and their associated metadata, from georeferenced plants. Environmental data is available through advanced integration of global and regional layers. The genotype and phenotype metrics are collected through direct submission of studies at the time of publication or through the biocuration efforts of the affiliated databases and applications (TreeGenes, BIEN, TreeSnap). Data analysis is enabled by accessing the metadata associated with the public studies and providing appropriate workflows through Galaxy ( This metadata collection, using ontologies and standards, allows data integration and analysis coming from different studies, which is key to perform both mega and meta-analysis. Mega-analysis and meta-analysis of GWAS (GxP association) and landscape genomics (GxE association) studies can improve the power to detect association signals by increasing sample size and by examining more variants throughout the genome than each dataset alone. Thus, they allow users to answer unprecedented and ambitious adaptive questions, taking advantage of the potential of high-throughput biological data. This talk will describe the recent updates in data sources, functionalities, and analytic workflows offered by CartograPlant.
    | Recording | Slides |
  • February 1st - Monica Munoz-Torres, University of Colorado School of Medicine
    The Monarch Initiative: harmonizing cross-species data for disease diagnostics and discovery.
    Addressing complex scientific challenges requires weaving together data from diverse sources, organisms, contexts, formats, and granularities, and building a coherent holistic view of this data landscape to address any given problem is non-trivial – much of the relevant information is scattered and not readily accessible for searching or analysis. The Monarch Initiative is a consortium and a set of resources aiming to overcome these limitations by integrating the fragmented data landscape into the most comprehensive open collection of genotype-phenotype data available. Monarch seeks to bridge the space between basic and applied and clinical research, developing tools that facilitate connecting data across a variety of scientific approaches and disciplines including genomics, proteomics, molecular modeling, diagnosis of disease and syndromes, and the organization of patient record data. The Monarch Knowledge Graph (KG) links together clinical, biomedical, and basic science research data spanning multiple species, and it supports reasoning across a wide range of organisms, body systems, and diseases. We founded the Human Phenotype Ontology (HPO), one of the most widely used biomedical ontologies and the gold standard for describing human phenotypes, and are also creators of the Mondo unified disease ontology, the Unified Phenotype Ontology (uPheno), the cross-species anatomy ontology (Uberon), the Environmental Conditions and Treatments Ontology (ECTO), and most recently, the Vertebrate Breed Ontology (VBO), a single source for data standardization and integration of all breed names. We also created the Simple Standard for Sharing Ontology Mappings (SSSOM) to harmonize the ontologies that are used by the sources, and the only ISO-approved standard for exchanging detailed, case-level phenotype data, Phenopackets. Monarch tools and resources are publically available and are designed for both informatics users, as well as clinical and basic research use cases. By making data more interoperable, our widely-used standards for data annotation and exchange help support a wide range of data sharing and reuse by projects and organizations around the world, and reduce the effort they need to devote to data harmonization. During this presentation, we will introduce you to a few of these resources and offer you the information to find and implement the ones that best serve your scientific needs.
    | Recording |


  • December 7th - Sushma Naithani, Oregon State University
    Plant Reactome: Using OMICs data for biocuration of plant genes and pathways
    The major challenge in analyzing and connecting genotype to phenotype data at the organismal level is their integration and visualization for knowledge synthesis, which is required for generating OMICs data-driven predictive models for precision breeding of crops as well as accessing the needs of conservation of biodiversity and long-term sustainability. The Plant Reactome ( is one such platform that allows integration of data from heterogeneous sources (i.e., published literature, transcriptome, proteome and metabolome data, orthology-based projections) for synthesizing in silico modeling of system-level plant pathway networks including metabolic pathways, biological processes associated with plant development and reproduction, and genetic-regulatory mechanisms that mediate plant survival under varied stress conditions. It provides a valuable framework for understanding how a gene, a group of connected genes, or genotypic differences culminate into a phenotype and supports the generation of data-driven hypotheses for understanding the intra-and -inter-species differences for basic and translational research and precision breeding. Here, we emphasize our recent efforts in using Omics data for improving gene/gene family functional annotations and biocuration of gene-gene networks.
    | Recording |

  • November 2nd - Jennifer E. Cross, Colorado State University 
    Science of Team Science: Using Developmental Evaluation to Advance Transdisciplinary Teams and Evolve Science
    For the past 20 years or more, science has been evolving to answer more complex questions, which require more complex teams. While scientists are eager to engage with diverse colleagues, academic institutions are slow to change and present a variety of barriers to advancing science. I will explore how the field “science of team science” has been growing and how team assessment, evaluation, and coaching can help teams become more effective. Case studies of interdisciplinary teams will be shared to illustrate how developmental evaluation and assessment can help accelerate team growth, and overcome institutional and infrastructural challenges and barriers.
    | Recording |

  • October 5th - Chris Mungall, Lawrence Berkeley National Laboratory 
    The Gene Ontology: Making functional annotation of plants and animals FAIR.
    The Gene Ontology is one of the most widely used databases in the biosciences, covering functional annotation of genes and gene products across a wide range of species. The GO is ubiquitously used to analyse a variety of types of high-throughput experimental data. Originally created to unify functional annotation across a handful of model organism databases, the GO has grown to encompass more species, and the structure of the GO has been extended to integrate with other ontologies such as CHEBI and the Plant Ontology. The structure of annotations has also evolved, and the GO now includes more expressive pathway-oriented annotations in the form of GO-CAMs (Causal Activity Models). In this talk I will give a practical guide to the structure of GO, how to find and request terms, how to search and create annotations, and how to use GO tools. I will also talk about how the broader AgBioData can contribute to the GO consortium to help seed functional annotation efforts in a more diverse range of organisms, and in particular with agriculturally relevant species. 
    | Recording |

  • September 7th - Nicholas J. Provart, University of Toronto
    Raising the BAR for Hypothesis Generation in Plant Biology Using Open Big Data.
    We have developed tools, available as part of the Bio-Analytic Resource at, for exploring large data sets from plants, to allow deeper insights into biological questions. My lab’s three visual analytic tools for transcriptomic data (eFP Browser, ePlant, and eFP-Seq Browser) allow for rapid access to comprehensive gene expression compendia we have curated for identifying tissues, cell-types, or perturbations in which a gene is active or alternatively spliced. Interactions, be they protein-protein or regulatory, create networks. We have developed new tools for exploring such data, either from large collections of experimentally-supported protein-protein or protein-DNA interactions or from predicted interactions, including protein-protein interactions inferred from molecular docking studies. We are currently working on integrating large-scale phenotype data from field trials monitored by drone-based sensors into ePlants we have developed for several agronomically-important species to improve understanding of links between genotype and phenotype.
    | Recording |

  • August 3rd - ThankGod Ebenezer, EMBL-EBI
    The African BioGenome Project (AfricaBP): Genomics in the service of African biological diversity.
    Food security and biodiversity conservation represents a substantial issue worldwide and requires local solutions, as highlighted in the UN’s Sustainable Development Goals. I will discuss the progress and process of establishing a pan-African network to address this challenge through genomic science and how this could inform and influence policy across Africa​​​​​.
    | Recording |

  • June 1st - Camille Rustenholz, University of Strasbourg (France)
    COST ACTION INTEGRAPE: Data integration to maximise the power of omics in grapevine improvement and beyond.
    The European network INTEGRAPE seeks the establishment of an open, international, and representative network, insuring that omics and phenotyping data generated in the grapevine research community are being produced in a secure and standardized format, following the F.A.I.R. principles of findability, accessibility, interoperability, and reusability. Amongst the most significant deliverables of INTEGRAPE:

    - the elaboration of Guideline ‘cookbooks’ and Dictionary of unified grape-sample ontologies;

    - the release of the PN40024 fourth genome assembly and its annotation;

    - the creation of the Gene Reference Catalogue;

    - the enlisting of Online repositories and tools for omics data exploration and visualization, which to date are not yet interoperable among them.

    To tackle this last challenge, we applied for a COST Innovative Grant with the GRAPEDIA project (Grapevine Encyclopedia of genes and omics), which goal is to provide the community with a single open-access database, allowing data exploration and visualization of all grapevine resources, with tools for comparative analysis and customized services. In the GRAPEDIA database, we aim at centralizing, interconnecting, and showcasing these dispersed resources, and integrating them with those genomic efforts generated by the worldwide community. The target group is the entire scientific community working on the grapevine or using grapevine as their model plant for an “orphan” plant species, and also the private sector working on R&D in vitiviniculture.
    | Recording |

  • May 4th - Karen Yook and Daniela Raciti, microPublication Biology 
    Bridging the gap between data production and database curation through microPublications
    To solve a long-standing problem in data loss and accessibility, we developed a publishing platform, microPublication Biology, to bridge data publishing and database curation. Our journal accepts single experiment articles (microPublications) and embeds curation within the article submission/publishing workflow. microPublication Biology is an online, peer-reviewed, open-access journal published by the Caltech Library and discoverable in PubMed. Starting with articles focused on nematode biology, we continually expand to more organism communities, including Arabidopsis and most recently Dictyostelium, Maize, and Cotton. Our system is set up so that upon publication, atomized data is delivered directly to authoritative databases for each community (e.g., WormBase, Flybase, PomBase, TAIR), ensuring timely delivery to biological databases for deep data integration. We will give an overview of our journal and its integrated curation workflow and present our latest publishing metrics.
    | Recording |
  • March 15-16-17  -   2022 AgBioData Community Workshop. 
    Facilitating crosstalk and network building across Working Groups
    Our three-day, all-hands, online workshop will provide a forum for the working group to pose questions to and gather feedback from the AgBioData community. Each day will have a two-hour session (7-9 AM Pacific Time), with short presentations of selected working groups at the beginning, followed by breakout sessions, where WG and non-WG members can meet and discuss relevant topics, and a brief reporting period at the end. Your participation can contribute to move forward FAIR data sharing and management! 
    | Recording |
  • February 2nd  -   Baron Koylass and Timothee Cezard, EMBL-EBI
    The European Variation Archive: Genetic variation archiving and accessioning
    The European Variation Archive (EVA) is a primary open repository for archiving, accessioning, and distributing genetic variation, including single nucleotide variants, short insertions and deletions (indels), and larger structural variants (SVs) in any species. Created in 2014 to provide FAIR access to genetic variation data, it has since grown to be a primary resource for genomic variants hosting >3 billion records and now maintains and provides the permanent variant locus identifiers (rs IDs) for all non-human species.
    | Recording | Slides |


  • December 1st  -  Meet the new AgBioData Working Groups!
    The purpose of this meeting will be to quickly introduce each of the Working Groups and their initial plans. This will be an opportunity to learn what each working group is planning to focus on, followed by a short discussion. AgBioData members who have not signed up for a working group, or who wish to join an additional group, formally or informally, will have an opportunity to contact working group chairs. 
    | Recording
  • November 10th  -  Silvie Fexova, Plant Expression Atlas
    Expression Atlas and Single Cell Expression Atlas – home of cross-species gene expression data
    From submission to data visualisation – Our team at EBI maintains and develops a number of resources aimed to support (FAIR)sharing, re-use, integration and visualisation of functional genomics data from a broad range of species including many agricultural species (both plants and animals). In this webinar I will briefly introduce our archival services and tools as well as our two knowledgebases, the Expression Atlas and Single Cell Expression Atlas, that host thousands of publicly available transcriptomics experiments across species and biological conditions – re-analysed and visualised in a user-friendly interface for the scientific community to use and explore.
    | Recording | Slides |
  • October 6th  -  Allyson Lister, FAIRsharing
    FAIRsharing: promoting the discovery of data standards, policies and databases across all research domains
    Abstract text: FAIRsharing is an informative and educational resource on interlinked standards, repositories and policies, three key elements of the FAIR ecosystem. FAIRsharing promotes the existence and value of these standards, repositories and policies, fostering a culture change within the research community into one where the use of these resources for FAIRer data is pervasive and seamless. This is achieved by guiding consumers to discover, select and use these resources with confidence, and helping producers to make their resources more visible, more widely adopted and cited. This presentation will highlight key collaborative, successful activities as well as next steps within FAIRsharing. It will also provide information on how to become a recommended repository in FAIRsharing and how to use FAIRsharing to engage with your stakeholders as well as with journal publishers and their data policies.
    | Recording | Slides |
  • September 1st  -  AgBioData RCN grant - Lisa Harper & Eva Huala

    Help us chart the future of agricultural data!
    Do you want easy access to better quality data? 
    We are THRILLED to announce that AgBioData ( has received a three-year NSF RCN award to expand our community committed to improving quality and access to agricultural data. New activities will include organizing workshops, establishing new working groups, and developing FAIR curriculum for scientists.  We are expanding the consortium and welcome new members, especially students, post-docs, big-data scientists, funding agency scientists and members of the scientific publishing community interested in solving common FAIR data issues.

    | Recording | Slides |

    • RCN objectives,

    • benefits of joining AgBioData,

    • and how YOU can make a difference in the biological data environment for years to come.

  • August 4th  -  Noah Fahlgren, Malia Gehan both from the Donald Danforth Plant Science Center, Discuss interactions between the phenomics and database communities
    High-throughput phenotyping has emerged as a promising area in plant, animal, and agricultural sciences that brings together researchers from life sciences, engineering, computer science, data science, mathematics, and other research fields to develop technologies for rapidly and accurately measuring phenotypes using robotics, imaging, and other tools. High-throughput phenotyping can be done at different scales, from cellular to ecological, typically using image-based approaches for data collection and analysis. The development of computer vision and machine learning approaches to extract biologically meaningful measurements from images, including physical, physiological, morphological, and qualitative properties of crops and livestock, is a major activity within the field. Phenotype datasets can be used for a variety of purposes, but in conjunction with large genomic datasets, are a powerful tool for linking phenotype to genotype, training genomic prediction models, and other approaches that integrate genetic, phenotypic, and environmental datasets.  We will introduce our efforts to develop PlantCV (, an open-source platform for image-based plant phenotyping, and discuss opportunities for collaboration between the phenomics and database communities. 
    | Recording | Slides |
  • June 2nd  -  Lisa Harper - Dealing with gene models from 50 different reference genomes. A progress report from MaizeGDB.   MaizeGDB now hosts over 50 reference-quality genome assemblies and their associated gene model sets and metadata. We have started to use a "Pan-Gene" concept to group syntelogs. We define a pan-gene as the set of gene models from multiple genomes that appear to represent the same gene. After I show you how we are implementing this at MaizeGDB, let's have a discussion about how other databases are dealing with this gene model explosion.
    | Recording | Slides |
  • May 5th  -  Monica Poelchau, Recommendations from the AgBioData GFF3 working group  Over a year ago, AgBioData convened a discussion on GFF3 formatting issues, led by Scott Cain. This discussion led us to form the AgBioData GFF3 working group. Our goals are to 1) identify common problems with the GFF3 format; 2) recommend solutions for these problems; and 3) promote community adoption of these recommendations, so that data can be formatted in standard ways across databases. Members of AgBioData, Alliance of Genome Resources, and NCBI have been working on these goals for the past year. We are now ready to receive feedback from the AgBioData community on our recommendations, in order to get traction on the final goal – community adoption of these solutions.
    Recording |  Slides |
  • April 7th:  Guest speaker Peifen Zhang.  PhyloGenes ( presents precomputed phylogenetic trees of plant gene families along with known functions for individual family members. By displaying experimentally validated gene functions associated to individual genes within a tree, PhyloGenes enables functional inference for genes of uncharacterized function, based on their evolutionary relationships to experimentally studied genes, in a visually traceable manner. For the many families containing genes that have evolved to perform different functions, PhyloGenes also facilitates the study of function evolution. The current PhyloGenes release (version 2.2) includes 40 plant genomes covering a broad taxonomic range and including all major crops, along with 10 non-plant model organisms represented in over 8,000 gene families. Over two-thirds of the families have at least one member with a validated known function as GO terms. To increase the predictive power of PhyloGenes, future work will involve community contribution and emphasize on incorporating new functional annotations of family members across families and subfamilies, and also adding complementary functional datasets such as gene expression and mutant phenotype. 
    Recording | Slides |
  • March 3rd - Jack Gardiner and Lisa Harper will lead a discussion on: Metabolomics- What is it and how might it lend insight into our understanding of complex biological traits. 
    Recording | Slides |
  • February 3rd -  Chuck Cook, The Global Biodata Coalition (GBC) is working with funders to encourage more efficient collaboration in funding data resources and to sustain funding for critical data resources. More info at the GBC website, including pdfs of past talks:
    Contact Chuck via email 
    | RecordingSlides |
  • January 13th - Imma Subirats & Kristin Kolshus, AGROVOC Ontologies.   During this webinar, the AGROVOC Team from the Food and Agriculture Organization (FAO) of the United Nations will introduce how AGROVOC is kept up to date with a number of institutions and individual domain experts serving as focal points for specific languages and topics. 
    | RecordingSlides |


  • December 2: Dr. Anne Brown (PostDoc USDA-ARS) and Andrew Wilkey (ORISE Fellow) will talk about the Genotype Comparison Visualization Tool (GCViT). | Recording |
  • October 7 Guest Speaker: WheatIS | Recording |
  • September 2nd: Guest Speaker: Dr. Sierra Moxon talk about data models and exchange protocols. | Recording | Slides |
  • August 5th: Group discussion: "What has AgBioData done for you & your database?" | Notes |
  • June 3rd: Guest Speaker: Dr. Julie Dunning Hotopp talk about secondary data usage | Recording | Slides |
  • May 6th: Group discussion: Pan-genomes | Recording | Slides |
  • April 8th: Guest speaker: Medha Devare from CGIAR, will be talking about the Gardian platform | Recording |
  • March 4th: Group discussion: GFF format nightmares | Recording | Slides | Notes |
  • February 5th: Moira Sheehan will talk about the Breeding Insights Platform | RecordingSlides |
  • January 13th: 10am - 12pm PST - In Person Meeting at PAGXXVIII