General Information:
AgBioData webinars are generally held on the first Wednesday of each month at 10A PT | 11A MT | 12P CT | 1P ET.
Connection details are distributed about a week before the webinar via email to the AgBioData members.
Visit the registration page to join and sign up for the email list.
NEXT WEBINAR:
2024
- November 6 - Virtual round-table on new AgBioData working groups on machine learning and natural language processing (NLP) for biocuration
PAST WEBINARs:
2024
- October 2 - Dr. Montana Smith (Pacific Northwest National Lab)
NMDC: Advancing microbiome science through FAIR and standardized metadata and data
The National Microbiome Data Collaborative (NMDC)’s mission is to support a FAIR microbiome data-sharing network through infrastructure, data standards, and community building that addresses pressing challenges in environmental sciences. In this webinar, we will dive into what the NMDC is and how standardized metadata capture enables FAIR data. We will walk through the 4 NMDC products and how they’re lowering barriers for experimental scientists to conduct their research in a way that ensures data re-use.
| Recording | - September 4 - Dr. David Emms (InstaDeep)
AgroNT: A Foundational Large Language Model for Plant Genomics
Foundational large language models can be pre-trained on large unlabelled datasets and subsequently fine-tuned to a wide range of specific tasks. We’ll present AgroNT (Agro Nucleotide Transformer), a foundational DNA large language model pre-trained on reference genomes from 48 plant species with a predominant focus on crops. We have shown that AgroNT can be fine-tuned to obtain state-of-the-art predictions of many genomic elements, including polyadenylation sites, splice sites, open chromatin and enhancer regions. Furthermore, AgroNT can be fine-tuned to e.g. predict tissue-specific gene expression levels or to prioritize functional variants.
Building on our Nucleotide Transformer, the novel SegmentNT model is able to make nucleotide resolution predictions, well suited to tasks such as de novo genome annotation of previously unseen species. Both our AgroNT and SegmentNT models are open-sourced for academic research and non-commercial uses on our GitHub repository https://github.com/instadeepai/nucleotide-transformer and HuggingFace space https://huggingface.co/InstaDeepAI.
| Recording | - August 7 - Seth Murray (Texas A&M University, TAMU)
Capturing Nature AND Nurture with Temporal Field Phenomics to Breed Better Crops
An organism’s phenome results from genotype (nature), environment and management effects (nurture) and their interactions, as well as measurement error. For over 30 years, DNA sequencing and genomics tools have advanced genotyping to where genomes can now be routinely saturated with measurements. In contrast, most focus in high throughput phenotyping and phenomics to date has been on automating previously known “traits” as measurable and interpretable phenotypes; akin to focusing on measuring a single DNA marker rather than measuring a saturated genome. Tools such as unoccupied aerial systems (UAS, aka UAVs, drones) collecting temporal phenomic measurements in the field now allow novel methods in plant breeding and new insights into plant biology. Viewing phenomics as a platform for discovery, similar to genomics, opens new methods for capturing phenomena in nature and nurture. To date, our experience with phenomic prediction from UAS in maize breeding for cumulative, complex phenotypes such as grain yield suggests it’s possible to predict organismal performance in untested environments; in fact possibly better than gold-standard genomic methods. Surprising insights into biology have also been made in through these activities predicting plant disease and resistance, evaluating genotypic resilience to stress, and identifying early season growth periods for crop improvement that have not been able to be selected. Method development and data analytics in phenomics are large investments, but worth making. Successfully measuring the phenome will impact every aspect of science and society, in biological disciplines from germplasm curators, physiologists to breeders, to education, the courtroom and policy.
| Recording | Slide | - June 5 - Ethy Cannon (USDA-ARS)
Pan-genomic resources at MaizeGDB
Pan-genomes, encompassing the entirety of genetic sequences found in a collection of genome assemblies within a clade, can be more useful than single reference genomes. This is especially true for Zea mays, which has a particularly diverse and complex genome. Presenting full pan-genome data is challenging, especially for a diverse species, but valuable when pan-genomic data can be linked to extensive gene model and gene data, including classical gene information, markers, insertions, expression and proteomic data, and protein structures as is the case at MaizeGDB. I will present the pan-gene analysis pipeline pipeline Pandagma, and MaizeGDB’s pan-gene data center, which offers a variety of browsing and visualizations, including sequence alignment visualization, gene trees and more, which enables exploration of pan-genes in Zea .
| Recording | - April 29-30, May 2 - 2024 AgBioData virtual community workshop (agenda available here)
This three-day meeting will feature presentations from ending and current AgBioData working groups (WGs) about their accomplishments and recommendations, breakout room discussions on data-related issues, and updates on the consortium's future.
| Recording | - April 3 - Zachary Miller (Cornell University)
Introducing The Practical Haplotype Graph Version 2: A Streamlined and Simple Pangenome System
The Practical Haplotype Graph (PHG) is a powerful tool for representing diverse plant pangenomes and imputing new sample genotypes for breeding programs and other purposes. Low-coverage sequencing data from various technologies (DaRT, GBS, etc.) is sufficient to identify paths through the graph, which can be stored efficiently within the PHG database or used to call variants and create custom genomes for alignment. PHGv2 refines and streamlines the original PHGv1 platform.
| Recording | Slides | - March 6 - Cyril Pommier (INRAE-URGI)
FAIR Plant Phenomics Data Management Tools and Guidelines from ELIXIR and Emphasis European Infrastructures
Plant phenomics data has been greatly facilitated those past ten years at several levels: data standards to organize and describe data, databases for the management of the experiments, data repositories to ensure long term accessibility supplemented by data portals to maximise findability and finally guidelines to ease their usage. We will review the recent advances from joint initiatives involving two European infrastructures: ELIXIR (Life science data) and EMPHASIS (Plant phenomics). First, we will update the current status of MIAPPE (www.miappe.org), a data standard interoperable with the Breeding API that enables not only phenoytping experiment formalisation but also their linking with genotyping. We will also give an overview of its usage in generic data repositories such as Dataverse or Zenodo and their relation with experimental database such as PHIS. Finding the right documentation to use those tools and standard is not always straightforward. The RDMKit (https://rdmkit.elixir-europe.org/) is a guidelines portal that has been build to help researchers finding the information subset they need. Through dedicated section, such as the plant domain page (https://rdmkit.elixir-europe.org/), it shows the complementarity between standard and tools and provide the guidances needed for data management. Finally, we will also update the status of FAIDARE (https://urgi.versailles.inrae.fr/faidare/), a global data portal that indexes 30 databases using either BrAPI or a generic minimal format.
| Recording | Slides | - February 7th - This webinar will feature two presentations from:
Paul D. Thomas (University of Southern California and Gene Ontology Consortium)
Accurate annotation of protein sequences at large scale, using evolutionary modeling
Inferring (aka “annotating” or “predicting”) the functions of the vast numbers of known protein sequences has been a longstanding challenge in genomics. Over the last decade, a comprehensive system has been developed for addressing this challenge based on constructing and applying models of function evolution in protein families. The main components of the system– including PANTHER phylogenetic trees, Gene Ontology phylogenetic annotations and TreeGrafter software (now implemented in InterProScan)– work together in an integrated software and data suite that is now beginning to be broadly used to annotate the functions of protein-coding genes. I will describe each of these components, as well as how the tool can be easily used to annotate any set of protein-coding genes and how users can give feedback to help improve the annotations.Alex Ignatchenko (EMBL-EBI):
Gene Ontology (GO) Annotation (GOA) project at EMBL-EBI aims to provide high-quality GO annotations to proteins in the UniProt Knowledgebase (UniProtKB), RNA molecules from RNACentral and protein complexes from the Complex Portal. Currently, the GOA database hosts 5 million manually curated GO annotations from over 70 research groups. This set is used as a foundation for 15 automatic GO annotation pipelines. The output data re-generated ever 2 month and commonly referred to as Inferred from Electronic Annotation (IEA). The IEA pipelines use range of statistical, rule-based and machine learning algorithms to enrich existing GO annotation coverage. The generated IEA set of over 1.1 billion GO annotations is subject to over 130 checks, constraints and filters to ensure the quality of predicted GO annotations. The GOA data is publicly available from GOA ftp and the GO annotation browser QuickGO. The GOA team is constantly looking for ways to improve the quality of GO annotations and gene product coverage.
The TreeGrafter is a method of prediction of GO annotations based on PANTHER family/subfamily and the InterPro signatures. The project is a collaboration between PANTHER and the InterPro team at EMBL-EBI. The algorithm was published in 2019, and it was incorporated into the InterPro in the second half of 2023. The TreeGrafter mappings were processed and added to the GOA database for testing shortly after. This implementation resulted in about 301 million GO annotations after the GOA pipeline checks and filters. More importantly, the final set has over 200 million GO annotations, which is not predicted by any other IEA methods. The GOA team plans to intergrade TreeGrafter GO annotation pipeline into the GOA database and release it to public in a first half of 2024.
| Recording |
2023
- December 6th - This webinar will feature two presentations from:
Benjamin Cole (Joint Genome Institute) on "Data management considerations for plant single-cell genomics."
While plants have arrived on the single-cell scene relatively late, the number and complexity of plant single-cell datasets have exploded over the past four years. With that massive increase in data has come a pressing need to ensure accurate documentation of the experimental provenance of plant single-cell datasets, not only for reproducibility but also for reusability in meta-analyses. During this presentation, I will discuss the current state of plant single-cell research as well as the most common practices for data storage. I will also argue for the need for better standards in the field, and what that could potentially enable.Christopher Tuggle (Iowa State University) & Muskan Kapoor (Iowa State University) on "Single-Cell genomics data incorporation into agricultural G2P research by building a FAIR data ecosystem."
We will describe a pilot-scale project to determine if our current metadata standards for livestock and crops can be used to ingest scRNAseq datasets in a manner consistent with HCA DCP standards and if established resources (e.g., Terra) can be used to analyze the ingested data. Currently, the most comprehensive data ingestion portal for high throughput sequencing datasets from plants, fungi, protists, and animals/humans is Annotare (located at EMBL-European Bioinformatics Institute). For agricultural animal datasets, another EMBL-EBI portal, the FAANG portal, has been developed. scRNAseq data/metadata can be submitted to FAANG using a semi-automated process. We have extended this tool for scRNAseq data so that files can be validated using the HCA DCP metadata and data validation service. These files are incorporated using EMBL-EBI’s HCA DCP ingestion service and transferred to Terra for further analysis. We will also describe a Shiny-based web application, implemented in R and called Shiny-PIGGI, for the single cell-level transcriptomic study of pig immune tissues and peripheral blood mononuclear cells, which will be an important resource for improved annotation of porcine immune genes and cell types (https://shinypiggi.ansci.iastate.edu). We intend to further build upon these existing tools to construct a scientist-friendly data resource and analytical ecosystem to facilitate single cell-level genomic analysis through data ingestion, storage, retrieval, re-use, visualization, and comparative annotation across agricultural species.
| Recording | - November 1st - Ben Rosen, USDA-ARS
The Ruminant T2T Consortium
The first draft of the human genome assembly was released over twenty years ago. However, a gapless telomere-to-telomere (T2T) “complete” assembly remained elusive until last year. The highly repetitive nature of pericentromeric, subtelomeric, and duplicated gene families, such as rRNA arrays, made them impossible to assemble. It was only with advances in long-read sequencing technologies and new bioinformatic tools that these structures were resolved. Recently, we proposed the application of these new resources, tools, and knowledge in support of a “Ruminant T2T Consortium.” Our goal is to generate complete genomes for the ruminant evolutionary lineage. The ruminant Suborder comprises six Families and 66 living genera. These species are found in geographically dispersed areas and have adapted to a wide variety of environments. They have also been subjected to both natural and artificial selection. Our hypothesis is that T2T assemblies of ruminant species with relatedness varying from those capable of interbreeding to higher evolutionary distances (up to the estimated 25 million years ago last common ancestor) will inform our understanding of the underpinnings of ruminant evolution. It will also shed light on the genomic consequences of domestication and enhance our knowledge of the functional roles of heterochromatin and other repeat regions of the genome.
| Recording | Slides | - October 4th - Pascal Neveu, UMR MISTEA, INRAE (France)
PHIS, an ontology-based Information System for Plant Phenotyping
The European EMPHASIS infrastructure aims to enable researchers to use facilities, resources and services for high-throughput phenotyping of plants. Within the infrastructure we are leading actions to help scientists better understand plant performance and translate this knowledge into applications. This presentation will look at some examples of data management and implementation of data standards carried out in this context to add value to the phenotyping data. In particular, we will look at PHIS, an ontology-based information system based on the OpenSILEX framework.
| Recording | Slides | - September 6th - Harry Caufield, Lawrence Berkeley National Laboratory
Staying grounded: assembling structured biological knowledge with help from large language models
Developing comprehensive knowledge bases and ontologies demands meticulous curation. The emergence of highly flexible, artificial intelligence-driven approaches to natural language processing offers novel ways to expedite this process. Current methods often rely on extensive training data, however, and struggle with complex, nested knowledge structures. In this talk, I will describe a new approach, Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES). This method for information extraction leverages the capability of Large Language Models (LLMs) to perform zero-shot learning (ZSL) with a variety of natural-language prompts. SPIRES operates with predefined data schemas, enabling information extraction that adheres to these structures. It also grounds concepts with well-established ontologies and vocabularies, avoiding the "hallucinations" common to text generated by LLMs.
Within OntoGPT, an LLM-querying framework we have developed, SPIRES supports rapid application to summarization and modeling across plant science and biology. Notably, this approach allows customization for new tasks and topics without a need for new training data. We have found that OntoGPT and SPIRES are capable of extracting structured knowledge from large literature collections and constructing knowledge graphs from the resulting relationships. Through harnessing the language comprehension capabilities of LLMs, SPIRES streamlines knowledge acquisition from across agricultural science and beyond.
| Recording | Slides | - August 2nd - Virtual round table on phenotypic data management issues of the AgBioData member databases
In Year 2 of our current NSF RCN grant, the AgBioData database community indicated sharing and managing phenotypic data as one of the primary data issues to address. We have surveyed our member databases, and many of them agreed that curating phenotypic data is challenging due to their diversity in terms of data types (e.g., images vs. spreadsheets) and data sources (e.g., breeding program, experimental trials, or literature), as well as the lack of standardization. We are inviting the AgBioData member databases and the larger community to discuss these challenges to understand their importance for AgBioData member databases and if they can be addressed entirely or partially in a new AgBioData working group.
| Recording | Slides | - July 12th - Sarah Lippincott, Dryad
Companion planting: How generalist and specialist repositories can work together to promote agricultural data sharing and reuse.
This conversation explores strategies for enhancing open sharing and reuse of agricultural data through collaboration between disciplinary and generalist repositories, specifically Dryad, an open data publishing platform and community. Generalist and specialist repositories bring distinct strengths to data sharing and reuse. Generalist repositories offer a home for a wide range of data types and support serendipitous discovery, while discipline-specific repositories offer granular metadata, specialized tools, and deep understanding of community needs. Agricultural data creators, and future data re-users, can benefit from collaboration between these different solutions, including building connections with complementary datasets stored in multiple repositories; consistency in metadata standards; and federated discovery systems. In this conversation, Dryad’s Head of Community Engagement, Sarah Lippincott, will describe Dryad’s stewardship of agricultural data and engage attendees in an exploration of how Dryad’s can work with the agricultural research community to improve data sharing, discovery, and reuse.
| Recording | Slides | - June 7th - Peter Selby, Cornell University
Applications and impacts of the BrAPI project on plant breeding
Modern genomic breeding methods rely heavily on very large amounts of phenotypic and genotypic data, presenting new challenges in effective data management and integration. The datasets are often large and complex, and the data is often stored on multiple systems, sometimes separated by country and organization. As the common analysis methods increasingly require aggregation of datasets from diverse sources, data exchange between disparate systems becomes a challenge. This webinar will be an introduction to The Breeding API (BrAPI) Project. The BrAPI Project began in 2014 when a small group of plant breeding and technology experts came together to try to standardize their data. Since then, BrAPI has become internationally accepted as one of the primary data exchange standards in the plant breeding domain. This webinar will give an overview of what BrAPI is, how it works, what it is capable of, and the impact the project has had so far on the community.
| Recording | Slides | - May 1-2 - 2023 AgBioData community workshop (Chicago, IL)
| Recording | - April 5th - Krystal Tsosie, Arizona State University
From Green Revolution to “Rescue” Indigeneity: Using Digital Data Tools and Machine Learning Approaches to Protect Indigenous Knowledge and Biodiversity
Comprising less than 5% of the world's population, Indigenous people protect 80% of global biodiversity. The next genomic ”discoveries” in industry and academia may co-opt Indigenous knowledge or disenfranchise Indigenous peoples, who are often last to benefit and are least protected from intellectual property claims. Ethical and sustainable research necessitates new digital data approaches as grounded in machine learning and Indigenous stewardship models to operationalize CARE data governance principles, direct benefit sharing, and equitable engagement and partnerships with Indigenous communities.
| Slides | - March 1st - Irene Cobo Simón, Institute of Forest Science (ICIFOR-INIA, CSIC; Spain)
CartograPlant: Cyberinfrastructure to improve plant health and productivity in the context of a changing climate.
Climate change is threatening plant health and productivity at all spatial scales. To date, it remains largely unknown whether plant breeding and agricultural management practices can keep pace with the rate and direction of environmental change, as well as species’ rate of adaptation to rapid environmental change. In addition, the incidence of invasive pests and pathogens is increasing as a consequence of globalization. This trend is being exacerbated by climate change. Thus, future plant health and productivity will depend on the match between genotypes (and their resulting phenotypes) and new environments. However, these analyses are challenging since they require the integration of diverse data types, usually decentralized and lacking in standardization: genotypic, phenotypic and environmental. Hence, centralized and up-to-date platforms which integrate, visualize and analyze high-throughput biological data are key, especially in the current big data era in plant biology. CartograPlant (https://cartograplant.org/) is a web-based application that integrates, visualizes, and analyzes genotypic, phenotypic, environmental data, and their associated metadata, from georeferenced plants. Environmental data is available through advanced integration of global and regional layers. The genotype and phenotype metrics are collected through direct submission of studies at the time of publication or through the biocuration efforts of the affiliated databases and applications (TreeGenes, BIEN, TreeSnap). Data analysis is enabled by accessing the metadata associated with the public studies and providing appropriate workflows through Galaxy (https://galaxyproject.org/). This metadata collection, using ontologies and standards, allows data integration and analysis coming from different studies, which is key to perform both mega and meta-analysis. Mega-analysis and meta-analysis of GWAS (GxP association) and landscape genomics (GxE association) studies can improve the power to detect association signals by increasing sample size and by examining more variants throughout the genome than each dataset alone. Thus, they allow users to answer unprecedented and ambitious adaptive questions, taking advantage of the potential of high-throughput biological data. This talk will describe the recent updates in data sources, functionalities, and analytic workflows offered by CartograPlant.
| Recording | Slides |
- February 1st - Monica Munoz-Torres, University of Colorado School of Medicine
The Monarch Initiative: harmonizing cross-species data for disease diagnostics and discovery.
Addressing complex scientific challenges requires weaving together data from diverse sources, organisms, contexts, formats, and granularities, and building a coherent holistic view of this data landscape to address any given problem is non-trivial – much of the relevant information is scattered and not readily accessible for searching or analysis. The Monarch Initiative is a consortium and a set of resources aiming to overcome these limitations by integrating the fragmented data landscape into the most comprehensive open collection of genotype-phenotype data available. Monarch seeks to bridge the space between basic and applied and clinical research, developing tools that facilitate connecting data across a variety of scientific approaches and disciplines including genomics, proteomics, molecular modeling, diagnosis of disease and syndromes, and the organization of patient record data. The Monarch Knowledge Graph (KG) links together clinical, biomedical, and basic science research data spanning multiple species, and it supports reasoning across a wide range of organisms, body systems, and diseases. We founded the Human Phenotype Ontology (HPO), one of the most widely used biomedical ontologies and the gold standard for describing human phenotypes, and are also creators of the Mondo unified disease ontology, the Unified Phenotype Ontology (uPheno), the cross-species anatomy ontology (Uberon), the Environmental Conditions and Treatments Ontology (ECTO), and most recently, the Vertebrate Breed Ontology (VBO), a single source for data standardization and integration of all breed names. We also created the Simple Standard for Sharing Ontology Mappings (SSSOM) to harmonize the ontologies that are used by the sources, and the only ISO-approved standard for exchanging detailed, case-level phenotype data, Phenopackets. Monarch tools and resources are publically available and are designed for both informatics users, as well as clinical and basic research use cases. By making data more interoperable, our widely-used standards for data annotation and exchange help support a wide range of data sharing and reuse by projects and organizations around the world, and reduce the effort they need to devote to data harmonization. During this presentation, we will introduce you to a few of these resources and offer you the information to find and implement the ones that best serve your scientific needs.
| Recording |
2022
-
December 7th - Sushma Naithani, Oregon State University
Plant Reactome: Using OMICs data for biocuration of plant genes and pathways
The major challenge in analyzing and connecting genotype to phenotype data at the organismal level is their integration and visualization for knowledge synthesis, which is required for generating OMICs data-driven predictive models for precision breeding of crops as well as accessing the needs of conservation of biodiversity and long-term sustainability. The Plant Reactome (https://plantreactome.gramene.org) is one such platform that allows integration of data from heterogeneous sources (i.e., published literature, transcriptome, proteome and metabolome data, orthology-based projections) for synthesizing in silico modeling of system-level plant pathway networks including metabolic pathways, biological processes associated with plant development and reproduction, and genetic-regulatory mechanisms that mediate plant survival under varied stress conditions. It provides a valuable framework for understanding how a gene, a group of connected genes, or genotypic differences culminate into a phenotype and supports the generation of data-driven hypotheses for understanding the intra-and -inter-species differences for basic and translational research and precision breeding. Here, we emphasize our recent efforts in using Omics data for improving gene/gene family functional annotations and biocuration of gene-gene networks.
| Recording | -
November 2nd - Jennifer E. Cross, Colorado State University
Science of Team Science: Using Developmental Evaluation to Advance Transdisciplinary Teams and Evolve Science
For the past 20 years or more, science has been evolving to answer more complex questions, which require more complex teams. While scientists are eager to engage with diverse colleagues, academic institutions are slow to change and present a variety of barriers to advancing science. I will explore how the field “science of team science” has been growing and how team assessment, evaluation, and coaching can help teams become more effective. Case studies of interdisciplinary teams will be shared to illustrate how developmental evaluation and assessment can help accelerate team growth, and overcome institutional and infrastructural challenges and barriers.
| Recording | -
October 5th - Chris Mungall, Lawrence Berkeley National Laboratory
The Gene Ontology: Making functional annotation of plants and animals FAIR.
The Gene Ontology is one of the most widely used databases in the biosciences, covering functional annotation of genes and gene products across a wide range of species. The GO is ubiquitously used to analyse a variety of types of high-throughput experimental data. Originally created to unify functional annotation across a handful of model organism databases, the GO has grown to encompass more species, and the structure of the GO has been extended to integrate with other ontologies such as CHEBI and the Plant Ontology. The structure of annotations has also evolved, and the GO now includes more expressive pathway-oriented annotations in the form of GO-CAMs (Causal Activity Models). In this talk I will give a practical guide to the structure of GO, how to find and request terms, how to search and create annotations, and how to use GO tools. I will also talk about how the broader AgBioData can contribute to the GO consortium to help seed functional annotation efforts in a more diverse range of organisms, and in particular with agriculturally relevant species.
| Recording | -
September 7th - Nicholas J. Provart, University of Toronto
Raising the BAR for Hypothesis Generation in Plant Biology Using Open Big Data.
We have developed tools, available as part of the Bio-Analytic Resource at http://bar.utoronto.ca, for exploring large data sets from plants, to allow deeper insights into biological questions. My lab’s three visual analytic tools for transcriptomic data (eFP Browser, ePlant, and eFP-Seq Browser) allow for rapid access to comprehensive gene expression compendia we have curated for identifying tissues, cell-types, or perturbations in which a gene is active or alternatively spliced. Interactions, be they protein-protein or regulatory, create networks. We have developed new tools for exploring such data, either from large collections of experimentally-supported protein-protein or protein-DNA interactions or from predicted interactions, including protein-protein interactions inferred from molecular docking studies. We are currently working on integrating large-scale phenotype data from field trials monitored by drone-based sensors into ePlants we have developed for several agronomically-important species to improve understanding of links between genotype and phenotype.
| Recording | -
August 3rd - ThankGod Ebenezer, EMBL-EBI
The African BioGenome Project (AfricaBP): Genomics in the service of African biological diversity.
Food security and biodiversity conservation represents a substantial issue worldwide and requires local solutions, as highlighted in the UN’s Sustainable Development Goals. I will discuss the progress and process of establishing a pan-African network to address this challenge through genomic science and how this could inform and influence policy across Africa.
| Recording | -
June 1st - Camille Rustenholz, University of Strasbourg (France)
COST ACTION INTEGRAPE: Data integration to maximise the power of omics in grapevine improvement and beyond.
The European network INTEGRAPE seeks the establishment of an open, international, and representative network, insuring that omics and phenotyping data generated in the grapevine research community are being produced in a secure and standardized format, following the F.A.I.R. principles of findability, accessibility, interoperability, and reusability. Amongst the most significant deliverables of INTEGRAPE:- the elaboration of Guideline ‘cookbooks’ and Dictionary of unified grape-sample ontologies;
- the release of the PN40024 fourth genome assembly and its annotation;
- the creation of the Gene Reference Catalogue;
- the enlisting of Online repositories and tools for omics data exploration and visualization, which to date are not yet interoperable among them.
To tackle this last challenge, we applied for a COST Innovative Grant with the GRAPEDIA project (Grapevine Encyclopedia of genes and omics), which goal is to provide the community with a single open-access database, allowing data exploration and visualization of all grapevine resources, with tools for comparative analysis and customized services. In the GRAPEDIA database, we aim at centralizing, interconnecting, and showcasing these dispersed resources, and integrating them with those genomic efforts generated by the worldwide community. The target group is the entire scientific community working on the grapevine or using grapevine as their model plant for an “orphan” plant species, and also the private sector working on R&D in vitiviniculture.
| Recording | - May 4th - Karen Yook and Daniela Raciti, microPublication Biology
Bridging the gap between data production and database curation through microPublications
To solve a long-standing problem in data loss and accessibility, we developed a publishing platform, microPublication Biology, to bridge data publishing and database curation. Our journal accepts single experiment articles (microPublications) and embeds curation within the article submission/publishing workflow. microPublication Biology is an online, peer-reviewed, open-access journal published by the Caltech Library and discoverable in PubMed. Starting with articles focused on nematode biology, we continually expand to more organism communities, including Arabidopsis and most recently Dictyostelium, Maize, and Cotton. Our system is set up so that upon publication, atomized data is delivered directly to authoritative databases for each community (e.g., WormBase, Flybase, PomBase, TAIR), ensuring timely delivery to biological databases for deep data integration. We will give an overview of our journal and its integrated curation workflow and present our latest publishing metrics.
| Recording |
- March 15-16-17 - 2022 AgBioData Community Workshop.
Facilitating crosstalk and network building across Working Groups
Our three-day, all-hands, online workshop will provide a forum for the working group to pose questions to and gather feedback from the AgBioData community. Each day will have a two-hour session (7-9 AM Pacific Time), with short presentations of selected working groups at the beginning, followed by breakout sessions, where WG and non-WG members can meet and discuss relevant topics, and a brief reporting period at the end. Your participation can contribute to move forward FAIR data sharing and management!
| Recording |
- February 2nd - Baron Koylass and Timothee Cezard, EMBL-EBI
The European Variation Archive: Genetic variation archiving and accessioning
The European Variation Archive (EVA) is a primary open repository for archiving, accessioning, and distributing genetic variation, including single nucleotide variants, short insertions and deletions (indels), and larger structural variants (SVs) in any species. Created in 2014 to provide FAIR access to genetic variation data, it has since grown to be a primary resource for genomic variants hosting >3 billion records and now maintains and provides the permanent variant locus identifiers (rs IDs) for all non-human species.
| Recording | Slides |
2021
- December 1st - Meet the new AgBioData Working Groups!
The purpose of this meeting will be to quickly introduce each of the Working Groups and their initial plans. This will be an opportunity to learn what each working group is planning to focus on, followed by a short discussion. AgBioData members who have not signed up for a working group, or who wish to join an additional group, formally or informally, will have an opportunity to contact working group chairs.
| Recording |
- November 10th - Silvie Fexova, Plant Expression Atlas
Expression Atlas and Single Cell Expression Atlas – home of cross-species gene expression data
From submission to data visualisation – Our team at EBI maintains and develops a number of resources aimed to support (FAIR)sharing, re-use, integration and visualisation of functional genomics data from a broad range of species including many agricultural species (both plants and animals). In this webinar I will briefly introduce our archival services and tools as well as our two knowledgebases, the Expression Atlas and Single Cell Expression Atlas, that host thousands of publicly available transcriptomics experiments across species and biological conditions – re-analysed and visualised in a user-friendly interface for the scientific community to use and explore.
| Recording | Slides | - October 6th - Allyson Lister, FAIRsharing
FAIRsharing: promoting the discovery of data standards, policies and databases across all research domains
Abstract text: FAIRsharing is an informative and educational resource on interlinked standards, repositories and policies, three key elements of the FAIR ecosystem. FAIRsharing promotes the existence and value of these standards, repositories and policies, fostering a culture change within the research community into one where the use of these resources for FAIRer data is pervasive and seamless. This is achieved by guiding consumers to discover, select and use these resources with confidence, and helping producers to make their resources more visible, more widely adopted and cited. This presentation will highlight key collaborative, successful activities as well as next steps within FAIRsharing. It will also provide information on how to become a recommended repository in FAIRsharing and how to use FAIRsharing to engage with your stakeholders as well as with journal publishers and their data policies.
| Recording | Slides | -
September 1st - AgBioData RCN grant - Lisa Harper & Eva Huala
Help us chart the future of agricultural data!
Do you want easy access to better quality data?
We are THRILLED to announce that AgBioData (https://www.agbiodata.org/) has received a three-year NSF RCN award to expand our community committed to improving quality and access to agricultural data. New activities will include organizing workshops, establishing new working groups, and developing FAIR curriculum for scientists. We are expanding the consortium and welcome new members, especially students, post-docs, big-data scientists, funding agency scientists and members of the scientific publishing community interested in solving common FAIR data issues.-
RCN objectives,
-
benefits of joining AgBioData,
-
and how YOU can make a difference in the biological data environment for years to come.
-
- August 4th - Noah Fahlgren, Malia Gehan both from the Donald Danforth Plant Science Center, Discuss interactions between the phenomics and database communities
High-throughput phenotyping has emerged as a promising area in plant, animal, and agricultural sciences that brings together researchers from life sciences, engineering, computer science, data science, mathematics, and other research fields to develop technologies for rapidly and accurately measuring phenotypes using robotics, imaging, and other tools. High-throughput phenotyping can be done at different scales, from cellular to ecological, typically using image-based approaches for data collection and analysis. The development of computer vision and machine learning approaches to extract biologically meaningful measurements from images, including physical, physiological, morphological, and qualitative properties of crops and livestock, is a major activity within the field. Phenotype datasets can be used for a variety of purposes, but in conjunction with large genomic datasets, are a powerful tool for linking phenotype to genotype, training genomic prediction models, and other approaches that integrate genetic, phenotypic, and environmental datasets. We will introduce our efforts to develop PlantCV (https://plantcv.danforthcenter.org/), an open-source platform for image-based plant phenotyping, and discuss opportunities for collaboration between the phenomics and database communities.
| Recording | Slides | - June 2nd - Lisa Harper - Dealing with gene models from 50 different reference genomes. A progress report from MaizeGDB. MaizeGDB now hosts over 50 reference-quality genome assemblies and their associated gene model sets and metadata. We have started to use a "Pan-Gene" concept to group syntelogs. We define a pan-gene as the set of gene models from multiple genomes that appear to represent the same gene. After I show you how we are implementing this at MaizeGDB, let's have a discussion about how other databases are dealing with this gene model explosion.
| Recording | Slides | - May 5th - Monica Poelchau, Recommendations from the AgBioData GFF3 working group Over a year ago, AgBioData convened a discussion on GFF3 formatting issues, led by Scott Cain. This discussion led us to form the AgBioData GFF3 working group. Our goals are to 1) identify common problems with the GFF3 format; 2) recommend solutions for these problems; and 3) promote community adoption of these recommendations, so that data can be formatted in standard ways across databases. Members of AgBioData, Alliance of Genome Resources, and NCBI have been working on these goals for the past year. We are now ready to receive feedback from the AgBioData community on our recommendations, in order to get traction on the final goal – community adoption of these solutions.
| Recording | Slides | - April 7th: Guest speaker Peifen Zhang. PhyloGenes (phylogenes.org) presents precomputed phylogenetic trees of plant gene families along with known functions for individual family members. By displaying experimentally validated gene functions associated to individual genes within a tree, PhyloGenes enables functional inference for genes of uncharacterized function, based on their evolutionary relationships to experimentally studied genes, in a visually traceable manner. For the many families containing genes that have evolved to perform different functions, PhyloGenes also facilitates the study of function evolution. The current PhyloGenes release (version 2.2) includes 40 plant genomes covering a broad taxonomic range and including all major crops, along with 10 non-plant model organisms represented in over 8,000 gene families. Over two-thirds of the families have at least one member with a validated known function as GO terms. To increase the predictive power of PhyloGenes, future work will involve community contribution and emphasize on incorporating new functional annotations of family members across families and subfamilies, and also adding complementary functional datasets such as gene expression and mutant phenotype.
| Recording | Slides | - March 3rd - Jack Gardiner and Lisa Harper will lead a discussion on: Metabolomics- What is it and how might it lend insight into our understanding of complex biological traits.
| Recording | Slides | - February 3rd - Chuck Cook, The Global Biodata Coalition (GBC) is working with funders to encourage more efficient collaboration in funding data resources and to sustain funding for critical data resources. More info at the GBC website, including pdfs of past talks: www.globalbiodata.org.
Contact Chuck via email
| Recording | Slides | - January 13th - Imma Subirats & Kristin Kolshus, AGROVOC Ontologies. During this webinar, the AGROVOC Team from the Food and Agriculture Organization (FAO) of the United Nations will introduce how AGROVOC is kept up to date with a number of institutions and individual domain experts serving as focal points for specific languages and topics.
| Recording | Slides |
2020
-
October 7 Guest Speaker: WheatIS | Recording |
-
August 5th: Group discussion: "What has AgBioData done for you & your database?" | Notes |
-
April 8th: Guest speaker: Medha Devare from CGIAR, will be talking about the Gardian platform | Recording |
-
January 13th: 10am - 12pm PST - In Person Meeting at PAGXXVIII
2019
-
November 6th: AgBioData Discussion: The Future of agriculture-related data resources | Contact agbiodata@gmail.com for notes & recording
-
August 7th: Kimberly Van Auken will talk about Textpresso | Recording |
-
June 5th: Ethy Cannon (PeanutBase) will lead a discussion about metadata and the Metadata and Persistence Working Group and Sook Jung (Main Lab databases) will lead a discussion about ontologies and the Ontologies Working Group. | Recording |
-
May 1st: Tanya Berardini (TAIR) and Lisa Harper (MaizeGDB) will lead a discussion about curation and the Curation Working Group | Recording | Slides |
-
April 3rd: Meg Staton will lead a discussion about Data Sharing and the Data Sharing using Web Services Working Group | Recording | Slides |
-
March 6th: Daureen Nesdill (University of Utah) and Carolyn Lawrence-Dill (Iowa State University) discuss the APLU-AAU Accelerating Public Access to Research Workshop. | Slides |
-
February 6th: Jacqueline Campbell leading a discussion on AgBioData business topics
-
January 14th: In-Person Meeting at PAGXXVII (2019) | Notes |
2018
-
December 5th: James Wilgenbusch will be talking about the GEMs (GxExMxS) platform | Recording
-
October 3rd: Michael Cherry from Alliance of Genome Resources | Recording
-
August 1st: Cynthia Parr from AgData Commons | Slides
-
June 6th: Alex Pico from Wiki Pathways | Recording
-
May 2nd: Esther Dzale-Yeumo from Research Data Alliance (RDA) | Recording
-
March 7th: Marcela Tello-Ruiz from Gramene
-
February 7th: Gary Saunders from EVA at EMBL | Slides