2022 Survey of Genomic, Genetic, and Breeding (GGB) Database Stakeholders
In partnership with Michael Coe (Washington State University)
In Year 1 of our NSF RCN grant (Award Abstract # 2126334), the AgBioData Consortium, in partnership with Washington State University, surveyed database stakeholders on standardized data curation principles and their implementation in data repositories for agricultural research and breeding programs. We will run a similar survey again at the end of the RCN funding period to assess whether changes in the perception or understanding of FAIR data in our community have occurred.
Summary of Baseline Survey Data
In total, we received 80 usable survey responses, which are summarized below. The full survey report is available here.
1. Survey Sample, Participant Characteristics, and Familiarity and Experience with GGB Databases
The majority of the respondents (85%) reported working in a university that offers related Ph.D. degrees, in a Land Grant University State Agricultural Experiment Station and the USDA (Table 1). As displayed in Table 2, most reported their primary professional role as being a research scientist (60%), post-doctoral intern (9%), or bioinformatics professional (8%).
Most respondents (85%) reported a professional focus on plants, including major plant crops (28%), horticultural specialty crops (19%), and plants grown for other purposes besides human consumption (9%; Table 3). Animals were a primary focus for 9% of respondents, including major livestock animals (5% of respondents) and minor livestock animals (6%). Pests, diseases, physiological stressors, and other threats were a focus for 18% of respondents; 15% reported working on understanding wild organisms not directly used in agriculture.
Table 1. Organizational Affiliations of Survey Respondents
Table 2. Primary Professional Role of Survey Respondents
Table 3. Professional Focus of Survey Respondents
The survey asked participants to rate their interest in or familiarity with a list of 34 specific AgBioData GGB databases. Respondents were most familiar with certain specific databases such as TAIR, Gramene, GRIN, MaizeGDB, and Solanaceae Genomics Network; for each of these, more than a quarter of survey participants reported being “somewhat familiar,” “very familiar,” or having “used this a lot and could teach others to use it.” Other databases that were rated as having similar levels of familiarity by 15 % or more of respondents included Soybase, Planteome, AgBase, Genome Database for Rosaceae, and GrainGenes.
As displayed in Table 4, over half of the participants had not tried to submit data to a GGB database, while 90% or more had attempted to search for or retrieve data from a GGB database, and 76% had attempted to reuse data that had been retrieved from a GGB database. Approximately 20% reported that they had attempted to perform these functions and found it to be impossible, very difficult, or not easy to do in a satisfactory way.
Table 4. Participant Experience with Specific GGB Database Functions.
2. Baseline Ratings and Recommendations On FAIR Data and GGB Databases
Survey participants were asked to rate their level of agreement with a series of statements about their knowledge and experience with FAIR data and GGB databases. They were also asked to rate their priorities for the future development of these databases and to provide related comments and written recommendations (Table 5). There were 17 questions in total, and the responses are summarized below:
- Baseline familiarity with FAIR data and value placed on these principles (questions C1-3):
On average, more than 74% of the respondents were slightly, moderately, or strongly familiar with the concept of FAIR data and management, could explain or supervise their implementation, and would choose a data resource that follows and implement the FAIR principles.
- Baseline implementation quality or value and usefulness of FAIR principles and related tools and procedures in GGB databases (questions C4-C11):
There is a little outright disagreement that these databases are currently doing a good job with these issues, but the majority (70.25%, on average) of the survey participants slightly, moderately, or strongly agreed that the GGB databases related to their work
- highlight the importance and provide educational material on the FAIR principles
- provide detailed guidelines on metadata and data standards to implement when submitting the data
- offer access to data in common formats that are easy to work with
- curate and catalog data using standard terms, making it easy to search for data.
- Importance of GGB databases as resources of data and analyses tools for the stakeholders (e.g., researchers, breeders, etc.; questions C12-C16):
About 89.5% of the respondents, on average, slightly, moderately, or strongly agreed that the process of contributing to and retrieving data from GGB databases helps researchers and breeders improve their understanding and ability to work with data according to the FAIR principles. Also, 92% of respondents agreed that GGB databases provide essential tools and data to the users facilitating their work in a cost-effective manner. About 84% of the survey participants agree that GGB databases related to their work provide useful resources for learning how to use them (e.g., tutorials, FAQs, etc.), while 78% agreed that agricultural graduate students would benefit from formal introduction during their courses to suing GGB databases.
Survey participants were asked to rate the importance of six potential priorities for improved data curation; their responses are summarized in Table 5. All six were rated as being “very important” or “highest priority” by more than 60 percent of respondents. The highest ratings were given to “timely and up-to-date availability of curated data,” “visualization of integrated data,” and “training materials for FAIR data (for data submission). The majority of the respondents' comments on these priorities highlighted the importance of better quality and consistency of data curation and more educational opportunities for students and other database users.
Table 5. Baseline Priorities for Further Development of FAIR Data Practices in GGB Databases
3. Recommendations for Topics and Formats of Training Opportunities for Users of GGB Databases
Survey participants were asked an open-ended question “What sort of training opportunities/formats or content/topics for users of these databases would be most helpful?”. Most of the comments can be summarized in the following categories:
- Synchronous (live) educational events online (e.g., webinars) or asynchronous online video presentations, demonstrations, or tutorials. After being recorded, events like webinars can be cut into segments and made into brief asynchronous, "static" video recordings. It is often helpful to design and organize webinars or similar online presentations with this segmentation and re-use in mind (20 comments).
- Static online educational materials, such as tutorials or manuals (11 comments).
- Online courses, synchronous and/or asynchronous or static (6 comments). Curriculum materials developed for an online or in-person course can then be repurposed and provided to instructors for use in their own courses. Curriculum and assistance for course instructors were specifically mentioned in some comments.
- In-person workshops, standalone, or in conjunction with conferences and meetings (5 comments).
- Synchronous "office hours" or asynchronous "discussion forums" in which people can ask questions and get timely advice and assistance (3 comments).
- Regular updates of any static online tutorials, manuals, or similar materials so that they reflect the current status of the databases they reference (2 comments).