Sustainability Survey of AgBioData member databases
Sabarinath Subramaniam, Director of Business Development, Phoenix Bioinformatics
Josh Young, Executive Director, Phoenix Bioinformatics
One of the core aims of our NSF RCN grant (Award Abstract # 2126334) is to develop a roadmap for a sustainable integrated database ecosystem. Most GGB databases rely on short-term funding for a majority of their operating costs and are vulnerable to loss of personnel and knowledge if funding lapses, even while demand for their services by researchers continues to increase. Researchers rely upon databases for data discovery, analysis, and management. To ensure that researchers continue to have access to reliable, high-quality, curated, and FAIR data in the future, we need to plan and develop infrastructure, strategies, and tools to ensure the long-term sustainability of GGB data and GGB databases.
To achieve this goal, we self-assessed the long-term financial stability of the AgBioData member databases through written surveys and interviews. We gathered data from 36 AgBioData member databases about the cost of operations, staff level, sources of funding, usage level, data types, species and strains, stakeholders served, and anticipated future needs. The 36 member databases surveyed did not include TAIR and CyVerse, because they have implemented a sustainability plan based on subscriptions.
The data reflects only the 19 respondents (databases), not the entire AgBioData member databases.
This section asked for general information about each of the databases.
Database funding, expenses, and stakeholders
This section asked for general information about funding sources, expenses, and stakeholders.
Sustainability strategies and placing
Usage data capture mechanism and User surveys
Having a mechanism to capture usage statistics is a very useful tool for identifying sustainability options. Our survey asked whether the AgBioData resource had a mechanism for usage capture, such as Google Analytics. Out of the 19 databases that took the survey, 17 databases provided a response to this question. Out of 17 respondents, 16 databases indicated that they already have a mechanism for capturing data usage, while 1 resource said they do not.
Understanding the value that the data/tools within the resource represents to the users of the resource is very important to judge the willingness of the users to support a sustainability model for the resource. Educating the users about the value of the resource as well as bringing their attention to the potential consequences if the resource were to disappear, are some of the key aspects of a user survey. We asked AgBioData databases if they have conducted user surveys in the past 2 years. Out of the 19 databases that returned the survey, 17 databases provided a response to this question. 11 databases indicated that they had conducted user surveys, while 6 databases said they had not.
Acceptable sustainability model(s) for each AgBioData database
We surveyed AgBioData databases about the types of sustainable funding models the resource users would be willing to support. Our survey asked respondents to select all models that would be acceptable to the users of the resource from the following:
Shared infrastructure (Chado DB, Tripal, BrAPI )
Database federation (e.g., Alliance for Genome Resources model)
Of the 19 AgBioData databases that returned the survey, 14 answered this question. Of those, acceptable models included: 7 indicated either shared infrastructure, Database federation, and/or Subscriptions as models acceptable for their users. Four indicated Shared infrastructure, Database federation, and voluntary contributions. Two databases selected Shared infrastructure as the only option. One chose voluntary contributions as the only option.
Steps taken by resource to reduce annual expenses
We surveyed AgBioData databases to understand any steps the resource took to reduce annual expenses (Table 1).
Sharing data with other databases
Data sharing is one way to reduce costs. We surveyed AgBioData databases to identify If the database shares data with other AgBioData databases and asked respondents to list the names of the other databases. 14 of the 19 respondents responded to this question listing at least one database they share data with. We grouped these 14 answers into two categories: Databases that contain data pertaining to a specific organism (e.g., TAIR) or databases that are part of a larger data repository (e.g., Ensembl).
Summary and recommendations
Based on the results of our survey, we would like to make the following recommendations to better understand the value of AgBioData databases among its users and to identify at least one viable sustainability strategy for each resource or the consortium as a whole.
Stakeholder surveys and interviews
Our survey identified multiple stakeholders for the respondent AgBIoData databases. Surveying or interviewing these stakeholders will give a clearer picture of stakeholder buy-in for any sustainability strategy and will help understand the value of each resource among its stakeholders.
Usage statistics and User surveys
Fifteen databases have a mechanism to capture usage statistics. Identifying a viable sustainability strategy requires understanding user behavior, and studying the usage statistics from the 15 databases would be a good starting point.
User surveys are a valuable tool in determining a viable sustainability strategy. Also, 11 databases indicated they had conducted user surveys. These surveys would be a good start to understanding user behavior. Fourteen out of 19 respondents identified at least one of our suggested sustainability models as appropriate for their user community. It would be valuable to do a user survey to gauge which of these strategies is acceptable for the users of these databases.
For databases that are sharing their data through public databases like Ensembl, it would be good to understand how their users access the data (e.g., How many access the data from the primary resource versus the public repository?).
Five databases said they get their data from public sources. Further discussion with these databases will help us understand what fraction of their data comes from public sources and is collected, curated, and integrated. Eleven respondents indicated their data is archived or duplicated in other databases.