September 6th Webinar

September 6th Webinar

Monday, August 21, 2023
Staying grounded: assembling structured biological knowledge with help from large language models

 

Developing comprehensive knowledge bases and ontologies demands meticulous curation. The emergence of highly flexible, artificial intelligence-driven approaches to natural language processing offers novel ways to expedite this process. Current methods often rely on extensive training data, however, and struggle with complex, nested knowledge structures. In this talk, I will describe a new approach, Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES). This method for information extraction leverages the capability of Large Language Models (LLMs) to perform zero-shot learning (ZSL) with a variety of natural-language prompts. SPIRES operates with predefined data schemas, enabling information extraction that adheres to these structures. It also grounds concepts with well-established ontologies and vocabularies, avoiding the "hallucinations" common to text generated by LLMs. 
Within OntoGPT, an LLM-querying framework we have developed, SPIRES supports rapid application to summarization and modeling across plant science and biology. Notably, this approach allows customization for new tasks and topics without a need for new training data. We have found that OntoGPT and SPIRES are capable of extracting structured knowledge from large literature collections and constructing knowledge graphs from the resulting relationships. Through harnessing the language comprehension capabilities of LLMs, SPIRES streamlines knowledge acquisition from across agricultural science and beyond.