Congratulations Jipeng Di, Julie Vaughn, Joshua Proulx, Sadie Tayler Nordstrand, Bryce Daines, Ph.D., Katrisa Ward, Philip Lupo, Jianhong Hu, Mullai Murugan, and Adam Hansen, PhD for our paper An Agentic Approach to Phenotype Mapping from Rare Disease Surveys being accepted at the 2025 Machine Learning for Health Symposium with an oral presentation!
TLDR: GenOMA is the first LLM agent purpose-built for rare disease survey mapping, transforming survey questions into standardized ontology codes with state-of-the-art accuracy and reliability.
Abstract: Rare disease patients worldwide often experience years-long diagnostic delays, in part due to fragmented and unstructured phenotypic information. Patient-reported surveys provide valuable insights but are typically unstructured and hard to integrate with structured data. We present GenOMA, a Large Language Model (LLM) agent built on the LangGraph framework and integrated with a Unified Medical Language System (UMLS) API for precise extraction and ontology mapping of phenotypic terms.
Using a modular, node-based architecture for context-aware extraction, iterative refinement, candidate ranking, and semantic validation, GenOMA maps data to standardized Human Phenotype Ontology (HPO) codes without local ontology deployment. We evaluate GenOMA on the question fields of three rare disease surveys, mapping them to HPO terms, and compare its performance with other leading methods. On the Xia-Gibbs Syndrome (XGS) Registry, GenOMA achieved 0.92 accuracy, 0.94 precision, 0.97 recall, and 0.96 F1. On the Down Syndrome Phenotyping Acute Leukemia Study (DS-PALS) dataset, it obtained 0.92 accuracy, 0.93 precision, 0.98 recall, and 0.96 F1. Finally, on the GenomeConnect (GC) dataset, it obtained 0.91 accuracy, 0.91 precision, 1.0 recall, and 0.96 F1. In all tasks, GenOMA outperformed MetaMap, PhenoTagger, PhenoBERT, cTAKES, and GPT-5.
These results show that GenOMA effectively converts unstructured survey data to structured phenotype information. To our knowledge, this is the first ontology mapping system specifically designed for patient-reported rare disease surveys, a critical but underexplored data modality.
https://lnkd.in/gVYpGnah
https://lnkd.in/gaQZE_EQ