A recent breakthrough study reveals how transformer technology can predict human age from microbial communities with 28% greater accuracy than previous methods.
The human microbiome—the vast community of microorganisms living in and on our bodies—undergoes profound changes as we age, influencing metabolism, immune function, and disease risk. These microbial shifts follow somewhat predictable patterns, creating a biological fingerprint that reflects not just how many years we’ve lived, but how well our bodies’ systems and cells are functioning. Understanding these patterns has become increasingly important as researchers seek to develop microbiome-based interventions for healthy aging and to distinguish between chronological age (years lived) and biological age (functional health status).
The challenge of accurately predicting age from microbiome data has proven complex, requiring sophisticated analyses to decode the intricate relationships between hundreds of microbial species and their collective impact on human health. New advancements are described in the study published in Communications Biology, Chronological Age Estimation from Human Microbiomes with Transformer-based Robust Principal Component Analysis, which presents a novel artificial intelligence (AI) approach that dramatically improves age prediction accuracy from human microbiome samples. The research to explore this model, called Transformer-based Robust Principal Component Analysis (TRPCA), was led by bioengineering PhD graduate and researcher at the Center for Microbiome Innovation (CMI) Tyler Myers in collaboration with fellow researchers at UC San Diego and international experts in artificial intelligence, microbiome science, and bioinformatics.
The methodology represents a significant departure from conventional microbiome analysis approaches. Unlike previous models that treat microbial features as independent variables, TRPCA leverages transformer architecture—the same technology powering advanced language models like ChatGPT—to understand the contextual relationships between different microbial species. The team tested their approach on an extensive dataset comprising 16S rRNA gene sequencing data from 8,959 samples across 10 studies and metagenomic sequencing data from 9,356 samples across 56 studies, examining microbiomes from three body sites: skin, oral cavity, and gut. This study aimed to assess how accurately aging and age could be predicted in individuals by examining their microbiome samples, while also investigating whether recent artificial intelligence breakthroughs could enhance the analysis of microbiome data.
The results reveal improvements in age prediction accuracy over previous best-performing models across multiple body sites and sequencing technologies. TRPCA achieved the most notable performance gains for skin microbiome samples, reducing prediction errors by 28% for metagenomic data and 14% for 16S rRNA sequencing data. The model showed smaller, but consistent improvements for gut and oral microbiomes.
“We’re exploring AI models grounded in biological processes that could transform how we understand aging at the microbial level,” noted Andrew Bartko, corresponding author and executive director at CMI. “Our approach improves our ability to not only predict age accurately but also the microbes associated with the observation, providing researchers and clinicians better tools to monitor aging and develop more targeted interventions.”
The implications of this research extend beyond computational improvements, offering profound insights into the biological mechanisms of aging. The research team identified specific microbial signatures associated with healthy aging across different body sites, as they observed trends in the abundance, or lack, of specific bacteria as individuals age. Their findings provide targets for interventions aimed at promoting healthy aging through microbiome modulation, potentially informing the development of personalized probiotics, dietary recommendations, and therapeutic strategies. Myers adds, “The AI methods in combination with the age prediction models now provide a baseline for us to assess aging from different human associated microbiomes—potentially catching accelerated or unhealthy aging early and, ideally, providing the opportunity to change the trajectory.”
Additional co-authors include Se Jin Song, Yang Chen, Lora Khatib, Daniel McDonald, Richard Gallo and Rob Knight at UC San Diego; Britta De Pessemier and Chris Callewaert at Ghent University; Shi Huang at The University of Hong Kong; Aki S. Havulinna at Finnish Institute for Health and Welfare; Leo Lahti at University of Turku, Finland; Guus Roeselers, Manolo Laiola and Sudarshan A. Shetty at Danone Research and Innovation; and Scott T. Kelley at San Diego State University.
CMI is proud to include Se Jin Song, Andrew Bartko and Rob Knight on its leadership team.
Acknowledgements: This work was funded by Danone Nutricia Research and the Center for Microbiome Innovation and supported by The Microsetta Initiative. B.D.P. was supported by the Research Foundation Flanders (grant numbers 1S04122N and V477223N).
About the UC San Diego Center for Microbiome Innovation (CMI): CMI inspires and sustains vibrant collaborations between industry leaders and interdisciplinary teams of UC San Diego researchers to fuel discovery and innovation in the world of microbiome science. The Center encompasses a diverse range of expertise in microbiome sampling, -omics technologies and data analysis, using high-performance computing environments, statistical frameworks and AI methodologies. If you are interested in discussing potential partnership opportunities with the CMI team, contact cmiinfo@ucsd.edu for more information.
Figure 1: Preprocessing and model overview to Transformer-based RPCA. Samples represented as count tables are visualized and converted to RPCA vectors. RPCA vectors are input as sequences into a transformer encoder model with multi-head attention. The transformer model outputs are provided to a classification (CLS) or regression (REG) head for classification, regression, or Multi-task learning.




