Microbiome research’s emergence as a burgeoning field has yielded an ever-increasing availability of data sets to study. While the massive amounts of data have presented the opportunity for a variety of novel studies, their intrinsic nature of being high-dimensional, undetermined, sparse, and compositional has also caused challenges.
This explosion of data, both in volume and complexity, has revealed a reproducibility crisis. While this is hardly unique to microbial studies, the field’s relative adolescence carries with it a lack of standards and protocols to guide the collection and processing of data. Methodological research has focused on identifying optimal methods of feature selection, but they’re typically evaluated on model prediction performance, such as Area Under the Curve (AUC), Mean Squared Error (MSE), or R-squared, rather than reproducibility.
In an effort to facilitate more reproducible research and data analysis, a group of researchers with the Center for Microbiome Innovation (CMI) at the University of California San Diego (UC San Diego), led by Lingjing Jiang and Loki Natarajan, collaborated with a team at IBM Research through the Artificial Intelligence for Healthy Living Center to evaluate the utility of Stability as a feature selection criterion. Their work was published online by Biometrics, in an article entitled “Utilizing stability criteria in choosing feature selection methods yields reproducible results in microbiome data.”
The study’s approach centered on two model prediction metrics – AUC and MSE – and their proposed Stability criterion. The researchers evaluated four popular feature selection methods – lasso, elastic net, random forests, and compositional lasso – through both simulations and experimental microbiome applications, focusing on feature selection analysis in the context of continuous or binary outcomes.
“We were surprised by the poor correlation between model performance metric – especially MSE – and the underlying truth (false positive/negative rates) in the simulations, and concerned by the common practice of using model performance metric alone to evaluate feature selection methods,” Natarajan noted when discussing the study. “We were also surprised to see that no feature selection method performed the best in all scenarios, even the latest methods developed specifically for microbiome feature selection. Hence, identifying the proper quantity to evaluate methods is probably more important than using a single approach for all analyses.”
Based on the study’s findings, the team hopes their work will lead researchers away from using a single model performance metric in evaluating feature selection methods. “Incorporating a reproducibility criterion such as Stability into commonly used model performance metrics can be a simple yet powerful practice to improve reproducibility in high dimensional data analysis,” according to Jiang.
Additional co-authors include Ho-Cheol Kim at IBM Research-Almaden, Niina Haiminen and Laxmi Parida at the IBM Thomas J. Watson Research Center, Anna-Paola Carrieri at IBM Research Europe, and Shi Huang, Yoshiki Vazquez-Baeza, Austin D. Swafford, and Rob Knight, all at UC San Diego.
The Center for Microbiome Innovation is proud to include Yoshiki Vázquez-Baeza, Austin D Swafford, and Rob Knight on its leadership team.
Figure 1. Comparing the relationship between MSE or AUC and False Positive Rate vs. Stability and False Positive Rate in three correlation structures for continuous or binary outcomes. The first two columns denote the results for continuous outcome, and the last two columns are the results for binary outcome. Colored dots represent values from different feature selection methods: compositional lasso (red), elastic net (green), lasso (blue) and random forests (purple). Size of dots indicate features-to-sample size ratio p/n.
This piece was written by CMI’s contributing editor Cassidy Symons