Document Type : Original Article
Authors
1
Department of Biostatistics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
2
Cellular and Molecular Endocrine Research Centre, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
3
Department of Statistics, Faculty of Statistics, Mathematics and Computer, Allameh Tabataba’i University, Tehran, Iran
Abstract
Objective: Metabolic syndrome (MetS) is a complex multifactorial disorder that considerably burdens healthcare
systems. We aim to classify MetS using regularized machine learning models in the presence of the risk variants of
GCKR, BUD13 and APOA5, and environmental risk factors.
Materials and Methods: A cohort study was conducted on 2,346 cases and 2,203 controls from eligible Tehran
Cardiometabolic Genetic Study (TCGS) participants whose data were collected from 1999 to 2017. We used different
regularization approaches [least absolute shrinkage and selection operator (LASSO), ridge regression (RR), elasticnet
(ENET), adaptive LASSO (aLASSO), and adaptive ENET (aENET)] and a classical logistic regression (LR) model
to classify MetS and select influential variables that predict MetS. Demographics, clinical features, and common
polymorphisms in the GCKR, BUD13, and APOA5 genes of eligible participants were assessed to classify TCGS
participant status in MetS development. The models’ performance was evaluated by 10-repeated 10-fold crossvalidation.
Various assessment measures of sensitivity, specificity, classification accuracy, and area under the receiver
operating characteristic curve (AUC-ROC) and AUC-precision-recall (AUC-PR) curves were used to compare the
models.
Results: During the follow-up period, 50.38% of participants developed MetS. The groups were not similar in terms of
baseline characteristics and risk variants. MetS was significantly associated with age, gender, schooling years, body
mass index (BMI), and alternate alleles in all the risk variants, as indicated by LR. A comparison of accuracy, AUCROC,
and AUC-PR metrics indicated that the regularization models outperformed LR. Regularized machine learning
models provided comparable classification performances, whereas the aLASSO model was more parsimonious and
selected fewer predictors.
Conclusion: Regularized machine learning models provided more accurate and parsimonious MetS classifying
models. These high-performing diagnostic models can lay the foundation for clinical decision support tools that use
genetic and demographical variables to locate individuals at high risk for MetS.
Keywords
Main Subjects