Regularized Machine Learning Models for Prediction of Metabolic Syndrome Using GCKR, APOA5, and BUD13 Gene Variants: Tehran Cardiometabolic Genetic Study

Document Type : Original Article


1 Department of Biostatistics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran

2 Cellular and Molecular Endocrine Research Centre, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran

3 Department of Statistics, Faculty of Statistics, Mathematics and Computer, Allameh Tabataba’i University, Tehran, Iran


Objective: Metabolic syndrome (MetS) is a complex multifactorial disorder that considerably burdens healthcare
systems. We aim to classify MetS using regularized machine learning models in the presence of the risk variants of
GCKR, BUD13 and APOA5, and environmental risk factors.
Materials and Methods: A cohort study was conducted on 2,346 cases and 2,203 controls from eligible Tehran
Cardiometabolic Genetic Study (TCGS) participants whose data were collected from 1999 to 2017. We used different
regularization approaches [least absolute shrinkage and selection operator (LASSO), ridge regression (RR), elasticnet
(ENET), adaptive LASSO (aLASSO), and adaptive ENET (aENET)] and a classical logistic regression (LR) model
to classify MetS and select influential variables that predict MetS. Demographics, clinical features, and common
polymorphisms in the GCKR, BUD13, and APOA5 genes of eligible participants were assessed to classify TCGS
participant status in MetS development. The models’ performance was evaluated by 10-repeated 10-fold crossvalidation.
Various assessment measures of sensitivity, specificity, classification accuracy, and area under the receiver
operating characteristic curve (AUC-ROC) and AUC-precision-recall (AUC-PR) curves were used to compare the
Results: During the follow-up period, 50.38% of participants developed MetS. The groups were not similar in terms of
baseline characteristics and risk variants. MetS was significantly associated with age, gender, schooling years, body
mass index (BMI), and alternate alleles in all the risk variants, as indicated by LR. A comparison of accuracy, AUCROC,
and AUC-PR metrics indicated that the regularization models outperformed LR. Regularized machine learning
models provided comparable classification performances, whereas the aLASSO model was more parsimonious and
selected fewer predictors.
Conclusion: Regularized machine learning models provided more accurate and parsimonious MetS classifying
models. These high-performing diagnostic models can lay the foundation for clinical decision support tools that use
genetic and demographical variables to locate individuals at high risk for MetS.


Main Subjects

  1. Kassi E, Pervanidou P, Kaltsas G, Chrousos G. Metabolicdrome: definitions and controversies. BMC Med. 2011; 9: 48.
  2. Jahangiry L, Khosravi-Far L, Sarbakhsh P, Kousha A, Entezarmahdi R, Ponnet K. Prevalence of metabolic syndrome and its determinants among Iranian adults: evidence of IraPEN survey on a biethnic population. Sci Rep. 2019; 9(1): 7937.
  3. Hirode G, Wong RJ. Trends in the prevalence of metabolic syndrome in the United States, 2011-2016. JAMA. 2020; 323(24): 2526-2528.
  4. Mazloomzadeh S, Rashidi Khazaghi Z, Mousavinasab N. The prevalence of metabolic syndrome in iran: a systematic review and meta-analysis. Iran J Public Health. 2018; 47(4): 473-480.
  5. Lee YH, Bang H, Kim DJ. How to establish clinical prediction models. Endocrinol Metab (Seoul). 2016; 31(1): 38-44.
  6. Abou Ziki MD, Mani A. Metabolic syndrome: genetic insights into disease pathogenesis. Curr Opin Lipidol. 2016; 27(2): 162-171.
  7. Matschinsky FM. Banting Lecture 1995. A lesson in metabolic regulation inspired by the glucokinase glucose sensor paradigm. Diabetes. 1996; 45(2): 223-241.
  8. Yuan F, Gu Z, Bi Y, Yuan R, Niu W, Ren D, et al. The association between rs1260326 with the risk of NAFLD and the mediation effect of triglyceride on NAFLD in the elderly Chinese Han population. Aging (Albany NY). 2022; 14(6): 2736-2747.
  9. Zahedi AS, Akbarzadeh M, Sedaghati-Khayat B, Seyedhamzehzadeh A, Daneshpour MS. GCKR common functional polymorphisms

are associated with metabolic syndrome and its components: a 10- year retrospective cohort study in Iranian adults. Diabetol Metab Syndr. 2021; 13: 20.

  1. Fernandes Silva L, Vangipurapu J, Kuulasmaa T, Laakso M. An intronic variant in the GCKR gene is associated with multiple lipids. Sci Rep. 2019; 9(1): 10240.
  2. Masjoudi S, Sedaghati-Khayat B, Givi NJ, Bonab LNH, Azizi F, Daneshpour MS. Kernel machine SNP set analysis finds the association of BUD13, ZPR1, and APOA5 variants with metabolic syndrome in Tehran Cardio-metabolic Genetics Study. Sci Rep. 2021; 11(1): 10305.
  3. Oh SW, Lee JE, Shin E, Kwon H, Choe EK, Choi SY, et al. Genome- wide association study of metabolic syndrome in Korean populations. PLoS One. 2020; 15(1): e0227357.
  4. Kim HK, Anwar MA, Choi S. Association of BUD13-ZNF259- APOA5-APOA1-SIK3 cluster polymorphism in 11q23.3 and structure of APOA5 with increased plasma triglyceride levels in a Korean population. Sci Rep. 2019; 9(1): 8296.
  5. Aung LH, Yin RX, Wu JZ, Wu DF, Wang W, Li H. Association between the MLX interacting protein-like, BUD13 homolog and zinc finger protein 259 gene polymorphisms and serum lipid levels. Sci Rep. 2014; 4: 5565.
  6. Wei W, Gyenesei A, Semple CA, Haley CS. Properties of local interactions and their potential value in complementing genome-wide association studies. PLoS One. 2013; 8(8): e71203.
  7. Daneshpour MS, Fallah MS, Sedaghati-Khayat B, Guity K, Khalili D, Hedayati M, et al. Rationale and design of a genetic study on cardiometabolic risk factors: protocol for the tehran cardiometabolic genetic study (TCGS). JMIR Res Protoc. 2017; 6(2): e28.
  8. Daneshpour MS, Hedayati M, Sedaghati-Khayat B, Guity K, Zarkesh M, Akbarzadeh M, et al. Genetic identification for non-communicable disease: findings from 20 years of the Tehran lipid and glucose study. Int J Endocrinol Metab. 2018; 16 Suppl 4: e84744.
  9. Alberti KG, Eckel RH, Grundy SM, Zimmet PZ, Cleeman JI, Donato KA, et al. Harmonizing the metabolic syndrome: a joint interim statement of the International Diabetes Federation Task Force on Epidemiology and Prevention; National Heart, Lung, and Blood Institute; American Heart Association; World Heart Federation; International Atherosclerosis Society; and International Association for the Study of Obesity. Circulation. 2009; 120(16): 1640-1645.
  10. Azizi F, Khalili D, Aghajani H, Esteghamati A, Hosseinpanah F, Delavari A, et al. Appropriate waist circumference cut-off points among Iranian adults: the first report of the Iranian national committee of obesity. Arch Iran Med. 2010; 13(3): 243-244.
  11. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007; 81(3): 559-575.
  12. Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B. 1996; 58(1): 267-288.
  13. Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol. 2005; 67(2): 301-320.
  14. Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. 1970; 12(1): 55-67.
  15. Zou H. The adaptive lasso and its oracle properties. J Am Stat Assoc. 2006; 101(476): 1418-1429.
  16. Zou H, Zhang HH. ON The adaptive elastic-net with a diverging number of parameters. Ann Stat. 2009; 37(4): 1733-1751.
  17. Kuhn M, Johnson K. Over-fitting and model tuning. In: Kuhn M, Johnson K, editors. Applied predictive modeling. 1st ed. New York: Springer; 2013.
  18. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988; 44(3): 837- 845.
  19. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010; 33(1): 1-22.
  20. Kuhn M. Building predictive models in R using the caret package. J Stat Softw. 2008; 28(5): 1-26.
  21. Xiao N, Xu QS. Multi-step adaptive elastic-net: reducing false positives in high-dimensional variable selection. J Stat Comput Simul. 2015; 85(18): 3755-3765.
  22. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011; 12: 77.
  23. Grau J, Grosse I, Keilwagen J. PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R. Bioinformatics. 2015; 31(15): 2595-2597.
  24. Garcia-Carretero R, Vigil-Medina L, Barquero-Perez O, Mora-Jimenez I, Soguero-Ruiz C, Goya-Esteban R, et al. Logistic LASSO and Elastic net to characterize vitamin D deficiency in a hypertensive obese population. Metab Syndr Relat Disord. 2020; 18(2): 79-85.
  25. Kim SM, Kim Y, Jeong K, Jeong H, Kim J. Logistic LASSO regression for the diagnosis of breast cancer using clinical demographic data and the BI-RADS lexicon for ultrasonography. Ultrasonography. 2018; 37(1): 36-42.
  26. Akbarzadeh M, Alipour N, Moheimani H, Zahedi AS, Hosseini-Esfahani F, Lanjanian H, et al. Evaluating machine learning-powered classification algorithms which utilize variants in the GCKR gene to predict metabolic syndrome: Tehran cardio-metabolic genetics study. J Transl Med. 2022; 20(1): 164.
  27. Huang YC. The application of data mining to explore association rules between metabolic syndrome and lifestyles. Health Inf Manag. 2013; 42(3): 29-36.
  28. Jasim AA, Al-Bustan SA, Al-Kandari W, Al-Serri A, AlAskar H. Sequence analysis of APOA5 among the Kuwaiti population identifies association of rs2072560, rs2266788, and rs662799 with TG and VLDL levels. Front Genet. 2018; 9: 112.
  29. Schooling CM, Jones HE. Clarifying questions about “risk factors”: predictors versus explanation. Emerg Themes Epidemiol. 2018; 15: 10.
  30. Guo S, Lucas RM, Ponsonby AL; Ausimmune Investigator Group. A novel approach for prediction of vitamin d status using support vector regression. PLoS One. 2013; 8(11): e79970.