Chronic Obstructive Pulmonary Disease: Novel Genes Detection with Penalized Logistic Regression

Document Type : Original Article


1 Department of Biostatistics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran

2 Division of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institute, Stockholm, Sweden

3 HPGC Research Group, Medical Biotechnology Department, Biotechnology Research Center, Pasteur Institute of Iran, Tehran, Iran

4 Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran


Objective: This study aimed to introduce novel techniques for identifying the genes associated with developing
chronic obstructive pulmonary disease (COPD) and to prioritize COPD candidate genes using regression methods.
Materials and Methods: This is a secondary analysis of the data from an experimental study. We used penalized
logistic regressions with three different types of penalties included least absolute shrinkage and selection operator
(LASSO), minimax concave penalty (MCP), and smoothly clipped absolute deviation (SCAD). The models were
trained using genome-wide expression profiling to define gene networks relevant to the COPD stages. A 10-fold
cross-validation scheme was used to evaluate the performance of the methods. In addition, we validate our
results by the external validity approach. We reported the sensitivity, specificity, and area under curve (AUC) of
the models.
Results: There were 21, 22, and 18 significantly associated genes for LASSO, SCAD, and MCP models, respectively.
The most statistically conservative method (detecting less significant features) was MCP detected 18 genes that were
all detected by the other two approaches. The most appropriate approach was a SCAD penalized logistic regression
(AUC= 96.26, sensitivity= 94.2, specificity= 86.96). In this study, we have a common panel of 18 genes in all three
models that show a significant positive and negative correlation with COPD, in which RNF130, STX6, PLCB1,
CACNA1G, LARP4B, LOC100507634, SLC38A2, and STIM2 showed the odds ratio (OR) more than 1. However, there
was a slight difference between penalized methods.
Conclusion: Regularization solves the serious dimensionality problem in using this kind of regression. More exploration
of how these genes affect the outcome and mechanism is possible more quickly in this manner. The regression-based
approaches we present could apply to overcoming this issue.


  1. Quan Z, Yan G, Wang Z, Li Y, Zhang J, Yang T, et al. Current status and preventive strategies of chronic obstructive pulmonary disease in China: a literature review. J Thorac Dis. 2021; 13(6): 3865-3877.
  2. Szalontai K, Gémes N, Furák J, Varga T, Neuperger P, Balog JÁ, et al. Chronic obstructive pulmonary disease: epidemiology, biomarkers, and paving the way to lung cancer. J Clin Med. 2021; 10(13): 2889.
  3. GBD 2017 Causes of Death Collaborators. Global, regional, and national age-sex-specific mortality for 282 causes of death in 195 countries and territories, 1980-2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2018; 392(10159): 1736-1788.
  4. Halpin DMG, Criner GJ, Papi A, Singh D, Anzueto A, Martinez FJ, et al. Global initiative for the diagnosis, management, and prevention of chronic obstructive lung disease. The 2020 GOLD science committee report on COVID-19 and chronic obstructive pulmonary disease. Am J Respir Crit Care Med. 2021; 203(1): 24-36.
  5. Pillai SG, Ge D, Zhu G, Kong X, Shianna KV, Need AC, et al. A genome-wide association study in chronic obstructive pulmonary disease (COPD): identification of two major susceptibility loci. PLoS Genet. 2009; 5(3): e1000421.
  6. Wilk JB, Chen TH, Gottlieb DJ, Walter RE, Nagle MW, Brandler BJ, et al. A genome-wide association study of pulmonary function measures in the Framingham Heart Study. PLoS Genet. 2009; 5(3): e1000429.
  7. Cho MH, Castaldi PJ, Wan ES, Siedlinski M, Hersh CP, Demeo DL, et al. A genome-wide association study of COPD identifies a susceptibility locus on chromosome 19q13. Hum Mol Genet. 2012; 21(4): 947-957.
  8. Zhao J, Cheng W, He X, Liu Y, Li J, Sun J, et al. Chronic obstructive pulmonary disease molecular subtyping and pathway deviationbased candidate gene identification. Cell J. 2018; 20(3): 326-332.
  9. Huang HH, Liu XY, Liang Y. Feature selection and cancer classification via sparse logistic regression with the hybrid l1/2 +2 regularization. PLoS One. 2016; 11(5): e0149675.
  10. Cui Y, Zheng CH, Yang J, Sha W. Sparse maximum margin discriminant analysis for feature extraction and gene selection on gene expression data. Comput Biol Med. 2013; 43(7): 933-941.
  11. Xie H, Huang J. SCAD-penalized regression in high-dimensional partially linear models. Ann Stat. 2009; 37(2): 673-696.
  12. Hardin M, Silverman EK. Chronic obstructive pulmonary disease genetics: a review of the past and a look into the future. Chronic Obstr Pulm Dis. 2014; 1(1): 33-46.
  13. Bahr TM, Hughes GJ, Armstrong M, Reisdorph R, Coldren CD, Edwards MG, et al. Peripheral blood mononuclear cell gene expression in chronic obstructive pulmonary disease. Am J Respir Cell Mol Biol. 2013; 49(2): 316-323.
  14. Singh D, Fox SM, Tal-Singer R, Plumb J, Bates S, Broad P, et al. Induced sputum genes associated with spirometric and radiological disease severity in COPD ex-smokers. Thorax. 2011; 66(6): 489-495.
  15. Vestbo J, Anderson W, Coxson HO, Crim C, Dawber F, Edwards L, et al. Evaluation of COPD longitudinally to identify predictive surrogate end-points (ECLIPSE). Eur Respir J. 2008; 31(4): 869-873.
  16. Fan J, Peng H. Nonconcave penalized likelihood with a diverging number of parameters. Ann Stat. 2004; 32(3): 928-961.
  17. Zhang CH. Nearly unbiased variable selection under minimax concave penalty. Ann Statist. 2010; 38(2): 894-942.
  18. Rodríguez JD, Pérez A, Lozano JA. Sensitivity analysis of kappafold cross validation in prediction error estimation. IEEE Trans Pattern Anal Mach Intell. 2010; 32(3): 569-575.
  19. Watanabe T, Yamashita S, Ureshino H, Kamachi K, Kurahashi Y, Fukuda-Kurahashi Y, et al. Targeting aberrant DNA hypermethylation as a driver of ATL leukemogenesis by using the new oral demethylating agent OR-2100. Blood. 2020; 136(7): 871-884.
  20. Pentcheva-Hoang T, Corse E, Allison JP. Negative regulators of Tcell activation: potential targets for therapeutic intervention in cancer, autoimmune disease, and persistent infections. Immunol Rev. 2009; 229(1): 67-87.
  21. Li J, Zhao X, Wang D, He W, Zhang S, Cao W, et al. Up-regulated expression of phospholipase C, β1 is associated with tumor cell proliferation and poor prognosis in hepatocellular carcinoma. Onco Targets Ther. 2016; 9: 1697-1706.
  22. Zhang D, Dai J, Pan Y, Wang X, Qiao J, Sasano H, et al. Overexpression of PELP1 in lung adenocarcinoma promoted e2 induced proliferation, migration and invasion of the tumor cells and predicted a worse outcome of the patients. Pathol Oncol Res. 2021; 27: 582443.
  23. Lin YJ, Chang JS, Liu X, Tsang H, Chien WK, Chen JH, et al. Genetic variants in PLCB4/PLCB1 as susceptibility loci for coronary artery aneurysm formation in Kawasaki disease in Han Chinese in Taiwan. Sci Rep. 2015; 5: 14762.
  24. Yu PF, Kang AR, Jing LJ, Wang YM. Long non-coding RNA CACNA1G- AS1 promotes cell migration, invasion and epithelial-mesenchymal transition by HNRNPA2B1 in non-small cell lung cancer. Eur Rev Med Pharmacol Sci. 2018; 22(4): 993-1002.
  25. Li Y, Jiao Y, Li Y, Liu Y. Expression of la ribonucleoprotein domainfamily member 4B (LARP4B) in liver cancer and their clinical and prognostic significance. Dis Markers. 2019; 2019: 1569049.
  26. Liu Y, Wu X, Wang G, Hu S, Zhang Y, Zhao S. CALD1, CNN1, and TAGLN identified as potential prognostic molecular markers of bladder cancer by bioinformatics analysis. Medicine (Baltimore). 2019; 98(2): e13847.
  27. Wang Z, Wang Z, Zhou Z, Ren Y. Crucial genes associated with diabetic nephropathy explored by microarray analysis. BMC Nephrol. 2016; 17(1): 128.
  28. Hsing EW, Shiah SG, Peng HY, Chen YW, Chuu CP, Hsiao JR, et al. TNF-α-induced miR-450a mediates TMEM182 expression to promote oral squamous cell carcinoma motility. PLoS One. 2019; 14(3): e0213463.
  29. Huo Y, Macara IG. The Par3-like polarity protein Par3L is essential for mammary stem cell maintenance. Nat Cell Biol. 2014; 16(6): 529-537.
  30. Słowikowski BK, Gałęcki B, Dyszkiewicz W, Jagodziński PP. Increased expression of proline-, glutamic acid- and leucine-rich protein PELP1 in non-small cell lung cancer. Biomed Pharmacother. 2015; 73: 97-101.
  31. Ciou SC, Chou YT, Liu YL, Nieh YC, Lu JW, Huang SF, et al. Ribose- 5-phosphate isomerase A regulates hepatocarcinogenesis via PP2A and ERK signaling. Int J Cancer. 2015; 137(1): 104-115.
  32. Chou YT, Jiang JK, Yang MH, Lu JW, Lin HK, Wang HD, et al. Identification of a noncanonical function for ribose-5-phosphate isomerase A promotes colorectal cancer formation by stabilizing and activating β-catenin via a novel C-terminal domain. PLoS Biol. 2018; 16(1): e2003714.
  33. Du J, Liu X, Wu Y, Zhu J, Tang Y. Essential role of STX6 in esophageal squamous cell carcinoma growth and migration. Biochem Biophys Res Commun. 2016; 472(1): 60-67.
  34. Hoang TT, Sikdar S, Xu CJ, Lee MK, Cardwell J, Forno E, et al. Epigenome-wide association study of DNA methylation and adult asthma in the Agricultural Lung Health Study. Eur Respir J. 2020; 56(3): 2000217.
  35. Sudo H, Tsuji AB, Sugyo A, Okada M, Kato K, Zhang MR, et al. Direct comparison of 2amino[311C]isobutyric acid and 2amino[11C] methylisobutyric acid uptake in eight lung cancer xenograft models. Int J Oncol. 2018; 53(6): 2737-2744.
  36. Sheridan JT, Gilmore RC, Watson MJ, Archer CB, Tarran R. 17β-Estradiol inhibits phosphorylation of stromal interaction molecule 1 (STIM1) protein: implication for store-operated calcium entry and chronic lung diseases. J Biol Chem. 2013; 288(47): 33509- 33518.
  37. Deng F, Dong H, Zou M, Zhao H, Cai C, Cai S. Polarization of neutrophils from patients with asthma, chronic obstructive pulmonary disease and asthma-chronic obstructive pulmonary disease overlap syndrome. Zhonghua Yi Xue Za Zhi. 2014; 94(48): 3796-3800.
  38. Lee CM, Cho SJ, Cho WK, Park JW, Lee JH, Choi AM, et al. Laminin α1 is a genetic modifier of TGF-β1-stimulated pulmonary fibrosis. JCI Insight. 2018; 3(18): e99574.