Chronic Obstructive Pulmonary Disease Molecular Subtyping and Pathway Deviation-Based Candidate Gene Identification

Document Type : Original Article


1 Department of Respiratory Medicine, The Affiliated Hospital of Qingdao University, Qingdao, China

2 Department of Respiratory Medicine, People’s Hospital of RizhaoLanshan, Rizhao, China

3 Department of Pharmacy, Qilu Hospital of Shandong University (Qingdao), Qingdao, China

4 4Department of President’s Office, The Affiliated Hospital of Qingdao University, Qingdao, China


The aim of this study was to identify the molecular subtypes of chronic obstructive pulmonary disease (COPD) and to prioritize COPD candidate genes using bioinformatics methods.
Materials and Methods
In this bioinformatics study, the gene expression dataset GSE76705 (including 229 COPD samples) and known COPD-related genes (candidate genes) were downloaded from the Gene Expression Omnibus (GEO) and the Online Mendelian Inheritance in Man (OMIM) databases respectively. Based on the expression values of the candidate genes, COPD samples were divided into molecular subtypes through hierarchical clustering analysis. Candidate genes were accordingly allocated into the defined molecular subtypes and functional enrichment analysis was undertaken. Pathway deviation scores were then analyzed, followed by the analysis of clinical indicators (FEV1, FEV1/FVC, age and gender) of COPD patients in each subtype, and prediction models were constructed. Furthermore, the gene expression dataset GSE71220 was used to bioinformatically validate our results.
A total of 213 COPD-related genes were identified, which divided samples into three subtypes based on the gene expression values. After intersection analysis, 160 common genes including transforming growth factor β1 (TGFB1), epidermal growth factor receptor (EGFR) and interleukin 13 (IL13) were obtained. Functional enrichment analysis identified 22 pathways such as ‘hsa04060: cytokine-cytokine receptor interaction pathways, ‘hsa04110: cell cycle’ and ‘hsa05222: small cell lung cancer’. Pathways in subtype 2 had higher deviation scores. Furthermore, three receiver operating characteristic (ROC) curves (accuracies >80%) were constructed. The three subtypes in COPD samples were also identified in the validation dataset GSE71220.
COPD may be further subdivided into several molecular subtypes, which may be useful in improving COPD therapy based on the molecular subtype of a patient.