Please use this identifier to cite or link to this item:
Title: Lung Cancer Classification with Discriminant Features of Mutated Genes using Machine Learning
Authors: Sattar, Mohsin
Keywords: Computer Science
Issue Date: 2019
Publisher: Pakistan Institute of Engineering & Applied Sciences, Islamabad.
Abstract: Machine learning based mathematical and statistical models are employed for the development of improved classification systems. These decision based systems have the capability of automatically learning from complex sequential data. In this work, machine learning models are developed for the classification of lung cancer. The early classification of lung cancer is critical for successful cancer treatment. Genes and proteins are important in the normal functioning of the human body. The abnormal processes due to somatic mutations transform normal cells into cancer cells. The somatic mutations in genes are ultimately reflected in gene expression and proteins amino acid sequences. Influential information is extracted during the statistical analysis of gene expression and proteins amino acid sequences data. This information is transformed into discriminant feature spaces using physiochemical properties. The machine learning capability is exploited effectively using discriminant information of mutated genes in proteomic and genomic data. This study aims to develop artificial intelligent lung cancer classification systems. The development was carried out in three main phases. In the first phase, lung cancer classification system using protein amino acid sequences is developed by employing various individual learning algorithms. In the second phase, lung cancer classification system using protein amino acid sequences is developed by employing multi-gene genetic programming. This approach exploits evolutionary learning capability by optimally combining the selected discriminant features with primitive functions. The third phase is focussed on the development of improved lung cancer classification system using influential features of gene expression with the imbalanced dataset by employing rotation forest. In the thesis work, extensive experiments are conducted to evaluate the performance of various lung cancer classification systems. The proposed systems have obtained excellent accuracy values in the range of 95%99%. The comparative analysis highlights that proposed lung cancer classification systems are better than previous approaches. It is expected that research outcome would impact in the fields of diagnosis, prevention, and effective treatment of lung cancer.
Gov't Doc #: 18616
Appears in Collections:PhD Thesis of All Public / Private Sector Universities / DAIs.

Files in This Item:
File Description SizeFormat 
Mohsin Sattar CS year 2019 03-7P1-002-2014.pdf4.29 MBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.