
Please use this identifier to cite or link to this item:
http://prr.hec.gov.pk/jspui/handle/123456789/18902
Title: | Machine Learning Based Prediction of Multi Label Subcellular Protein Localization of Prokaryotic cells |
Authors: | Javed, Faisal |
Keywords: | Computer Science Computer & IT |
Issue Date: | 2021 |
Publisher: | Abdul Wali Khan University, Mardan |
Abstract: | Subcellular protein localization is a key area in modern medicine and research since a living cell contain numerous proteins that are separated by their particular location inside the cell body as well as the functions they perform. Any particular protein if gets moved from its location or show some changes in its size, may serve as a clue of anomaly that needs to be considered, studied and targeted for any potential disease/abnormality. The process of protein localization was primarily handled manually by expert personnel using specialized equipment and chemicals in dedicated labs. Such a procedure is not only expensive but is also time consuming and becomes impractical for modern explosively grown genome projects. As such, machine learning assisted protein classification models can serve as better alternative since once properly trained; these models can yield promising results in significantly less time. These models are highly cost effective and barely need the services of expert biologists. One of the key subcellular protein is Golgi apparatus which is responsible for processing and bundling macromolecules including proteins and lipids while being synthesized in the cell. Working as a cell’s post office, it operates by sorting, modifying and packaging proteins which are to be secreted. Plenty of research is carried out on the importance of Golgi protein and its contribution in different complex diseases is reported in literature that indicates its importance. Structurally, the Golgi protein is made up of flattened sac like structure known as cisternae which is divided into three main sections, the cis-Golgi which takes in the macromolecules, the medial which processes it and the trans-Golgi which emit the processed proteins and lipids to their designated areas either inside cell or to cell membrane. x Considering the importance of Golgi protein, subGolgi protein classification by using machine assisted model is a challenging task which is handled by different researchers using various techniques. However, the reported performance measures indicate that further research needs to be carried out for the development of a precise model that could be reliable, robust and consistent in its working. The current study is focused towards the development of one such prediction system which takes multiple feature extraction techniques into account for better identification of discriminating sub-sequence portions that could lead to effective learning of the classifier. Along with the feature extraction techniques, a number of feature selection mechanisms are also applied to the collected feature set in order to reduce the size of the working vector which not only results in fast processing but also trains the system in a better way. The results of the current study are computed and evaluated vigorously using different industry acceptable statistical measures where the tests are carried out using all the known test beds. The outcomes of this model encourages the industry for its adoption in the modern equipment as well as the future researchers to utilize it in similar areas. Key Words: Golgi, kNN, PSSM, PseAAC, Split-PseAAC, Dipeptide Composition, SMOTE |
Gov't Doc #: | 22370 |
URI: | http://prr.hec.gov.pk/jspui/handle/123456789/18902 |
Appears in Collections: | PhD Thesis of All Public / Private Sector Universities / DAIs. |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Faisal javed 2020 CS awk mardan.pdf | phd.Thesis | 1.86 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.