Please use this identifier to cite or link to this item:
Title: Machine Learning Based Protein Function Annotation
Authors: Gull, Sadaf
Keywords: Computer Science
Computer & IT
Issue Date: 2020
Publisher: Pakistan Institute of Engineering & Applied Sciences, Islamabad.
Abstract: Proteins are the most important class of biomolecules in living organisms as they perform a wide variety of biological functions: help in digestion of food, energy production, growth of tissues and bones, forming antibodies etc. The function of a protein is determined by its structure and sequence. Biologists are interested in understanding protein function. Therefore, predicting function of a given protein is an important research problem. Many machine learning models have been developed for protein function annotation, but they are not broad and have low prediction accuracy. Novel ideas and approaches are required for accurate prediction of functional annotations especially proteins involved in antimicrobial therapeutics which is the primary focus of this dissertation. Due to increasing antibiotic resistance, it has become necessary to identify antimicrobial peptides (AMPs). We have attempted to overcome the problems associated with existing AMP predictors by developing a multi-label Antimicrobial Activity predictor (AMAP) which can simultaneously predict whether a peptide sequence is an AMP or not, type of its biological activity and the effect of mutations on its activity. We have performed a stringent performance evaluation and comparison with existing methods by considering sequence similarity in training and test folds in cross-validation. The webserver of proposed method is also available. We have also developed a targeted AMP activity predictor called AMP0 that can predict whether a peptide is effective against a given microbial species or not. The proposed predictor takes in the amino acid sequence of the peptide and the genomic sequence of a target microbial species to generate targeted predictions. The proposed method can generate predictions for species that are not part of its training set. Webserver of the proposed methodology is also available. We have also developed a webserver for Multiple Instance Learning based AMyloid Proteins (MILAMP) which is a machine learning based method that can simultaneously predict amyloid proteins, their hotspot regions and the effects of point mutations in such proteins. The webserver inputs a single protein sequence and optional mutation information in that sequence. We have evaluated performance of MILAMP on some amyloid proteins which are not included in training the model.
Gov't Doc #: 22040
Appears in Collections:PhD Thesis of All Public / Private Sector Universities / DAIs.

Files in This Item:
File Description SizeFormat 
Sadaf gull CS 2020 pieas isb.pdfphd.Thesis4.01 MBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.