Please use this identifier to cite or link to this item:
Title: Robust Algorithm for Genome Sequence Short Read Error Correction using Hadoop-MapReduce
Authors: Tahir, Muhammad
Keywords: Computer science, information & general works
Computer science
Issue Date: 2016
Publisher: Iqra University Islamabad Campus
Abstract: Biological sequences consist of A C G and T in a DNA structure and contain vital information of living organisms. The development of computing technologies, especially NGS technologies have increased genomic data at a rapid rate. The increase in genomic data presents significant research challenges in bioinformatics, such as sequence alignment, short-reads error correction, phylogenetic inference, etc. Next-generation high-throughput sequencing technologies have opened new and thought-provoking research opportunities. In particular, Next-generation sequencers produce a massive amount of short-reads data in a single run. However, these large amounts of short-reads data produced are highly susceptible to errors, as compared to shotgun sequencing. Therefore, there is a peremptory demand to design fast and more accurate statistical and computational tools to analyze these data. This research presents a novel and robust algorithm called HaShRECA for genome sequence short reads error correction. The developed algorithm is based on a probabilistic model that analyzes the potential errors in reads and utilizes the Hadoop-MapReduce framework to speed up the computation processes. Experimental results show that HaShRECA is more accurate, as well as time and space efficient as compared to previous algorithms.
Appears in Collections:PhD Thesis of All Public / Private Sector Universities / DAIs.

Files in This Item:
File Description SizeFormat 
Muhammad_Tahir_Computer_Science_2016_Iqra_Univ_10.05.2016.pdfComplete Thesis1.43 MBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.