Please use this identifier to cite or link to this item: http://prr.hec.gov.pk/jspui/handle/123456789/10235
Title: Imputation of missing values in the In/Out procedure of Random Forest
Authors: Ali, Amjad
Keywords: Statistics
Issue Date: 2019
Publisher: University of Peshawar, Peshawar.
Abstract: The performance of a classifier can affect to a great extent by the presence of missing values in a dataset. In literature, several methods have been proposed to treat missing data and the one used more frequently is deleting instances containing at least one missing value of a feature. In this part of the study we compare the three methods for dealing with missing values to evaluate the effect of misclassification error rate on the non-parametric classifier, the case deletion method, the simple random imputation and the modified random imputation procedure. The classifiers considered were the conventional random forest and the In/Out procedure of the random forest. The missing data problem is common and often unavoidable especially when dealing with large data sets from several real-world sources. Many new computationally tools have been developed to tackle missing data problems. In some cases, the sought after missing data processes engage temporary removal or surrogate of missing data. Existing methods have been successfully applied to well-defined parametric models, however, the usefulness of these models has yet to establish for tree-based models. The problem of missing value, out-of-bag error and misclassification rates in imbalanced data are difficult to deal in Random Forest technique. In this study, a new imputation method has been proposed for In/Out procedure of Random Forest. The proposed method does not depend on the missing data mechanisms which is the principal advantages of this method. This rectifies disadvantages of all other imputation methods its performance has been evaluated and compared with non-missing data sets. It is concluded that new proposed method reduced the Out-Of-Bag error in case of missing values using different Random Forest procedure.
Gov't Doc #: 18166
URI: http://prr.hec.gov.pk/jspui/handle/123456789/10235
Appears in Collections:PhD Thesis of All Public / Private Sector Universities / DAIs.

Files in This Item:
File Description SizeFormat 
Amjad Ali_UoP_2019.pdf1.46 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.