Please use this identifier to cite or link to this item:
http://prr.hec.gov.pk/jspui/handle/123456789/17733
Title: | A Comparative Study of Biased, Robust and Biased-Robust Regression and Some Improved Diagnostic and Estimation Strategies |
Authors: | Ilyas, Muhammad |
Keywords: | Physical Sciences Statistics |
Issue Date: | 2021 |
Publisher: | University of Peshawar, Peshawar. |
Abstract: | In this thesis, efforts have been made to evaluate the performance of various regression methods under numerous circumstances. First, we explore different regression methods including ordinary least square (OLS), and some robust regression procedures like; M-regression, Least trimmed square (LTS), least median of squared (LMS), MM-estimation and S-estimation under different outliers scenarios and varying levels of collinearity simultaneously. For the sack of comparison different performance measures, total absolute deviation (TAB), total mean square error (TMSE) and boxplot of absolute bias, along with the graphs of total absolute deviation (TAB), and total mean square error (TMSE) are used. A full discussion over performance of these methods in each scenario is provided. In case the error distribution is standard normal, In regard to performance of methods (OLS, M, LMS, LTS, MM, S) almost similar pattern was seen for all values of P =2,4,6 and sample size n=50,100,150,200. The results clearly shows that for lower levels of collinearity the (TMSE) values for all methods are not that much different, but at higher levels of collinearity the (TMSE) for all methods are quite different. It is also evident that LTS, LMS are the poor of all, the next is S while OLS, M and MM methods perform reasonably well even at higher level of collinearity. With a high collinearity and 10% outliers OLS is the worse, LTS, LMS and S are the next, M and MM perform reasonably well with MM the best of all. A 20% outlier simultaneously with a high collinearity level, declare MM the best, M the second best, while LTS, LMS, and S behaving very similar are poor while OLS the worse of all. In a scenario of 30% outliers simultaneously with high collinearity levels MM the best, (LTS,LMS, and S) the second set of best, while OLS and M-estimation perfom poor, M-Estimation being the worse of all. But 40% outliers when simultaneously taken with a high level of viii multicollinearity, it ranks these methods differently, resulting in MM method which is the best one in all cases to be the worse of all, secodly M and OLS have high values for (TMSE) and the remaining three (LTS,LMS,S) resulting in low values for (TMSE). Secondly in this research various WLAD-lasso versions based on weights through different robust distances derived from different robust location and scatter estimators (like Minimum covariance determinant (mcd) estimator, Minimum Volume ellipsoid (mve) estimator, Orthogonal Ganadesikan (OGK) estimator, constrained M-estimator of location and scatter (covMest), MM-estimate of multivariate location and scatter (covMMest) and S-estimate of multivariate location and scatter (covSest ) are compared. A new WLAD-lasso method, based on a new weighting scheme, where the weights are derived from predictor space independently of the covariance matrix is proposed.Theresultsofthenewmethodarecomparedwiththeexistingone‘susing percentages of correctly estimated coefficients, correctly classified zeros, false positive rate (FPR), false negative rate (FNR) and normalized mean squared error (NMSE) through various simulation settings. for sample size n=20, at lower level of contamination i-e = 0.1, the results of all performance measures for all methods are almost similar and no single method is clearly dominant. For a little increased level of contamination i-e = 0.2, the proposed method and the one based on Orthogonal Ganadesikan estimator (OGK), outperform all other methods. For = 0.3, the proposed method clearly give dominant results possessing a high percentage of correctly estimated coefficients, high percentage of correctly classified zeros, a low percentage of incorrectly classified zeros, a small false positive rate and average normalized mean squared error (ANMSE). For = 0.4, the proposed method more overwhelmingly dominate the remaining methods in terms of a high percentage of ix correctly estimated coefficients, high percentage of correctly classified zeros, a low percentage of incorrectly classified zeros, a small false positive rate (FPR) and average normalized mean squared error (ANMSE). Finally an improved lasso IRW lasso is derived and its performance is compared with the usual lasso under different contamination rates. Overall objectives of the thesis is a motivation to compare various robust regression methods performances under various circumstances and to adopt strategies to cope with simultaneous violations of classical assumptions in the data. The brief overall summary involve an investigation of various robust regression methods under outliers and varying levels of multicollinearity. It is concluded that the performance of robust regression methods is highly affected by a simultaneous interruption of high multicollinearity. It is highly recommended first to inspect the underlying assumptions of modelling tools to avoid misleading results. |
Gov't Doc #: | 23853 |
URI: | http://prr.hec.gov.pk/jspui/handle/123456789/17733 |
Appears in Collections: | PhD Thesis of All Public / Private Sector Universities / DAIs. |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Muhammad Ilyas statistics 2021 uop peshwar.pdf | phd.Thesis | 2.88 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.