Please use this identifier to cite or link to this item:
Title: Study of Missing Values Different Imputation Methods
Authors: Sohail, Muhammad Umair
Keywords: Physical Sciences
Issue Date: 2020
Publisher: Quaid-i-Azam University, Islamabad.
Abstract: In this research, we suggest different families of estimators for the imputation of missing complete at random (MCAR) values for the estimation of finite population mean Y¯ and total ( PY ) of the study variable (y) under different sampling schemes. The known population parameters of the auxiliary variable such as: mean (X¯), ranks R(x) , truncated mean (V¯ ), variance (S 2 x ), coefficient of variation (Cx), coefficient of kurtosis β2(x) and correlation coefficient (ρyx) have been used to estimate the parameters of interest more precisely. Mathematical expressions for bias and mean square errors of resultant estimators are obtained up to first order of approximation. For the comparative comparison, theoretical conditions are also defined by which the proposed imputation procedures performs better than their counterparts. For the numerical comparison of the proposed procedures with existing ones is performed using Monte Carlo experiment by generating hypothetical population and real life data sets by repeating the process N n times at varying response rate. A generalized families of ratio and difference type estimators are proposed for handling the non-response bias by utilizing the ranks of the auxiliary variable under SRS. On the basis of numerical results, we easily understand that, our proposed families of estimators can perform much better as compared to traditional ones. A modified class of ratio type estimators is suggested for imputing the MCAR values by using the higher order moments of the auxiliary variable. The suggested imputation procedure for estimating finite population mean dominates over the other competitor estimators at varying response rate. The idea of truncating the auxiliary information is also reported in this research for estimating the missing values in a significant way. A class of ratio-exponential type estimators is proposed to impute the missing value by using the truncated auxiliary variable (v) and the Abstract viii ranks of the auxiliary variables r(x) . The proposed class of estimators is perform better as compared to their counterpart. We support our results through two real life data sets at the response rate between 20% to 80%. Under the comparative measures, when the observation units are varying in size and we have no auxiliary information in hand. A mixture of two phase and pps sampling scheme is proposed by combine their features. A class of estimators is reformulated with four possible situations of non-response in the study variable or/and the auxiliary variable under proposed sampling scheme. A comprehensive numerical comparison in terms of PRE is additionally consider at varying response rate for each of the pre-characterized circumstances. The imputation of non-response in RSS is considered by modifying existing family of estimators. Theoretical results for bias and mean square errors are also reported up to first order of approximation. The relative comparison between different proposed and existing imputation strategies is defined through the simulation and real life data sets. The simulation study of the modified procedures is carried by generating the random number from the two hypothetical population, (i) a population is generated from normal distribution with mean (µ) and variance (σ 2 ) and (ii) the other population is generate from the exponential distribution with mean (λ) at distinct parametric values. The suggested imputation mechanism perform better as compared to conventional mechanisms. In last few decades, the utilization of raw moment for the estimation of finite population parameters has got substantial attention in the field of survey sampling. We define an imputation strategy by using the raw moments of the auxiliary variable for filling the item non-response. A ratio-exponential type family of estimators is defined with the significant use of the auxiliary information. To support the relative performance, four different real life populations are studied along with simulated data sets by generating the random population from: (i) bivariate normal distribution with means (µx and µy) and variances σ 2 x and σ 2 y and (ii) the second population is generated from gamma distribution with parameters (a and b). The small area predictive estimation of population total is also illustrated for the case of known and unknown domain/area membership (Di). Three population models says: (i) Homogeneous population model (HPM), (ii) Ratio populati Abstract ix model (RPM), and (iii) Linear population model (LPM) are used to defined the different estimators for the imputation of non-response. For the case of known and unknown area membership, two distinct real life data sets are used for the predictive estimation population total ( Py). The ranks of the auxiliary variable (w) are utilized productively to define the new population model for handling the non-response bias. From the reported results, we easily understand that new imputation population model are more efficient as compared to traditional model.
Gov't Doc #: 20747
Appears in Collections:PhD Thesis of All Public / Private Sector Universities / DAIs.

Files in This Item:
File Description SizeFormat 
Muhammad Umair Sohail 2020 qau isb.pdfPhd.Thesis2.18 MBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.