Please use this identifier to cite or link to this item:
Title: Efficient Predictive Modeling in Modern Interpolation Regime by the Smoothing Effect Through Balancing Normal and Complex Signals
Authors: Aman, Fazal
Keywords: Physical Sciences
Computer Sciences
Issue Date: 2022
Publisher: University of Peshawar, Peshawar
Abstract: In the last few years, machine learning has gained unprecedented popularity and a remarkable expansion in research. Predictive techniques of machine learning are used to build models that can predict future outcomes. The dataset is divided into two sets: the training set, which is used to build a model, and the test set, to check the model's predictive accuracy. In machine learning, while training the predictive model, the dataset sometimes contains data points with either very small or very large values in the prob ability distribution. These data points are called extreme values or outliers, which greatly influence the predictive models. There has been much debate in the literature regarding what to do with the extreme or influential data points. The statisticians in classical approach of machine learning commonly handle the outliers using omission or winzorizations. In outliers’ omission method, the extreme values are removed from the dataset, whereas winsorization manipulates the extreme values to the nearest normal values. Recently, the computer scientists handled the extreme values through extensive training of the machine learning models. In modern machine learning practices, the training data are completely fit to reach the interpolation point, where bias error reaches near zero and has high variance error. The models are further trained on training data even after the interpolation point to minimize the impact of extreme values on prediction accu racy. The experimental results have shown that further training the models after the interpolation point on the training dataset has reduced their variance errors. The problem in the classical approach of the machine learning is that it ignores the extreme values from the dataset, which undermines the model's predictive power. On the other hand, modern machine learning (MML) practices ignore the impact on the accuracy and efficiency of the models by training the models on all complex signals (outliers). A novel pre-processing technique is introduced in this research to address the above stated issues. The proposed technique performs a trade-off analysis of complex signals and suggests an optimal point for the dataset. The experimental results show that the models trained on the resultant dataset are more robust and efficient as compared to the classical and MML approaches. This dissertation proposes a trade-off method for outliers by employing Tukey’s sche matic boxplot to identify the extreme values in the dataset. The trade-off analysis on complex signals is performed by shifting the inner and outer fences of the boxplot to find an optimal point where maximum complex signals existing in the data are included with minimum impact on the model’s performance. The experimental results show that the proposed Complex Signal Balancer (CSB) approach outperformed the MML approach in accuracy and efficiency. Similarly, the CSB approach outperformed the clas sical approach in terms of the models' predictive capability and information loss.
Gov't Doc #: 27197
Appears in Collections:PhD Thesis of All Public / Private Sector Universities / DAIs.

Files in This Item:
File Description SizeFormat 
Fazal Aman Computer Science 2022 uop peshawar.pdf 11.10.22.pdfPh.D thesis1.91 MBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.