Please use this identifier to cite or link to this item:
Title: Performance-aware Cloud Resource Management for Microservices-based Applications
Authors: Abdullah, Muhammad
Keywords: Computer Science
Computer & IT
Issue Date: 2020
Publisher: University of the Punjab , Lahore
Abstract: Recent advancements in web application design introduced a microservices architecture which is gaining traction to build modern applications. A typical microservice application consists of mul tiple fine-grained small independent web services that can interact together to serve incoming user requests. However, the performance of typical microservices-based applications is significantly lower than traditional monolithic implementations. Nowadays, large monolithic web applications are manually decomposed into microservices for many reasons including adopting a modern archi tecture to ease maintenance and increase reusability. However, the existing approaches to refactor a monolithic application do not inherently consider the application scalability and performance. Cloud computing offers on-demand resource provisioning to scale the application resources for main taining application performance automatically. The intrinsic design principals of microservices can leverage the benefits of this cloud computing feature to scale the microservices resources horizontally to offer better performance. Autoscaling methods are important to ensure response time guaran tees for cloud-hosted microservices. Most of the existing state-of-the-art autoscaling methods use rule-based reactive policies with static thresholds defined either on monitored resource consumption metrics such as CPU and memory utilization or application-level metrics such as the response time. However, it is challenging to determine the most appropriate threshold values to minimize resource consumption and performance violations. Whereas, predictive autoscaling methods can help to address these challenges. These methods require considerable time to collect sufficient performance traces representing different resource provisioning possibilities for a target infrastructure to train a useful predictive autoscaling model. Containers provide a lightweight runtime environment for microservices applications while enabling better server utilization. Different workloads can also benefit from containerization to boost performance and reduce hosting costs. However, it remains challenging to dynamically allocate CPU cores to containers hosting ML workloads to minimize the job completion time while maximizing the number of concurrently running jobs. In this dissertation, we address the above-discussed challenges. First, we propose a novel method to vii automatically decompose a monolithic application into microservices to improve application scala bility and performance. Our proposed decomposition method is based on a black-box approach that uses the application access logs and an unsupervised machine-learning method to auto-decompose the application into microservices mapped to URL partitions having similar performance and re source requirements. In particular, we propose a complete automated system to decompose an application into microservices, deploy the microservices using appropriate resources, and autoscale the microservices to maintain the desired response time. We evaluate the proposed system using real web applications on a public cloud infrastructure. The experimental evaluation shows im proved performance of the auto-created microservices compared with the monolithic version of the application and the manually created microservices. Second, We propose a system which models the response time of microservices through stress testing and then uses a trace-driven simulation to learn a predictive autoscaling model for satisfying response time requirements automatically. The proposed solution reduces the need for collecting performance traces to learn a predictive au toscaling model. Our experimental evaluation on AWS cloud using a microservice under realistic dynamic workloads validates the proposed solution. The validation results also show the excel lent performance to satisfy the response time requirement with only 4.5% extra cost for using the proposed autoscaling method compared to the reactive autoscaling method. Third, we propose a novel predictive autoscaling method for microservices running on Fog MDC (Micro Data Centers) with containerized infrastructure to satisfy the application response time service-level objectives (SLO). Initially, our proposed approach uses a reactive rule-based autoscal ing method to gather the training dataset for building the predictive autoscaling model. The proposed approach is efficient as it can learn the predictive autoscaling model using an increasing synthetic workload. The learned predictive autoscaling model is used to manage the application resources serving different realistic workloads effectively. The proposed approach is capable of scale in and scale-out the number of containers allocated to the microservices running at MDC to satisfy the desired response time SLO. Our experimental evaluation using two synthetic and three realistic workloads for two benchmark microservice applications on a real MDC shows excellent performance viii compared to the existing state-of-the-art baseline rule-based autoscaling method. Forth, we propose a novel burst-aware autoscaling method which detects burst in dynamic workloads using workload forecasting, resource prediction, and scaling decision while minimizing response time SLO viola tions. Our approach is evaluated through a trace-driven simulation, using multiple synthetic and realistic bursty workloads for containerized microservices, improving performance when comparing against existing state-of-the-art autoscaling methods. Such experiments show an increase of ×1.09 in total processed requests, and a reduction of ×5.17 for SLO violations, and an increase of ×0.767 cost as compared to the baseline method. Finally, we introduce a new machine learning-based approach for dynamically allocated CPU cores to containers hosting machine learning (ML) workloads to minimize the job completion time. First, we explore different CPU allocation configurations for the incoming jobs and collect the performance traces. Then, we train a deep neural network to predict the jobs execution time for different CPU allocation configurations. The latter is used to adaptively allocate CPU cores to minimize the overall jobs completion time while maximizing the number of concurrent jobs. Our approach exploits the observed law of diminishing marginal returns as the incremental performance gain of adding more CPU cores will only advance to a certain point. After which, allocating additional cores will not result in any meaningful performance improvement (i.e., the higher the allocated CPU cores, the smaller the effect on the job execution time). Our approach uses the law of diminishing marginal returns to determine the optimum number of CPU cores to allocate to a container for maximum performance gain. Thus, minimizing over-provisioning CPU cores to containers to allow more concurrent jobs execution. The proposed system is evaluated using real ML workloads on a Docker-based containerized infrastructure using different state-of-the-art placement algorithms. The results demonstrate the effectiveness of the proposed solution in reducing the jobs completion time by 23% to 74% compared to commonly used static CPU allocation methods.
Gov't Doc #: 25782
Appears in Collections:PhD Thesis of All Public / Private Sector Universities / DAIs.

Files in This Item:
File Description SizeFormat 
Muhammad Abdullah Computer Science 2020 uop lhr.pdf 12.5.22.pdfphd.Thesis5.46 MBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.