1. Release Notes
    1. Release Notes - 2.0.2Latest
    1. Release Notes - 2.0.1
    1. Release Notes - 2.0.0
  1. Introduction
    1. Introduction
    1. Features
    1. Architecture
    1. Advantages
    1. Glossary
  1. Installation
    1. Intruction
      1. Intro
      2. Port Requirements
    1. Install on Linux
      1. All-in-One Installation
      2. Multi-Node Installation
      3. Installing HA Master and Etcd Cluster
      4. Storage Configuration Instruction
    1. Install on Kubernetes
      1. Prerequisites
      2. Online Installation
      3. Offline Installation
    1. Related Tools
      1. Integrating Harbor Registry
    1. Cluster Operation
      1. Adding New Nodes
      2. High Risk Operation
      3. Uninstalling KubeSphere
  1. Quick Start
    1. Getting Started with Multitenancy
    1. Exposing your APP using Ingress
    1. Deploying a MySQL Application
    1. Deploying a Wordpress Website
    1. Job to compute π to 2000 places
    1. Deploying Grafana using APP Template
    1. Creating Horizontal Pod Autoscaler
    1. S2i: Publish your app without Dockerfile
    1. Canary Release of Microservice APP
    1. CI/CD based on Spring Boot Project
    1. Building a Pipeline in a Graphical Panel
    1. CI/CD based on GitLab and Harbor
    1. Ingress-Nginx for Grayscale Release
  1. Cluster Admin Guide
    1. Multi-tenant Management
      1. Overview of Multi-tenant Management
      2. Overview of Role Management
    1. Platform Management
      1. Account Management
      2. Platform Roles Management
    1. Infrastructure
      1. Service Components
      2. Nodes
      3. Storage Classes
    1. Monitoring Center
      1. Physical Resources
      2. Application Resources
    1. Application Repository
    1. Jenkins System Settings
  1. User Guide
    1. Application Template
    1. Workloads
      1. Deployments
      2. StatefulSets
      3. DaemonSets
      4. Jobs
      5. CronJobs
    1. Storage
      1. Volumes
    1. Network & Services
      1. Services
      2. Routes
    1. Configuration Center
      1. Secret
      2. ConfigMap
      3. Image Registry
    1. Project Settings
      1. Basic Information
      2. Member Roles
      3. Project Members
      4. Internet Access
    1. DevOps Project
      1. DevOps Project Management
      2. DevOps Project Management
      3. DevOps Project Management
      4. DevOps Project Management
      5. DevOps Project Management
  1. Development Guide
    1. Preparing the Development Environment
    1. Development Workflow
  1. API Documentation
    1. API Guide
    1. How to invoke KubeSphere API
KubeSphere®️ 2020 All Rights Reserved.

Creating Horizontal Pod Autoscaler for Deployment

The Horizontal Pod Autoscaler automatically scales the number of pods in a deployment based on observed CPU utilization or Memory usage. The controller periodically adjusts the number of replicas in a deployment to match the observed average CPU utilization to the target value specified by user.

How does the HPA work

The Horizontal Pod Autoscaler is implemented as a control loop, with a period controlled by the controller manager’s HPA sync-period flag (with a default value of 30 seconds). For per-pod resource metrics (like CPU), the controller fetches the metrics from the resource metrics API for each pod targeted by the Horizontal Pod Autoscaler. See Horizontal Pod Autoscaler for more details.

After creating HPA in the deployment, Controller Manager can access to Metrics-server. And it can obtain the utilization ratio or the original value's (which is determined by the target type) average value of each container cluster in the customized resources. Then, the Controller Manager will compare the average value with the HPA's setting index. At the same time, the Controller Manager can calculate the specific value of Pod's elastic expansion in the deployment. For the Pod's CPU and its storage resources, it should be considered the limits and requests. When scheduling, Kube-scheduler will calculate according to the requests. Thus, if Pod has not been set the requst, the flexible expansion will not work.


This document walks you through an example of configuring Horizontal Pod Autoscaler for the hpa-example deployment.

We will create a deployment to send an infinite loop of queries to the hpa example application, demonstrating its autoscaling function and the HPA Principle.

Estimate Time

About 25 minutes.


Hands-on Lab

Step 1: Create a Deployment

1.1. Enter into demo-project, then select Workload → Deployments and click Create Deployment button.

1.2. Fill in the basic information in the pop-up window. e.g. Name: hpa-example, then click Next when you've done.

Step 2: Configure the HPA

2.1. Choose Horizontal Pod Autoscaling, and fill in the table as following:

  • Min Replicas Number: 2
  • Max Replicas Number: 10
  • CPU Request Target(%): 50 (represents the percent of target CPU utilization)

Then click on the Add Container button.

2.2. Fill in the Pod Template with following values, then click Save to save these settings.

  • Image: mirrorgooglecontainers/hpa-example

  • Service Settings

    • Name: port
    • port: 80 (TCP protocol by default)

2.3. Skip the Volume and Label Settings, click the Create button directly. Now the hpa-example deployment has been created successfully.

Step 3: Create a Service

3.1. Choose Network & Services → Services on the left menu, then click on the Create Service button.

3.2. Fill in the basic information, e.g. name : hpa-example, then click Next.

3.3. Choose the first item Virtual IP: Access the service through the internal IP of the cluster for the service Settings.

3.4. In Selector blanks, click Specify Workload and select the hpa-example as the backend workload. Then choose Save and fill in the Ports blanks.

  • Ports:

    • Name: port
    • Protocol: TCP
    • Port: 80
    • Target port: 80

Click Next → Create to complete the creation. Now the hpa-example service has been created successfully.

Step 4: Create Load-generator

4.1. In the current project, redirect to Workload → Deployments. Click Create button and fill in the basic information in the pop-up window, e.g. Name : load-generator. Click Next when you've done.

4.2. Click on Add Container button, and fill in the Pod template as following:

  • Image: busybox
  • Scroll down to Start command, add commands and parameters as following:
# Commands

# Parameters (Note: the http service address like http://{$service name}.{$project name}.svc.cluster.local)
while true; do wget -q -O- http://hpa-example.demo-project.svc.cluster.local; done

Click on the Save button when you've done, then click Next.

4.3. Click Next → Create to complete creation.

So far, we've created 2 deployments (i.e. hpa-example and load-generator) and 1 service (i.e. hpa-example).

Step 5: Verify the HPA

5.1. Click into hpa-example and inspect the changes, please pay attention to the HPA status and the CPU utilization, as well as the Pods monitoring graphs.

Step 6: Verify the Auto Scaling

6.1. When all of the load-generator pods are successfully created and begin to access the hpe-example service, as shown in the following figure, the CPU utilization is significantly increased after refreshing the page, currently rising to 722%, and the desired replicas and current replicas is rising to 10/10.

Note: Since the Horizontal Pod Autoscaler is working right now, the load-generator looply requests the hpa-example service to make the CPU utilization rised rapidly. After the HPA starts working, it makes the backend of the service increases fast to handle a large number of requests together. Also the replicas of hpa-example continues to increase follow with the CPU utilization increases, which demonstrates the working principle of HPA.

6.2. In the monitoring graph, it can be seen that the CPU usage of the first Pod that we originally created, showing a significant upward trend. When HPA started working, the CPU usage has a significant decreased trend, finally it tends to be smooth. Accordingly, the CPU usage is increasing on the newly created Pods.

Step 7: Stop the Load Generation

7.1. Redirect to Workload → Deployments and delete load-generator to cease the load increasing.

7.2. Inspect the status of the hpa-example again, you'll find that its current CPU utilization has slowly dropped to 10% in a few minutes, eventually the HPA has reduced its deployment replicas to 1 (initial value). The trend reflected by the monitoring curve can also help us to further understand the working principle of HPA;

7.3. It enables user to inspect the monitoring graph of Deloyment, see the CPU utilization and Network inbound/outbound trend, they just match with the HPA example.

Modify HPA Settings

If you need to modify the settings of the HPA, you can click into the deployment, and click More → Horizontal Pod Autoscaler.

Cancel HPA

Click ··· button on the right and Cancel if you don't need HPA within this deployment.