Setting up a Horizontal Pod Autoscaler for Kubernetes cluster

In a Docker-based microservice setup, applications that do not scale automatically during peak business hours can face significant performance bottlenecks and potential downtime, negatively impacting user experience. To address this, the team at CloudifyOps, a leading cloud consulting company, recommended implementing a Horizontal Pod Autoscaler (HPA) within the Kubernetes environment.

With HPA in place, application pods dynamically scale based on CPU and memory utilization, ensuring high performance during peak usage while minimizing resource consumption during off-hours. This approach not only enhances system responsiveness but also promotes cost efficiency by eliminating the need to maintain excess replicas around the clock.

As an expert in AWS managed services, CloudifyOps also emphasized the value of integrating cloud-native scaling tools and Cloud Security Frameworks to protect infrastructure. These frameworks enforce security policies, monitor access, and mitigate threats—ensuring a secure and resilient microservices architecture. Additionally, by leveraging multi-cloud management tools, organizations can achieve seamless workload distribution across cloud environments, further boosting scalability, availability, and operational efficiency.

Introduction:

Kubernetes autoscaling:

The three scalability tools that Kubernetes has are the Horizontal pod autoscaler, Vertical pod autoscaler (VPA) and the cluster autoscaler. HPA and VPA tools are used to scale up and monitor the application layer.

Horizontal pod autoscaling:

When a spike or drop in consumption occurs, Kubernetes can automatically decrease or increase the number of pods that serve the workload.

Vertical pod autoscaling:

Deciding how much compute resources to dedicate to a particular workload is challenging. With the right configuration, Kubernetes can help you get the most out of the allocated resources.

Requirement:

We need one Kubernetes cluster configuration ready to deploy.

Steps to follow:

Installing the metrics-server: The goal of the HPA is to make scaling decisions based on the per-pod resource metrics that are retrieved from the metrics API (metrics.k8s.io).

Create the cluster without giving the –yes argument to it. This will only create the configuration. Now, we need to make the below changes to the metrics server configuration.

The metrics server will be useful in creating the HPA.

For Cluster created with KOPS, follow the steps:

kops edit cluster <.cluster name.>

add the below configuration to your cluster configuration under kubelet

kubelet:

anonymousAuth: false

authorizationMode: Webhook

authenticationTokenWebhook: true

After making the changes, we should update the cluster. With the below command, the cluster will be created with the required configuration.

kops update cluster –name bittergourd.xyz –yes –admin

If you are changing the configuration after deploying the cluster, you need to run the rolling-update, which causes the master to terminate and redeploy. Later, new nodes will be deployed and the old nodes will be terminated.

To avoid this recreation, we are following the above steps while creating the Kubernetes cluster with kops (Kubernetes operations).

Now we need to install the metrics server

Kubectl apply -f https://raw.githubusercontent.com/kubernetes/kops/master/addons/metrics-server/v1.16.x.yaml

Below is the output of the metrics server creation step.

To confirm the metrics server installation

kubectl get pods -n kube-system

You will find metrics server pod in the pods list.

We can find out the memory and CPUutilization of pods and nodes using below commands

kubectl top node <.node name.>
kubectl top pod -n <.pod_name.>
kubectl top nodes
kubectl top -n <.namespace.>

The output should look like this.

Resource Requests and limits:

If the resource limit of a pod is exceeded, then it can use more than its requested resource. However, a container can’t use more than its resource limit.

If you set a memory request for 256 MiB, and a container is in a scheduled pod, then it can use more RAM.

If the limit is set at 4GiB, the kubelet enforces the limit. The runtime stops the process that tries to consume more than the permitted amount of memory.

Configuring HPA:

It is important that we have resource requests and limits mentioned in the container resources like shown in the above image.

First, we will start a deployment running the image and expose it as a service using the following command

kubectl apply -f https://k8s.io/examples/application/php-apache.yaml

One new deployment and service will be created with the above command. After completing the deployment, we need to deploy the HPA.

Follow below commands:

echo ‘apiVersion: autoscaling/v1

kind: HorizontalPodAutoscaler

metadata:

name: php-apache

spec:

scaleTargetRef:

apiVersion: apps/v1

kind: Deployment

name: php-apache

minReplicas: 1

maxReplicas: 10

targetCPUUtilizationPercentage: 50 ‘ | kubectl apply -f –

The HPA will be deployed as shown here.

As there is no load applied on the deployment, the targets show 0%/50%. To test the HPA, we shall apply load on the deployment.

kubectl run -i –tty load-generator –rm –image=busybox –restart=Never — /bin/sh -c “while sleep 0.01; do wget -q -O- http://php-apache; done”

Run the above command to apply some load on the deployment. You will get an output like below.

The deployment is scaled up. If you see in the below image, when the load increases above 50%, the deployment scaled up to 7 pods.

The default time to scale down is 300 seconds. The scale down time can be customized to suit different requirements.

Note: If you use AWS EKS, the metric server needs to be enabled with the following command.

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml

With the above exercise, we applied load on the CPU. Similarly, we can configure for memory usage as well.

To learn more about these cutting edge technologies & real time industry applied best practices, follow our LinkedIn Page. To explore our services, visit our website

Services

Solutions

Recent Blogs

Find us on Glassdoor.

CloudifyOps Trust Center

Our Services

Our Solutions

Offices

CloudifyOps Pvt Ltd, Ground Floor, Block C, DSR Techno Cube, Survey No.68, Varthur Rd, Thubarahalli, Bengaluru, Karnataka 560066

CloudifyOps Pvt Ltd, Cove Offices OMR, 10th Floor, Prince Infocity 1, Old Mahabalipuram Road, 50,1st Street, Kandhanchavadi, Perungudi, Chennai, Tamil Nadu - 600096

CloudifyOps Inc.,
200, Continental Dr Suite 401,
Newark, Delaware 19713,
United States of America

Contact us

sales@cloudifyops.com

Copyright 2025 CloudifyOps. All Rights Reserved