Reach Us

Optimizing Kubernetes Workloads with Keda: Custom Metric-Driven Pod Autoscaling

Our customer, a media advertising company, had an application for data processing with sufficient resources to handle the request load. The Kubernetes pods were enabled to perform Horizontal Pod Autoscaling (HPA) based on CPU and memory utilization. The customer wanted to scale the Kubernetes pods to meet the requests coming in for deployment. This is similar in requirement to the native AWS auto-scaling behavior dependent on “requests per target group”. The solution should ideally leverage existing and available CloudWatch metrics rather than modifying the application code base for new metrics that can then be scrapped.

The CloudifyOps solution was to incorporate Kubernetes Event-Driven Autoscaling or Keda to achieve this type of autoscaling. Keda serves as a Kubernetes Metrics Server and allows users to define autoscaling rules using a dedicated Kubernetes custom resource definition.

Architecture

Keda provides several capabilities that enhance the way Kubernetes clusters handle event-driven workloads:

Event-Driven Scaling: It enables automatic scaling of Kubernetes pods based on the number of events in event sources such as message queues (e.g., Azure Queue, RabbitMQ, Kafka), AWS CloudWatch events, and custom metrics. This means your application can scale dynamically in response to events.

Scaling to Zero: It allows your Kubernetes pods to scale down to zero when there are no incoming events. This capability is crucial for serverless and event-driven architectures, where resources should only be allocated when there’s actual work to be done. Scaling to zero helps save costs and resources during idle periods.

Integration with Various Workloads: It can scale various types of Kubernetes workloads, including Deployments, StatefulSets, and other controllers. This flexibility ensures that different types of applications can benefit from event-driven autoscaling.

Metrics Aggregation: Keda aggregates metrics from event sources, enabling better monitoring and understanding of the workload patterns.

Deploying Keda

We can either clone the repo or we can add the repo in helm.

git clone -b v2.10.0 https://github.com/kedacore/charts.git

We are using Keda v2.10.0 since we are doing this on the 1.23 version k8s cluster. We need to have a role with CloudWatch read access and add the serviceaccount in the trust relationship of the role.

Edit the values.yaml:

Give a name to the service account

serviceAccount.name: keda-operator

Give the role arn which has cloudwatch access

aws.irsa.roleArn: "arn:aws:iam::account-id:role/role-name"

Now install the Keda with helm command: helm install keda ./ –namespace keda

Scaling of Deployments and StatefulSets

Keda allows you to define the Kubernetes Deployment or StatefulSet that you want Keda to scale based on a scale trigger. Keda will monitor that service and based on the events that occur it will automatically scale your resource out/in accordingly.

Now we can write ScaledObject file where we will add the Trigger i.e aws-cloudwatch and configuration for Keda to scale up and scale down.

ScaledObject spec

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
 name: aws-cloudwatch-keda-scaledobject
 namespace: dash-cft-dev
spec:
 scaleTargetRef:
   name: backend
 minReplicaCount: 2  # We don't want pods if the queue is empty
 maxReplicaCount: 25  # We don't want to have more than 5 replicas
 pollingInterval: 10 # How frequently we should go for metrics (in seconds)
 cooldownPeriod:  60 # How many seconds should we wait for downscale
 fallback:                                          # Optional. Section to specify fallback options
   failureThreshold: 3       #Keda will keep track of the number of consecutive times each scaler has failed to get metrics from its source.                       # Mandatory if fallback section is included
   replicas: 5
 advanced:                                          # Optional. Section to specify advanced options
   restoreToOriginalReplicaCount: true  # true to make replica count as per the deployment file if scaled object is deleted
   horizontalPodAutoscalerConfig:                   # Optional. Section to specify HPA related options
     name: keda-hpa-be                  # Optional. Default: keda-hpa-{scaled-object-name}
     behavior:                                      # Optional. Use to modify HPA's scaling behavior
       scaleDown:
         stabilizationWindowSeconds: 60
         policies:
         - type: Percent
           value: 100
           periodSeconds: 10
 triggers:
 - type: memory
   metricType: AverageValue # Allowed types are 'Utilization' or 'AverageValue'
   metadata:
     value: "1434Mi"


 - type: cpu
   metricType: AverageValue # Allowed types are 'Utilization' or 'AverageValue'
   metadata:
     value: "500m"


 - type: aws-cloudwatch
   metadata:
  # Required: namespace
     namespace: AWS/ApplicationELB
     #with and condition for lb and tg
#      expression: SELECT COUNT(RequestCountPerTarget) FROM SCHEMA("AWS/ApplicationELB", LoadBalancer,TargetGroup) WHERE TargetGroup = 'targetgroup/k8s-tg-name' AND LoadBalancer = 'app/k8s-lb-name'
     #only with LB
     expression: SELECT COUNT(RequestCountPerTarget) FROM SCHEMA("AWS/ApplicationELB", LoadBalancer,TargetGroup) WHERE LoadBalancer = 'app/k8s-lb-name'
     metricName: RequestCountPerTarget
     targetMetricValue: "300"
     minMetricValue: "6"
  # Required: region
     awsRegion: "us-west-2"
     identityOwner: operator # Optional. Default: pod
  # Optional: Collection Time
     metricCollectionTime: "60" # default 300
  # Optional: Metric Statistic
     metricStat: "Average" # default "Average" , SampleCount (is the number of data points during the period)
  # Optional: Metric Statistic Period
     metricStatPeriod: "30" # default 300
  # Optional: Metric Unit
     metricUnit: "Count" # default ""
  # Optional: Metric EndTime Offset
#     metricEndTimeOffset: "30" # default 0
  • spec.scaleTargetRef.name: Define the name of your target deployment or statefulset which needs to scale.
  • pollingInterval: This is the interval to check each trigger on.
  • cooldownPeriod: The period to wait after the last trigger reported active before scaling the resource back to 0. By default, it is 5 minutes (300 seconds).
  • minReplicaCount: Minimum number of replicas Keda will scale the resource down to. By default, it scales to zero, but you can use it with some other value as well.
  • maxReplicaCount: This setting is passed to the HPA definition that Keda will create for a given resource and holds the maximum number of replicas of the target resource.
  • fallback: The fallback section is optional. It defines many replicas to fallback to if a scalar is in an error state.
  • advanced.restoreToOriginalReplicaCount: true/false

This property specifies whether the target resource (Deployment, StatefulSet) should be scaled back to the original replicas count after the ScaledObject is deleted.

  • advanced.horizontalPodAutoscalerConfig.name: Name of the HPA that Keda will spin up.
  • advanced.horizontalPodAutoscalerConfig.behavior: Autoscaling API allows scaling behavior to be configured through the HPA behavior field. This way, one can directly affect the scaling of 1<->N replicas, which is internally being handled by HPA.

Under spec.triggers, we can have multiple trigger points. In our case, these points were added based on CPU, memory utilization, and request count.

Load Test

There are multiple ways to test the load, Locust being an example of a popular tool. We will see a simple way to test by hitting the DNS record above the set threshold value. To do this, we have a Python script like this:

import requests
import threading
import time
url = ["Enter your endpoint"]
requests_per_minute = 300
delay = 60 / requests_per_minute
def send_request():
    try:
        response = requests.get(url)
        print(f"Request sent. Status code: {response.status_code}")
    except requests.RequestException as e:
        print(f"Request failed: {e}")
def send_requests():
    while True:
        threads = []
        for _ in range(requests_per_minute):
            thread = threading.Thread(target=send_request)
            threads.append(thread)
            thread.start()
        for thread in threads:
            thread.join()
        time.sleep(delay)
if __name__ == "__main__":
    send_requests()

Once you hit the URL with the known request set in this script, the pod will scale up and down as required.

To learn more about our containerization services, write to us today at sales@cloudifyops.com.

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from - Youtube
Vimeo
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google
Spotify
Consent to display content from - Spotify
Sound Cloud
Consent to display content from - Sound
Contact Us