Our customer, a media advertising company, had an application for data processing with sufficient resources to handle the request load. The Kubernetes pods were enabled to perform Horizontal Pod Autoscaling (HPA) based on CPU and memory utilization. The customer wanted to scale the Kubernetes pods to meet the requests coming in for deployment. This is similar in requirement to the native AWS auto-scaling behavior dependent on “requests per target group”. The solution should ideally leverage existing and available CloudWatch metrics rather than modifying the application code base for new metrics that can then be scrapped.
The CloudifyOps solution was to incorporate Kubernetes Event-Driven Autoscaling or Keda to achieve this type of autoscaling. Keda serves as a Kubernetes Metrics Server and allows users to define autoscaling rules using a dedicated Kubernetes custom resource definition.
Keda provides several capabilities that enhance the way Kubernetes clusters handle event-driven workloads:
Event-Driven Scaling: It enables automatic scaling of Kubernetes pods based on the number of events in event sources such as message queues (e.g., Azure Queue, RabbitMQ, Kafka), AWS CloudWatch events, and custom metrics. This means your application can scale dynamically in response to events.
Scaling to Zero: It allows your Kubernetes pods to scale down to zero when there are no incoming events. This capability is crucial for serverless and event-driven architectures, where resources should only be allocated when there’s actual work to be done. Scaling to zero helps save costs and resources during idle periods.
Integration with Various Workloads: It can scale various types of Kubernetes workloads, including Deployments, StatefulSets, and other controllers. This flexibility ensures that different types of applications can benefit from event-driven autoscaling.
Metrics Aggregation: Keda aggregates metrics from event sources, enabling better monitoring and understanding of the workload patterns.
We can either clone the repo or we can add the repo in helm.
git clone -b v2.10.0 https://github.com/kedacore/charts.git
We are using Keda v2.10.0 since we are doing this on the 1.23 version k8s cluster. We need to have a role with CloudWatch read access and add the serviceaccount in the trust relationship of the role.
Edit the values.yaml:
Give a name to the service account
serviceAccount.name: keda-operator
Give the role arn which has cloudwatch access
aws.irsa.roleArn: "arn:aws:iam::account-id:role/role-name"
Now install the Keda with helm command: helm install keda ./ –namespace keda
Keda allows you to define the Kubernetes Deployment or StatefulSet that you want Keda to scale based on a scale trigger. Keda will monitor that service and based on the events that occur it will automatically scale your resource out/in accordingly.
Now we can write ScaledObject file where we will add the Trigger i.e aws-cloudwatch and configuration for Keda to scale up and scale down.
apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: aws-cloudwatch-keda-scaledobject namespace: dash-cft-dev spec: scaleTargetRef: name: backend minReplicaCount: 2 # We don't want pods if the queue is empty maxReplicaCount: 25 # We don't want to have more than 5 replicas pollingInterval: 10 # How frequently we should go for metrics (in seconds) cooldownPeriod: 60 # How many seconds should we wait for downscale fallback: # Optional. Section to specify fallback options failureThreshold: 3 #Keda will keep track of the number of consecutive times each scaler has failed to get metrics from its source. # Mandatory if fallback section is included replicas: 5 advanced: # Optional. Section to specify advanced options restoreToOriginalReplicaCount: true # true to make replica count as per the deployment file if scaled object is deleted horizontalPodAutoscalerConfig: # Optional. Section to specify HPA related options name: keda-hpa-be # Optional. Default: keda-hpa-{scaled-object-name} behavior: # Optional. Use to modify HPA's scaling behavior scaleDown: stabilizationWindowSeconds: 60 policies: - type: Percent value: 100 periodSeconds: 10 triggers: - type: memory metricType: AverageValue # Allowed types are 'Utilization' or 'AverageValue' metadata: value: "1434Mi" - type: cpu metricType: AverageValue # Allowed types are 'Utilization' or 'AverageValue' metadata: value: "500m" - type: aws-cloudwatch metadata: # Required: namespace namespace: AWS/ApplicationELB #with and condition for lb and tg # expression: SELECT COUNT(RequestCountPerTarget) FROM SCHEMA("AWS/ApplicationELB", LoadBalancer,TargetGroup) WHERE TargetGroup = 'targetgroup/k8s-tg-name' AND LoadBalancer = 'app/k8s-lb-name' #only with LB expression: SELECT COUNT(RequestCountPerTarget) FROM SCHEMA("AWS/ApplicationELB", LoadBalancer,TargetGroup) WHERE LoadBalancer = 'app/k8s-lb-name' metricName: RequestCountPerTarget targetMetricValue: "300" minMetricValue: "6" # Required: region awsRegion: "us-west-2" identityOwner: operator # Optional. Default: pod # Optional: Collection Time metricCollectionTime: "60" # default 300 # Optional: Metric Statistic metricStat: "Average" # default "Average" , SampleCount (is the number of data points during the period) # Optional: Metric Statistic Period metricStatPeriod: "30" # default 300 # Optional: Metric Unit metricUnit: "Count" # default "" # Optional: Metric EndTime Offset # metricEndTimeOffset: "30" # default 0
This property specifies whether the target resource (Deployment, StatefulSet) should be scaled back to the original replicas count after the ScaledObject is deleted.
Under spec.triggers, we can have multiple trigger points. In our case, these points were added based on CPU, memory utilization, and request count.
There are multiple ways to test the load, Locust being an example of a popular tool. We will see a simple way to test by hitting the DNS record above the set threshold value. To do this, we have a Python script like this:
import requests import threading import time url = ["Enter your endpoint"] requests_per_minute = 300 delay = 60 / requests_per_minute def send_request(): try: response = requests.get(url) print(f"Request sent. Status code: {response.status_code}") except requests.RequestException as e: print(f"Request failed: {e}") def send_requests(): while True: threads = [] for _ in range(requests_per_minute): thread = threading.Thread(target=send_request) threads.append(thread) thread.start() for thread in threads: thread.join() time.sleep(delay) if __name__ == "__main__": send_requests()
Once you hit the URL with the known request set in this script, the pod will scale up and down as required.
To learn more about our containerization services, write to us today at sales@cloudifyops.com.
CloudifyOps Pvt Ltd, Ground Floor, Block C, DSR Techno Cube, Survey No.68, Varthur Rd, Thubarahalli, Bengaluru, Karnataka 560037
Indiqube Vantage, 3rd Phase, No.1, OMR Service Road, Santhosh Nagar, Kandhanchavadi, Perungudi, Chennai, Tamil Nadu 600096.
CloudifyOps Inc.,
200, Continental Dr Suite 401,
Newark, Delaware 19713,
United States of America
Copyright 2024 CloudifyOps. All Rights Reserved