Optimizing EKS Cluster Autoscaling with Karpenter: A Game-Changer for Cloud Engineers

Our customer, with infrastructure on EKS set up by the CloudifyOps team, had a fluctuating load on worker nodes. The cluster was provisioned to meet the peak load and was largely under-utilized during off-peak hours, incurring costs to the customer. The EKS cluster had 2 node groups that needed a scaling solution to scale the node groups independently. With the growth in their business, the customer was concerned about the latency and availability of their services.

In today’s dynamic cloud environment, scaling Kubernetes clusters to meet varying workload demands is a critical challenge for cloud engineers. While Kubernetes offers built-in scaling capabilities, there are situations where more advanced auto-scaling solutions are needed.

Our solution was to leverage Karpenter and the scaling solutions it offers. This blog takes you through the steps involved in setting up Karpenter and testing it.

Why do we need Karpenter and its Custom Resource Definitions(CRD)? Aren’t the built-in Kubernetes auto-scaling solutions enough?

To better understand Karpenter auto-scaling, let us compare the built-in auto-scaling solutions from Kubernetes against the auto-scaling features of Karpenter.

Karpenter provides definite advantages when the problem involved auto-scaling the worker nodes and supporting custom metrics.

With its dynamic capacity provisioning, Karpenter allowed us to save infrastructure costs (up to 30%), reduce the downtime (up to 80%), reduce manual intervention, and achieve improvement in resource utilization and workload distribution (up to 25%), by using the advanced customization features for our client. In addition, it also provided our client an option for auto-scaling based on custom metrics.

The below steps to setup Karpenter are specific to the version v0.27.0. Since newer versions of Karpenter are released regularly, the install/upgrade steps could differ from those below. Please check the official documentation for the latest version and installation steps. Karpenter should be set up after creating the EKS cluster.

The requirements for setting up Karpenter could be divided into below subsets.

IAM permissions and roles required by Karpenter, so that Karpenter has permission to interact with EC2 and EKS services.
Tags to the EKS private subnets and security group, so that Karpenter knows which subnets and security group to use to create the new nodes.
Adding Node IAM role to aws-auth configmap and deploying Karpenter provisioner and AWSNodeTemplate CRDs, deploying Karpenter using helm chart and deploying provisioners to control the auto-scaling behavior.

IAM Permissions

The IAM permissions and roles required by Karpenter are as below:

Create Karpenter node IAM role and attach required policies and instance profile to the role, so that Karpenter can launch and add new instances to the EKS cluster.

CLUSTER_NAME=my-eks-cluster

AWS_PARTITION=”aws” # if you are using standard partitions
AWS_REGION=”us-east-1″

OIDC_ENDPOINT=”$(aws eks describe-cluster — name ${CLUSTER_NAME}

— query “cluster.identity.oidc.issuer” — output text)”

AWS_ACCOUNT_ID=$(aws sts get-caller-identity — query ‘Account’ — output text)

echo ‘{
“Version”: “2012–10–17”,
“Statement”: [
{
“Effect”: “Allow”,
“Principal”: {
“Service”: “ec2.amazonaws.com”
},
“Action”: “sts:AssumeRole”
}
]
}’ > node-trust-policy.json

aws iam create-role — role-name “KarpenterNodeRole-${CLUSTER_NAME}”
— assume-role-policy-document file://node-trust-policy.json

aws iam attach-role-policy — role-name “KarpenterNodeRole-${CLUSTER_NAME}”

— policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy

aws iam attach-role-policy — role-name “KarpenterNodeRole-${CLUSTER_NAME}”

— policy-arn arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy

aws iam attach-role-policy — role-name “KarpenterNodeRole-${CLUSTER_NAME}”

— policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly

aws iam attach-role-policy — role-name “KarpenterNodeRole-${CLUSTER_NAME}”

— policy-arn arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore

aws iam create-instance-profile

— instance-profile-name “KarpenterNodeInstanceProfile-${CLUSTER_NAME}”

aws iam add-role-to-instance-profile

— instance-profile-name “KarpenterNodeInstanceProfile-${CLUSTER_NAME}”

— role-name “KarpenterNodeRole-${CLUSTER_NAME}”

Create a Karpenter controller role, which gives Karpenter permission to monitor the state of the cluster from the EKS API and delete/add new nodes to the cluster. The controllerrole uses the Karpenter node IAM role to delete/add nodes to the cluster.

cat << EOF > controller-trust-policy.json
{
“Version”: “2012–10–17”,
“Statement”: [
{
“Effect”: “Allow”,
“Principal”: {
“Federated”: “arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_ENDPOINT#*//}”
},
“Action”: “sts:AssumeRoleWithWebIdentity”,
“Condition”: {
“StringEquals”: {
“${OIDC_ENDPOINT#*//}:aud”: “sts.amazonaws.com”,
“${OIDC_ENDPOINT#*//}:sub”: “system:serviceaccount:karpenter:karpenter”
}
}
}
]
}
EOF

aws iam create-role — role-name KarpenterControllerRole-${CLUSTER_NAME}
— assume-role-policy-document file://controller-trust-policy.json

cat << EOF > controller-policy.json
{
“Statement”: [
{
“Action”: [
“ssm:GetParameter”,
“ec2:DescribeImages”,
“ec2:RunInstances”,
“ec2:DescribeSubnets”,
“ec2:DescribeSecurityGroups”,
“ec2:DescribeLaunchTemplates”,
“ec2:DescribeInstances”,
“ec2:DescribeInstanceTypes”,
“ec2:DescribeInstanceTypeOfferings”,
“ec2:DescribeAvailabilityZones”,
“ec2:DeleteLaunchTemplate”,
“ec2:CreateTags”,
“ec2:CreateLaunchTemplate”,
“ec2:CreateFleet”,
“ec2:DescribeSpotPriceHistory”,
“pricing:GetProducts”
],
“Effect”: “Allow”,
“Resource”: “*”,
“Sid”: “Karpenter”
},
{
“Action”: “ec2:TerminateInstances”,
“Condition”: {
“StringLike”: {
“ec2:ResourceTag/karpenter.sh/provisioner-name”: “*”
}
},
“Effect”: “Allow”,
“Resource”: “*”,
“Sid”: “ConditionalEC2Termination”
},
{
“Effect”: “Allow”,
“Action”: “iam:PassRole”,
“Resource”: “arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME}”,
“Sid”: “PassNodeIAMRole”
},
{
“Effect”: “Allow”,
“Action”: “eks:DescribeCluster”,
“Resource”: “arn:${AWS_PARTITION}:eks:${AWS_REGION}:${AWS_ACCOUNT_ID}:cluster/${CLUSTER_NAME}”,
“Sid”: “EKSClusterEndpointLookup”
}
],
“Version”: “2012–10–17”
}
EOF

aws iam put-role-policy — role-name KarpenterControllerRole-${CLUSTER_NAME}
— policy-name KarpenterControllerPolicy-${CLUSTER_NAME}
— policy-document file://controller-policy.json

Tag EKS Subnets and Security Group

Karpenter auto-scales by reading the pre-set tags from the EKS subnets and security groups. For example, the key-value combination of the tag can be:

“Karpenter.sh/discovery” = “my-eks-cluster”

This tag will also be added to the provisioner file later, so Karpenter knows the subnets and security groups to use for the instances it spins up.

for NODEGROUP in $(aws eks list-nodegroups — cluster-name ${CLUSTER_NAME}
— query ‘nodegroups’ — output text); do aws ec2 create-tags
— tags “Key=karpenter.sh/discovery,Value=${CLUSTER_NAME}”
— resources $(aws eks describe-nodegroup — cluster-name ${CLUSTER_NAME}
— nodegroup-name $NODEGROUP — query ‘nodegroup.subnets’ — output text )
done

NODEGROUP=$(aws eks list-nodegroups — cluster-name ${CLUSTER_NAME}

— query ‘nodegroups[0]’ — output text)

LAUNCH_TEMPLATE=$(aws eks describe-nodegroup — cluster-name ${CLUSTER_NAME}

— nodegroup-name ${NODEGROUP} — query ‘nodegroup.launchTemplate.{id:id,version:version}’

— output text | tr -s “t” “,”)

# If your EKS setup is configured to use only Cluster security group, then please execute –

SECURITY_GROUPS=$(aws eks describe-cluster

— name ${CLUSTER_NAME} — query “cluster.resourcesVpcConfig.clusterSecurityGroupId” — output text)

# If your setup uses the security groups in the Launch template of a managed node group, then:

SECURITY_GROUPS=$(aws ec2 describe-launch-template-versions

— launch-template-id ${LAUNCH_TEMPLATE%,*} — versions ${LAUNCH_TEMPLATE#*,}

— query ‘LaunchTemplateVersions[0].LaunchTemplateData.[NetworkInterfaces[0].Groups||SecurityGroupIds]’

— output text)

aws ec2 create-tags

— tags “Key=karpenter.sh/discovery,Value=${CLUSTER_NAME}”

— resources ${SECURITY_GROUPS}

Update aws-auth ConfigMap and Deploy Karpenter CRDs and Provisioners

The steps involved are:

Create the Karpenter namespace and modify the aws-auth configmap.

kubectl create ns karpenter
kubectl edit configmap aws-auth -n kube-system

– groups:

– system:bootstrappers

– system:nodes

rolearn: arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME}

username: system:node:{{EC2PrivateDNSName}}

Deploy Karpenter helm charts, create Karpenter provisioner CRD’s and AWSNodeTemplate definitions, by running the below commands:

export KARPENTER_VERSION=v0.27.0
helm template karpenter oci://public.ecr.aws/karpenter/karpenter — version ${KARPENTER_VERSION} — namespace karpenter

— set settings.aws.defaultInstanceProfile=KarpenterNodeInstanceProfile-${CLUSTER_NAME}

— set settings.aws.clusterName=${CLUSTER_NAME}

— set serviceAccount.annotations.”eks.amazonaws.com/role-arn”=”arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/KarpenterControllerRole-${CLUSTER_NAME}”

— set controller.resources.requests.cpu=1

— set controller.resources.requests.memory=1Gi

— set controller.resources.limits.cpu=1

— set controller.resources.limits.memory=1Gi > karpenter.yaml

Set node affinity. Edit the karpenter.yaml file and find the Karpenter deployment affinity rules. Modify the affinity so that Karpenter will run on one of the existing node group nodes. The rules should look something like this. Modify the value to match your $NODEGROUP, one node group per line.

affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
— matchExpressions:
— key: karpenter.sh/provisioner-name
operator: DoesNotExist
— matchExpressions:
— key: eks.amazonaws.com/nodegroup
operator: In
values:
— ${NODEGROUP1}
— ${NODEGROUP2}

kubectl create -f https://raw.githubusercontent.com/aws/karpenter/${KARPENTER_VERSION}/pkg/apis/crds/karpenter.sh_provisioners.yaml

kubectl create -f https://raw.githubusercontent.com/aws/karpenter/${KARPENTER_VERSION}/pkg/apis/crds/karpenter.k8s.aws_awsnodetemplates.yaml

kubectl apply -f karpenter.yaml

The 2 yaml CRD files above can be downloaded from the URLs listed below. Modifying the CRDs or customizing CRD files is not recommended by AWS and Karpenter as they are API resource definition files.

aws/karpenter/v0.27.0/pkg/apis/crds/karpenter.sh_provisioners.yaml

aws/karpenter/v0.27.0/pkg/apis/crds/karpenter.k8s.aws_awsnodetemplates.yaml

Create a provisioner for Karpenter, so Karpenter knows the node and settings to provision the unscheduled workloads.

# This provisioner will provision general-purpose(t3a) instances
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: general-purpose
spec:
consolidation:
enabled: true
requirements:
# Include general purpose instance families
— key: karpenter.k8s.aws/instance-family
operator: In
values: [t3a]
# Exclude small instance sizes
— key: karpenter.k8s.aws/instance-size
operator: In
values: [xlarge, medium]
— key: karpenter.sh/capacity-type
operator: In
values: [on-demand]
— key: topology.kubernetes.io/zone
operator: In
values: [us-east-1a, us-east-1b, us-east-1c]
providerRef:
name: default
— –
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
name: default
spec:
subnetSelector:
karpenter.sh/discovery: my-eks-cluster # replace with your cluster name
securityGroupSelector:
karpenter.sh/discovery: my-eks-cluster # replace with your cluster name
detailedMonitoring: true # boolean value
blockDeviceMappings:
— deviceName: /dev/xvda
ebs:
volumeSize: 75Gi
volumeType: gp3
iops: 3000
encrypted: true
kmsKeyID: “arn:aws:kms:us-east-1:xxxxxxxxxxxx:key/14ce1495–9686–4518–9904–159c25608a29”
deleteOnTermination: true
amiSelector:
aws-ids: ami-xxxxxxxxxxxxx

Check the controller logs to confirm there are no errors, and Karpenter creates nodes for the workloads.

kubectl logs -f -n karpenter -c controller -l app.kubernetes.io/name=karpenter

Testing Karpenter Auto-Scaling

Now to test the Karpenter setup, we created test deployments with replica pods and a higher CPU and memory “requests” parameter. This allows the existing cluster resources to be saturated and Karpenter creates additional worker nodes in response to pods that the Kubernetes scheduler has marked as “unschedulable”. We created additional provisioners to test additional configuration options specified in the provisioner file. The test cases and their results have been captured in the below table.

Benefits of using Karpenter

In conclusion, Karpenter emerges as a powerful and reliable auto scaling solution for managing workloads on Amazon Elastic Kubernetes Service (EKS). Its robust capabilities offer numerous benefits that can be quantified, enhancing the efficiency and scalability of EKS clusters. By leveraging Karpenter, organizations can dynamically provision and scale the underlying infrastructure to match workload demands, resulting in optimized resource utilization and improved application performance.

One of the key advantages of Karpenter is its intelligent capacity provisioning, which helps organizations save infrastructure costs by avoiding underutilization or overprovisioning.

Karpenter enables enhanced reliability and fault tolerance of EKS clusters and ensures the high availability of applications running on them. Furthermore, Karpenter provides advanced features such as node customization, affinity, and anti-affinity rules. These features enable organizations to fine-tune their infrastructure based on specific requirements, resulting in better performance and efficiency. Karpenter also removes any manual intervention and administrative overhead in the auto-scaling process.

By leveraging Karpenter, organizations can unlock the true value of their EKS clusters, drive operational excellence, and deliver exceptional user experiences while achieving tangible measures of cost reduction, scalability improvement, and resource optimization.

Optimizing EKS Cluster Autoscaling with Karpenter: A Game-Changer for Cloud Engineers

The requirements for setting up Karpenter could be divided into below subsets.

IAM Permissions

Tag EKS Subnets and Security Group

Update aws-auth ConfigMap and Deploy Karpenter CRDs and Provisioners

Testing Karpenter Auto-Scaling

Benefits of using Karpenter

Services

Solutions

Recent Blogs

Our Services

Our Solutions

Offices

Contact us