Our customer, a leading fintech business in the micro banking sector, recently migrated their infrastructure to Amazon Elastic Kubernetes Service (EKS) for managing their containerized applications. As their business expanded rapidly, they faced challenges during node termination in AWS EKS, resulting in service disruptions and data loss.
After thorough research and analysis, our team discovered that a Node Termination Handler (NTH) could effectively handle the graceful shutdown of nodes in AWS EKS. The NTH would monitor node health, ensure proper pod eviction, and coordinate load balancer redirection for a seamless transition.
Graceful Node Termination with an NTH is used in scenarios where you want to ensure a smooth and controlled shutdown process for nodes in a Kubernetes cluster.
The AWS NTH project ensures that the Kubernetes control plane responds appropriately to events that can cause your EC2 instances to become unavailable, such as EC2 maintenance events, EC2 Spot-interruptions, ASG Scale-In, ASG AZ Rebalance, and EC2 Instance Termination via the API or Console.
AWS-node-termination-handler (NTH) can operate in two different modes:
Instance Metadata Service (IMDS)
IMDS must be deployed as a Kubernetes DaemonSet. The termination handler daemonset installs into your cluster a ServiceAccount, ClusterRole, ClusterRoleBinding, and a DaemonSet. All four of these Kubernetes constructs are required for the termination handler to run properly.
Monitor EC2 Instance Metadata for: Spot Instance Termination Notifications, Scheduled Events, Instance Rebalance Recommendations.
Helm install
helm upgrade --install aws-node-termination-handler --namespace kube-system --set enableSpotInterruptionDraining="true" --set enableRebalanceMonitoring="true" --set enableScheduledEventDraining="false" oci://public.ecr.aws/aws-ec2/helm/aws-node-termination-handler --version $CHART_VERSION
The enable* configuration flags above enable or disable IMDS monitoring paths.
Running Only On Specific Nodes:
helm upgrade --install aws-node-termination-handler --namespace kube-system --set nodeSelector.lifecycle=spot oci://public.ecr.aws/aws-ec2/helm/aws-node-termination-handler --version $CHART_VERSION
Webhook Configuration:
helm upgrade --install aws-node-termination-handler --namespace kube-system --set webhookURL=https://hooks.slack.com/services/YOUR/SLACK/URL oci://public.ecr.aws/aws-ec2/helm/aws-node-termination-handler --version $CHART_VERSION
The workflow (shown in the diagram) consists of the following high-level steps:
You will need the following AWS infrastructure components:
Prerequisites:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "autoscaling:CompleteLifecycleAction", "autoscaling:DescribeAutoScalingInstances", "autoscaling:DescribeTags", "ec2:DescribeInstances", "sqs:DeleteMessage", "sqs:ReceiveMessage" ], "Resource": "*" } ] }
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::<Account ID>:oidc-provider/<OIDC URL>" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "<OIDC URL>:sub": "system:serviceaccount:kube-system:<Service Account Name>", "<OIDC URL>:aud": "sts.amazonaws.com" } } } ] }
{ "Version": "2012-10-17", "Id": "__default_policy_ID", "Statement": [ { "Sid": "__owner_statement", "Effect": "Allow", "Principal": { "Service": [ "sqs.amazonaws.com", "events.amazonaws.com" ] }, "Action": "SQS:*", "Resource": "SQS ARN" } ] }
Update the SQS queue URL in the values-aws.yaml as queueURL: <SQS Queue URL>
Here is the AWS CLI command to create a termination lifecycle hook on an existing ASG when using EventBridge.
Update the –auto-scaling-group-name=my-k8s-asg with the name of your existing ASG name.
aws autoscaling put-lifecycle-hook --lifecycle-hook-name=my-k8s-term-hook --auto-scaling-group-name=my-k8s-asg --lifecycle-transition=autoscaling:EC2_INSTANCE_TERMINATING --default-result=CONTINUE --heartbeat-timeout=300
To tag ASGs and propagate the tags to your instances – By default the aws-node-termination-handler will only manage terminations for instances tagged with key=aws-node-termination-handler/managed. The value of the key does not matter.
Update the ResourceId=my-auto-scaling-group with the autoscaling group name.
aws autoscaling create-or-update-tags --tags ResourceId=my-auto-scaling-group,ResourceType=auto-scaling-group,Key=aws-node-termination-handler/managed,Value=,PropagateAtLaunch=true
To tag an individual EC2 instance:
Make sure to update the resources i-12************f0 with the instance ID.
aws ec2 create-tags --resources i-12**********f0 --tags 'Key="aws-node-termination-handler/managed",Value='
Create Amazon EventBridge rules so that ASG termination events, Spot Interruptions, Instance state changes, Rebalance Recommendations, and AWS Health Scheduled Changes are sent to the SQS queue.
Update the ARN of SQS –targets “Id”=”1″,”Arn”=”<SQS ARN>”
$ aws events put-rule --name MyK8sASGTermRule --event-pattern "{"source":["aws.autoscaling"],"detail-type":["EC2 Instance-terminate Lifecycle Action"]}" $ aws events put-targets --rule MyK8sASGTermRule --targets "Id"="1","Arn"="<SQS ARN>" $ aws events put-rule --name MyK8sSpotTermRule --event-pattern "{"source": ["aws.ec2"],"detail-type": ["EC2 Spot Instance Interruption Warning"]}" $ aws events put-targets --rule MyK8sSpotTermRule --targets "Id"="1","Arn"="<SQS ARN>" $ aws events put-rule --name MyK8sRebalanceRule --event-pattern "{"source": ["aws.ec2"],"detail-type": ["EC2 Instance Rebalance Recommendation"]}" $ aws events put-targets --rule MyK8sRebalanceRule --targets "Id"="1","Arn"="<SQS ARN>" $ aws events put-rule --name MyK8sInstanceStateChangeRule --event-pattern "{"source": ["aws.ec2"],"detail-type": ["EC2 Instance State-change Notification"]}" $ aws events put-targets --rule MyK8sInstanceStateChangeRule --targets "Id"="1","Arn"="<SQS ARN>" $ aws events put-rule --name MyK8sScheduledChangeRule --event-pattern "{"source": ["aws.health"],"detail-type": ["AWS Health Event"],"detail": {"service": ["EC2"],"eventTypeCategory": ["scheduledChange"]}}" $ aws events put-targets --rule MyK8sScheduledChangeRule --targets "Id"="1","Arn"="<SQS ARN>"
helm install aws-node-termination-handler eks/aws-node-termination-handler -f values-aws.yaml --namespace kube-system
If you would like to read more about our work, check out the Case Studies page on our website. To know how we can help you, write to us at sales@cloudifyops.com.
CloudifyOps Pvt Ltd, Ground Floor, Block C, DSR Techno Cube, Survey No.68, Varthur Rd, Thubarahalli, Bengaluru, Karnataka 560037
Indiqube Vantage, 3rd Phase, No.1, OMR Service Road, Santhosh Nagar, Kandhanchavadi, Perungudi, Chennai, Tamil Nadu 600096.
CloudifyOps Inc.,
200, Continental Dr Suite 401,
Newark, Delaware 19713,
United States of America
Copyright 2024 CloudifyOps. All Rights Reserved