Our customer, a leading fintech business in the micro banking sector, recently migrated their infrastructure to Amazon Web Services (AWS) using Amazon Elastic Kubernetes Service (EKS) to efficiently manage their containerized applications. As their business expanded rapidly, they encountered challenges during node termination in AWS EKS, leading to service disruptions and potential data loss.
After thorough research and analysis, our team identified that implementing a Node Termination Handler (NTH) could effectively mitigate these risks. The NTH monitors node health, ensures proper pod eviction, and coordinates load balancer redirection to enable a seamless transition, minimizing downtime and enhancing Cloud Security.
By leveraging expertise from a Cloud Consulting Company, businesses can optimize AWS EKS operations, improve resilience, and enhance Cloud Security through proactive infrastructure management. This approach ensures seamless scaling while maintaining high availability and operational efficiency within Amazon Web Services (AWS) environments.
.
Graceful Node Termination with an NTH is used in scenarios where you want to ensure a smooth and controlled shutdown process for nodes in a Kubernetes cluster.
The AWS NTH project ensures that the Kubernetes control plane responds appropriately to events that can cause your EC2 instances to become unavailable, such as EC2 maintenance events, EC2 Spot-interruptions, ASG Scale-In, ASG AZ Rebalance, and EC2 Instance Termination via the API or Console.
AWS-node-termination-handler (NTH) can operate in two different modes:
Instance Metadata Service (IMDS)
IMDS must be deployed as a Kubernetes DaemonSet. The termination handler daemonset installs into your cluster a ServiceAccount, ClusterRole, ClusterRoleBinding, and a DaemonSet. All four of these Kubernetes constructs are required for the termination handler to run properly.
Monitor EC2 Instance Metadata for: Spot Instance Termination Notifications, Scheduled Events, Instance Rebalance Recommendations.
Helm install
helm upgrade --install aws-node-termination-handler --namespace kube-system --set enableSpotInterruptionDraining="true" --set enableRebalanceMonitoring="true" --set enableScheduledEventDraining="false" oci://public.ecr.aws/aws-ec2/helm/aws-node-termination-handler --version $CHART_VERSION
The enable* configuration flags above enable or disable IMDS monitoring paths.
Running Only On Specific Nodes:
helm upgrade --install aws-node-termination-handler --namespace kube-system --set nodeSelector.lifecycle=spot oci://public.ecr.aws/aws-ec2/helm/aws-node-termination-handler --version $CHART_VERSION
Webhook Configuration:
helm upgrade --install aws-node-termination-handler --namespace kube-system --set webhookURL=https://hooks.slack.com/services/YOUR/SLACK/URL oci://public.ecr.aws/aws-ec2/helm/aws-node-termination-handler --version $CHART_VERSION
The workflow (shown in the diagram) consists of the following high-level steps:
You will need the following AWS infrastructure components:
Prerequisites:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "autoscaling:CompleteLifecycleAction", "autoscaling:DescribeAutoScalingInstances", "autoscaling:DescribeTags", "ec2:DescribeInstances", "sqs:DeleteMessage", "sqs:ReceiveMessage" ], "Resource": "*" } ] }
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam:::oidc-provider/" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { ":sub": "system:serviceaccount:kube-system:", ":aud": "sts.amazonaws.com" } } } ] }
{ "Version": "2012-10-17", "Id": "__default_policy_ID", "Statement": [ { "Sid": "__owner_statement", "Effect": "Allow", "Principal": { "Service": [ "sqs.amazonaws.com", "events.amazonaws.com" ] }, "Action": "SQS:*", "Resource": "SQS ARN" } ] }
Update the SQS queue URL in the values-aws.yaml as queueURL:
Here is the AWS CLI command to create a termination lifecycle hook on an existing ASG when using EventBridge.
Update the –auto-scaling-group-name=my-k8s-asg with the name of your existing ASG name.
aws autoscaling put-lifecycle-hook --lifecycle-hook-name=my-k8s-term-hook --auto-scaling-group-name=my-k8s-asg --lifecycle-transition=autoscaling:EC2_INSTANCE_TERMINATING --default-result=CONTINUE --heartbeat-timeout=300
To tag ASGs and propagate the tags to your instances – By default the aws-node-termination-handler will only manage terminations for instances tagged with key=aws-node-termination-handler/managed. The value of the key does not matter.
Update the ResourceId=my-auto-scaling-group with the autoscaling group name.
aws autoscaling create-or-update-tags --tags ResourceId=my-auto-scaling-group,ResourceType=auto-scaling-group,Key=aws-node-termination-handler/managed,Value=,PropagateAtLaunch=true
To tag an individual EC2 instance:
Make sure to update the resources i-12************f0 with the instance ID.
aws ec2 create-tags --resources i-12**********f0 --tags 'Key="aws-node-termination-handler/managed",Value='
Create Amazon EventBridge rules so that ASG termination events, Spot Interruptions, Instance state changes, Rebalance Recommendations, and AWS Health Scheduled Changes are sent to the SQS queue.
Update the ARN of SQS –targets “Id”=”1″,”Arn”=””
$ aws events put-rule --name MyK8sASGTermRule --event-pattern "{"source":["aws.autoscaling"],"detail-type":["EC2 Instance-terminate Lifecycle Action"]}" $ aws events put-targets --rule MyK8sASGTermRule --targets "Id"="1","Arn"="" $ aws events put-rule --name MyK8sSpotTermRule --event-pattern "{"source": ["aws.ec2"],"detail-type": ["EC2 Spot Instance Interruption Warning"]}" $ aws events put-targets --rule MyK8sSpotTermRule --targets "Id"="1","Arn"="" $ aws events put-rule --name MyK8sRebalanceRule --event-pattern "{"source": ["aws.ec2"],"detail-type": ["EC2 Instance Rebalance Recommendation"]}" $ aws events put-targets --rule MyK8sRebalanceRule --targets "Id"="1","Arn"="" $ aws events put-rule --name MyK8sInstanceStateChangeRule --event-pattern "{"source": ["aws.ec2"],"detail-type": ["EC2 Instance State-change Notification"]}" $ aws events put-targets --rule MyK8sInstanceStateChangeRule --targets "Id"="1","Arn"="" $ aws events put-rule --name MyK8sScheduledChangeRule --event-pattern "{"source": ["aws.health"],"detail-type": ["AWS Health Event"],"detail": {"service": ["EC2"],"eventTypeCategory": ["scheduledChange"]}}" $ aws events put-targets --rule MyK8sScheduledChangeRule --targets "Id"="1","Arn"=""
helm install aws-node-termination-handler eks/aws-node-termination-handler -f values-aws.yaml --namespace kube-system
If you would like to read more about our work, check out the Case Studies page on our website. To know how we can help you, write to us at sales@cloudifyops.com.
CloudifyOps Pvt Ltd, Ground Floor, Block C, DSR Techno Cube, Survey No.68, Varthur Rd, Thubarahalli, Bengaluru, Karnataka 560037
Cove Offices OMR, 10th Floor, Prince Infocity 1, Old Mahabalipuram Road, 50,1st Street, Kandhanchavadi, Perungudi, Chennai, Tamil Nadu - 600096
CloudifyOps Inc.,
200, Continental Dr Suite 401,
Newark, Delaware 19713,
United States of America
Copyright 2024 CloudifyOps. All Rights Reserved