In today’s rapidly evolving cloud-native landscape, observability has become a critical aspect of managing and monitoring applications deployed on Kubernetes. One popular solution gaining traction is Loki, a highly scalable and cost-effective log aggregation system developed by Grafana Labs. Loki offers a powerful approach to efficiently store, search, and analyze logs, allowing developers and operators to gain valuable insights into their applications’ performance and behavior.
In this blog post, we will explore Loki and dive into setting up a distributed Loki architecture on Amazon Elastic Kubernetes Service (EKS), a managed Kubernetes service provided by Amazon Web Services (AWS). We will understand Loki’s architecture and components, and configure Loki in a distributed setup. By leveraging EKS, we can harness the benefits of both Loki and the flexibility of EKS to achieve robust log management and analysis for our containerized applications.
Whether you’re a DevOps engineer, a Kubernetes enthusiast, or simply looking to enhance your log management capabilities on EKS, this blog post will provide you with the knowledge and hands-on guidance needed to get started with Loki’s distributed setup.
Let’s understand how we can unlock the full potential of Loki and revolutionize the way we handle logs in our Kubernetes environments on EKS!
Now let us understand these components and their functionalities.
These Loki components work together to create a distributed log aggregation system that provides scalable log storage, efficient querying, fault tolerance, and high availability. Understanding the role and functionality of each component is essential, the descriptive details of each component can be found here.
Using Loki in a distributed manner offers several advantages over the traditional Loki stack setup. Here are some key benefits:
Scalability: By deploying Loki in a distributed manner, you can scale your log aggregation system to handle large volumes of logs from multiple sources. Distributed Loki setups allow for horizontal scaling, where you can add more resources and nodes to accommodate increased log ingestion and querying demands. This scalability ensures that your log management solution can grow with your application’s needs.
Fault Tolerance: In a distributed Loki setup, log data is replicated and distributed across multiple storage nodes. This redundancy ensures fault tolerance and high availability. If a node or component fails, the system can continue to operate without interruption because the data is distributed and replicated across multiple nodes. This fault tolerance prevents data loss and ensures that log data remains accessible even in the event of a failure.
Load Balancing: With a distributed Loki setup, you can distribute the load of log ingestion and querying across multiple nodes. This load balancing mechanism improves overall system performance by effectively utilizing available resources and preventing bottlenecks. Load balancing ensures that the system can handle high traffic and large query volumes without compromising performance or responsiveness.
Efficient Resource Utilization: In a distributed setup, Loki components can be deployed on separate nodes or clusters, allowing for optimized resource utilization. Each component can be scaled independently based on its specific resource requirements. This flexibility ensures efficient resource allocation and prevents resource contention, maximizing the overall efficiency of your log management infrastructure.
Improved Query Performance: Distributing the querying workload across multiple Queriers in a distributed Loki setup enables parallel processing of queries. This parallelization enhances query performance and reduces query response times, even when dealing with large volumes of log data. The distributed setup ensures that queries are efficiently distributed and processed, resulting in faster and more responsive log analysis.
Enhanced Availability and Redundancy: A distributed Loki setup provides built-in redundancy by replicating log data across multiple storage nodes. This redundancy ensures that log data is highly available and accessible, even in the face of node failures or network issues. By spreading the data across multiple nodes, the distributed setup provides increased resilience and minimizes the risk of data loss.
Overall, using Loki in a distributed manner offers significant benefits in terms of scalability, fault tolerance, load balancing, query performance, and resource utilization. It allows your log management system to handle increasing log volumes, ensures high availability, and provides a more efficient and robust solution for log aggregation and analysis.
Now let’s understand how to deploy Loki distributed with minimal components and understand each working component. Here, we are using S3 bucket to store the Loki logs.
Prerequisites:
Deployment Steps:
Before deploying Loki, we need to configure service account, IAM role and bucket policies so that the Loki pods have access to S3 bucket to push the logs.
Configure IAM role and policy
{
“Version”: “2012–10–17”,
“Statement”: [
{
“Sid”: “”,
“Effect”: “Allow”,
“Principal”: {
“Federated”: “arn:aws:iam::12345678910:oidc-provider/oidc.eks.eu-west-1.amazonaws.com/id/1847B92748AB2A2XYZ”
},
“Action”: “sts:AssumeRoleWithWebIdentity”,
“Condition”: {
“StringEquals”: {
“oidc.eks.eu-west-1.amazonaws.com/id/1847B92748AB2A2XYZ:sub”: “system:serviceaccount:loki:loki”
}
}
}
]
}
apiVersion: v1
kind: ServiceAccount
metadata:
name: loki
namespace: loki
annotations:
eks.amazonaws.com/role-arn: “arn:aws:iam::12345678910:role/loki-distributed-bucket-role”
Updating the Bucket policy in which we store the Loki Logs
{
“Version”: “2012–10–17”,
“Statement”: [
{
“Effect”: “Allow”,
“Principal”: {
“AWS”: “arn:aws:iam::12345678910:role/loki-distributed-bucket-role”
},
“Action”: “s3:ListBucket”,
“Resource”: “arn:aws:s3:::dframe-loki-distributed”
},
{
“Effect”: “Allow”,
“Principal”: {
“AWS”: “arn:aws:iam::12345678910:role/loki-distributed-bucket-role”
},
“Action”: [
“s3:GetObject”,
“s3:PutObject”,
“s3:DeleteObject”
],
“Resource”: “arn:aws:s3:::dframe-loki-distributed/*”
}
]
}
Deploy Loki through helm.
serviceAccount:
# — Specifies whether a ServiceAccount should be created
create: true
# — The name of the ServiceAccount to use.
# If not set and create is true, a name is generated using the fullname template
name: loki
# — Image pull secrets for the service account
imagePullSecrets: []
# — Annotations for the service account
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::121234567890:role/loki-distributed-bucket-role
# — Set this toggle to false to opt out of automounting API credentials for the service account
automountServiceAccountToken: true
storageConfig:
boltdb_shipper:
shared_store: aws
aws:
s3: s3://us-east-1
bucketnames: dframe-loki-distributed
Add Loki Helm chart repository:
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
Deploy Loki components:
The custom values files has been uploaded here, for the fast deployment.
helm upgrade -i loki grafana/loki-distributed -n loki -f loki-distributed-values.yaml
Check the status of the corresponding Pod after the installation is complete.
Installed Loki components are gateway, ingester, distributor, querier, query-frontend, query-scheduler, compactor, index-gateway, memcached-chunks, memcached-frontend, memcached-index-queries and memcached-index-writes.
NOTE:
In the second part of this blog, we will talk about how the CloudifyOps team solved the log aggregation challenge for a client using the Loki distributed setup.
CloudifyOps has worked with enterprises and start-ups across industries, successfully helping them manage their cloud infrastructure and optimization challenges. To know how we can help you tackle your cloud issues, write to us at sales@cloudifyops.com today
CloudifyOps Pvt Ltd, Ground Floor, Block C, DSR Techno Cube, Survey No.68, Varthur Rd, Thubarahalli, Bengaluru, Karnataka 560037
Cove Offices OMR, 10th Floor, Prince Infocity 1, Old Mahabalipuram Road, 50,1st Street, Kandhanchavadi, Perungudi, Chennai, Tamil Nadu - 600096
CloudifyOps Inc.,
200, Continental Dr Suite 401,
Newark, Delaware 19713,
United States of America
Copyright 2024 CloudifyOps. All Rights Reserved