A Comprehensive Guide to Setting up Loki in a Distributed Manner on Amazon EKS – Part one

In today’s rapidly evolving cloud-native landscape, observability has become a critical aspect of managing and monitoring applications deployed on Kubernetes. One popular solution gaining traction is Loki, a highly scalable and cost-effective log aggregation system developed by Grafana Labs. Loki offers a powerful approach to efficiently store, search, and analyze logs, allowing developers and operators to gain valuable insights into their applications’ performance and behavior.

In this blog post, we will explore Loki and dive into setting up a distributed Loki architecture on Amazon Elastic Kubernetes Service (EKS), a managed Kubernetes service provided by Amazon Web Services (AWS). We will understand Loki’s architecture and components, and configure Loki in a distributed setup. By leveraging EKS, we can harness the benefits of both Loki and the flexibility of EKS to achieve robust log management and analysis for our containerized applications.

Whether you’re a DevOps engineer, a Kubernetes enthusiast, or simply looking to enhance your log management capabilities on EKS, this blog post will provide you with the knowledge and hands-on guidance needed to get started with Loki’s distributed setup.

Let’s understand how we can unlock the full potential of Loki and revolutionize the way we handle logs in our Kubernetes environments on EKS!

Loki Architecture

Loki Component

Now let us understand these components and their functionalities.

Component	Description
Loki Gateway	It serves as a vital entry point, functioning as a reverse proxy and load balancer for efficient query routing and high availability in the Loki system.
Query Frontend	It acts as a centralized entry point, managing query routing, load balancing, and result aggregation to ensure efficient and responsive log data querying in the system.
Query Scheduler	It efficiently distributes and schedules queries across multiple Queriers in a distributed Loki setup, optimizing resource utilization by routing queries based on load balancing strategies.
Querier	It executes queries by retrieving and processing log chunks from the Ingester or Index Gateway, enabling scalable and parallel query processing across multiple Queriers in a distributed setup.
Ingester	It processes and stores log streams from log shippers, ensuring efficient and reliable storage in a distributed Loki setup.
Distributor	It handles the distribution, replication, and coordination of log chunks across storage nodes to ensure fault tolerance, high availability, and efficient retrieval of log data.
Alertmanager	It processes and manages alerts by applying rules to determine actions, such as sending notifications via email, Slack, or PagerDuty, ensuring reliable and efficient alert handling in a distributed setup.
Compactor	It manages the compaction process by periodically scanning and merging log chunks to improve storage utilization, query performance, and long-term scalability of the system.
Ruler	It processes and evaluates alerting rules against log data, generating alerts based on predefined conditions to enable proactive monitoring and notification of significant events or anomalies.
Index Gateway	It facilitates efficient querying and filtering of logs by managing index-related operations, optimizing log retrieval based on labels or filters to enhance overall query performance in a distributed setup.
memcached-chunks	It caches log data chunks in memory to improve query performance by avoiding disk reads.
memcached-frontend	It acts as a caching layer, storing and retrieving previously executed queries and their results to enhance query performance and reduce execution time.
memcached-index-queries	It accelerates index queries by caching their results, improving the performance of subsequent queries with similar label filters by leveraging the index’s mapping of log labels to log streams.
memcached-index-writes	It caches index writes, improving indexing performance by buffering writes in memory and periodically flushing them to persistent storage.

These Loki components work together to create a distributed log aggregation system that provides scalable log storage, efficient querying, fault tolerance, and high availability. Understanding the role and functionality of each component is essential, the descriptive details of each component can be found here.

Loki Distributed setup vs. Loki-stack

Using Loki in a distributed manner offers several advantages over the traditional Loki stack setup. Here are some key benefits:

Scalability: By deploying Loki in a distributed manner, you can scale your log aggregation system to handle large volumes of logs from multiple sources. Distributed Loki setups allow for horizontal scaling, where you can add more resources and nodes to accommodate increased log ingestion and querying demands. This scalability ensures that your log management solution can grow with your application’s needs.

Fault Tolerance: In a distributed Loki setup, log data is replicated and distributed across multiple storage nodes. This redundancy ensures fault tolerance and high availability. If a node or component fails, the system can continue to operate without interruption because the data is distributed and replicated across multiple nodes. This fault tolerance prevents data loss and ensures that log data remains accessible even in the event of a failure.

Load Balancing: With a distributed Loki setup, you can distribute the load of log ingestion and querying across multiple nodes. This load balancing mechanism improves overall system performance by effectively utilizing available resources and preventing bottlenecks. Load balancing ensures that the system can handle high traffic and large query volumes without compromising performance or responsiveness.

Efficient Resource Utilization: In a distributed setup, Loki components can be deployed on separate nodes or clusters, allowing for optimized resource utilization. Each component can be scaled independently based on its specific resource requirements. This flexibility ensures efficient resource allocation and prevents resource contention, maximizing the overall efficiency of your log management infrastructure.

Improved Query Performance: Distributing the querying workload across multiple Queriers in a distributed Loki setup enables parallel processing of queries. This parallelization enhances query performance and reduces query response times, even when dealing with large volumes of log data. The distributed setup ensures that queries are efficiently distributed and processed, resulting in faster and more responsive log analysis.

Enhanced Availability and Redundancy: A distributed Loki setup provides built-in redundancy by replicating log data across multiple storage nodes. This redundancy ensures that log data is highly available and accessible, even in the face of node failures or network issues. By spreading the data across multiple nodes, the distributed setup provides increased resilience and minimizes the risk of data loss.

Overall, using Loki in a distributed manner offers significant benefits in terms of scalability, fault tolerance, load balancing, query performance, and resource utilization. It allows your log management system to handle increasing log volumes, ensures high availability, and provides a more efficient and robust solution for log aggregation and analysis.

Loki Deployment

Now let’s understand how to deploy Loki distributed with minimal components and understand each working component. Here, we are using S3 bucket to store the Loki logs.

Prerequisites:

An active AWS account in which EKS clusters should be running and S3 bucket should be created.
The kubectl command-line tool installed and configured to interact with your EKS cluster.
Helm 3 installed on your local machine.

Deployment Steps:
Before deploying Loki, we need to configure service account, IAM role and bucket policies so that the Loki pods have access to S3 bucket to push the logs.

Configure IAM role and policy

Create a role with a Trust Relationship and attach the AmazonS3FullAccess policy above called loki-distributed-bucket-role role in this deployment
Add the below Trust Relationship, by Editing te identity provider arn:aws:iam::12345678910:oidc-provider/oidc.eks.eu-west-1.amazonaws.com/id/1847B92748AB2A2XYZ and oidc.eks.eu-west-1.amazonaws.com/id/1847B92748AB2A2XYZ:sub called it ekscloudwatch-eks-cw-rolein this example for serviceaccount in Loki namespace

{

“Version”: “2012-10-17”,

“Statement”: [

{

“Sid”: “”,

“Effect”: “Allow”,

“Principal”: {

“Federated”: “arn:aws:iam::12345678910:oidc-provider/oidc.eks.eu-west-1.amazonaws.com/id/1847B92748AB2A2XYZ”

“Action”: “sts:AssumeRoleWithWebIdentity”,

“Condition”: {

“StringEquals”: {

“oidc.eks.eu-west-1.amazonaws.com/id/1847B92748AB2A2XYZ:sub”: “system:serviceaccount:loki:loki”

}

]

}

Create a Service account using below yaml, by editing the arn of the policy. You need to provide the arn of the role that was provided above.

apiVersion: v1
kind: ServiceAccount
metadata:
name: loki
namespace: loki
annotations:
eks.amazonaws.com/role-arn: “arn:aws:iam::12345678910:role/loki-distributed-bucket-role”

Updating the Bucket policy in which we store the Loki Logs

We need to provide access to the OIDC so that Loki can write and read logs from the S3 bucket. Below is the bucket policy in which we need to update the principal role and the S3 bucket arn.

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Principal”: {
“AWS”: “arn:aws:iam::12345678910:role/loki-distributed-bucket-role”
},
“Action”: “s3:ListBucket”,
“Resource”: “arn:aws:s3:::dframe-loki-distributed”
},
{
“Effect”: “Allow”,
“Principal”: {
“AWS”: “arn:aws:iam::12345678910:role/loki-distributed-bucket-role”
},
“Action”: [
“s3:GetObject”,
“s3:PutObject”,
“s3:DeleteObject”
],
“Resource”: “arn:aws:s3:::dframe-loki-distributed/*”
}
]
}

Deploy Loki through helm.

We have customised the value file loki-distributed-values.yaml, so the values that mustbe modified before deploying are as listed below.
- By default, the authentiation for Loki is enabled, which can be disabled.
- The role in annotation for the serviceaccount needs to be changed

serviceAccount:
# — Specifies whether a ServiceAccount should be created
create: true
# — The name of the ServiceAccount to use.
# If not set and create is true, a name is generated using the fullname template
name: loki
# — Image pull secrets for the service account
imagePullSecrets: []
# — Annotations for the service account
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::121234567890:role/loki-distributed-bucket-role
# — Set this toggle to false to opt out of automounting API credentials for the service account
automountServiceAccountToken: true

The storage class storageClass for the persistance we are using gp2, which can be changed to standard or efs.
The S3 bucket name and region should be changed, where the historical logs will be stored for a long time.
storageConfig:

storageConfig:
boltdb_shipper:
shared_store: aws
aws:
s3: s3://us-east-1
bucketnames: dframe-loki-distributed

Add Loki Helm chart repository:

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

Deploy Loki components:

The custom values files has been uploaded here, for the fast deployment.

helm upgrade -i loki grafana/loki-distributed -n loki -f loki-distributed-values.yaml

Check the status of the corresponding Pod after the installation is complete.

Installed Loki components are gateway, ingester, distributor, querier, query-frontend, query-scheduler, compactor, index-gateway, memcached-chunks, memcached-frontend, memcached-index-queries and memcached-index-writes.

NOTE:

To get the logs, we need to set up promtail or grafana-agent to scrape logs from the cluster. Loki is a log aggregation tool.
By default, Loki is authenticated with username and password. The username and password can be modified in the custom-values file.

In the second part of this blog, we will talk about how the CloudifyOps team solved the log aggregation challenge for a client using the Loki distributed setup.

CloudifyOps has worked with enterprises and start-ups across industries, successfully helping them manage their cloud infrastructure and optimization challenges. To know how we can help you tackle your cloud issues, write to us at sales@cloudifyops.com today.

A Comprehensive Guide to Setting up Loki in a Distributed Manner on Amazon EKS – Part one

Loki Architecture

Loki Component

Loki Distributed setup vs. Loki-stack

Loki Deployment

Services

Solutions

Recent Blogs

Our Services

Our Solutions

Offices

Contact us