A Comprehensive Guide to Setting up Loki in a Distributed Manner on Amazon EKS (Part Two)

In the previous blog, we learned about the components of the distributed Loki architecture. Let us now see how the CloudifyOps team used Loki distributed configuration on Amazon EKS to solve log management challenges for a client with a SaaS platform product, and achieved scaling, high availability, and multi-tenancy.

The client’s SaaS platform supports a number of tenants, each of whom runs their own apps and produces a sizable amount of logs. It is essential to have a reliable log management system in place to gather, archive, and individually analyze logs for each tenant. Furthermore, we must guarantee the distributed infrastructure’s scalability, security, and effective log querying.

The challenges tackled in this context involve ensuring data privacy and compliance, maintaining robust log isolation and separation between tenants, effectively managing the increasing volume of logs generated by multiple tenants as the platform expands, and optimizing query performance to retrieve logs for specific tenants from a distributed infrastructure. Additionally, it is crucial to maintain the security and privacy of records while adhering to compliance standards, including access restrictions, data retention policies, and audit trail requirements.

In the following sections, we will delve into the key features that played a significant role in achieving an effective solution for log aggregation management.

Scaling and High Availability: Ensuring Resilience in Loki’s Distributed Setup

In a distributed Loki setup, scaling plays a crucial role in ensuring distributed workloads, optimal performance, balancing traffic, and utilizing resources efficiently. This helps organizations handle growing log volumes effectively, improve query response times, and maintain high availability.

By scaling key Loki components such as the Gateway, Query Frontend, Querier, Distributor, and Ingester, we can achieve optimal performance, scalability, and resilience in a distributed Loki setup. Implementing scaling strategies for these components enables seamless operation and supports the continuous growth of log data in Loki. We will explore scaling and examine how it was utilized for these crucial Loki components.

The Gateway component acts as the entry point for ingesting logs into Loki. By scaling the Gateway, you can distribute incoming log traffic across multiple instances. This ensures efficient log ingestion, reduces bottlenecks, and increases the overall system throughput. Scaling of the Gateway also provides fault tolerance, as traffic can be routed to available pods even if some of them experience failures or become overloaded.

The Query Frontend is responsible for handling log queries and serving query results to users. Scaling of the Query Frontend enables distributing query processing across multiple instances, improving query response times and accommodating a larger number of concurrent queries. It also enhances system availability by allowing the load to be balanced across multiple Query Frontend instances.

The Querier component performs the actual log querying by interacting with the storage backend and retrieving relevant log data. Scaling of the Querier allows for distributing query execution across multiple instances, enabling faster and more efficient log retrieval. It also provides fault tolerance, ensuring that log queries can still be processed even if some Querier instances are unavailable or experiencing issues.

The Distributor is responsible for managing the distribution of log chunks across the storage backend. By scaling the Distributor, you can distribute the workload of chunk distribution across multiple instances, improving overall system performance. This ensures efficient storage utilization and enables the system to handle larger log volumes seamlessly.

The Ingester component is responsible for receiving log data and forwarding it to the storage backend. By scaling the Ingester, you can distribute the log ingestion workload across multiple instances, enabling faster and parallel ingestion of logs. This helps handle increased log volumes efficiently and ensures smooth log data transfer to the storage backend.

Deploying Loki components on EKS with scaling enabled is a straightforward process. The helm values file for each component offers an autoscaling option that can be easily enabled. By specifying the maximum and minimum replica counts and defining the CPU and memory thresholds, scaling can be automatically triggered when the specified resource utilization percentages are reached. This simplifies the management of Loki deployment, ensuring that the system dynamically scales to meet the demands of log ingestion and query processing.

Let us see how scaling is enabled for Loki gateway component in the helm value file.

# Configuration for the gatewayactivity-api-es

gateway:

# — Specifies whether the gateway should be enabled

enabled: true

# — Number of replicas for the gateway

replicas: 3

# — Enable logging of 2xx and 3xx HTTP requests

verboseLogging: true

autoscaling:

# — Enable autoscaling for the gateway

enabled: true

# — Minimum autoscaling replicas for the gateway

minReplicas: 3

# — Maximum autoscaling replicas for the gateway

maxReplicas: 30

# — Target CPU utilisation percentage for the gateway

targetCPUUtilizationPercentage: 90

# — Target memory utilisation percentage for the gateway

targetMemoryUtilizationPercentage: 90

Below is the Loki capacity planning table, which helps to plan the resource utilization for each component based on the queries, samples and series.

Log management for a multi-tenant platform

Managing logs efficiently across multiple tenants is crucial for ensuring operational visibility, troubleshooting, and compliance. Loki, a powerful log aggregation system, provides a robust solution for multi-tenant log management. We will explore how Loki distributed setup was leveraged to centralize logs from multiple tenants while maintaining data isolation, security, and scalability.

One of the key considerations in multi-tenant log management is maintaining data isolation between tenants. Loki addresses this challenge by providing log isolation through labels and tenants. Each tenant can have its own set of labels, allowing logs to be tagged and categorized based on their source and context. This ensures that logs from different tenants are segregated, preventing data leakage and maintaining strict data privacy.

Loki acts as a centralized log aggregator, collecting logs from various sources across multiple tenants. Loki’s architecture allows for scalable log ingestion and storage, enabling efficient log aggregation across diverse environments.

Loki enables tenant-specific log queries, allowing each tenant to search and analyze their own logs independently. By leveraging Loki’s query language and powerful filtering capabilities, tenants can extract actionable insights from their logs without interference from other tenants. This ensures that each tenant has full control over their log data and can perform targeted troubleshooting or analysis.

Managing log retention and compliance requirements is critical in a multi-tenant environment. Loki provides flexible options for log retention, allowing you to define retention periods based on tenant-specific policies. By integrating with compliance frameworks and leveraging distributed storage backends, such as Amazon S3 or Google Cloud Storage, Loki ensures that logs are securely stored, tamper-proof, and compliant with data retention regulations.

Securing access to tenant logs is vital in multi-tenant log management. Loki supports various authentication mechanisms, including integration with external identity providers, such as LDAP or OAuth, and fine-grained access control based on user roles and permissions. This enables granular control over log access, ensuring that only authorized users can view and interact with tenant-specific log data.

As log volumes grow across multiple tenants, ensuring scalability and performance becomes crucial. Loki’s distributed architecture and horizontal scaling options allow expansion to handle increasing log workloads seamlessly. By leveraging features such as horizontal scaling of Loki components, load balancing, and auto-scaling, organizations can maintain optimal performance and accommodate growing log volumes efficiently.

Multi-tenant log management using Loki distributed setup empowered the client’s Saas platform to centralize log aggregation, achieve data isolation, and enhance operational visibility across multiple tenants. By leveraging Loki’s features for data isolation, centralized log aggregation, tenant-specific queries, log retention, access control, and scalability, the solution was able to efficiently manage logs, streamline troubleshooting, ensure compliance, and deliver better services to their tenants. Implementing Loki as a multi-tenant log management solution paved the way for enhanced operational efficiency and improved log analysis capabilities.

CloudifyOps has worked with enterprises and start-ups across industries, successfully helping them manage their cloud infrastructure and optimization challenges. To know how we can help you tackle your cloud issues, write to us at sales@cloudifyops.com today.

A Comprehensive Guide to Setting up Loki in a Distributed Manner on Amazon EKS (Part Two)

Scaling and High Availability: Ensuring Resilience in Loki’s Distributed Setup

Log management for a multi-tenant platform

Services

Solutions

Recent Blogs

Our Services

Our Solutions

Offices

Contact us