Kubernetes Logging Best Practices

You’re sitting at your desk, typing away, when all of a sudden you hear a “ping!” Unfortunately, you have a browser with fifteen tabs open, a task management application, email, messaging applications, and calendars all open, making it difficult to know exactly which technology just pinged you. To identify the source, you open your system settings and look at the notifications section to see which ones you allow to make a sound.

Logging in Kubernetes is the same as the notifications setting on a laptop by providing visibility into ongoing activities in the environment. Kubernetes’ power lies in its dynamic, distributed, and ephemeral nature. However, these same characteristics introduce observability challenges. Effective logging enables IT and security teams to maintain application health, mitigate security risks, and troubleshoot complex issues in production. Understanding how logging works within a cluster and the different types of logs that the clusters generate enables organizations to build an effective collection and analysis strategy.

With insight into Kubernetes logging best practices, IT and security teams can create a clear, coherent source for operational intelligence.

How Does Logging Work in Kubernetes?

Kubernetes logging treats container logs as standard output (stdout) and standard error (stderr) streams. When a containerized application writes a log message to either of these streams, the container runtime captures it. Then Kubernetes exposes these logs through its API.

The kubelet, an agent running on each cluster node, handles managing pods and their containers. When a user requests a specific pod’s logs, the request goes to the Kubernetes API server that proxies it to the appropriate kubelet. The kubelet retrieves the stored logs for that pod’s containers and sends them back to the user.

This approach comes with limitations, including:

Tying logs to the pod’s lifecycle can mean data loss if a pod is evicted, crashes, or deleted.
Storing logs locally on the node makes aggregation and analysis difficult in a multi-node cluster.

What Types of Logs Are in a Kubernetes Cluster?

A Kubernetes cluster is a complex system with multiple layers that each generate valuable log data. By understanding the different log types, organizations can start building a comprehensive observability strategy.

Application Logs

The applications running inside the containers generate these common logs. Typically written to stdout and stderr, these logs provide insight into:

Application behavior: Logs showing functions that are running, requests being handled, and responses.
Errors: Records of exceptions, failed operations, or crashes.
Performance metrics: Timing and resource usage details,like latency, memory, and CPU.
Business logic: Logs tied to key workflows or transactions that show core process behaviors.

Cluster Component Logs

The Kubernetes control plane and node components generate these logs that help when diagnosing cluster health and operation issues. Some examples include logs from:

API server: API requests and responses between cluster components and users.
Scheduler: How pods are assigned to nodes, including scheduling decisions, constraints, and any placement failures.
Controller manager: Controller actions that maintain desired cluster state.
etcd: Cluster state persistence, configuration changes, and replication issues
Kubelet: Node-level operations such as pod lifecycle events, container health checks, and communication with the control plane

Node-Level System Logs

Each node in a cluster is a virtual or physical machine with its own operating system that generates system logs, like journald or syslog. These can provide content for trouble showing low-level issues related to:

Kernel: Messages from the operating system kernel about core processes, driver activity, or crashes.
Networking: Events related to network interfaces, DNS resolution, routing, or connectivity issues between nodes and pods.
Hardware: Physical or virtual hardware problems, like disk I/O errors, CPU throttling, or memory failures.

Audit Logs

Kubernetes API server logs that provide chronological, security-relevant records that answer questions about:

Who took action on the cluster.
When they took action.
Where the action occurred.

Events

Kubernetes events are objects that provide insight into the lifecycle of other cluster objects, like pods, nodes, and deployments. They record information about:

Scheduling events: Pods creation, pending, or successful node assignment.
Lifecycle events: Key transitions like pod start, container restart, or termination.
Warning events: Transient or recoverable errors, like failed image pulls, readiness probe failures, or back-off retries.
Resource updates: Record changes to deployments, replicas, or configuration objects.

What Is the Logging Architecture in Kubernetes?

Often, organizations adopt a cluster-level logging architecture that involves deploying a log forwarding agent on each node to collect, process, and ship logs to a centralized backend.

Node-Level Logging Agent

As the most common approach, this requires using a dedicated agent, like Fluentd, deployed as a DaemonSet to ensure an instance runs on every cluster node. The agent can access and collect the containers’ log files to enrich them with Kubernetes metadata, like pod name, namespace, and label. Then, it forwards them to a centralized logging system.

Sidecar Container

In this pattern, a dedicated logging container runs alongside the pod’s application container. The application container writes the logs to a shared volume, then sidecar container tails, processes, and forwards the log files. While this approach offers improved isolation and allows for application-specific logging configurations, it adds resource overhead for each pod.

Direct Application Logging

With this approach, developers add logging code directly into the application so it can send the log data directly to a centralized backend. Developers have maximum control, but the process tightly couples the application to the logging system. This dependency can impact application performance and bypasses the standard Kubernetes stdout/stderr logging mechanism.

What Should Be Logged in Kubernetes?

A good logging strategy captures actionable data and provides context. Generally, the application logs should include:

Timestamp: An accurate, timezone-aware timestamp for every log event.
Severity Level: A clear indicator of the log’s importance (e.g., INFO, WARN, ERROR, FATAL).
Service/Application Name: Identifies the source of the log message.
Unique Request ID: A correlation ID that can be used to trace a single request as it passes through multiple microservices.
Error Messages and Stack Traces: Detailed information for debugging failures.
Key Business Events: Records of significant application events (e.g., user login, transaction completed).
Performance Data: Information on the duration of critical operations.

What Are Some Challenges Organizations Face with Kubernetes Logging?

Kubernetes introduces several logging challenges that a deliberate strategy can help address.

Lack of log centralization

By default, logs are siloed on individual nodes which creates challenges when trying to get a holistic view of a distributed application. Correlating events from different microservers during a production incident becomes a manual and time-consuming process.

Lack of built-in log management and persistent log storage

Kubernetes lacks a native solution for long-term log storage, rotation, and analysis. Without a persistent storage solution, critical diagnostic information is lost when a container or pod is terminated, hindering debugging and historical analysis.

Diverse and evolving log formats

Development teams may use different programming languages and logging libraries leading to a variety of unstructured or semi-structured logs formats. These various formats introduce challenges when trying to parse and search logs or create dashboards and alerts.

Best Practices for Logging in Kubernetes

By implementing the following best practices, organizations can design a logging architecture that addresses the scale and complexity of modern cloud-native environments.

Set Up a Centralized Logging System

By consolidating logs from all applications, nodes, and cluster components, organizations can monitor systems, troubleshoot issues, and analyze security incidents. Using a node-level agent to collect logs and forward them streamlines this process.

Set Up Log Retention Policies

Defining clear log retention policies based on operational and compliance needs enables organizations to manage storage costs more effectively. Using a solution with built-in data management capabilities enables organizations to manage verbose application logs, like debug or trace logs, that can help troubleshoot performance issues but may not be necessary to daily activities.

Configure Log Rotation and Storage Management

Managing log volume prevents logging from negatively impacting the node’s stability. By configuring the container runtime to rotate log files, organizations prevent them from filling up the node’s disk space.

Implement Structured Logging

Structured logs are key-value pairs that are machine-readable, making them easy to parse, query, and filter. By normalizing all Kubernetes logs, organizations can improve search performance, reduce storage costs, and leverage analytics for alerting.

Use Separate Clusters for Development and Production

By maintaining separate environments, organizations can prevent noisy, high-volume development logs from overwhelming the production logging system. Further, isolation improves security so developers can test logging configurations and updates in a non-critical environment.

Secure and Control Log Access

Since logs contain sensitive information, any centralized logging platform should enable the organization to implement role-based access control (RBAC) that ensures users only view logs for the services they manage. Additionally, the organization should take measures to anonymize or mask sensitive data, like personally identifiable information (PII).

Graylog: Implement Kubernetes Logging Best Practices

Graylog enables organizations to centralize Kubernetes logs and manage them more effectively. With Graylog, organizations have several options for how to rotate Kubernetes logs, including easily setting up an index rotation and specifying when to delete, close, or archive logs. For archival purposes, you can move logs to a new location and compress them to save space.

With Graylog’s scalable, flexible platform, organizations can implement proper log retention and monitoring while minimizing the cost of Kubernetes log storage and management.

Jeff Darrington

Jeff Darrington is Graylog's Director, Technical Marketing. He is a long-time Graylog OS user with extensive experience in IT Operations, IT product solutions deployment in Firewalls, Networking, VOIP, Physical security Controls, and many others.

View More Posts By Jeff Darrington

Get the Monthly Tech Blog Roundup

Subscribe to the latest in log management, security, and all things Graylog blog delivered to your inbox once a month.