Kubernetes 101

When you get behind the wheel of your car, one of the first things you see is the dashboard. Your dashboard provides various information about all the different technologies that make the car run smoothly, like helping you control your speed, providing insight into your fuel levels, and offering suggestions for regular maintenance, like oil changes.

For developers, Kubernetes acts as that one-glance dashboard to provide insights about container performance, maintenance needs, and storage requirements. Kubernetes gives you a single location for managing your containers across their lifecycle, including helping you schedule activities and controlling access to shared resources.

When using Kubernetes, you should understand how it can improve your applications’ performance and best practices for logging.

What is Kubernetes and why is it used?

Kubernetes is an open-source platform that simplifies complex tasks associated with containerized applications by automating their deployment, scaling, and management. Kubernetes provides a centralized control plane to coordinate containers, reducing the need to manage them individually.

Some of the Kubernetes key capabilities include:

Automatic, on-demand scaling of applications to ensure efficient resource allocation and minimal downtime
Seamless operation across various environments, including on-premises, public, private, and hybrid clouds
Ability to transform legacy applications into more efficient, cloud-native deployments with a microservices architecture

Kubernetes vs. Docker

Although developers often use Docker and Kubernetes together, they serve different purposes:

Docker: a container engine for packaging and deploying applications in standard Docker containers
Kubernetes: a container orchestration platform to automate container deployment, scaling, and management across a Cluster of nodes

Kubernetes supports multiple container runtimes with Docker only being one type. Since Kubernetes focuses on managing and orchestrating containers, its ability to handle Docker and other runtimes enables flexibility across different technology stacks.

Understanding the Kubernetes vocabulary

Before you can understand how Kubernetes works, you need to understand the different terms used to describe the different components and mechanisms.

Pod

A Pod is the smallest deployable unit, consisting of one or more containers that share resources like networking and storage. Each Pod has a unique IP address, similar to how a virtual machine with applications running as containerized processes. Developers can manage Pods by using attribute labels, enabling them to organize and separate environments like development and production.

Cluster

Clusters are sets of physical or virtual Nodes that run the workloads that the control plane manages. Clusters allow efficient resource scaling and management, with each Node hosting Pods that share resources and networking. Inside a Kubernetes Cluster, you have various components that enable the overall management, like:

Kubernetes API: handles all internal and external requests
Kubernetes scheduler: understands a pod’s resource needs then schedules it to the right compute Node
Controller: coordinates with the scheduler and manages the Cluster to maintain availability
Etcd: stores configuration and Cluster-state data

Operator

Operators act as application-specific controllers, automating tasks like deployment, scaling, version upgrades, and hardware Node configurations. They incorporate domain knowledge to handle the software lifecycle with high efficiency.

Secrets

Secrets provide secure storage for sensitive information, such as OAuth tokens and SSH keys, so that developers can apply the principle of least privilege to control applications’ access to data. Using Secrets is more secure than embedding data in pod definitions or container images.

Service

Services are abstractions that provide a stable IP address and logical way to distribute network traffic to a set of Pods. Services enable you to create a single outward-facing endpoint for workloads divided into multiple backends. Some typical Services include:

ClusterIP: assigns each Service inside the Cluster a unique IP address
NodePort: adds a port to a Node’s IP address so the Services can be access from outside the Cluster
LoadBalancer: integrates with the cloud provider’s load balancer to distribute traffic across Pods

Kubectl

Kubectl is the command-line interface (CLI) for interacting with Kubernetes Clusters. It uses standardized command syntax so that you can manage the Cluster operations through the Kubernetes API.

Klog

Klog is the Kubernetes logging library that generates log messages for the system components. Logs can be formatted to provide more or less data, offering details that include step-by-step event traces like:

HTTP access logs
Pod state changes
Controller actions
Scheduler decisions

How Kubernetes works

Kubernetes operates as a control plane, overseeing each Node within a Cluster. Developers use Kubernetes to create a self-contained environment that includes all the components necessary for running applications, including:

Applications
Dependencies
Libraries
Configuration files

The Master Node controls and manages the Cluster, enabling you to detect and respond to Cluster events from the control plane that acts as a collection of processes for managing all Kubernetes Nodes.

Worker Nodes contain all the Services related to managing Pod life cycle, enabling you to run the applications and workloads. Within all these Nodes, you have Kubelets, mini-applications that communicate with the Master Node and execute actions. To handle the Cluster’s internal and external network communications, each Node contains the network proxy, kube-proxy.

What are the benefits of Kubernetes?

Using Kubernetes enables you to optimize the value of your container technologies across diverse environments. Some primary benefits Kubernetes provides include:

Performance improvements: Automated load balancing efficiently distributes network traffic to maintain application performance.
Cost reductions: Optimizing hardware resource allocation saves money while ensuring that applications run smoothly across diverse infrastructures.
Scalability and flexibility: On-demand, automated resource adjustments improve resource allocation and management.
Rapid deployment: Interoperability across various cloud providers and container runtimes enables container orchestration at scale for faster development and deployment.

Best Practices for Kubernetes Logging

Kubenetes logs provide insights into the applications running in your Pods and the Kubernetes systems components. You can use logs for:

Troubleshooting and debugging
Monitoring performance and system health
Tracking interactions and movements within the Cluster
Monitoring security and documenting compliance

Checkout this Graylog video: Demystifying Kubernetes for Security Analytics: Enhancing TDIR for Cloud Deployments

Collect all logs

A Kubernetes Cluster generates various logs, including ones from the control plane and individual applications. You should ensure that you capture the following logs:

Application logs: application-level events, errors, and transactions for insights into application runtime state
Kubernetes events: system-level activities, like pod creation and Node failure
Kubernetes Cluster components: orchestration-layer insights to track events like pod scheduling, API requests, and Node communications
Kubernetes Ingress: traffic entering a Cluster for insights into application accessibility, performance, and security
Kubernetes audit logs: Cluster activity for insights into security and policy enforcement

When configuring the processes to log, you should ensure that you:

Turn on logging for all application areas
Configure all application logs, including error, warning, and info
Configure messages to provide enough context
Review for meaningful information to prevent overly detailed logging that can become expensive
Pre-fix logs with metadata to make searching easier

Centralize log storage

By centrally aggregating and storing your Kubernetes logs, you can correlate them with data from across your environment, including devices and servers. When considering a centralized storage and management solution, you should consider using a solution with data routing that allows you to process data efficiently across:

Active data: quick access to data necessary for real-time analysis, alerting, and troubleshooting
Standby data: ability to retrieve data on an as-needed basis but not immediately necessary
Long-term storage: historical data stored in cost efficient locations, like data lakes

With data tiering, you no longer need to worry about precisely defining log sources and types because you can cost-effectively maintain more data.

Normalize log data

Prior to v1.19, Kubernetes’ control plane had no way to provide a uniform structure for log messages and references. However, the v.1.19 release introduced new methods for the klog library for a more structured format. The structure logging capability supporters (key,value) pairs and object references.

However, you should consider normalizing all your log data, including Kubernetes and Docker, to a standard format. Using a consistent format enables you to correlate and analyze the log data more effectively.

Integrate Kubernetes logging into overarching monitoring

With the data normalized, you can implement holistic monitoring across your production and development environments. With centralized logging and monitoring, you can:

Identify issues within your deployment more effectively
Investigate the root cause of an issue faster
Incorporate Kubernetes security into your larger security and compliance monitoring

For example, you can create security alerts for abnormal activity indicating unauthorized access from compromised Pods or third-party component vulnerabilities that attackers can use as an entryway into your networks and systems.

Graylog: Comprehensive Logging, Monitoring, and Security Alerting for Kubernetes

With Graylog, you can integrate Kubernetes monitoring and security into your overarching monitoring. Graylog ingests, parses, and normalizes your Kubernetes logs so you can correlate and analyze data. By standardizing log formats across your entire environment, Graylog Enterprise introduces automation that rapidly identifies issues and enables faster investigations to reduce downtime.

Graylog Security offers a robust threat detection and incident response (TDIR) solution that simplifies daily security activities. With our pre-built content, you gain a library of curated event definitions, alerts, and dashboards that help uplevel your security and compliance initiatives. Our content packs and Threat Coverage Widget allow you to map detection to the MITRE ATT&CK Framework and improve alert fidelity so that you can reduce key security metrics, like mean time to detect (MTTD) and mean time to investigate (MTTI).

To complete your Kubernetes monitoring capabilities, you can leverage Graylog API Security. When you build your Kubernetes Cluster, you can integrate Graylog API Security to help discover new APIs and capture runtime analysis data for improved attack detection and API failures.

To learn how Graylog can improve your Kubernetes security, contact us today.

Jeff Darrington

Jeff Darrington is Graylog's Director, Technical Marketing. He is a long-time Graylog OS user with extensive experience in IT Operations, IT product solutions deployment in Firewalls, Networking, VOIP, Physical security Controls, and many others.

View More Posts By Jeff Darrington

Get the Monthly Tech Blog Roundup

Subscribe to the latest in log management, security, and all things Graylog blog delivered to your inbox once a month.