What is OpenTelemetry and Why Do Organizations Use it?

What is OpenTelemetry and Why Do Organizations Use it

Mining for information about environments is like trying to find gold. Looking for gold can be sifting through silty waters or blasting through a mine. In some cases, the gold nuggets are so small as to be almost invisible, some things look like gold but aren’t, and others are larger nuggets where the miner strikes it rich.

Trying to understand how a distributed system works means sifting through vast amounts of telemetry, looking for patterns. Like with pyrite, the telemetry may look useful but really has as much value as fool’s gold. Even more challenging, aggregating and correlating this telemetry requires creating consistent data formats and schemas so that the organization can analyze the information to gain insights.

Backed by the Cloud Native Computing Foundation (CNCF), OpenTelemetry offers a unified framework for generating, collecting, and exporting telemetry data, enabling organizations to build vendor-neutral observability pipelines.

 

What Is OpenTelemetry?

OpenTelemetry (OTel) is an open-source observability framework comprising a collection of APIs, SDKs, and tools that standardize the telemetry data that infrastructures and cloud-native applications generate, like:

  • Traces: data about how applications manage requests.
  • Metrics: measurements about services captured at runtime.
  • Logs: timestamped text records and their metadata.

OpenTelemetry separates the data generations from the backend that stores and analyzes it. The standard defines a consistent method and data format for how applications produce and export telemetry data, creating a vendor-agnostic pipeline for traces, metrics, and logs.

 

What Are the Three Pillars of Telemetry Data?

OpenTelemetry cohesively handles the observability’s three core data types. Each pillar provides different insight into system behavior for comprehensive visibility into performance.

Traces

Traces record a single request’s journey as it travels through a distributed system’s various services. Each trace consists of one or more spans with each one representing a single unit of work or operation across the journey, like an API call or database query. Spans can be enriched with metadata, timestamps, and events. Visualizing traces helps answer questions about why a request is slow or failing to:

  • Pinpoint latency bottlenecks.
  • Understand service dependencies.
  • Identify an error’s root cause.

Metrics

Metrics are numerical measurements aggregated over time that describe a system’s health and performance. They help with:

  • Monitoring trends.
  • Creating dashboards.
  • Triggering alerts.

Storing and querying metrics helps answer questions about the current state of a system’s resources.

Logs

Logs are timestamped structured or unstructured text records of discrete events that occurred within an application or system. They provide granular details about:

  • Specific error messages.
  • Stack traces.
  • Application state at a specific moment in time.

Logs provide the details and evidence needed during a forensic review to answer questions about what happened at a specific point in time.

 

What Is the OpenTelemetry Architecture?

OpenTelemetry is built around a modular and flexible architecture that provides a complete telemetry data pipeline. Understanding how the architecture’s key components fit together enables organizations to implement an effective observability strategy.

Application Programming Interfaces (APIs) and Software Development Kits (SDKs)

The APIs and SDKs are the foundation for instrumentation. Developers can use them to generate telemetry data directly from the application code.

The API defines what the application calls to produce telemetry. Since this is a stable specification, the application’s telemetry producing code stays the same even if the developers:

  • Switch SDKs
  • Change exporters
  • Reconfigure the pipeline
  • Send data to a different backend

The OpenTelemetry SDK  defines how the application processes, enriches, and delivers the telemetry data. Since SDKs are configurable, organizations can change how the telemetry behaves without rewriting the application code. Developers use SDKs to control things like:

  • How telemetry is collected, batched, or sampled
  • Which exporters send data and where they send it
  • What metadata or resource attributes get added
  • How context flows through services and requests

The OpenTelemetry Protocol (OTLP)

OTLP is an efficient, vendor-neutral protocol that sends telemetry data using common network transports like gRPC or HTTP/1.1 that uses a unified schema so traces, metrics, and logs have the same structure. By standarding the data format and transport method, OTLP makes the telemetry data easier for a backend to ingest, store, and correlate.

The OpenTelemetry Collector

The OpenTelemetry Collector is a vendor-neutral service that receives, processes, and forwards telemetry data from across the environment. To help teams centralize data, it supports common protocols including:

  • OTLP
  • Jaeger
  • Zipkin

With a configurable pipeline for receiving, transforming, filtering, and exporting telemetry, the Collector simplifies observability operations and reduces the need for multiple agents or custom integrations. This standardization makes routing data to any backend flexible and scalable.

 

What Are the Benefits of OpenTelemetry?

For organizations that want observability across their environment, OpenTelemetry offers the following benefits:

  • Vendor-agnostic telemetry collection: Choose any tool or backend for storing, querying, visualizing, and analyzing telemetry data.
  • Unified telemetry formats: Correlate events across systems, services, and environments to reduce data silos.
  • Lower operational complexity: Use shared APIs, SDKs, and OpenTelemetry Collector to reduce agents and proprietary prebuilt code to simplify the observability pipeline.
  • Improved portability and future-proofing: Maintain flexibility as tooling evolves with an open standard supported across cloud providers, vendors, and programming languages.
  • Better cross-service visibility: Use end-to-end context propagation and trace modeling to understand request flows, identify bottlenecks, and troubleshoot distributed systems more effectively.

 

What Challenges Do Organizations Face Implementing OpenTelemetry?

While OpenTelemetry offers various benefits, organizations can face the following challenges when implementing it across the environment:

  • Volume and backend cost management: Without guardrails, the “collect everything” philosophy can lead to ingest volumes that increase storage and query costs.
  • Backend selection and tool sprawl: Choosing where to store telemetry can be difficult when balancing Application Performance Monitoring (APM), Security Information and Event Management (SIEM), log platforms, and self-hosted options.
  • Data governance and security controls: Protecting telemetry’s sensitive information requires implementing rules for redaction, attribute filtering, encryption, access control, and compliance alignment across teams.
  • Cross-team standardization and coordination: Organizations must define naming conventions, schema rules, sampling policies, and logging standards across various teams like application, infrastructure, Site Reliability Engineering (SRE), and security teams.

 

Best Practices for Using OpenTelemetry for Observability and Security

The telemetry data that infrastructures and applications generate enables observability for operations but also provides visibility for security teams. For example, traces and metrics can reveal unusual application request patterns, spikes, failures, or execution paths that may indicate an attack like credential stuffing or probing.

To use OTLP logs for both operations and security, organizations should consider these best practices.

Define a Signal Strategy and Retention Model

Organizations should classify the telemetry signals that matter the most to their operations, security monitoring, and compliance objectives. Some data is easier to prioritize for immediate access, like error logs and security-relevant spans. However, organizations may want to either forward data to accessible storage or archive it.

When looking for a solution to manage both operations and security, organizations should consider one that provides flexible, cost-efficient data tiering that aligns with retention and compliance goals like:

  • Hot, warm, and archive tiers to store data based on real-time vs long-term needs
  • Automated movement of data between tiers to reduce storage spending without losing visibility
  • Efficient search and retrieval across all tiers for investigations and compliance audits

Use the OpenTelemetry Collector for Centralized Processing and Routing

Centralizing telemetry ingestion and transformation enables organizations to send data through a single, governed pipeline before being stored or analyzed. With unified pipelines, organizations can:

  • Normalize data
  • Attach key attributes
  • Redact sensitive fields
  • Route signals to a specific destination.

When looking for a solution to manage both operations and security, organizations should consider one that provides native OTLP support for seamless ingestion from the OpenTelemetry Collector like:

  • Direct OTLP/gRPC inputs that eliminate translation layers and reduce pipeline complexity
  • Automatic mapping of trace context and resource attributes into searchable fields
  • Reliable, high-throughput ingestion that supports large OTel deployments without data loss

Enrich Telemetry With Context For Better Correlation

To improve correlation across systems, organizations should enrich logs, traces, and metrics with contextual metadata like:

  • Service name
  • Environment
  • User identifier
  • Version number

OpenTelemetry natively incorporates context by attaching trace IDs, span IDs, and resource attributes across all services. This capability improves root-cause analysis and helps security teams trace suspicious activity across distributed systems.

When looking for a solution to manage both operations and security, organizations should consider one that provides built-in enrichment and correlation capabilities.

  • Automatic indexing of trace IDs, span IDs, attributes, and metadata from OTel logs
  • Fast, contextual search across logs, attributes, and time ranges for investigations
  • Correlation workflows that connect anomalies, errors, user activity, and service behavior

Align Observability and Security Through Governance and Cost Control

As telemetry volume grows, SRE, DevOps, and security operations should create a shared governance model that defines:

  • Sampling rules
  • Naming conventions
  • Data schemes
  • Permissions boundaries

Using OpenTelemetry’s vendor-neutral schemas and consistent formats, organizations can more easily apply these policies to ensure they balance cost and insight while maintaining data integrity, privacy, and security.

When looking for a solution to manage both operations and security, organizations should consider one that provides strong governance, access control, and cost-optimized scalability with:

  • Role-based access controls and fine-grained permissions for secure multi-team use
  • Pipeline rules for filtering, redaction, and normalization to protect sensitive data
  • Scalable architecture with predictable pricing for high-volume OTel telemetry

 

Graylog: Enabling OpenTelemetry Use for Security and Operations

By combining OpenTelemetry’s vendor-neutral signal collection with Graylog’s scalable backend, organizations gain a cost-efficient foundation for ingesting, analyzing, and correlating high-volume telemetry without overspending. With this integrated approach, teams can investigate incidents faster, uncover operational issues earlier, and continuously strengthen both performance and security posture.

Graylog enriches OpenTelemetry data so that security, IT, operations, and SRE teams have access to contextual analysis for powerful search capabilities while leveraging tiered storage to optimize cost. With flexible routing, strong governance, and seamless integration, organizations can adopt OTel at scale while keeping costs predictable.

Categories

Get the Monthly Tech Blog Roundup

Subscribe to the latest in log management, security, and all things Graylog blog delivered to your inbox once a month.