Graylog Resource Library

Introduction to Data Routing

Overview of Data Routing

Today, we’re discussing the data routing feature, which you’re probably familiar with. This is how Graylog works currently: data comes from the log source to the input, then either stream rules or pipeline rules send it to the appropriate stream. From there, Illuminate does its processing, and the pipeline bus processes the data. The data then lands in the stream, and from there, it goes to any attached outputs and OpenSearch.

Data Routing Enhancements

Addition of Multi-Destination Routing

What we’ve added today is a new feature called “Routing.” It’s like a multi-splitter, allowing you to turn on or off any of three different destinations. We’ve also added filter rules, which act as exclusions, giving us even finer control over where messages in a stream are sent.

Data Warehouse as a New Destination

You’ll notice a new destination: the Data Warehouse. This is essentially a data lake where customers can store data they don’t actively need but want to keep for compliance reasons. The data in the Data Warehouse doesn’t count against their license, making it an economical storage option for less critical data.

Routing, Filtering, and Practical Examples

Simple Overview of the Process

To summarize, we process the data, then we route it, then we filter it, and finally, it goes to one of the destinations.

Configuring Routing in Graylog

Let me show you what this looks like in practice. We have a list of streams, and for each stream, we can see whether routing is already turned on. For example, in the default stream, we see which destinations the data is being routed to. By default, data is routed to the index set, but not to the Data Warehouse or outputs. If I want to route data to the Data Warehouse, I can simply turn it on, and the data will now also be saved there.

Data Warehouse and Filtering Rules

Viewing Data in the Data Warehouse

Every five minutes or so, you can see updates in the Data Warehouse, such as the number of messages, file sizes, and timestamps. This reassures us that the data is being properly saved.

Excluding Data with Filter Rules

Next, we can use the filter rule interface to exclude specific data from going to a particular destination. For example, if we want to exclude certain HTTP messages, we can set up a rule that drops messages with a specific field, like “Source.” With this filter rule applied, those messages won’t be saved in the Data Warehouse.

Practical Example of License Management

Helping a Customer Reduce Data Usage

Let’s imagine a customer named Trent is upset about going over his license usage. He doesn’t need all his data, especially not the delete logs, so we can help him route those delete logs to the Data Warehouse instead of OpenSearch.

Configuring a Stream for Delete Logs

To do this, we go to the stream in question, like “Splendid Stream One,” and configure the routing. We add a filter rule to send only the delete logs to the Data Warehouse. After setting this up, if we search for those delete logs in OpenSearch, they won’t appear because they’ve been routed to the Data Warehouse instead.

Retrieving Data from the Data Warehouse

Bringing Data Back from the Data Warehouse

Now, if the customer needs to retrieve data from the Data Warehouse, for example, today’s data, we can easily do that. By clicking the “Retrieve” button for the stream in question, Graylog retrieves the data from the Data Warehouse. The retrieval process runs in the background, and within seconds, the data reappears in the stream, as if by magic.

Setting Up the Data Warehouse

Engineer’s Guide to Configuring the Data Warehouse

Finally, let’s walk through setting up the Data Warehouse. As an engineer helping a customer onboard, we go to the overview screen where we can configure the backend. Whether it’s a file system or an S3 bucket, setting it up is simple. Just provide the file path (which needs to be a shared network drive), name the Data Warehouse, and click “Activate”—and that’s it. The Data Warehouse is ready to use.

Graylog Data Routing Feature