Introduction to Data Routing
Overview of Data Routing
Today, we’re discussing the data routing feature, which you’re probably familiar with. This is how Graylog works currently: data comes from the log source to the input, then either stream rules or pipeline rules send it to the appropriate stream. From there, Illuminate does its processing, and the pipeline bus processes the data. The data then lands in the stream, and from there, it goes to any attached outputs and OpenSearch.
Data Routing Enhancements
Addition of Multi-Destination Routing
What we’ve added today is a new feature called “Routing.” It’s like a multi-splitter, allowing you to turn on or off any of three different destinations. We’ve also added filter rules, which act as exclusions, giving us even finer control over where messages in a stream are sent.
Data Warehouse as a New Destination
You’ll notice a new destination: the Data Warehouse. This is essentially a data lake where customers can store data they don’t actively need but want to keep for compliance reasons. The data in the Data Warehouse doesn’t count against their license, making it an economical storage option for less critical data.
Routing, Filtering, and Practical Examples
Simple Overview of the Process
To summarize, we process the data, then we route it, then we filter it, and finally, it goes to one of the destinations.
Configuring Routing in Graylog
Let me show you what this looks like in practice. We have a list of streams, and for each stream, we can see whether routing is already turned on. For example, in the default stream, we see which destinations the data is being routed to. By default, data is routed to the index set, but not to the Data Warehouse or outputs. If I want to route data to the Data Warehouse, I can simply turn it on, and the data will now also be saved there.
Data Warehouse and Filtering Rules
Viewing Data in the Data Warehouse
Every five minutes or so, you can see updates in the Data Warehouse, such as the number of messages, file sizes, and timestamps. This reassures us that the data is being properly saved.
Excluding Data with Filter Rules
Next, we can use the filter rule interface to exclude specific data from going to a particular destination. For example, if we want to exclude certain HTTP messages, we can set up a rule that drops messages with a specific field, like “Source.” With this filter rule applied, those messages won’t be saved in the Data Warehouse.
Practical Example of License Management
Helping a Customer Reduce Data Usage
Let’s imagine a customer named Trent is upset about going over his license usage. He doesn’t need all his data, especially not the delete logs, so we can help him route those delete logs to the Data Warehouse instead of OpenSearch.
Configuring a Stream for Delete Logs
To do this, we go to the stream in question, like “Splendid Stream One,” and configure the routing. We add a filter rule to send only the delete logs to the Data Warehouse. After setting this up, if we search for those delete logs in OpenSearch, they won’t appear because they’ve been routed to the Data Warehouse instead.
Retrieving Data from the Data Warehouse
Bringing Data Back from the Data Warehouse
Now, if the customer needs to retrieve data from the Data Warehouse, for example, today’s data, we can easily do that. By clicking the “Retrieve” button for the stream in question, Graylog retrieves the data from the Data Warehouse. The retrieval process runs in the background, and within seconds, the data reappears in the stream, as if by magic.
Setting Up the Data Warehouse
Engineer’s Guide to Configuring the Data Warehouse
Finally, let’s walk through setting up the Data Warehouse. As an engineer helping a customer onboard, we go to the overview screen where we can configure the backend. Whether it’s a file system or an S3 bucket, setting it up is simple. Just provide the file path (which needs to be a shared network drive), name the Data Warehouse, and click “Activate”—and that’s it. The Data Warehouse is ready to use.