Cyber Defense with MITRE Framework | Graylog + SOC Prime | On-Demand Webinar >> ​

Getting Started with GROK Patterns

If you’re new to logging, you might be tempted to collect all the data you possibly can. More information means more insights; at least, those NBC “the more you know” public services announcements told you it would help. Unfortunately, you can create new problems if you do too much logging. To streamline your log collection, you can apply some filtering of messages directly from the log source. However, to parse the data, you may need to use a Grok pattern.

 

If you’re just getting started with Grok patterns, you might want to know what they are, how they work, and how to use them.

What is Grok?

Used for parsing and analyzing log data, Grok is a tool in the Elasticsearch, Logstash, and Kibana (ELK) stack that helps extract structured data from unstructured log messages. Grok uses regular expressions or pattern matching  to define pattern definitions, enabling users to separate log message fields to more easily analyze the data.

 

With Grok, users can define patterns to match any type of log message data, including:

  • Email addresses
  • IP addresses
  • Positive and negative integers
  • Sets of characters

 

Grok has a regular expression library and built-in patterns to make getting started easier. However, users can also create pattern files and add more patterns. With the filter plugins, users can apply patterns to log data in a configuration file.

 

Grok patterns follow the Elastic Common Schema (ECS), enabling users to normalize event data at ingest time to make querying the data sources easier. Grok is particularly effective with log formats written for human rather than computers, like:

  • Syslog logs
  • Apache and other web server logs
  • Mysql log

How does it work?

Grok patterns use regular expressions to match patterns in log messages. When the Grok filter finds a match, it separates the matched data into fields.

Regular expressions (regex)

Regex consists of a character sequence that defines a search pattern, enabling complex search and replace operations. The process works similarly to the “Find and Replace With” functions in Word and Google Docs.

 

You should also keep in mind that regular expressions:

  • Use a syntax of specific characters and symbols to define a pattern
  • Can match patterns in strings, making them useful for processing and filtering large amounts of data
  • Can replace parts of a string that match a pattern for advanced manipulation and editing

 

Programming languages that use regular expressions, like Grok patterns, provide built-in libraries.

Grok basics

The fundamentals of parsing log messages with Grok patterns are:

  • Defining patterns: using regex syntax for predefined or custom patterns that include alphanumeric characters, sets of characters, single characters, or utf-8 characters
  • Matching patterns: using filter plugins to match and extract pattern-defined relevant fields from log messages
  • Pattern files: adding predefined or custom patterns to files for sharing across multiple projects or teams
  • Composite patterns: combining multiple predefined or custom patterns into a single pattern for more complex log parsing that simplifies the parsing process and reduces the overall number of partners needed

 

Using Grok patterns

Grok patterns are essential to processing and analyzing log data because they enable you to extract and categorize data fields within each message. Parsing data is the first step toward normalizing it which is ultimately how you can correlate events across your environment.

Normalizing diverse log data formats

Log data comes in various formats, including:

  • CSV
  • JSON
  • XML

 

Further, you need visibility into diverse log types, including:

  • Access logs
  • System logs
  • Application logs
  • Security logs

 

With Grok patterns, you can parse these logs, extracting the defined fields no matter where it’s contained in the technology-generated format. Since you’re focusing on the type of information rather than the message itself, you can now correlate and analyze the data.

Debugging Grok expressions

Getting the regular expressions for parsing the log files can be challenging. For example, the username could be represented as either:

  • USERNAME [a-zA-Z0-9._-]+

or

  • USER %{USERNAME}

 

Debugging Grok expressions can be a little bit of trial and error, as you compare your expressions to the log files you want to parse.

 

However, you can find online applications to help you construct a regular expression that matches your given log lines. Some examples of these tools include:

  • Incremental construction: prompting you to select common prefixes or Grok library patterns then running the segment against the log lines
  • Matcher: testing Grok expressions against several log lines simultaneously to determine matches and highlighting unmatched data
  • Automatic construction: running Grok expressions against log lines to generate options

 

Managing log data that doesn’t fit a defined pattern

Not every log message will have data that fits the defined pattern. Grok manages this in a few different ways:

  • Ignoring lines in log data outside the defined pattern to filter out irrelevant or corrupted entries
  • Adding custom tags to unmatched entries to identify and track issues with log data or categories entries based on custom criteria
  • Using a separate log file or database table for further analyzing or troubleshooting log data
  • Creating fallback patterns that apply when the initial pattern fails to match the entry for handling more complex log data

 

Graylog: Parsing Made Simple

With Graylog, you can use Grok patterns with both extractors and processing pipelines. However, our Graylog Illuminate content is included with both Graylog Security and Graylog Operations, enabling you to automate the parsing process without having to build your own Grok patterns. Graylog Sidecar enables you to gather logs from all your computer systems with any log collection agent while centralized, mass deployment of sidecars can support multiple configurations per collector.

 

With Graylog Operations and Graylog Security, you can use pre-built content, including parsing rules and pipelines, that help you get immediate value from your log data. You also gain access to search templates, dashboards, correlated alerts, reports, dynamic lookup tables, and streams that give you visibility into your environment. By leveraging our lightning-fast search capabilities, you can get the answers you need as soon as you need them to get to the root cause of incidents as quickly as possible.

 

To learn more about how Graylog can help you gain the full value of your log data, contact us today.

Get the Monthly Tech Blog Roundup

Subscribe to the latest in log management, security, and all things Graylog blog delivered to your inbox once a month.