In order to analyze logs efficiently, they must be structured effectively. Often, logs from different sources label data fields differently and/or provide data that’s completely unstructured. The problem is that both types of data need to be structured appropriately in order to key in on particular elements within the log data, such as:
● Monitoring on source address
● Applying rules associated with user names
● Creating alerts for destination addresses
For data to be more useful (and usable), it needs to be organized so that it can be easily understood, queried, and analyzed. This need isn’t just an administrative undertaking—it’s an important log management strategy. And if done without planning, it can cause far-reaching and irreparable issues, creating more work and headaches down the road.
What’s the best way to approach structuring log data? Try these three steps.
1 – Get Management on Board
Getting management on board to help facilitate this process can be a challenge, but it’s vital to long-term success. Why? Without an understanding of how other groups will interact with the data, you’ll have difficulty identifying a schema that includes all types of information you’re likely to encounter in the future, not just today. Without management support, including a “rollout”schedule involving other groups, you may need to rework—and reapply—a new schema down the road, which means altering everything you’ve already done up to that point.
One way to get management on board is by sharing the business risks of proceeding without their support, demonstrating why it’s easier to do it correctly from the beginning. This risk plays out in two ways:
● Risk #1: By starting in isolation (i.e., doing it with only the needs of one group in mind), you pigeonhole what you’re able to do with the data and run the risk of not meeting the needs of all groups in the future.
● Risk #2: A bigger challenge is the possibility of having to redo your structuring efforts. The further along you are, the more that’s wrong, the more you’ll have to fix…and the greater the chance of not being able to fix it.
Once management lends its support, it’s critical to approach log structuring systematically with steps 2 and 3.
2 – Parse Log Information
Your log information needs structure in order to apply any level of analysis. Give data structure by parsing messages into discrete fields. Most log management systems offer several ways to parse information, including Regular Expressions, GROK parsers, JSON, CEF or Syslog.
3 – Standardize Field Names
Vendors often use different names for certain fields (e.g., “src_addr” vs. “src_address” vs.“source_addr”). It’s simply the same information labeled differently.
Leaving fields as vendors have labeled them makes it impossible to create straightforward data queries and forces you to be familiar with the nomenclature of every vendor. It also creates an unnecessary barrier between related data and causes inefficient storage.
To avoid this, standardize your field names. Start by mapping all variations of fields (src_addr, source_ip, SourceAddress, etc.) to a single field name and create a schema. If you’re doing this manually, run a query to discover your field sets. Then decide what the standard should be and rename like fields under a consistent field name.
This standardization ensures more streamlined querying, more efficient storage, and vastly improved and actionable analysis.
Conclusion
The point of generating log data is to be able to analyze it and use that information to make smart business decisions. Structuring logs effectively is an important strategy to ensure log data is efficient and accessible. Approaching it as a strategy—from getting management on board to creating an appropriate schema—allows you to meet the needs of every group today and tomorrow. Best of all, it saves you from the time and headache of reworking it down the road…something definitely worth avoiding.