Before diving into our blog post topic, allow me to introduce myself. My name is Joel and I work with the solution engineering team at Graylog. Our primary task is to work with our customers and prospective clients on how to manage and make the most out of Graylog in their respective IT environments. One of our main tasks is to identify the logs sources they should incorporate and the kind of volumes they should anticipate. This involves getting into the nitty-gritty specifics of exactly how they can use Graylog to unlock the most value for their organizations.
Determining What to Feed into Your First SIEM
In this blog post, we aim to address what you should feed into your first SIEM. Your current SIEM may not be in the best state or perhaps you are using Graylog as a centralized log management platform where you dump everything without necessarily extracting much value from it.
Our goal today is to explore the types of data we should introduce into a standard environment and how to decide which data fits and which doesn’t as you grow beyond a minimum viable product of a SIEM. This involves looking at the building blocks you need to have in place before you start feeding anything at all into the system.
Building Blocks and Rules
The building blocks to consider before you start to feed anything into the system includes the rules to use to decide what should and shouldn’t be fed into Graylog. We want to be careful about what we bring in order to avoid creating a ton of noise, first off, and secondly, to store the data for the required amount of time without incurring unnecessary costs.
Being selective about what we bring in can help in enhancing the efficiency and effectiveness of our system.
Key SIEM Data Sources
In SIEM, key data sources often include authentication systems, workstations or endpoints, network infrastructure, externally facing services, cloud applications, compliance-required data sources, and other data enrichment sources.
The Authentication System
Authentication data plays an essential role in SIEM, answering questions related to user identity, user access, user account changes, and user activity. This data is typically drawn from domain controllers, Azure Active Directory, and other similar sources.
Endpoints can be a tricky domain to manage due to their scalability. Key questions that need answers in endpoint management include the health of the endpoint, any alterations, and the maintenance of a full history for potential future investigations.
Your network tells you who is communicating with whom. Key sources of network data include firewalls and other network devices. An emphasis should be placed on ensuring the data is clean and useful.
Externally Facing Services
Dealing with publicly facing services often involves considering who accesses those services, whether there were unusual traffic patterns, and if any known vulnerabilities have been exploited. Typically, the data for these services can be obtained from web servers, reverse proxies, or WAFs.
Cloud Applications & Compliance
Cloud applications like Office 365 and Google Workspaces provide significant data that can greatly aid in investigations. Meanwhile, data required for compliance with regulations, laws, or corporate policies also represent a significant part of the SIEM composition.
Data Enrichment & DHCP
Data enrichment is the process of augmenting raw data with additional information to add context. In the SIEM sphere, most of the enrichment data comes from sources like DHCP logs, GeoIP lookups, and threat intelligence feeds. DHCP is particularly important because it allows Graylog to attribute network activity to specific devices.
At the end, remember, the most crucial and often less intriguing task is documentation. Proper documentation should not be downplayed. Also, don’t forget to create alerts around the health of your system. To catch a session and watch Joel Duffield talk at length regarding this topic, watch this video
If you have further questions, feel free to reach out, we are always glad to help at Graylog!