When talking about log management, search history is overlooked more often than not. Past searches can be used as part of log analysis and forensic analysis, but the main issue with this data is the speed of search which gets compromised as data volume gets greater. We will discuss some ways to get the best out of your saved searches and to speed up the search process.
USING PAST SEARCHES TO IMPROVE LOG ANALYSIS
All selected searches related to your machine are stored in log files. Data storage itself is useless unless you perform a systematic approach in order to extract knowledge from the data. Transaction log analysis (TLA) is the process of analyzing data containing past searches in order to obtain valuable information. According to a paper on search log analysis by Bernard J. Jansen, “The goal of TLA is to gain a clearer understanding of the interactions among searcher, content, and system or the interactions between two of these structural elements based on whatever research questions drive the study.”
The term historical log data is used to describe the data collection of all past events and circumstances related to an environment. Even though the word “historical” may suggest that this data is quite old, it isn’t necessarily the case. Every organization customizes the way they store historical log data depending heavily on their needs and storage capacity, but there are also regulations around how long one should keep old logs. The constant threat of cybercriminals and personal data theft led to a significant increase in the restrictions imposed by regulatory authorities. For instance, the EuropeanUnion has introduced the GDPR regulation in May2018 with new instructions on keeping different user data, including saved searches.
In most cases, you will want to find the middle ground between the volume of allocated storage and the cost, so you ensure keeping old and recent history log enough without paying over your budget.
FORENSIC ANALYSIS AND SEARCH HISTORY
Another reason to keep past searches as long as you can is that you may need them for forensic analysis. This type of data analysis is apart of the computer forensics process of investigating a digital crime such asa security breach, fraud, information theft, etc. The more information is available, the greater the chance that the investigation will be successful.
For one, keeping old search data can help determine the first occurrence of an event—a particular query or an anomaly during the search process. If you track back to the first event of such sort, you will be able to calculate the frequency of said event and to look for possible causes.If you want to know more about steps to take to improve your environment’s cybersecurity, read about log forensic analysis best practices.
HOW CAN YOU SPEED UP YOUR SEARCHES?
When talking about terabytes of data, alogical question comes into mind: how long does it take to perform queries on large volumes of search history data and is there a way to optimize and speedup your searches?
One way to make the search process faster is to use saved searches. Search parameters can be fully customized by the user and saved for future use. If you frequently run a certain type of search, instead of wasting time by repeating the whole process over and over again, you can turn this action into an automated one.
After adding a widget to look for certain parameters or variables (such as geolocation, hostnames, or IP addresses), you can create a custom dashboard. Here you can run your investigations, and save it for later use. Every time you will come back to it, you can have all the data immediately, automating the whole search process. You can also set this search to run continuously, so all incoming data that meets search criteria will be filtered and saved into a specified folder.
TRACKING ROUTER HISTORY AND INTERNET ACTIVITY WITH GRAYLOG
The way Graylog handles search is optimized for speed, taking into account the amount of information we are working with. Graylog uses Elasticsearch as its search engine, which lets you perform full text queries across extremely large volumes of data, returning results with high speed. Elasticsearch also scales across tens of hundreds of nodes without a performance hit.
Graylog is designed to enable multi-threaded and distributed search across the environment. This means that each search uses multiple processors and multiple buffers on a single computer, then multiplies that threaded search across the number of participating nodes in the cluster.Thanks to this, the results are returned much faster. The user doesn’t have to schedule or save searches to continue at a later time because everything is done within a reasonable timeframe.
GRAYLOG STREAMS VS. SAVED SEARCHES
We previously discussed saved searches as a means to improve the search speed and streamline the search process. While Graylog supports saved searches, it also comes with an innovative way to improve search.
Graylog streams are a mechanism to route messages into categories in real time while they are processed. This sounds very similar to saved searches, with one major difference: streams are processed in real time. Even if you set saved search to work continuously, it won’t run in real time, but in certain timeframes. With the streams being processed in real time, you gain an invaluable advantage – for instance, you can receive error alerts and address the problem as soon as it happens.
Another difference between saved searches and streams is in price. When you perform a search for complex stream rule sets, it will be significantly cheaper because the message will be tagged with streamIDs when processed. No matter how many rules you add to the stream, the Graylog search will internally always look like streams:[STREAM_ID]. This means that there is no additional load on the message storage that would normally come with more rules.
By using stream and pipeline rules, streams can be set to look only for specific sets of data. Instead of putting all your firewall data into a stream, you can, for example make one just forAuthentication data, or Virus Infection data, making your speed process much, much faster.
In a nutshell, using Graylog streams instead of saved searches is an efficient way to save time and money.