Reporting Up: Recommendations for Log Analysis

What kind of log information should be reported up the chain? The best part about centralized log management is that it gives you the “two fer” that you really want. While you might normally think about it in terms of IT operations, you can also use it to track security issues. In either case, you probably know that at some point, you’re going to ask, “What information is important enough to share with my supervisor?” When reporting up, these recommendations for log management analysis and communication can help you make more informed decisions.

When reporting up the chain matters

No matter how you’re using your centralized log management solution, you’re using it for insights. This means that you want to detect abnormal behavior across your environment. The question you need to ask yourself is: what does that look like?

IT Operations

Maintaining services, applications, and technologies is vital to your business. To keep everything moving smoothly, you want to detect any problems before a customer or user does.

What types of logs provide the insights you need to take a proactive approach?

Load balancer
Web servers
App server
Databases
Virtual machines (VMs)
Hosts

When you have centralized log management, you can use visualizations for dependency and service mapping. Using IT operations analytics (ITOA) can show you how a potential problem in one area will have an upstream or downstream impact.

Once you have the visibility that ITOA brings, you can report on a potential issue before it leads to a service downtime.

Security

With security, you can take the same proactive reporting approach. In fact, with the right monitoring, you can detect a threat more rapidly which makes your company more secure.

What types of logs provide the insights you need to take a proactive approach?

Failed logins
Password resets
Network traffic
Firewall
Virtual Private Network (VPN)
Malware

This is where log management gives you the “two for one” that you need. You’re already using your logs to take a proactive approach to IT operations. If you’re adding in some additional event log sources, then you’re getting the security you need.

Why report up the chain?

Reporting up the chain doesn’t have to be to senior leadership. It might be that you need to send a report to a supervisor.

On the other hand, you might have to provide regular reports to senior leadership to meet compliance requirements.

IT Operations

Your service level agreement (SLA) is part of your performance metrics. If your team is following best practices, you have an SLA guiding how to comment on and resolve tickets.

Normally, your key reporting metrics are:

Ticket closure
Time-to-resolution
Incidents by category
Customer satisfaction
SLA compliance

If you’re proactively reporting issues, then you’ll be able to reduce:

Tickets sent int
Time-to-resolution
Incidents by category

Mainly, you’re reducing these because you’re reporting things in advance. Reducing these metrics increases the customer satisfaction and SLA compliance metrics.

If you’re reporting a help desk ticket up through the chain, then you still need to have a way to research the root cause quickly so that you can prove your IT team is complying with the SLA.

Security

Reporting security incidents up to your manager and senior leadership is often part of a compliance requirement.

In this case, reporting up might mean telling your manager that you detected something abnormal indicating a potential insider threat, like credential theft.

Cybersecurity compliance mandates nearly all include a section about incident response handling, and you’re going to need to report up the key metrics proving that you did your due diligence. These would include:

Mean Time to Detect (MTTD)
Mean Time to Investigate (MTTI)
Mean Time to Respond (MTTR)
Mean Time to Recover (MTTR)

The faster you can investigate the incident or escalate to someone who can investigate the incident, the better your security and compliance reporting will be.

Recommendations for Log Analysis Reporting for IT Operations

As you build out your centralized log management analysis capabilities, it’s important to keep in mind the types of issues that require escalation.

Policy Violations

To ensure continued uptime, IT teams often have poilicies that they need to comply with. As part of putting your centralized log management reporting practices in place, you want to consider the following ways you can ensure policy compliance:

Change Management violations: Tracked approval processes indicating changes made without approval metrics shown in log management.
Status of IT Deployments: Monitoring equipment or new projects
Deviations from configuration standards: Ensuring all server builds and configurations comply with internal design standards.

Significant Shifts in Statistics

Monitoring trends over time gives you the information you need to detect potential problems that can impact system or network availability. Some trends to consider include:

Excessive traffic from unknown hosts on a network
Spikes in CPU or Memory usage for resources available
Critical processes being stopped impacting application performance

Overall Areas to Monitor and Report

Recommendations for Log Analysis Reporting for Security

When using log analysis reporting for security, knowing what needs to be escalated can help you build out the right processes.

Policy Violations

This type of information indicates someone is doing something they’re not supposed to.

To identify policy violation issues, monitor:

Non-service accounts: a user or admin account is being used for services instead of a service account
Account sharing: multiple staff members using a single account (e.g., logins from different locations or networks at a similar time)

Significant Shifts in Statistics

Monitoring a significant change in statistics means looking at the volume of particular events as an indicator that something might be wrong.

Centralized log management allows you to create dashboards showing trends that give you visibility into issues that need investigation and escalation.

Examples include:

Failed logins to critical resources (e.g., failed attempts on the finance server)
Increased communication with known malware/bad actor sites, such as identifying a list of malware domains and any users communicating with them
Significant uptick in bytes transferred, which suggests someone may be moving a lot of data in or out of the network

Overall Areas to Monitor and Report

There are also plenty of areas you should monitor regularly and report if unusual activity suggests there may be an issue.

Account types

Monitoring usage of account types like service accounts, privileged accounts (e.g., administrators, users with access to critical resources, executive accounts), dormant accounts and recently terminated accounts help identify if the account is being misused or accessed by the wrong people.

Physical security

Information about access control such as badge swipes or RFID data can help identify security issues at facilities and workplaces.

User logins by department

Report to each department where and when users are logging in to ensure personnel access assets appropriately.

Login monitoring

Login monitoring can include logins outside of business hours and remote logins vs.local logins to gauge after-hours or remote activity.

Privileged account creation

Privileged accounts are rare, so several accounts being created may indicate an issue.

New software installations

If there are policies in place on what can be uploaded, tracking software installations shows if employees aren’t playing by the rules.

New processes in an environment

If a process is identified that was not there before, it could be an early indication of a malware compromise.

Bandwidth usage

Tracking bandwidth usage can include examining high usage in general and/or high-bandwidth-using users or resources (which may signal an attempt to exfiltrate data).

Assets connecting from a new location

If assets that connect to the network from a particular (or assigned) location are connecting from a new location, it might be a warning sign.

Connection attempts by stolen assets

A lost laptop or mobile device that attempts to connect to your environment may suggest a security risk.

Unusual patterns in downloads

This type of activity could indicate an issue—especially if tied to other unusual usage.

Conclusion

With the volume of log data available, monitoring data related to policy violations, a significant shift in statistics, and general network activity is a good place to start reporting up.Knowing what to share with supervisors increases both the value of log data and your value to the organization.

‍

The Graylog Team

The Graylog Experts offering useful tips, tricks, and other important information whenever they can.

View More Posts By The Graylog Team

Get the Monthly Tech Blog Roundup

Subscribe to the latest in log management, security, and all things Graylog blog delivered to your inbox once a month.