Cyber Security Mega Breaches: Best Practices & Log Management

While the Capital One breach may have been jaw-dropping in its sheer scale, there are best practice lessons to be learned in its remediation response, says Nick Carstensen, technical product evangelist at Graylog.

In an interview with Information Security Media Group’s Nick Holland, Carstensen discusses:

  • What was overlooked in the Capital One data breach, and why it could have been much worse;
  • How to monitor for security events in a cloud solution;
  • What steps should be taken to mitigate data breach risk.

Carstensen has been in the security industry for more than 15 years with experience in sales and security engineering. He is currently a technical product evangelist for Graylog, creating content and and helping with its social media presence. Carstensen developed his broad security background as a security engineer for several companies ranging from Anheuser-Busch to Scottrade. His areas of expertise including logging/SIEM solutions, host-based security, firewalls, network design/restructuring and scripting.


NICK HOLLAND: Let’s look at a big breach that’s occurred recently and what was overlooked.

NICK CARSTENSEN: I want to talk about the Capital One breach that happened in 2019. I want to give you an overview from the attacker and actually who did that breach, as well as what kind of data was taken, what were some steps that could have helped determine this a little bit quicker and all of that.

The Capital One breach happened on March 22 and March 23. The alleged attacker, who has been arrested and charged, is Paige Thompson. The attacker was able to access hundreds of millions of Capital One’s subscriber accounts, including credit card application data. The company found out throughout their investigation that no actual credit card numbers were taken, as well as no log-on information data. But all of the surrounding data of how to apply for a credit card was taken. So a lot of understanding about how people apply for loans and all of that data was compromised.

While this attack occurred in March, it wasn’t actually discovered until July 19. So 119 days it sat there, where they had that data from infection time all the way to the actual breach, the recognition time. Paige used to work at Amazon. She was a software engineer on the AWS team, but she didn’t have any inside knowledge of this environment, although Capital One’s data environment that did get breached was housed on the Amazon infrastructure.

What she allegedly did is create a program that would scan different sites for a web application firewall, or a WAF, vulnerability. She created this tool in her spare time. She went off and scanned a bunch of environments. Of course, it did bounce off Amazon’s infrastructure and she was able to breach into Capital One’s organization at that time. And once she got into this web application misconfiguration, she was actually able to grab some credentials. So it was administrative credentials for that web application firewall and other systems below that. And once she had those administrative credentials, she could go ahead and do additional things.

So once she had that data, she was able to break into the actual systems that housed all this financial data of application loan processing and the different accounts themselves. And she was able to gather all that data and then exfiltrate. And when she actually had that data, she would end up putting that up on something called GitHub, which is a big repository of code. She also just put files and data out there. So she uploaded that file out to that system. She also was talking on Slack and other channels that said, “Hey, I found this massive trove of data. I didn’t know what to do with it, so I uploaded it here. I wanted to get it off my system to free up space.”

She’s also been accused of hacking other types of systems as well, so it wasn’t just the Capital One breach that she was tied to. She’s also allegedly tied to universities outside of the country, some telecom companies and some state agencies as well. But when she was allegedly breaking into those systems, she was actually using a tool called cryptojacking, which is installing bitcoin mining software on those end points and using their compute resources. She never did get to that with the Capital One breach, but her MO was to break in, take data and also install cryptojacking tools. And again, like I mentioned, she was on Slack talking about stealing this data and she’d upload that for GitHub. Well, when some of the people that were on the GitHub channel actually saw this, they realized what data it was, and they alerted the authorities. That’s the way she got caught and was brought to the FBI’s attention.

Capital One is estimating the cost of the breach, just for providing the LifeLock service and the different types of credit monitoring, is currently around $150 million. This is outside of everything else, of course, that will come because of that.


HOLLAND: You touched on a lot of the things that happened there. Just to recap, what was overlooked?

CARSTENSEN: There are four main things. The first has to do with the way that she got into the system and the way that she got those initial credentials by taking advantage of a misconfigured firewall. So obviously, configuration management is key for any organization to do.

You want to make sure that your configuration on all of your security devices is as tight as possible and that you’re maintaining the patches as well. A lot of times they’ll come out with new security updates that might fix a broken issue. We want to make sure those are patched up so there are no back doors that are easily scriptable to get into that environment.

Any kind of configuration change usually creates some sort of an alert or log as well. We want to capture those so we know whether an administrator is going in and making a backdoor. Are they doing anything they shouldn’t? Let’s capture those logs just to keep them in a centralized spot.

The second one is around user monitoring. While that command that she ran is not a normal command that should ever be run on the system, it obviously did run, and it did not trigger an event to say, “Hey, something like this command is run on the system. It’s something awkward that’s not a normal web interaction. Let’s bring that to my attention.”

Now, the third one was the user information logging on. So after she did gather those credentials, she was able to log on externally outside of the system and get access to the raw data underneath the systems. There should be some form of monitoring. Where are your users coming from? What kind of data are they accessing? Is it always the United States? Is it always in one specific state? Or maybe it’s abroad in a couple of different countries. But being able to do geolocation tagging and understanding where your clients are coming from and your users, is obviously the key and something that was missed in this example.

And then the fourth one is the traffic logs. She was able to break in and then absorb a bunch of data and then exfiltrate that out and upload it to somewhere. And in her case, she had ended up uploading it to GitHub after her laptop.

We should be monitoring traffic in general and understanding patterns of normal behavior. If normally you’re uploading, per client maybe a megabyte of traffic, but in this case you’re uploading 10 gigs of data, that’s obviously something that should be brought to your attention.


HOLLAND: This is obviously a very high profile data breach. But how representative is this for a data breach, and what else could go wrong?

CARSTENSEN: You’re right. This one is a very high-profile customer. Obviously, Capital One is very well known. But it is very similar to normal breaches that happen. I mean, this misconfiguration, which was the initial infection point or the initial breach point, is the third largest cause of hacking. And if you look at like the Verizon data report, the top cause is always the stolen credentials – somebody knows your password or finds it on a sticky note. But the misconfiguration is the third most common cause. And those are the two that really happened in this environment. She breached via misconfiguration and then stole some credentials, which became the first main way of attack.

The attack itself is very common misconfigurations leading to compromised credentials.

You’ll also find that a lot of times these attackers are hacking for three main reasons. One is obviously financial gain – they want to take the data. She’s using cryptojacking because that’s a quick way to generate revenue. But she also wants to do it for bragging purposes, because she got the data, uploaded it to something public, like GitHub, and then talked about it on Slack.

The timeline that she had here of 119 days to detection is actually pretty low. So that is a differentiator. The average is about 200 days to detect. … Obviously, we want to get that down to a couple hours to a day at the most.

Capital One was able to figure this one out through a few methods. They were alerted to someone posting something, and they were able to find and back-trace everything that happened. That’s kind of an unusual case. A lot of times, smaller companies don’t have that ability to retrace the footsteps of what happened. They need a tool that can help them paint a picture of what’s happened after a breach; that’s obviously key to understanding how much data was compromised.

What we find in smaller companies is that they’ll know a breach happened because the FBI or some credit card company will reach out to them and say, “We noticed all these cards came through you and they’re all compromised now, so you have a breach somewhere.” And they just end up throwing their hands up and saying, “I don’t know how far they got. I’m assuming everything in my network’s compromised.”

But Capital One was able to say, in this case, “Yes, I know we were breached. Here’s the exact data that they took. They did not take credit card numbers, so I know that’s safe, and they did not take any type of personal log-on information.” Having that ability to trace where it happened obviously provided them a much better security blanket, so they knew they didn’t have to just throw their hands up and say “everything’s been compromised.”


HOLLAND: One of the things that has happened in the wake of this is that Capital One has had to pay a substantial cost for LifeLock coverage for people who have been victims of this particular breach. But that could be the tip of the iceberg. What are the other impacts of a breach?

CARSTENSEN: Obviously there is the whole financial aspect of that. You have brand reputation. Your stock price is going to dip. You have to provide services to the victims. But the other big one that a lot of people do not think about is, Amazon is actually getting sued now, too. They are a third-party vendor that Capital One was using. Companies are now looking at their vendors, like Amazon, and saying, “Well, why did you provide me a WAF that was misconfigured or didn’t have the relevant patches on that?” Understanding that a lot of your service providers can actually be responsible and held liable for some of this is obviously a key point people need to take into consideration.

And if you look at the Target breach a few years ago, Target was breached through an HVAC contractor. So it was just somebody that wasn’t even in their security environment. This guy came onsite to fix an HVAC and had some malware on his system and all that broke out from that guy’s laptop.

Key questions to ask now are: How do I vet third-party systems and contractors? How do I understand that they’re secure? Do I have to get assurances from them that say if something happens because they’re in my environment, they have a certificate of insurance, so I can prove to my auditors that I’m covered in case you guys get breached?


HOLLAND: Let me ask you one final question then. What steps should be taken by any organization given the state of cybersecurity?

CARSTENSEN: We have to realize that everyone’s being targeted now. It used to be that it was the biggest companies, such as Walmart and Amazon, that were getting targeted, but now even the small mom-and-pop shops get attacked.

And you have to also take into consideration that some of these large companies, like Capital One, obviously have a huge security budget. They have the ability to buy the best tools on the market, the best products, the best people to maintain all this – and they still got breached.

There’s no silver bullet in security. We have to understand that all organizations will eventually get breached. But how can they be the most successful when that does happen? How can they reduce their exposure?

That comes through multiple different steps. You can obviously adopt security best practices. You should be applying your patches to all your endpoints as fast as they come out. Obviously, everybody does regression testing to make sure the patch won’t break anything, but put the patches on as quickly as possible.

Make sure you have a good, strong perimeter security, including firewalls, anti-virus and email scanners. Most of the attacks come through phishing – malware downloaded through an email. So make sure you have a good strong, secure email security posture.

Regarding endpoint protections: How do you protect against unauthorized software running? Can you use a white list/black list product on the endpoint to block all that?

And then regarding policies and procedures: Who has access to elevated accounts? Who has domain admin privileges in your environment? Who has the ability to modify those permissions? Be very restrictive, and then update whether they need that permission going forward.

Another step is compliance. Are you complying with PCI or HIPAA or GLBA? Each one of those addresses keeping data for a certain amount of time. …

So you have to find a tool that can archive data. Organizations should gather the data, make it searchable through a web interface or something similar, and then archive that off for long-term storage, which is always accessible.

One of our customers told us: “I got breached recently and the data was in my logs. I just didn’t know how to find it or I didn’t get alerted to it when it happened.” So a lot of breach victims will say, “All the data, the key indicators that I did get compromised was there; I just need help finding that and alerting me in real time.”

And that’s why we really need some kind of log aggregation – centralized log management with alerting. At Graylog, we have the ability to take those logs and give correlated events across multiple different data sources and alert you.

One is geolocation tagging. If you can have a log management system that does geolocation tagging, that’s a great step in the right direction, because you can create rules and alerts based around that data. Why should anybody from Russia log onto hospital records?

The second one is threat intelligence. Everybody knows there’s threat intelligence feeds out there from the FBI and the Department of Homeland Security. There are also OTX feeds and command-and-control type lists that you can take in and enrich your data. So let’s take all those IP addresses from all your logs and look at different threat feeds and ask: “Is this part of a known command and control network? If it is, alert me. Tell me that it’s happening.” You can also download known bad MD5 hashes of these – the known bad files that shouldn’t be running in your environment. And with centralized logging you can key in on those and alert when you see one running or just do broad sweeping searches to find them.

And then the last one is really around alerting and correlated alerts. You still need some basic alerting and correlating alerts where they’re talking across two different data streams. If you’re in a financial institution and you do loans, should there ever be a time when a teller logs on, changes the loan rate down, makes a loan quick and then changes that loan rate back up, all within a five-minute period? Obviously that’s some type of fraud. We want to be alerted to that. And a correlation engine can help find that type of activity and alert you in real time.

Maybe a bank has a key management system for the vault bank door and a tag or some access code. Let’s record that and bring it in. And then you can tie that around times and say, “Well, the vault now should never be open past five o’clock at night, and it should only be opened by certain guys that should be in the office on certain days.” And we can tie all that through correlation rules as well.

The last thing in here is just the reports. … Creating new user counts in the environment is always a key that something’s happening if you have a pretty static environment, as well as creating any type of backdoor processes and new software running in your environment. All that can be tracked through reporting mechanisms of a centralized log management.

Get the Monthly Tech Blog Roundup

Subscribe to the latest in log management, security, and all things Graylog blog delivered to your inbox once a month.