Get One Step Closer to GDPR Compliance

graylog and gdpr 1080p

Since 2018, most organizations need to demonstrate compliance and accountability with the EU General Data Protection Regulation (GDPR). In this webinar, we’re going to discuss how Graylog can help you with GDPR Compliance.


First, we will start by putting things in context with a little bit of background on what the GDPR is and why it should matter to you.


The first question that most people will ask is: Why do we have GDPR in the first place?  And the short answer is that the old regulations were just way too old. They were designed or implemented in 1995, which was long before the internet came into existence and became the primary marketplace for businesses.

Today, consumers are demanding higher standards and more security for their personal data. When those regulations were created, the entire idea of privacy did not even include the different types of online data that we handle today.  And so, it was just time to update them.


We need to understand how these regulations are going to affect companies and what are the differences introduced by the GDPR. The biggest change has been applied to the way we look at data. Before GDPR, prevailing wisdom said that big data was an asset. The more you had, the better you were.

With the introduction of the GDPR, big data is arguably a liability. Broadly speaking, the less customer information that an organization stores, the lower their risk of running afoul of GDPR. In a nutshell, you can’t be held responsible for what you didn’t have in your databases or data drives.


The GDPR has changed the definition of what constitutes personal information that needs to be protected. Under GDPR, access logs, error logs, and security audit logs are now all considered to contain personal information.

In fact, log data contain a lot of data that is labeled as “personal,” such as user names which a company may have never thought of. Companies are required to protect intellectual property, IP data, and cookie data, as they can be used as personal identifiers just like credit cards or national ID numbers.


There are three core areas for GDPR readiness that you need to be aware of. They are Transparency, Compliance, and Accountability. Let’s have a look at them.


The first core concept of GDPR is transparency.  Companies need to provide clear communications on the personal data that they are going to use, and they must provide a mandatory breach disclosure. A company must inform its customers on how the data is going to be handled, and the data processing must match the description.

Performance Impact Assessments are also required to fully determine the risk involved in getting this data and in case a breach occurs. Companies must run breach simulations to understand what the impact of these events may be, and how they could affect customers.


To be compliant with the GDPR, companies should be able to document their data use fully. You need to be able to clearly describe what it is that should happen so that you can go back at any time and prove that what did happen matches what should have happened.

You need to have data portability.  Consumers must be granted the “right to be forgotten,” or erasure upon request.  Any information you got on someone should be removed as they request you to get rid of it. You need to be able to build your data structures so that that’s possible at any time.

You need to expect that a regulatory or statutory authority might want to have a look at your processes, review them, and control if you’re doing a good job. You need to have your data in a form that’s always reviewable by third parties.


The last core principle of the GDPR is accountability. If you fail to comply with the mandatory requirements of this law, you may face substantial fines. Financial penalties can be up to 10 million euros or 2% of global annual turnover for technical measures. For the non-technical key provisions, financial penalties can be up to 20 million euros or 4% of global annual turnover, whichever is higher. Lastly, you will face the suspension of your right or ability to process data, which means you will be out of business in no time.


The GDPR has been broken into six fundamental principles. Let’s have a look at them.


Being fair with your data boils down to respecting the transparency core value. You should properly explain what you’re going to do with the data and then do exactly what you said you were going to do. You need to demonstrate that by documenting each of those steps.


If you don’t know why you’re collecting personal data, just stop! If you don’t know what that data is going to be used for, stop collecting it: it’s a liability. You must have a reason to collect that data, and you must explicitly say that.


Only keep the data you have to have to satisfy the reason you came up with in the previous steps.  Never keep more information about your customers than you really need, or you’re just creating liabilities for your organization.


The data has to be accurate at all times.  Regardless of whether you’re going to delete it, data needs to be accurate in the first place.  If you can’t demonstrate that you’re keeping the data accurately, then you will have difficult times proving that you have handled it appropriately.


You don’t want to keep your data any longer than you have to. For example, if your retention policy says that you keep it for 180 days, then after 180 days, it should be deleted. If it keeps sitting in your archives, it becomes a liability and something that can be used against you.


You must make sure that your data is cared for appropriately. All information must be processed in a way that protects it and retains its integrity.


The roles involved in the GDPR are essentially three: the data controller, the data processor, and the data protection officer. These people are responsible for handling the data in the correct way to ensure that your organization is always compliant with the regulation.


The data controller is the person who defines how to process the personal data and why you should process it. The controller takes responsibility for third-party processors in making sure that they comply with GDPR. If a third-party processor such as a business partner is out of compliance, the primary organization is also considered non-compliant.


The data processor is usually an outsourcing firm or internal team that maintains or processes the personal data records in any way. Technically, anybody who touches your company’s data is labeled as a data processor. Both the company and the processing partner are accountable for breaches, just like they are in the above example.


Not every organization out there needs to have a Data Protection Officer (DPO). GDPR dictates that companies have to have a DPO only if they’re a public authority, handle large amounts of data, or deal with data gathered from European citizens. The presence of a DPO is not necessary for enterprises outside of the EU, unless they have a huge business presence in Europe.


Graylog can help you meet several GDPR requirements related to how your organization handles personal data. In this webinar we will show you the Graylog features that are relevant to be compliant with the GDPR.


Let’s start with data flow planning, a key step to adhere to the private data handling requirements because it’s what you do upfront. Proper planning is important in order for the process to be repeatable and to meet GDPR requirements for transparency. Streams are tags that route traffic into a pipeline processor, and they’re a key part of Graylog. You can set streams with rules, such as moving everything that comes from AWS Cloud Trail in a specific stream. You can have multiple streams, and then handle them separately.


The first thing you can do with Streams is parse, modify, and enrich that data via something called Pipelines. You can connect a stream to different pipelines, which are set of stages that allow you to make modifications to data by applying different rules.

This way you can clean your data, enrich it, or simply get rid of what you don’t need for your processing. Pipelines make easy to process the data appropriately and minimize it, and allow you to delete unnecessary data at any time. For example, if there’s personal information in your stream that is not required for your processing, you can delete it with a pipeline. You will never be out of compliance for data you never collected in the first place.


The second thing that streams can do rather is routing traffic to an index set. Index Sets are separate containers for data that can be treated differently. With Index Sets you can decide what happens to your data – for example you can set rotation strategies, replication, as well as what happens to your data when it has certain characteristics. Information can be archived or deleted after a certain time or when it reaches a certain size.

You can route data that does contain personal information into one Index Set and apply a retention period or a retention setting so that it is deleted after a certain time. If that data needs to be kept longer, you still retain the ability to change that with this function. This way you will be fully compliant with the data retention policy of the GDPR as well as setting a defined process to separate personal from non-personal information. This process can also be easily demonstrated by showing the logs.


Data protection by design is another principle that Graylog can help you with. To meet regulations, you need to treat your data as a valuable asset and apply a role-based access control to it. You need to be able to control who sees that data, what they can see, and how they get to it.

In Graylog, you do that via users and roles to determine administrative or reader roles to control who has access to streams and dashboards. You can set your own rules to define what each role can edit, and what they cannot even see since it doesn’t show up, ensuring that privacy is protected at all times.

In many cases, companies invest in a centralized directory structure – usually LDAP or Active Directory, and you can map them in addition to static roles. You can also map them to groups with an active directory.  This way you’ve got a centralized place to control authorization and group membership in your active directory structure. Your authorization structure can be extended into Graylog so you can work with your existing structures.


Logging and auditing is another part of GDPR that, quite obviously, comes with the territory with a Log Management Solution such as Graylog. When it comes to logs, new regulations are just mandating that you do the things that you have been doing all along, such as monitoring data. You need to monitor the in-flow or the log data that comes in, look for security issues, try to find breaches, and just make sure that you know what’s going on with the data in your environment.

These are all core competencies of Graylog. You can do all that by keeping an eye on your dashboards and views where you can find graphical representations of data. Here you can aggregate it and oversee any activity.  The best part is that, in case of audit, you can demonstrate that your logs have been continuously collected over the period of time in question.


The search interface is useful if you need to drill down even further since it’s an easy and intuitive way to find things. You can look at a given time set (say, 30 minutes), and check any specific type of data (such as a DNS request). Searches allow you to find relevant information rather easily, such as every activity that involved a certain administrator user name, or the addresses that have tried to log in as administrator.

You can even save default searches you want to run routinely, such as failed log attempts. Another option is to configure a new dashboard to look for anything that stands out rather than for a particular info. This way you can monitor things and spot any anomaly that might indicate a problem in a proactive way. Even if you’re not entirely sure about what you’re looking for or responding to an alert, you can dig through your data and do general reviews using searches and dashboards to set your own queries.


Sometimes customers might want to exercise their right to be forgotten. In this case, you need to be able to search for a specific IP or username. Graylog Searches feature allows you to do that very quickly, and find out if this data exists in any other place. Once you found the IP address, you can find if there are any usernames associated with it, and delete them as well. You can check where have they been, what did they do, which system logs include information about them, and delete everything that is relevant.


The last thing Graylog can help you with to be compliant with the GDPR is the maintenance of the data itself to show that it has been appropriately collected and handled. Log data keeps coming 24/7, so showing you possess some resiliency is important. You need to be able to demonstrate that data is collected in such a way that when a spike or system failure occurs, you keep collecting it no matter what.

One of the Graylog features that ensures its resiliency is the Journal. The Journal writes the data to disk in a raw format as soon as it comes in.  Even if there is a power interruption or any other network interruption, that data will not disappear. In addition to the Journal, Graylog also has Buffers which protect Elasticsearch, our storage engine and backend. Buffers provide a way for Elasticsearch to have some flow control in the insertion phase, so that it doesn’t ever get overwhelmed and doesn’t drop events. Should Elasticsearch fill up, Buffers provide a place for that data to be stored while you perform whatever maintenance is necessary to free that space.


Clustering is a feature that provides redundancy for both to Graylog and Elasticsearch. We already saw how you can configure Index Sets for replication. If you got two servers, you can set a copy of the data on each Elasticsearch server. If something happens to the first, a complete copy exists on the second. You can have an infinite number of server if you want, and replicate your data sets across all of them to provide even more redundancy.


Archiving allows you to retain data in a very cost-effective manner. You can control over when the archives get made, which data is collected, and which streams get archived.  If you’ve got data that has personal information and you don’t want to retain it, but you do want to retain everything else, you can just choose not to archive those streams and just delete them.

Archives give you the ability to keep the data you need to as flat files, and get rid of everything that might constitute a liability. All this data can be accessed at any time, such as if you need to find information when a customer has a compliant or wants something deleted. You can still look through your archives without having to restore them first. Once you find what you need you can restore the ones you need, perform whatever activity you need to, and the return the modified data to the archives.