Graylog Resource Library

Hello, my name is Marinos Yoris, and I am the SRE Team Lead at Kaizen. I’m part of the team that uses and maintains the Graylog cluster at Kaizen.

Today, I’m going to walk you through the journey of Graylog at our company — the challenges we faced and how we overcame them with the help of the Graylog team.

Introduction to Kaizen

Kaizen is an online betting company operating in more than 16 countries under the brands “Stoiximan” and “Betano.”

Why We Needed Graylog

We required a centralized logging tool to:

Visualize logs from applications
Retain logs for 2 weeks to 1 month
Efficiently search logs
Monitor application health

Initial Graylog Setup

In 2012, we implemented the open-source version of Graylog with ElasticSearch. It started as a small cluster but grew rapidly with business needs.

Scaling Challenges

By 2023, during high-traffic periods, we faced:

Slow search performance
Frequent disruptions (weekly outages)
Log loss during downtime

This was critical because we rely heavily on real-time monitoring.

Transition to Graylog Enterprise

At the end of 2023, we:

Collaborated with the Graylog team
Migrated to Graylog Enterprise
Focused on cluster redesign, optimization, and adopting Enterprise features like Illuminate, audit functionality, advanced filtering, and 24/7 support.

Migration and Optimization Process

Steps taken:

Evaluated the existing cluster
Planned migration and cluster creation
Migrated non-production environments first, then production

Key optimizations included:

Removing unnecessary replicas
Adjusting shard sizes
Optimizing pipelines and stream rules
Increasing batch sizes for output
Fine-tuning process and output buffer processors

These changes resulted in:

Faster processing (from 5s to 2s per log)
No outages for months
More stable cluster performance

Current Cluster Scale

Processes 15–20 TB of logs daily
Retains ~200 TB of logs at a time
Handles spikes of up to 750,000 messages per second
Upgraded cluster resources (more cores and storage)

Main Use Cases

Production Issue Investigation
Real-Time Monitoring
Application Debugging
Customer Activity Logging (for troubleshooting and regulatory needs)

Future Plans

Scale to handle over 1 million logs per second
Upgrade to Graylog 6.1
Implement Illuminate for network logs in production

Thank you for your attention. We look forward to continued growth with Graylog!

How Kaizen Scaled Their Logging Infrastructure with Graylog Enterprise