Monitoring with AWS Cloudwatch

For organizations that leverage AWS resources, CloudWatch is a “must-have.” Operations teams leverage data from their AWS environments to identify service issues and maintain availability. By using CloudWatch and its associated products, organizations can monitor their AWS resource usage to optimize their environments. Problematically, while CloudWatch integrates with some third-party services, it often lacks robust capabilities, meaning that many organizations look for ways to gain comprehensive visibility.

 

Monitoring with AWS CloudWatch is critical to managing applications and resources, but you should understand its use cases and limitations.

What is CloudWatch in AWS?

CloudWatch is an Amazon Web Services (AWS) service that provides observability so that IT teams gain real-time insights about operational health, including:

  • Application monitoring
  • Responding to performance changes
  • Resource use optimization

 

With CloudWatch, organizations can:

  • Monitor AWS resources
  • Track custom metrics
  • Aggregate system, application, and custom AWS log files
  • Create alarms based on metrics
  • Build workflows that automate responses to resource changes

 

Operations and security teams use CloudWatch to identify and resolve potential issues before they impact system-wide performance.

What is the difference between CloudWatch and CloudTrail?

Although CloudWatch and CloudTrail are AWS-supplied monitoring tools, they respond to different use cases, meaning that they provide different data.

Purpose

While both enable visibility, they serve distinct purposes:

  • CloudWatch: central logging and monitoring service that is activated by default
  • CloudTrail: auditing resource changes made by users and applications that is not activated by default

Data collected

Based on their purpose, the two services collect different data:

  • CloudWatch: log files with information about the activity of all AWS services and resources
  • CloudTrail: API calls from AWS Console, CLI, third-party applications, and other AWS Services with details about the request, response, and user identity

Data Delivery

Although both provide data about cloud activity, their delivery times differ significantly:

  • CloudWatch: log data updated every five seconds with metrics data delivered in 1-minute or 5-minute periods
  • CloudTrail: delivery within fifteen minutes of the API call

 

CloudWatch Features and Capabilities

CloudWatch offers various features and capabilities that enable customers to gain deeper insights, automate actions, and manage resources.

CloudWatch Logs

CloudWatch Logs centralizes all AWS system, application, and service logs, making them easy to view, search, filter, and archive. The primary features associated with this are:

  • CloudWatch Logs Insight: interactive log data search and analysis
  • Live Tail: streaming new log event list that users can view, filter, and highlight in real-time to detect and resolve issues
  • Amazon EC2 instance monitoring: log data for applications and systems that can be turned into customized metrics
  • AWS CloudTrail event monitoring: integration with CloudTrail for notifications about defined API activity
  • Data protection policies: auditing and masking sensitive data in logs based on defined data identifiers
  • Log retention: storing logs indefinitely or based on compliance requirement timeframe
  • Archiving: sending rotated and non-rotated log data off host and into the log service
  • Route 53 DNS queries: log information about public DNS queries that Route 53 receives

CloudWatch Metrics

CloudWatch Metrics fall into two categories:

  • Basic monitoring: default setting provided at no extra charge
  • Detailed monitoring: additional monitoring available for some service that incurs charges

 

Amazon EC2 sends the following categories to CloudWatch:

  • Instance metrics
  • CPU credit metrics
  • Dedicated Host metrics
  • Amazon EBS metrics for Nitro-based instances
  • Status check metrics
  • Traffic mirroring metrics
  • Auto Scaling group metrics
  • Amazon EC2 usage metrics

Amazon EventBridge

Formerly called CloudWatch Events, EventBridge is the updated version that enables users to connect applications with data from various sources, including internally built applications, Software-as-a-Service (SaaS) applications, and AWS services.

 

EventBridge processes events in two ways:

  • Event buses: receive events and deliver them to various targets
  • Pipes: receive event from a single source and deliver to a single target

 

Often, organizations combine buses and pipes.

CloudWatch Alarms

AWS offers two different types of Alarms:

  • Metric alarms: performing one or more actions based on either a single CloudWatch metric or related to a threshold based on a threshold number over time
  • Composite alarm: rules built around multiple metric and composite alarm states

 

When creating an alarm, users must specify three settings:

  • Period: time in seconds spent creating the data points
  • Evaluation period: number of most recent periods/data points evaluated
  • Datapoints to Alarm: number of data points within the Evaluation Period that triggers the alarm’s change in state

 

CloudWatch Dashboards

Dashboards are customizable homepages in the CloudWatch console for visibility into things like:

  • Metrics and alarms to assess resources and applications
  • Operational playbooks to help teams respond to incidents
  • Critical resource and application measurements

 

Organizations can use dashboards to gain cross-account cross-Region observability. With this customized view, teams can share dashboards that collect real-time data for better cross-functional communication.

 

Challenges of Using CloudWatch

Many companies use CloudWatch because it’s a service included with their subscription that enables them to:

  • Create visualizations that make monitoring their AWS environments easier
  • Improve total cost of ownership by automating activities
  • Optimize applications and resources
  • Gain insight into key issues like CPU, capacity, and memory utilization.

However, despite these benefits, many organizations struggle to use CloudWatch effectively.

Cost

Although CloudWatch is a native tool, it becomes increasingly expensive as the organization’s environment grows, making large-scale monitoring and logging cost-inefficient. For example, organizations with the free tier are limited to:

  • Basic monitoring metrics
  • 10 detailed metrics
  • 1 million API requests
  • 10 alarm metrics
  • 3 Custom Dashboards
  • 5 GB of data, including ingestion, archiving, and data scanned by queries

Query limitations

Although CloudWatch Metrics Insights enables you to query data, its limits create challenges. For example, Amazon explains the following limits:

  • Ability to query only the most recent three hours of data
  • Inability to process more than 10,000 metrics with a single query
  • Limited to single query returning no more than 500 time series
  • Limited to 75 Metrics Insights alarms per Region
  • Failure to support high-resolution data
  • One query per GetMetricData operation

Resource intensive

To track resource use with CloudWatch, organizations need to install the CloudWatch Agent on their servers. However, the CloudWatch Agent can be resource-intensive, using up a lot of CPU for various reasons, including:

  • Use of wildcard symbols when monitoring a large number of files
  • Collecting too many metrics during a timeframe
  • Collecting too many metrics across various processes, filters, and patterns with the procstat plugin
  • Monitoring too many large-sized log files without rotating them

 

Graylog: Reduced Cost and Time for Monitoring AWS CloudWatch Data

Graylog’s solution enables you to gain the full value of your CloudWatch logs by helping you overcome the challenges associated with them. Using the AWS Kinesis/CloudWatch input, you can stream CloudWatch Logs, CloudWatch Flow logs, and Kinesis Raw Logs to Graylog for comprehensive AWS monitoring.

 

By leveraging Graylog Cloud, you can reliably handle the occasional spikes in data, allowing you to monitor your AWS environments consistently. Further, by aggregating all log data across your environment in Graylog, you can correlate events for more meaningful insights. For example, by correlating load balancer and CloudWatch application logs, you can visualize distribution across Availability Zones and source IPs for more precise availability monitoring and faster remediation.

 

To learn how Graylog can help you gain greater insight into your environment, contact us today.

Categories

Get the Monthly Tech Blog Roundup

Subscribe to the latest in log management, security, and all things Graylog blog delivered to your inbox once a month.