The Why and What of AWS Lambda Monitoring

Serverless architectures are the rental tux of computing. If you’re using AWS to manage and scale your underlying infrastructure, you’re renting compute time or storage space. Your Lambda functions are the tie or cummerbund you purchase to customize your rental.

Using the AWS event-driven architecture improves business agility, allowing you to move quickly. Lambda is the on-demand compute services that runs custom code driving an event’s response. Lambda functions only run when the event triggers them, typically only for fifteen minutes or less. Since a single Lambda function typically performs a single functional task, an application often consists of multiple functions.

Monitoring AWS Lambda and Lambda functions is the backbone of managing costs and ensuring the application continues to function as intended. By understanding why AWS Lambda monitoring matters, developers can start using the data and metrics available.

Why Is AWS Lambda Monitoring Important?

AWS Lambda monitoring enables developers to identify problems with Lambda functions that can hinder serverless applications’ performance. Monitoring AWS Lambda functions enables development and operations teams to identify:

Performance issues: bottlenecks, high latency, or resource constraints impacting response time
Errors: troubleshooting with error messages, exceptions, and anomalies occurring during execution
Resource usage and costs: insights into resource allocation, scaling, and optimization by monitoring memory, CPU, and execution times
Reliability and availability: understanding behavior to improve an application’s design and architecture
Compliance and security: monitoring to detect and remediate vulnerabilities, unauthorized access, or potential data breaches
Alerting and notifications: proactive alerts for addressing issues

Types of Metrics for Lambda Functions

The Lambda metrics that you use to monitor your functions typically fall into three categories:

Concurrency metrics: number of function instances running simultaneously for insights into errors or performance issues
Performance metrics: data about execution times, memory usage, and cold starts for insights into resource allocation and response time improvements
Invocation metrics: successful and failed function invocations to help troubleshoot function code issues and understand the execution environment

Additionally, the following two AWS Lambda extensions enable you to capture more data about your functions to augment these metrics:

External extensions: run as independent processes in the execution environment and continue to run after fully processing the function invocation
Internal extensions: run as part of the runtime process with the function accessing them using wrapper scripts or in-process mechanisms

Key Concepts of AWS Lambda Monitoring

When you begin monitoring your AWS Lambda functions, you should understand the types of data used for monitoring workloads and the services that AWS offers directly.

Data Types

Lambda workloads generate high volumes of data that you can use for monitoring and observability. Some key types of information available include:

Logs: timestamps records of activities occurring across the application
Metrics: measurements about how well the functions operate
Alerts: notifications for abnormal metrics
Visualizations: graphical representations of metrics
Distributed tracing: request’s full path across a system, even one consisting of multiple microservices

Monitoring in AWS

If you want to monitor your Lambda functions through AWS supplied tools, you have the following options:

CloudWatch: Capture and forward all requests handled by your function to a log group named “/aws/lambda/<function name>”
CloudWatch Logs Insights: Search and analyze log data with a specialized query syntax for performing queries across multiple log groups
Lambda Insights: Enable this feature on a Lambda function for collecting more metrics, including memory, network, and CPU usage
AWS X-Ray: Track performance problems or errors by activating the tracing tool in the Lambda console and granting permission using the AWSXRayDaemonWriteAccess managed policy

The Anatomy in a Lambda Log

At their core, all metrics rely on the Lambda logs generated by the functions and applications, so you should understand the types of information available in them.

AWS offers several ways for configuring your Lambda logs, including:

Format: plain text or structured JSON
Level: Level of detail in JSON, like Error, Debug, or Info
Group: CloudWatch log group receiving the log

If you configure your Lambda logs using the JSON option, each log will contain at least the following elements:

“timestamp”: when the message was generated
“level”: log level assigned to the message
“message”: log message contents
“requestId”: unique request ID for function invocation

Depending on the programming language you use for writing your Lambda function and the format you choose, the log may contain any of the following additional data fields:

Duration: Time the function’s handler method spent processing the event
Billed Duration: Time billed for the invocation
Memory size: Amount of memory allocated to the function
Max Memory Use: Maximum amount of memory used across all invocations of the function
Init Duration: Time it took runtime to load the function and run code outside of the handler method the first time the request was served
XRAY TraceID: Unique ID applied to traced requests
SegmentId: X-Ray segment ID for traced requests
SampleId: Sampling result for traced requests

Graylog: Security and Operations Monitoring for Serverless Architectures

Using Graylog’s CloudWatch inputs, you can integrate your AWS Lambda monitoring directly into your overarching security and operations monitoring. Graylog’s purpose-built solution provides lightning-fast search capabilities and flexible integrations that allow your team to collaborate more efficiently.

Graylog Operations provides a cost-efficient solution for IT ops so that organizations can implement robust infrastructure monitoring while staying within budget. With our solution, IT ops can analyze historical data regularly to identify potential slowdowns or system failures while creating alerts that help anticipate issues.

Since you can easily share Dashboards and searches with Graylog’s cloud platform, you have the ability to capture, manage, and share knowledge consistently across DevOps, operations, and security.

With Graylog’s security analytics and anomaly detection capabilities, you get the cybersecurity platform you need without the complexity that makes your team’s job harder. With our powerful, lightning-fast features and intuitive user interface, you can lower your labor costs while reducing alert fatigue and getting the answers you need – quickly.

Our prebuilt search templates, dashboards, alerts, and dynamic look-up tables enable you to get immediate value from your logs while empowering your security team.

Jeff Darrington

Jeff Darrington is Graylog's Director, Technical Marketing. He is a long-time Graylog OS user with extensive experience in IT Operations, IT product solutions deployment in Firewalls, Networking, VOIP, Physical security Controls, and many others.

View More Posts By Jeff Darrington

Get the Monthly Tech Blog Roundup

Subscribe to the latest in log management, security, and all things Graylog blog delivered to your inbox once a month.