23 Lambda Metrics You Should Know

Developing an application is like composing a song. You know your intended outcome, and the creation is what gives you the jolt of adrenaline to keep going. However, your job isn’t over once you push the application live. You need to monitor and maintain it to ensure performance and cost optimization.

AWS Lambda forwards metrics to CloudWatch once the function completes processing an event. Through the CloudWatch console, you can set alarms and build visualizations with these metrics. By understanding the key AWS Lambda metrics available, you can more effectively monitor and maintain your applications to ensure continued cost optimization and end-user experience.

What are the different types of Lambda metrics?

Since AWS manages the memory, CPU, network, and other resources that run your code, managing costs directly relates to your application’s performance and efficiency. To help you manage your Lambda functions, AWS provides basic serverless monitoring into the CloudWatch platform.

The four primary metrics that you can monitor are:

Invocation metrics: the number of times a function or service is called for assessing usage patterns and understanding demands on the service
Performance metrics: how well a function operates under different conditions for insights about service efficiency and reliability
Concurrency metrics: number of simultaneous executions for understanding system capability and ability to effectively scale during peak load
Asynchronous invocation metrics: how services process requests that happen at different times for tracking successful executions, failed executions, and processing times

3 Important Function Invocation Metrics

Each function invocation represents a request to execute for code. These metrics provide insight into service reliability and operational efficiency by helping you understand resource allocation and load balancing needs.

Invocations

Invocations is the total number of times that a function executes or “is called.” Although these metrics fail to distinguish successful or failed invocations, they provide insights into:

Function performance
Usage periods
Traffic trends
Resource allocation
Cost optimization

For example, if the invocation count drops, you might have underlying system architecture or dependency issues.

Errors

Errors track the number of invocations that result in function errors by dividing the error count by the total invocation count. These metrics can help identify issues like:

Timeouts
Configuration errors
Coding exceptions

For example, a sudden error rate spike could indicate a Lambda function issue or a misconfigured AWS service.

Dead-letter errors

Dead letter errors are events that failed to reach a dead letter queue (DLQ), the temporary storage location for messages that a system cannot process. Sending errors to the DLQ enables you to keep and analyze failed event data to identify underlying issues. Dead-letter errors can help you identify problems that might not be easily traced.

These errors may raise concerns about potential data loss and can provide insight into issues like:

Improper permissions
Misconfigured resources
Message size limits

8 Important Performance Metrics

Performance metrics help you evaluate your Lambda function’s event processing effectiveness. With these insights that these metrics generate, you can optimize your Lambda functions to enhance overall application performance.

Duration

Duration measures the time (in milliseconds) from the invocation of an AWS Lambda function until it completes execution. This metric supports percentiles statistics so you can filter out outliers to better understand performance.

Monitoring this metric can help you:

Identify code inefficiencies
Latency issues with external dependencies
Effectiveness of memory allocation

Billed Duration

Billed duration rounds the Duration metric up to the nearest 100 milliseconds and directly impacts costs. Regularly monitoring billed duration enables you to understand how optimizing performance can improve cost management.

Init duration

Captured separately from the overall duration metric, init duration uses seconds to measure how long a function takes to initialize during a cold start, indicating how long the runtime environment takes to prepare for execution.

Monitoring init duration helps:

Optimize function code or configurations to startup speed
Identify potential bottlenecks related to function initialization

Memory size and max memory used

Typically, you want to use these two metrics together to understand how your function uses – or wastes – memory. Memory size uses megabytes to indicate the total memory allocated to a function. Max memory used refers to the peak memory consumption during a function’s invocation, also expressed in megabytes.

Monitoring these metrics can provide insights into:

Excessive memory, resulting in wasted resources and increased costs
Insufficient memory allocation, resulting in prolonged execution times

Post runtime extensions duration

This metric uses milliseconds to track the time that Lambda extension spends after the function handler finishes executing. It provides insights into additional time extensions introduce when performing tasks like :

Sending logs, metrics, or traces to external services
Cleaning up resources
Interacting with other AWS resources

Monitoring this metric enables you to optimize performance by:

Identifying where extensions add too much time
Fine-tuning serverless applications to enhance functionalities

Iterator age

Iterator age uses milliseconds to measure the time between when records arrive and a function processes them, especially important for streaming services like Kinesis or DynamoDB. A high iterator age indicates that income data volumes surpass the functional processing capabilities, creating a backlog of unprocessed records.

Monitoring this metric can help identify issues like:

Prolonged execution duration of the function
Insufficient stream shards
Invocation errors

Latency (P50, P90, P99)

Monitoring latency uses percentile distributions because they offer a more accurate representation of latency, helping to capture user experience effectively.

The three most common percentile metrics are:

P50: 50th percentile, a baseline for typical performance
P90: 90th percentile, often triggering performance issue alerts
P99: 99th percentile, insights into extreme outliers

You can use these metrics to identify issues that might not otherwise be apparent, enabling more effective troubleshooting and optimization.

Offset Lag

Offset Lag is a metric specific to Amazon MSK and self-managed Apache Kafka when they are event sources for Lambda functions. By measuring the total number of messages waiting in the message queue to be sent to a target Lambda function, it provides visibility into how polling runs.

Monitoring this metrics can help identify issues like:

Undesirable congestions
Inefficiencies in data streams

8 Important Concurrency Metrics

Concurrency metrics provide insight into how many simultaneous requests a function is handling. Lambda provisions a separate execution environment instance for each concurrent request, increasing the number of execution environments until you reach your account’s concurrency limit.

Concurrent executions

This metric is the total number of function instances actively process events. By providing insight into how close to your regional or reserved concurrency limit you are, this metric can help prevent throttling that can undermine other functions’ performance.

Properly managing concurrent executions enables efficient resource allocation and improves applications’ responsiveness.

Unreserved concurrent executions

Unreserved concurrent executions tracks the total concurrency that remains unallocated, providing insight into resource distribution during peak workloads. Consistently depleting unreserved concurrencies may indicate function or workload inefficiencies.

For example, if specific functions regularly exhaust their unreserved concurrency during traffic spikes, you may need to distribute the workload more across multiple functions to enhance performance.

Claimed Account Concurrency

This metric provides visibility into the total concurrent executions across all functions. If traffic surpasses the established concurrency limit, some requests may be declined, undermining reliability and efficiency.

Throttles

The Throttles metric tracks the total number of invocation requests that are rejected due to a lack of available function instances or an exceeded concurrent execution limit. Throttled invocation requests are not reflected in the standard Invocations or Errors metrics.

By monitoring this metric, you can take actions that improve application reliability and speed, like:

Reserving concurrency
Optimizing execution time
Increasing the concurrent execution limit

Provisioned concurrent executions

If you have provisioned concurrency enabled, this metric tells you the number of function instances actively actively processing events for a specific version or alias. MOnitoring this metric can provide insights into underlying issues with the Lambda function or dependencies on other services.

Provisioned concurrency utilization

Provisioned concurrency utilization assesses how effectively the function uses its allocated provisioned concurrent. Monitoring this metric supports cost management with insights about:

Low utilization that suggests reducing or disabling provisioned concurrency for cost savings
Need for more provisioned concurrency if the function consistently reaches thresholds
Issues with the function or upstream services

Provisioned concurrency invocations

Provisioned Concurrency Invocations measure the total executions of a Lambda function that is utilizing provisioned concurrency. This metric is distinct from standard invocation metrics in that it exclusively counts invocations operating on provisioned concurrency when it’s configured. Monitoring this metric can provide insights into:

Peak demand periods
Overall function performance to identify areas of optimization
Potential costs saving by reducing or disabling provisioned concurrency

Provisioned concurrency spillover invocations

This metric tells you when the Lambda function exceeds the provisioned number of concurrent invocations. When a function exceeds this threshold, it operates on non-provisioned concurrency, increasing the likelihood of cold starts that impacts performance and response times.

Monitoring this spillover metric can help identify:

Ways to change configurations to better align with traffic demands
Underlying issues with the function or an upstream service
Opportunities for improving responsiveness and reliability

4 Important Asynchronous Invocation Errors

In AWS Lambda, asynchronous invocation means that the invoking application can proceed without waiting for the function to finish executing, improving application performance. Some examples of asynchronous services include:

Amazon Simple Email Service (SES)
Amazon Simple Notification Service (SNS)
Amazon S3

Asynchronous events received

This metric is the number of events that Lambda successfully queues for processing, giving you insight into the events that the function receives. If this metric and the invocations metric don’t match, you might want to look for issues like dropped events or potential queue backlogs.

Destination Delivery Failures

Delivery errors may occur during asynchronous invocations if Lambda cannot send events to their designated destinations, like the DLQ. These errors can occur for reasons like:

Permission errors
Misconfigured resources
Size limitations
Destination is not supported

Asynchronous Event Age

This metric tracks the time an event spent waiting the in queue, providing insight into issues like:

Incorrect triggers
Function misconfigurations
Throttling

Setting alarms for thresholds can help investigate when queue backlog occurs, especially when comparing this metric with:

Errors: to identify function errors
Throttles: to identify concurrency issues

Asynchronous Events Dropped

This metric tracks the total count for events that a Lambda function fails to process then ultimately discards. Some reasons that a function might drop events include:

Exceeding the maximum age
Reaching the attempt retry limit
Hitting a concurrency limit

Graylog: Security and Operations Monitoring for Insights into AWS Environments

Using Graylog’s CloudWatch inputs, you can integrate your AWS Lambda monitoring directly into your overarching security and operations monitoring. Graylog’s purpose-built solution provides lightning-fast search capabilities and flexible integrations that allow your team to collaborate more efficiently.

Graylog Operations provides a cost-efficient solution for IT ops so that organizations can implement robust infrastructure monitoring while staying within budget. With our solution, IT ops can analyze historical data regularly to identify potential slowdown or system failures while creating alerts that help anticipate issues.

Since you can easily share Dashboards and searches with Graylog’s cloud platform, you have the ability to capture, manage, and share knowledge consistently across DevOps, operations, and security.

With Graylog’s security analytics and anomaly detection capabilities, you get the cybersecurity platform you need without the complexity that makes your team’s job harder. With our powerful, lightning-fast features and intuitive user interface, you can lower your labor costs while reducing alert fatigue and getting the answers you need – quickly.

Our prebuilt search templates, dashboards, correlated alerts, and dynamic look-up tables enable you to get immediate value from your logs while empowering your security team.

Jeff Darrington

Jeff Darrington is Graylog's Director, Technical Marketing. He is a long-time Graylog OS user with extensive experience in IT Operations, IT product solutions deployment in Firewalls, Networking, VOIP, Physical security Controls, and many others.

View More Posts By Jeff Darrington

Get the Monthly Tech Blog Roundup

Subscribe to the latest in log management, security, and all things Graylog blog delivered to your inbox once a month.