Introduction
Amazon SQS (Simple Queue Service) is a fully managed message queuing service that enables you to decouple and scale microservices, distributed systems, and serverless applications. AWS Lambda can process messages from an SQS queue, making it a powerful combination for event-driven applications.
Scenario
Imagine an AWS architecure like this:
Imagine a scenario where your Lambda function fails to tag an SQS message as successfully processed. By default, Lambda retries the message up to 4 times before ignoring it. This retry behavior can lead to exponential execution of your handler:
- First Lambda: Retries 4 times, potentially sending 4 messages to the next queue.
- Second Lambda: Each of the 4 messages triggers 4 retries, resulting in 16 executions.
- Third Lambda: Each of the 16 messages triggers 4 retries, resulting in 64 executions.
This exponential growth can significantly inflate your AWS costs by the end of the month.
Cause
The root cause of this issue is that AWS Lambda handlers processing SQS events are expected to return an SQSBatchResponse object. If your handler returns a different type, you risk losing control over what AWS recognizes as a successful execution.
Here’s an example of the expected handler signature:
public class SQSLambdaHandler implements RequestHandler<SQSEvent, SQSBatchResponse> {
@Override
public SQSBatchResponse handleRequest(SQSEvent event, Context context) {
...
The SQSBatchResponse should contain a list of message IDs that failed to process. If your handler returns an SQSBatchResponse with an empty list of failed message IDs, AWS interprets this as a successful execution, and the Lambda function will not be retried for those messages.
Code example
public class SQSLambdaHandler implements RequestHandler<SQSEvent, SQSBatchResponse> {
@Override
public SQSBatchResponse handleRequest(SQSEvent event, Context context) {
List<BatchItemFailure> batchItemFailures = new ArrayList<>();
for (SQSMessage msg : event.getRecords()) {
try {
logger.info("Message received: " + msg.getBody());
// Process the message here
} catch (Exception e) {
logger.error("Error processing message: " + msg.getMessageId() + " - " + e.getMessage());
batchItemFailures.add(new BatchItemFailure(msg.getMessageId()));
}
}
return new SQSBatchResponse(batchItemFailures);
}
}
When an exception is caught, we give the message another chance to be processed correctly by adding its ID to the batchItemFailures list:
batchItemFailures.add(new BatchItemFailure(msg.getMessageId()));
This tells Lambda to retry the message with the specified ID msg.getMessageId()
. The number of retry attempts is based on the configuration you have set in your cdk-stack.
Source link
lol