28

This is more of a concern than a question, but still, has anyone experienced this before? Does anyone know how to prevent it?

I have a lambda function (L1) which calls a second lambda function (L2) all written in NodeJs (runtime: Node.Js 8.10, and aws-sdk should be v2.488.0 - but I'm just pasting that from the documentation). The short story is that L1 is supposed to call L2, and when it does L2 is executed twice! I discovered this by writing logs to CloudWatch and I could see one L1 log and two L2 logs.

Here's a simplified version of L1 and L2.

L1:

const AWS = require('aws-sdk');
const lambda = new AWS.Lambda();

module.exports = {
    handler: async (event, context, callback) => {
        const payload: { rnd: Math.random() };
        const lambdaParams = {
            FunctionName: 'L2',
            Qualifier: `dev`,
            Payload: JSON.stringify(payload),
        };

        console.log(`L1 calling: ${JSON.stringify(payload)}`);
        return await lambda.invoke(lambdaParams).promise();
    },
};

L2:

module.exports = {
    handler: async (event, context, callback) => {
        console.log(`L2 called: ${JSON.stringify(event)}`);
    },
};

In CloudWatch I can see one L1 calling {"rnd": 0.012072353149807702} and two L2 called: {"rnd": 0.012072353149807702}!

BTW, this does not happen all the time. This is part of a step function process which was going to call L1 10k times. My code is written in a way that if L2 is executed twice (per one call), it will fail the whole process (because L2 inserts a record to DB only if it does not exist and fails if it does). So far, I managed to log this behaviour three times. All of them processing the same 10k items, facing the issue at a different iteration each time.

Does anyone have the same experience? Or even better, knows how to make sure one call leads to exactly one execution?

8 Answers 8

18

Your lambda function must be idempotent, because it can be called twice in different situations.

https://aws.amazon.com/premiumsupport/knowledge-center/lambda-function-idempotent/

https://cloudonaut.io/your-lambda-function-might-execute-twice-deal-with-it/

Sign up to request clarification or add additional context in comments.

3 Comments

I don't think this explains what I'm seeing. I get that my Lambda might be called twice, but I get the impression that this is the exception rather than the norm. My lambda seems to ALWAYS be triggered twice. I've made it idempotent, so it all works fine, but I am still bothered by the fact that the function runs twice every time. BTW, the first invocation is neither timing out or exiting due to an error. It completes successfully with plenty of time to spare, and yet a second invocation always happens. If anyone happens to find another reason for two invocations, let me know please.
All my lambda functions gets triggered twice even though the client requests only once. I cannot find nowhere how to avoid this issue. @CryptoFool did you find a solution for this?
This could be useful for someone looking at this, I found a problem on my side, there was a bad implementation of a middleware, in my case calling next() when not need to and this was causing "double" requests for all my handlers, maybe this could help you @CryptoFool or anyone else out there.
7

One case that may cause this is that in your L2 lambda you didn't return anything, which will lead the L1 lambda (the caller) to think there is an error with L2 and so the Retry mechanism is triggered. Try to return something in L2, even simply an "OK".

Comments

4

I just run into the same problem. Calling from browser, lambda runs twice. I realized the second one was from the browser, getting /favicon.ico. I used Chrome browser. I hope it helps.

3 Comments

You saved me a lot of time there, really could not figure out what was happening. Finally tested with Postman
Yes! Thank you. This is great.
Do you know how you fix that?
3

With 10K lambda invokes it must be experiencing a failure and doing a retry.

From the documentation:

Asynchronous Invocation – Lambda retries function errors twice. If the function doesn't have enough capacity to handle all incoming requests, events may wait in the queue for hours or days to be sent to the function. You can configure a dead-letter queue on the function to capture events that were not successfully processed. For more information, see Asynchronous Invocation.

If this is what is a happening and you setup the dead letter queue you'll be able to isolate the failure event.

You can also use CloudWatch Logs Insights to easily and quickly search for errors messages of the lambda. Once you select the log group this query should help you get started. Just change the time window.

fields @timestamp, @message
| filter @message like /(?i)(Exception|error|fail|5\d\d)/
| sort @timestamp desc
| limit 20

5 Comments

Thanks for the input. It is insightful and I have to spend more time on it. But so far, I would say that in my case, the problem is that L2 is executed twice simultaneously and successfully (the second execution fails only because my code throws an exception intentionally). I (now) understand that using the InvocationType I might be able to do something about this. Anyways, thanks. I have to do more testing before I can confirm or reject your solution.
You can search for lambda START and END log records using the requestId to see the timing of them being called. Once you find log message it will have the requestId and you can update the filter in the query pattern like this "| filter @requestId == "abd16f4c-d532-4fba-b648-a8d0e1b16e7c"
Another idea, the Lambda is executing everything but getting a timeout before returning? It could the database call is taking longer than expected to return or some other network call. If this is happening you will see it in the log records for the first event and you can increase you timeout. Updating the filter int he query above with "| filter @message like /timed out/" should confirm if that is happening.
After reading the AWS documentation, I'm calling the L2 synchronously (I'm not specifying the InvocationType). And if I'm not mistaken, in this mode, there's no auto-retry. So the question is, why would aws-sdk call L2 twice?
The SDK does retry on some invocation errors such as timeouts and throttling. The only thing I can come up with is that an error was returned on invoke but the lambda actually was invoked. See the last paragraph. docs.aws.amazon.com/lambda/latest/dg/invocation-sync.html You can change the retry behavior in the Config and the Request ifhttps://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Config.html docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Request.html
2

Lambda can be triggered twice if your layer with dependencies is too large. If in Cloudwatch logs you see INIT_REPORT Init Duration: 10000+ ms Phase: init Status: timeout, then you probably need to reduce the size of your zip file. Or implement idempotence.

When Lambda starts execution in the runtime environment, it has 10 seconds for the INIT phase, where it loads dependencies. If it can't finish in 10 sec, then Lambda retries the Init phase, which means it triggers the function again. See the official explanation of what happens during the Init phase.

From my experiments, zip files smaller than 20Mb have Init Duration (see it in the Cloudwatch logs) less than 10 sec and don't cause Lambda to trigger twice.

Comments

1

Good morning,

I was also facing this issue. And in the case of algorithmic trading, this issue really costs a lot of money as the lambda function was sending two orders of the same script to the exchange resulting in higher broker charges.

A little background to my lambda function. It gets triggered from EventBridge at 03:43 (9:13 IST) and then sleeps till 3:45 (9:15 IST) to execute the order precisely at the opening bell. I kept this 2 min gap in trigger and execution as loading modules takes different times in each instance. Sleep code block in lambda function prior to resolution: -

# Generating sleep to execute the order exactly at 9:15:00
current_time = datetime.now()
current_time = current_time.replace(year=1900, month= 1, day=1)
print("Current time for gap calculation: ", current_time)

target_time = datetime(1900, 1, 1, 3, 44, 59, 996000)
print("Target time for gap calculation: ", target_time)

gap = (target_time - current_time ).total_seconds()
print(f'Gap: {gap}')

if gap > 0 :
    sleep(gap)

Prior to this code block, I am loading modules and after this code block, I am placing an order to exchange.

The first time when function ran it showed:- 2023-07-06T03:43:35.417+00:00 Gap: 84.578606

As per the problem the function ran automatically for 2nd time and Gap is negative as we have already passed the target time of 3:45(9:15 IST) :- 2023-07-06T03:45:09.884+00:00 Gap: -9.88857

So to stop the function to execute further in the second attempt I have inserted an else statement in the sleep code block.

else:
    return 'Second Attempt'

Now my total sleep code block looks like this:-

# Generating sleep to execute the order exactly at 3:45:00 (9:15:00 IST)

current_time = datetime.now()
current_time = current_time.replace(year=1900, month= 1, day=1)
print("Current time for gap calculation: ", current_time)

target_time = datetime(1900, 1, 1, 3, 44, 59, 996000)
print("Target time for gap calculation: ", target_time)

gap = (target_time - current_time ).total_seconds()
print(f'Gap: {gap}')

if gap > 0 :
    sleep(gap)

else:
    return 'Second Attempt'

This way when my lambda function runs 2nd time automatically it finds the gap value to be negative and exits the lambda function with the statement 'Second Attempt'. And it never reaches the real part of sending orders to the exchange after the sleep code block.

I am learning Python, Lambda, and trading. Please forgive me if there is any mistake or shortcoming in my presentation. This has helped to solve this problem in which the lambda function gets executed twice.

Comments

1

I was also facing this issue. And in the case of CODE deployment to production environment. We were checking multiple stages like jira ticket is on "ready to deploy" state or not, On GitHub tickets are mergable or not.

Solutions: For every request call, We started storing ID to s3 bucket and validate before processing request. Below is detailed code example.

Input Event Parameters:

{
    "id": 1234,
    "data": {
        "name": "sonoo",
        "salary": 56000,
        "married": true
    }
}

app.py

import boto3

def lambda_handler(event, context):
    # Extract relevant context data from the event
    event_id = event.get('id')
    event_data = event.get('data')

    # Check if the event has already been processed
    if is_event_processed(event_id):
        return {
            'statusCode': 200,
            'body': 'Event already processed'
        }

    # Perform authorization check if required
    if not is_authorized(event_data):
        return {
            'statusCode': 403,
            'body': 'Unauthorized event execution'
        }

    # Process the event
    process_event(event_data)

    # Update storage with processed event information
    mark_event_as_processed(event_id)

    return {
        'statusCode': 200,
        'body': 'Event processed successfully'
    }

def is_event_processed(event_id):
    # Check if the event ID exists as an object in the S3 bucket
    s3 = boto3.client('s3')
    bucket_name = 'YourBucketName'
    object_key = f'processed_events/{event_id}'
    
    try:
        s3.head_object(Bucket=bucket_name, Key=object_key)
        return True
    except:
        return False

def is_authorized(event_data):
    # Implement your authorization logic here
    # Return True if authorized, False otherwise
    return True

def process_event(event_data):
    # Implement your event processing logic here
    pass

def mark_event_as_processed(event_id):
    # Upload an empty object to the S3 bucket to mark the event as processed
    s3 = boto3.client('s3')
    bucket_name = 'YourBucketName'
    object_key = f'processed_events/{event_id}'

    s3.put_object(Body=b'', Bucket=bucket_name, Key=object_key)

Comments

0

In my case, it happened that when calling my second lambda, there was a try that was catching an exception with traceback, this triggered the lambda to retry the call, normally without traceback this does not happen, but when commenting the module it stopped happening.

Also within the try it had a condition that yes or yes it could fail, since it had to query for a resource with boto3, so if it existed there was no problem, but since there was no traceback it forced the general failure, not capturing it as an exception.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.