-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem in combination AWS Lambda and Lithops #1245
Comments
ok i tried setting up 8 GB RAM and processing went through without any problems. Do you have any ideas why this happened? Could there have been any changes to the code responsible for this?
|
If the problem is related to the memory, can you check in the function stats the Additionally, in your function's code, you can use the get_memory_usage() function to log the memory usage at the moment this function is called. You can Edit: I now see that Lamba shows the memory usage at the end of the logs.. So, the issue appears when you first invoke with 4GB, one or more function calls crash because of an OOM, and then you reinvoke the same function with 8GB? If you directly invoke with 8GB it always works fine? |
We have wrappers around the Lithops Executor class and also around the map function. In them, we implemented the possibility of choosing the type of executor in accordance with the required amount of RAM. We also monitor MemoryError, TimeoutError and invoke the function with more RAM. An example of the code that implements this:
This approach worked great with IBM Cloud Functions, IBM Code Engine, IBM VPC. After migrating to AWS, this approach works, when after invoking the My hypothesis is that lithops check the status of running executors via JobMonitor. During the restart that we do on our side (4 GB -> 8GB), `Activation ID' changes, but Job Monitor only knows about past executors. The last line of the log that I cited inspired me to this hypothesis
P.S> Yes, if I immediately allocate 8 GB of RAM, everything works without a problem. |
Can you check the ephemeral storage assigned to each worker? From my experience, workers with low memory do not throw exceptions, they take more time since each worker has less cpus assigned to them (Unless you detect that via your custom executor/map) On the other hand, lambdas with low ephemeral memory throw memory-related exceptions if the threshold is surpassed. |
@abourramouss |
Hi @sergii-mamedov. I added this patch that is aimed to fix the issue you experienced in this thread. Could you test with master branch? |
Thanks @JosepSampe . I will test it tomorrow. |
@JosepSampe Works well. Waiting for new release :) |
Hi @sergii-mamedov, I just created version 3.1.1 |
thanks a lot @JosepSampe |
Hi @JosepSampe !
I have a strange situation for a small subset of datasets for a particular step. This step is characterized by a higher consumption of RAM and a duration of 2-4 minutes for some datasets. It also occurs when there is not enough RAM for at least one executor and we start again with a larger amount of RAM. Below I provide debug logs as well as logs of lambda functions.
lithops DEBUG logs
AWS lambda logs
27c11211-4e68-4d85-a6f8-6e9701002adb
7e566c35-c1d3-4516-ab53-29dfd32b9e48
85545c28-d632-49e1-b143-8171e1c38fb5
5a00f31f-ec62-49c0-8fb1-21995c88fe85
My only hypothesis is that due to the fact that we are restarting executors repeatedly with double the amount of memory, something happens in the middle of lithops. Although we do the same logic for other steps on AWS and did it earlier with IBM. I'm also wondering why there are no records for the last call right away
They should be at least once every 30 seconds as far as I can see from the source code.
The text was updated successfully, but these errors were encountered: