Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem in combination AWS Lambda and Lithops #1245

Closed
sergii-mamedov opened this issue Jan 30, 2024 · 10 comments
Closed

Problem in combination AWS Lambda and Lithops #1245

sergii-mamedov opened this issue Jan 30, 2024 · 10 comments

Comments

@sergii-mamedov
Copy link

Hi @JosepSampe !

I have a strange situation for a small subset of datasets for a particular step. This step is characterized by a higher consumption of RAM and a duration of 2-4 minutes for some datasets. It also occurs when there is not enough RAM for at least one executor and we start again with a larger amount of RAM. Below I provide debug logs as well as logs of lambda functions.

lithops DEBUG logs

2024-01-30 14:50:11,007 - INFO - engine.lithops-wrapper[Thread-1] - executor.py:227 - executor.map(run_coloc_job, 4 items, 4096MB, attempt 1)
2024-01-30 14:50:11,007 - DEBUG - engine.lithops-wrapper[Thread-1] - executor.py:357 - Selected executor aws_lambda
2024-01-30 14:50:11,007 - INFO - lithops.invokers[Thread-1-ex] - invokers.py:107 - ExecutorID 290acf-0 | JobID M009 - Selected Runtime: metaspace-aws-lambda:3.1.0.d - 4096MB
2024-01-30 14:50:11,008 - DEBUG - lithops.storage.storage[Thread-1-ex] - storage.py:474 - Runtime metadata found in local disk cache
2024-01-30 14:50:11,008 - DEBUG - lithops.job.job[Thread-1-ex] - job.py:242 - ExecutorID 290acf-0 | JobID M009 - Serializing function and data
2024-01-30 14:50:11,010 - DEBUG - lithops.job.serialize[Thread-1-ex] - serialize.py:97 - Include Modules: sm
2024-01-30 14:50:11,010 - DEBUG - lithops.job.serialize[Thread-1-ex] - serialize.py:113 - Module 'sm' found in /opt/dev/metaspace/metaspace/engine/sm
2024-01-30 14:50:11,010 - DEBUG - lithops.job.serialize[Thread-1-ex] - serialize.py:118 - Modules to transmit: /opt/dev/metaspace/metaspace/engine/sm
2024-01-30 14:50:11,015 - DEBUG - lithops.job.job[Thread-1-ex] - job.py:276 - ExecutorID 290acf-0 | JobID M009 - Uploading function and modules to the storage backend
2024-01-30 14:50:11,114 - DEBUG - lithops.storage.backends.aws_s3.aws_s3[Thread-1-ex] - aws_s3.py:104 - PUT Object lithops.jobs/290acf-0/4fdcace2839061ae9f84ec852b8e9277.func.pickle - Size: 760.7KiB - OK
2024-01-30 14:50:11,114 - DEBUG - lithops.job.job[Thread-1-ex] - job.py:302 - ExecutorID 290acf-0 | JobID M009 - Uploading data to the storage backend
2024-01-30 14:50:11,161 - DEBUG - lithops.storage.backends.aws_s3.aws_s3[Thread-1-ex] - aws_s3.py:104 - PUT Object lithops.jobs/290acf-0-M009/aggdata.pickle - Size: 519.0KiB - OK
2024-01-30 14:50:11,161 - INFO - lithops.invokers[Thread-1-ex] - invokers.py:172 - ExecutorID 290acf-0 | JobID M009 - Starting function invocation: run_coloc_job() - Total: 4 activations
2024-01-30 14:50:11,161 - DEBUG - lithops.invokers[Thread-1-ex] - invokers.py:177 - ExecutorID 290acf-0 | JobID M009 - Worker processes: 1 - Chunksize: 1
2024-01-30 14:50:11,161 - DEBUG - lithops.invokers[Thread-1-ex] - invokers.py:425 - ExecutorID 290acf-0 | JobID M009 - Free workers: 564 - Going to run 4 activations in 4 workers
2024-01-30 14:50:11,162 - INFO - lithops.invokers[Thread-1-ex] - invokers.py:208 - ExecutorID 290acf-0 | JobID M009 - View execution logs at /tmp/lithops-ubuntu/logs/290acf-0-M009.log
2024-01-30 14:50:11,163 - DEBUG - lithops.monitor[Thread-14] - monitor.py:381 - ExecutorID 290acf-0 - Starting Storage job monitor
2024-01-30 14:50:11,164 - INFO - lithops.wait[Thread-1-ex] - wait.py:98 - ExecutorID 290acf-0 - Getting results from 4 function activations
2024-01-30 14:50:11,186 - DEBUG - lithops.invokers[ThreadPoolExecutor-5_59] - invokers.py:371 - ExecutorID 290acf-0 | JobID M009 - Calls 00000 invoked (0.024s) - Activation ID: ce8d6875-a1a7-4d75-bfee-4a1f6dd49cca
2024-01-30 14:50:11,190 - DEBUG - lithops.invokers[ThreadPoolExecutor-5_31] - invokers.py:371 - ExecutorID 290acf-0 | JobID M009 - Calls 00003 invoked (0.026s) - Activation ID: af8e38cc-048f-4600-a7a7-2ddf312526ab
2024-01-30 14:50:11,194 - DEBUG - lithops.invokers[ThreadPoolExecutor-5_15] - invokers.py:371 - ExecutorID 290acf-0 | JobID M009 - Calls 00002 invoked (0.030s) - Activation ID: dcb01eb1-4bbe-4b7a-bb5c-02e2da4f3501
2024-01-30 14:50:11,197 - DEBUG - lithops.invokers[ThreadPoolExecutor-5_25] - invokers.py:371 - ExecutorID 290acf-0 | JobID M009 - Calls 00001 invoked (0.034s) - Activation ID: 7ad2afc7-cde5-42b6-aa39-3c62bfadf171
2024-01-30 14:50:13,230 - DEBUG - lithops.monitor[Thread-14] - monitor.py:127 - ExecutorID 290acf-0 - Pending: 0 - Running: 4 - Done: 0
2024-01-30 14:50:45,769 - DEBUG - lithops.monitor[Thread-14] - monitor.py:127 - ExecutorID 290acf-0 - Pending: 0 - Running: 4 - Done: 0
2024-01-30 14:50:57,993 - DEBUG - lithops.monitor[Thread-14] - monitor.py:127 - ExecutorID 290acf-0 - Pending: 0 - Running: 2 - Done: 2
2024-01-30 14:50:58,222 - DEBUG - lithops.future[ThreadPoolExecutor-603_0] - future.py:273 - ExecutorID 290acf-0 | JobID M009 - Got status from call 00001 - Activation ID: 7ad2afc7-cde5-42b6-aa39-3c62bfadf171 - Time: 44.34 seconds
2024-01-30 14:50:58,222 - DEBUG - lithops.future[ThreadPoolExecutor-603_0] - future.py:293 - ExecutorID 290acf-0 | JobID M009 - Got output from call 00001 - Activation ID: 7ad2afc7-cde5-42b6-aa39-3c62bfadf171
2024-01-30 14:50:58,223 - DEBUG - lithops.future[ThreadPoolExecutor-603_1] - future.py:273 - ExecutorID 290acf-0 | JobID M009 - Got status from call 00002 - Activation ID: dcb01eb1-4bbe-4b7a-bb5c-02e2da4f3501 - Time: 44.92 seconds
2024-01-30 14:50:58,223 - DEBUG - lithops.future[ThreadPoolExecutor-603_1] - future.py:293 - ExecutorID 290acf-0 | JobID M009 - Got output from call 00002 - Activation ID: dcb01eb1-4bbe-4b7a-bb5c-02e2da4f3501
2024-01-30 14:51:10,717 - DEBUG - lithops.monitor[Thread-14] - monitor.py:127 - ExecutorID 290acf-0 - Pending: 0 - Running: 1 - Done: 3
2024-01-30 14:51:11,240 - DEBUG - lithops.invokers[Thread-1-ex] - invokers.py:325 - ExecutorID 290acf-0 - Stopping async invokers
2024-01-30 14:51:11,240 - INFO - lithops.executors[Thread-1-ex] - executors.py:596 - ExecutorID 290acf-0 - Cleaning temporary data
2024-01-30 14:51:11,241 - DEBUG - lithops.invokers[Thread-4] - invokers.py:311 - ExecutorID 290acf-0 - Async invoker 1 finished
2024-01-30 14:51:11,241 - DEBUG - lithops.invokers[Thread-3] - invokers.py:311 - ExecutorID 290acf-0 - Async invoker 0 finished
2024-01-30 14:51:11,290 - WARNING - engine.lithops-wrapper[Thread-1] - executor.py:278 - run_coloc_job raised <class 'MemoryError'> with 4096MB, retrying with 8192MB. Failed activation(s): ['ce8d6875-a1a7-4d75-bfee-4a1f6dd49cca']
2024-01-30 14:51:11,291 - INFO - engine.lithops-wrapper[Thread-1] - executor.py:227 - executor.map(run_coloc_job, 4 items, 8192MB, attempt 2)
2024-01-30 14:51:11,291 - DEBUG - engine.lithops-wrapper[Thread-1] - executor.py:357 - Selected executor aws_lambda
2024-01-30 14:51:11,291 - INFO - lithops.invokers[Thread-1-ex] - invokers.py:107 - ExecutorID 290acf-0 | JobID M010 - Selected Runtime: metaspace-aws-lambda:3.1.0.d - 8192MB
2024-01-30 14:51:11,291 - DEBUG - lithops.storage.storage[Thread-1-ex] - storage.py:470 - Runtime metadata found in local memory cache
2024-01-30 14:51:11,291 - DEBUG - lithops.job.job[Thread-1-ex] - job.py:242 - ExecutorID 290acf-0 | JobID M010 - Serializing function and data
2024-01-30 14:51:11,293 - DEBUG - lithops.job.serialize[Thread-1-ex] - serialize.py:97 - Include Modules: sm
2024-01-30 14:51:11,294 - DEBUG - lithops.job.serialize[Thread-1-ex] - serialize.py:113 - Module 'sm' found in /opt/dev/metaspace/metaspace/engine/sm
2024-01-30 14:51:11,294 - DEBUG - lithops.job.serialize[Thread-1-ex] - serialize.py:118 - Modules to transmit: /opt/dev/metaspace/metaspace/engine/sm
2024-01-30 14:51:11,298 - DEBUG - lithops.job.job[Thread-1-ex] - job.py:284 - ExecutorID 290acf-0 | JobID M010 - Function and modules found in local cache
2024-01-30 14:51:11,298 - DEBUG - lithops.job.job[Thread-1-ex] - job.py:302 - ExecutorID 290acf-0 | JobID M010 - Uploading data to the storage backend
2024-01-30 14:51:11,359 - DEBUG - lithops.storage.backends.aws_s3.aws_s3[Thread-1-ex] - aws_s3.py:104 - PUT Object lithops.jobs/290acf-0-M010/aggdata.pickle - Size: 519.0KiB - OK
2024-01-30 14:51:11,359 - INFO - lithops.invokers[Thread-1-ex] - invokers.py:172 - ExecutorID 290acf-0 | JobID M010 - Starting function invocation: run_coloc_job() - Total: 4 activations
2024-01-30 14:51:11,359 - DEBUG - lithops.invokers[Thread-1-ex] - invokers.py:177 - ExecutorID 290acf-0 | JobID M010 - Worker processes: 1 - Chunksize: 1
2024-01-30 14:51:11,360 - DEBUG - lithops.invokers[Thread-15] - invokers.py:296 - ExecutorID 290acf-0 - Async invoker 0 started
2024-01-30 14:51:11,360 - DEBUG - lithops.invokers[Thread-16] - invokers.py:296 - ExecutorID 290acf-0 - Async invoker 1 started
2024-01-30 14:51:11,360 - DEBUG - lithops.invokers[Thread-1-ex] - invokers.py:425 - ExecutorID 290acf-0 | JobID M010 - Free workers: 1000 - Going to run 4 activations in 4 workers
2024-01-30 14:51:11,360 - INFO - lithops.invokers[Thread-1-ex] - invokers.py:208 - ExecutorID 290acf-0 | JobID M010 - View execution logs at /tmp/lithops-ubuntu/logs/290acf-0-M010.log
2024-01-30 14:51:11,361 - INFO - lithops.wait[Thread-1-ex] - wait.py:98 - ExecutorID 290acf-0 - Getting results from 4 function activations
2024-01-30 14:51:11,386 - DEBUG - lithops.invokers[ThreadPoolExecutor-5_50] - invokers.py:371 - ExecutorID 290acf-0 | JobID M010 - Calls 00001 invoked (0.024s) - Activation ID: 27c11211-4e68-4d85-a6f8-6e9701002adb
2024-01-30 14:51:11,388 - DEBUG - lithops.invokers[ThreadPoolExecutor-5_60] - invokers.py:371 - ExecutorID 290acf-0 | JobID M010 - Calls 00003 invoked (0.025s) - Activation ID: 7e566c35-c1d3-4516-ab53-29dfd32b9e48
2024-01-30 14:51:11,388 - DEBUG - lithops.invokers[ThreadPoolExecutor-5_32] - invokers.py:371 - ExecutorID 290acf-0 | JobID M010 - Calls 00002 invoked (0.026s) - Activation ID: 85545c28-d632-49e1-b143-8171e1c38fb5
2024-01-30 14:51:11,394 - DEBUG - lithops.invokers[ThreadPoolExecutor-5_17] - invokers.py:371 - ExecutorID 290acf-0 | JobID M010 - Calls 00000 invoked (0.034s) - Activation ID: 5a00f31f-ec62-49c0-8fb1-21995c88fe85
2024-01-30 14:51:13,269 - DEBUG - lithops.monitor[Thread-14] - monitor.py:409 - ExecutorID 290acf-0 - Storage job monitor finished

AWS lambda logs

27c11211-4e68-4d85-a6f8-6e9701002adb

START RequestId: 27c11211-4e68-4d85-a6f8-6e9701002adb Version: $LATEST
2024-01-30 13:51:11,437 [INFO] entry_point.py:41 -- Lithops v3.1.0 - Starting AWS Lambda execution
2024-01-30 13:51:11,437 [DEBUG] aws_s3.py:37 -- Creating S3 client
2024-01-30 13:51:11,444 [INFO] aws_s3.py:68 -- S3 client created - Region: eu-west-1
2024-01-30 13:51:11,506 [DEBUG] utils.py:43 -- Getting function and modules
2024-01-30 13:51:11,545 [DEBUG] utils.py:58 -- Writing function dependencies to /tmp/lithops-root/modules/290acf-0-M010
2024-01-30 13:51:11,553 [DEBUG] utils.py:88 -- Getting function data
2024-01-30 13:51:11,579 [INFO] handler.py:74 -- Tasks received: 1 - Worker processes: 1
2024-01-30 13:51:11,579 [INFO] handler.py:116 -- Worker process 0 started
2024-01-30 13:51:11,580 [INFO] handler.py:176 -- Lithops v3.1.0 - Starting AWS Lambda execution
2024-01-30 13:51:11,580 [INFO] handler.py:177 -- Execution ID: 290acf-0-M010/00001
2024-01-30 13:51:11,580 [DEBUG] aws_s3.py:37 -- Creating S3 client
2024-01-30 13:51:11,587 [INFO] aws_s3.py:68 -- S3 client created - Region: eu-west-1
2024-01-30 13:51:11,662 [DEBUG] handler.py:193 -- Runtime: metaspace-aws-lambda:3.1.0.d - Memory: 8192MB - Timeout: 895 seconds
2024-01-30 13:51:11,693 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/290acf-0-M010/00001/27c11211-4e68-4d85-a6f8-6e9701002adb.init - Size: 0.0B - OK
2024-01-30 13:51:11,693 [DEBUG] handler.py:205 -- Starting JobRunner process
2024-01-30 13:51:11,697 [DEBUG] jobrunner.py:203 -- Process started
/usr/local/lib/python3.8/site-packages/joblib/_multiprocessing_helpers.py:46: UserWarning: [Errno 2] No such file or directory.  joblib will operate in serial mode
warnings.warn('%s.  joblib will operate in serial mode' % (e,))
2024-01-30 13:51:12,674 [INFO] jobrunner.py:233 -- Going to execute 'run_coloc_job()'
---------------------- FUNCTION LOG ----------------------
2024-01-30 13:51:35,974 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00001/cloudobject_0 - Size: 40.2KiB - OK
2024-01-30 13:51:36,047 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00001/cloudobject_1 - Size: 45.2KiB - OK
2024-01-30 13:51:44,520 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00001/cloudobject_2 - Size: 63.3KiB - OK
2024-01-30 13:51:44,580 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00001/cloudobject_3 - Size: 76.7KiB - OK
2024-01-30 13:51:51,219 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00001/cloudobject_4 - Size: 168.5KiB - OK
2024-01-30 13:51:51,311 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00001/cloudobject_5 - Size: 175.5KiB - OK
2024-01-30 13:51:52,882 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00001/cloudobject_6 - Size: 475.3KiB - OK
2024-01-30 13:51:53,049 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00001/cloudobject_7 - Size: 499.5KiB - OK
----------------------------------------------------------
2024-01-30 13:51:53,050 [INFO] jobrunner.py:239 -- Success function execution
2024-01-30 13:51:53,051 [DEBUG] jobrunner.py:253 -- Pickling result
2024-01-30 13:51:53,051 [INFO] jobrunner.py:311 -- Process finished
2024-01-30 13:51:53,060 [DEBUG] handler.py:209 -- JobRunner process finished
2024-01-30 13:51:53,061 [INFO] status.py:88 -- Storing execution stats - Size: 3.1KiB
2024-01-30 13:51:53,144 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/290acf-0-M010/00001/status.json - Size: 3.1KiB - OK
2024-01-30 13:51:53,144 [INFO] handler.py:270 -- Finished
2024-01-30 13:51:53,147 [INFO] handler.py:138 -- Worker process 0 finished
END RequestId: 27c11211-4e68-4d85-a6f8-6e9701002adb
REPORT RequestId: 27c11211-4e68-4d85-a6f8-6e9701002adb	Duration: 41723.82 ms	Billed Duration: 41724 ms	Memory Size: 8192 MB	Max Memory Used: 6788 MB	

7e566c35-c1d3-4516-ab53-29dfd32b9e48

START RequestId: 7e566c35-c1d3-4516-ab53-29dfd32b9e48 Version: $LATEST
2024-01-30 13:51:12,611 [INFO] entry_point.py:41 -- Lithops v3.1.0 - Starting AWS Lambda execution
2024-01-30 13:51:12,744 [DEBUG] aws_s3.py:37 -- Creating S3 client
2024-01-30 13:51:12,863 [INFO] aws_s3.py:68 -- S3 client created - Region: eu-west-1
2024-01-30 13:51:12,917 [DEBUG] utils.py:43 -- Getting function and modules
2024-01-30 13:51:12,948 [DEBUG] utils.py:58 -- Writing function dependencies to /tmp/lithops-root/modules/290acf-0-M010
2024-01-30 13:51:12,957 [DEBUG] utils.py:88 -- Getting function data
2024-01-30 13:51:12,986 [INFO] handler.py:74 -- Tasks received: 1 - Worker processes: 1
2024-01-30 13:51:12,986 [INFO] handler.py:116 -- Worker process 0 started
2024-01-30 13:51:12,987 [INFO] handler.py:176 -- Lithops v3.1.0 - Starting AWS Lambda execution
2024-01-30 13:51:12,988 [INFO] handler.py:177 -- Execution ID: 290acf-0-M010/00003
2024-01-30 13:51:12,988 [DEBUG] aws_s3.py:37 -- Creating S3 client
2024-01-30 13:51:12,993 [INFO] aws_s3.py:68 -- S3 client created - Region: eu-west-1
2024-01-30 13:51:13,033 [DEBUG] handler.py:193 -- Runtime: metaspace-aws-lambda:3.1.0.d - Memory: 8192MB - Timeout: 895 seconds
2024-01-30 13:51:13,057 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/290acf-0-M010/00003/7e566c35-c1d3-4516-ab53-29dfd32b9e48.init - Size: 0.0B - OK
2024-01-30 13:51:13,058 [DEBUG] handler.py:205 -- Starting JobRunner process
2024-01-30 13:51:13,065 [DEBUG] jobrunner.py:203 -- Process started
/usr/local/lib/python3.8/site-packages/joblib/_multiprocessing_helpers.py:46: UserWarning: [Errno 2] No such file or directory.  joblib will operate in serial mode
warnings.warn('%s.  joblib will operate in serial mode' % (e,))
2024-01-30 13:51:14,097 [INFO] jobrunner.py:233 -- Going to execute 'run_coloc_job()'
---------------------- FUNCTION LOG ----------------------
2024-01-30 13:54:10,738 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00003/cloudobject_0 - Size: 6.6KiB - OK
2024-01-30 13:54:10,789 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00003/cloudobject_1 - Size: 6.7KiB - OK
2024-01-30 13:54:18,467 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00003/cloudobject_2 - Size: 41.0KiB - OK
2024-01-30 13:54:18,527 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00003/cloudobject_3 - Size: 43.3KiB - OK
2024-01-30 13:54:24,323 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00003/cloudobject_4 - Size: 218.5KiB - OK
2024-01-30 13:54:24,424 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00003/cloudobject_5 - Size: 224.4KiB - OK
2024-01-30 13:54:31,643 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00003/cloudobject_6 - Size: 3.8MiB - OK
2024-01-30 13:54:32,432 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00003/cloudobject_7 - Size: 3.2MiB - OK
----------------------------------------------------------
2024-01-30 13:54:32,441 [INFO] jobrunner.py:239 -- Success function execution
2024-01-30 13:54:32,441 [DEBUG] jobrunner.py:253 -- Pickling result
2024-01-30 13:54:32,441 [INFO] jobrunner.py:311 -- Process finished
2024-01-30 13:54:32,449 [DEBUG] handler.py:209 -- JobRunner process finished
2024-01-30 13:54:32,450 [INFO] status.py:88 -- Storing execution stats - Size: 3.1KiB
2024-01-30 13:54:32,537 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/290acf-0-M010/00003/status.json - Size: 3.1KiB - OK
2024-01-30 13:54:32,537 [INFO] handler.py:270 -- Finished
2024-01-30 13:54:32,539 [INFO] handler.py:138 -- Worker process 0 finished
END RequestId: 7e566c35-c1d3-4516-ab53-29dfd32b9e48
REPORT RequestId: 7e566c35-c1d3-4516-ab53-29dfd32b9e48	Duration: 199932.33 ms	Billed Duration: 200732 ms	Memory Size: 8192 MB	Max Memory Used: 6169 MB	Init Duration: 798.82 ms	

85545c28-d632-49e1-b143-8171e1c38fb5

START RequestId: 85545c28-d632-49e1-b143-8171e1c38fb5 Version: $LATEST
2024-01-30 13:51:12,431 [INFO] entry_point.py:41 -- Lithops v3.1.0 - Starting AWS Lambda execution
2024-01-30 13:51:12,571 [DEBUG] aws_s3.py:37 -- Creating S3 client
2024-01-30 13:51:12,697 [INFO] aws_s3.py:68 -- S3 client created - Region: eu-west-1
2024-01-30 13:51:12,764 [DEBUG] utils.py:43 -- Getting function and modules
2024-01-30 13:51:12,796 [DEBUG] utils.py:58 -- Writing function dependencies to /tmp/lithops-root/modules/290acf-0-M010
2024-01-30 13:51:12,805 [DEBUG] utils.py:88 -- Getting function data
2024-01-30 13:51:12,827 [INFO] handler.py:74 -- Tasks received: 1 - Worker processes: 1
2024-01-30 13:51:12,827 [INFO] handler.py:116 -- Worker process 0 started
2024-01-30 13:51:12,829 [INFO] handler.py:176 -- Lithops v3.1.0 - Starting AWS Lambda execution
2024-01-30 13:51:12,829 [INFO] handler.py:177 -- Execution ID: 290acf-0-M010/00002
2024-01-30 13:51:12,829 [DEBUG] aws_s3.py:37 -- Creating S3 client
2024-01-30 13:51:12,835 [INFO] aws_s3.py:68 -- S3 client created - Region: eu-west-1
2024-01-30 13:51:12,897 [DEBUG] handler.py:193 -- Runtime: metaspace-aws-lambda:3.1.0.d - Memory: 8192MB - Timeout: 895 seconds
2024-01-30 13:51:12,921 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/290acf-0-M010/00002/85545c28-d632-49e1-b143-8171e1c38fb5.init - Size: 0.0B - OK
2024-01-30 13:51:12,921 [DEBUG] handler.py:205 -- Starting JobRunner process
2024-01-30 13:51:12,927 [DEBUG] jobrunner.py:203 -- Process started
/usr/local/lib/python3.8/site-packages/joblib/_multiprocessing_helpers.py:46: UserWarning: [Errno 2] No such file or directory.  joblib will operate in serial mode
warnings.warn('%s.  joblib will operate in serial mode' % (e,))
2024-01-30 13:51:14,098 [INFO] jobrunner.py:233 -- Going to execute 'run_coloc_job()'
---------------------- FUNCTION LOG ----------------------
/usr/local/lib/python3.8/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py:1593: RuntimeWarning: k >= N for N * N square matrix. Attempting to use scipy.linalg.eigh instead.
warnings.warn("k >= N for N * N square matrix. "
2024-01-30 13:51:36,509 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00002/cloudobject_0 - Size: 3.5KiB - OK
2024-01-30 13:51:36,538 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00002/cloudobject_1 - Size: 3.7KiB - OK
2024-01-30 13:51:45,758 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00002/cloudobject_2 - Size: 10.7KiB - OK
2024-01-30 13:51:45,797 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00002/cloudobject_3 - Size: 12.0KiB - OK
2024-01-30 13:51:54,528 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00002/cloudobject_4 - Size: 29.5KiB - OK
2024-01-30 13:51:54,595 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00002/cloudobject_5 - Size: 33.2KiB - OK
2024-01-30 13:51:56,224 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00002/cloudobject_6 - Size: 298.1KiB - OK
2024-01-30 13:51:56,352 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00002/cloudobject_7 - Size: 324.6KiB - OK
----------------------------------------------------------
2024-01-30 13:51:56,353 [INFO] jobrunner.py:239 -- Success function execution
2024-01-30 13:51:56,353 [DEBUG] jobrunner.py:253 -- Pickling result
2024-01-30 13:51:56,354 [INFO] jobrunner.py:311 -- Process finished
2024-01-30 13:51:56,361 [DEBUG] handler.py:209 -- JobRunner process finished
2024-01-30 13:51:56,362 [INFO] status.py:88 -- Storing execution stats - Size: 3.2KiB
2024-01-30 13:51:56,486 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/290acf-0-M010/00002/status.json - Size: 3.2KiB - OK
2024-01-30 13:51:56,487 [INFO] handler.py:270 -- Finished
2024-01-30 13:51:56,489 [INFO] handler.py:138 -- Worker process 0 finished
END RequestId: 85545c28-d632-49e1-b143-8171e1c38fb5
REPORT RequestId: 85545c28-d632-49e1-b143-8171e1c38fb5	Duration: 44060.35 ms	Billed Duration: 44712 ms	Memory Size: 8192 MB	Max Memory Used: 1587 MB	Init Duration: 651.44 ms

5a00f31f-ec62-49c0-8fb1-21995c88fe85

START RequestId: 5a00f31f-ec62-49c0-8fb1-21995c88fe85 Version: $LATEST
2024-01-30 13:51:12,552 [INFO] entry_point.py:41 -- Lithops v3.1.0 - Starting AWS Lambda execution
2024-01-30 13:51:12,697 [DEBUG] aws_s3.py:37 -- Creating S3 client
2024-01-30 13:51:12,827 [INFO] aws_s3.py:68 -- S3 client created - Region: eu-west-1
2024-01-30 13:51:12,880 [DEBUG] utils.py:43 -- Getting function and modules
2024-01-30 13:51:12,919 [DEBUG] utils.py:58 -- Writing function dependencies to /tmp/lithops-root/modules/290acf-0-M010
2024-01-30 13:51:12,929 [DEBUG] utils.py:88 -- Getting function data
2024-01-30 13:51:12,952 [INFO] handler.py:74 -- Tasks received: 1 - Worker processes: 1
2024-01-30 13:51:12,953 [INFO] handler.py:116 -- Worker process 0 started
2024-01-30 13:51:12,954 [INFO] handler.py:176 -- Lithops v3.1.0 - Starting AWS Lambda execution
2024-01-30 13:51:12,954 [INFO] handler.py:177 -- Execution ID: 290acf-0-M010/00000
2024-01-30 13:51:12,954 [DEBUG] aws_s3.py:37 -- Creating S3 client
2024-01-30 13:51:12,961 [INFO] aws_s3.py:68 -- S3 client created - Region: eu-west-1
2024-01-30 13:51:13,028 [DEBUG] handler.py:193 -- Runtime: metaspace-aws-lambda:3.1.0.d - Memory: 8192MB - Timeout: 895 seconds
2024-01-30 13:51:13,054 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/290acf-0-M010/00000/5a00f31f-ec62-49c0-8fb1-21995c88fe85.init - Size: 0.0B - OK
2024-01-30 13:51:13,054 [DEBUG] handler.py:205 -- Starting JobRunner process
2024-01-30 13:51:13,060 [DEBUG] jobrunner.py:203 -- Process started
/usr/local/lib/python3.8/site-packages/joblib/_multiprocessing_helpers.py:46: UserWarning: [Errno 2] No such file or directory.  joblib will operate in serial mode
warnings.warn('%s.  joblib will operate in serial mode' % (e,))
2024-01-30 13:51:14,313 [INFO] jobrunner.py:233 -- Going to execute 'run_coloc_job()'
---------------------- FUNCTION LOG ----------------------
2024-01-30 13:54:45,502 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00000/cloudobject_0 - Size: 6.8KiB - OK
2024-01-30 13:54:45,530 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00000/cloudobject_1 - Size: 7.4KiB - OK
2024-01-30 13:54:54,348 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00000/cloudobject_2 - Size: 44.6KiB - OK
2024-01-30 13:54:54,397 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00000/cloudobject_3 - Size: 48.9KiB - OK
2024-01-30 13:54:58,975 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00000/cloudobject_4 - Size: 353.8KiB - OK
2024-01-30 13:54:59,100 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00000/cloudobject_5 - Size: 345.3KiB - OK
2024-01-30 13:55:06,711 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00000/cloudobject_6 - Size: 3.7MiB - OK
2024-01-30 13:55:07,668 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/tmp/290acf-0-M010-00000/cloudobject_7 - Size: 3.2MiB - OK
----------------------------------------------------------
2024-01-30 13:55:07,677 [INFO] jobrunner.py:239 -- Success function execution
2024-01-30 13:55:07,677 [DEBUG] jobrunner.py:253 -- Pickling result
2024-01-30 13:55:07,678 [INFO] jobrunner.py:311 -- Process finished
2024-01-30 13:55:07,687 [DEBUG] handler.py:209 -- JobRunner process finished
2024-01-30 13:55:07,688 [INFO] status.py:88 -- Storing execution stats - Size: 3.1KiB
2024-01-30 13:55:07,773 [DEBUG] aws_s3.py:104 -- PUT Object lithops.jobs/290acf-0-M010/00000/status.json - Size: 3.1KiB - OK
2024-01-30 13:55:07,773 [INFO] handler.py:270 -- Finished
2024-01-30 13:55:07,776 [INFO] handler.py:138 -- Worker process 0 finished
END RequestId: 5a00f31f-ec62-49c0-8fb1-21995c88fe85
REPORT RequestId: 5a00f31f-ec62-49c0-8fb1-21995c88fe85	Duration: 235225.87 ms	Billed Duration: 236031 ms	Memory Size: 8192 MB	Max Memory Used: 6227 MB	Init Duration: 804.27 ms

My only hypothesis is that due to the fact that we are restarting executors repeatedly with double the amount of memory, something happens in the middle of lithops. Although we do the same logic for other steps on AWS and did it earlier with IBM. I'm also wondering why there are no records for the last call right away

2024-01-30 14:50:13,230 - DEBUG - lithops.monitor[Thread-14] - monitor.py:127 - ExecutorID 290acf-0 - Pending: 0 - Running: 4 - Done: 0
2024-01-30 14:50:45,769 - DEBUG - lithops.monitor[Thread-14] - monitor.py:127 - ExecutorID 290acf-0 - Pending: 0 - Running: 4 - Done: 0
2024-01-30 14:50:57,993 - DEBUG - lithops.monitor[Thread-14] - monitor.py:127 - ExecutorID 290acf-0 - Pending: 0 - Running: 2 - Done: 2

They should be at least once every 30 seconds as far as I can see from the source code.

@sergii-mamedov
Copy link
Author

ok i tried setting up 8 GB RAM and processing went through without any problems. Do you have any ideas why this happened? Could there have been any changes to the code responsible for this?

2024-01-30 15:48:24,851 - INFO - engine.lithops-wrapper[Thread-1] - executor.py:227 - executor.map(run_coloc_job, 4 items, 8192MB, attempt 1)
2024-01-30 15:48:24,851 - DEBUG - engine.lithops-wrapper[Thread-1] - executor.py:357 - Selected executor aws_lambda
2024-01-30 15:48:24,851 - INFO - lithops.invokers[Thread-1-ex] - invokers.py:107 - ExecutorID 66ff3d-5 | JobID M009 - Selected Runtime: metaspace-aws-lambda:3.1.0.d - 8192MB
2024-01-30 15:48:24,851 - DEBUG - lithops.storage.storage[Thread-1-ex] - storage.py:470 - Runtime metadata found in local memory cache
2024-01-30 15:48:24,851 - DEBUG - lithops.job.job[Thread-1-ex] - job.py:242 - ExecutorID 66ff3d-5 | JobID M009 - Serializing function and data
2024-01-30 15:48:24,853 - DEBUG - lithops.job.serialize[Thread-1-ex] - serialize.py:97 - Include Modules: sm
2024-01-30 15:48:24,853 - DEBUG - lithops.job.serialize[Thread-1-ex] - serialize.py:113 - Module 'sm' found in /opt/dev/metaspace/metaspace/engine/sm
2024-01-30 15:48:24,853 - DEBUG - lithops.job.serialize[Thread-1-ex] - serialize.py:118 - Modules to transmit: /opt/dev/metaspace/metaspace/engine/sm
2024-01-30 15:48:24,858 - DEBUG - lithops.job.job[Thread-1-ex] - job.py:276 - ExecutorID 66ff3d-5 | JobID M009 - Uploading function and modules to the storage backend
2024-01-30 15:48:24,952 - DEBUG - lithops.storage.backends.aws_s3.aws_s3[Thread-1-ex] - aws_s3.py:104 - PUT Object lithops.jobs/66ff3d-5/6433e09d0a451a4926949a32125022f6.func.pickle - Size: 760.7KiB - OK
2024-01-30 15:48:24,952 - DEBUG - lithops.job.job[Thread-1-ex] - job.py:302 - ExecutorID 66ff3d-5 | JobID M009 - Uploading data to the storage backend
2024-01-30 15:48:25,012 - DEBUG - lithops.storage.backends.aws_s3.aws_s3[Thread-1-ex] - aws_s3.py:104 - PUT Object lithops.jobs/66ff3d-5-M009/aggdata.pickle - Size: 519.0KiB - OK
2024-01-30 15:48:25,012 - INFO - lithops.invokers[Thread-1-ex] - invokers.py:172 - ExecutorID 66ff3d-5 | JobID M009 - Starting function invocation: run_coloc_job() - Total: 4 activations
2024-01-30 15:48:25,012 - DEBUG - lithops.invokers[Thread-1-ex] - invokers.py:177 - ExecutorID 66ff3d-5 | JobID M009 - Worker processes: 1 - Chunksize: 1
2024-01-30 15:48:25,012 - DEBUG - lithops.invokers[Thread-1-ex] - invokers.py:425 - ExecutorID 66ff3d-5 | JobID M009 - Free workers: 564 - Going to run 4 activations in 4 workers
2024-01-30 15:48:25,012 - INFO - lithops.invokers[Thread-1-ex] - invokers.py:208 - ExecutorID 66ff3d-5 | JobID M009 - View execution logs at /tmp/lithops-ubuntu/logs/66ff3d-5-M009.log
2024-01-30 15:48:25,012 - DEBUG - lithops.monitor[Thread-27] - monitor.py:381 - ExecutorID 66ff3d-5 - Starting Storage job monitor
2024-01-30 15:48:25,012 - INFO - lithops.wait[Thread-1-ex] - wait.py:98 - ExecutorID 66ff3d-5 - Getting results from 4 function activations
2024-01-30 15:48:25,035 - DEBUG - lithops.invokers[ThreadPoolExecutor-231_48] - invokers.py:371 - ExecutorID 66ff3d-5 | JobID M009 - Calls 00001 invoked (0.022s) - Activation ID: 5924283e-3222-4f2a-8199-abdcc18081ab
2024-01-30 15:48:25,037 - DEBUG - lithops.invokers[ThreadPoolExecutor-231_33] - invokers.py:371 - ExecutorID 66ff3d-5 | JobID M009 - Calls 00000 invoked (0.025s) - Activation ID: cf1b05e8-91e9-40d6-bada-7fddfa8ea38e
2024-01-30 15:48:25,041 - DEBUG - lithops.invokers[ThreadPoolExecutor-231_25] - invokers.py:371 - ExecutorID 66ff3d-5 | JobID M009 - Calls 00003 invoked (0.026s) - Activation ID: d422eda5-73fe-4241-b705-612898a7b115
2024-01-30 15:48:25,045 - DEBUG - lithops.invokers[ThreadPoolExecutor-231_55] - invokers.py:371 - ExecutorID 66ff3d-5 | JobID M009 - Calls 00002 invoked (0.031s) - Activation ID: 734ddf49-75c0-4ba6-a7be-fd8db5fbfa66
2024-01-30 15:48:27,074 - DEBUG - lithops.monitor[Thread-27] - monitor.py:127 - ExecutorID 66ff3d-5 - Pending: 1 - Running: 3 - Done: 0
2024-01-30 15:48:29,113 - DEBUG - lithops.monitor[Thread-27] - monitor.py:127 - ExecutorID 66ff3d-5 - Pending: 0 - Running: 4 - Done: 0
2024-01-30 15:49:01,760 - DEBUG - lithops.monitor[Thread-27] - monitor.py:127 - ExecutorID 66ff3d-5 - Pending: 0 - Running: 4 - Done: 0
2024-01-30 15:49:09,945 - DEBUG - lithops.monitor[Thread-27] - monitor.py:127 - ExecutorID 66ff3d-5 - Pending: 0 - Running: 3 - Done: 1
2024-01-30 15:49:10,068 - DEBUG - lithops.future[ThreadPoolExecutor-813_0] - future.py:273 - ExecutorID 66ff3d-5 | JobID M009 - Got status from call 00001 - Activation ID: 5924283e-3222-4f2a-8199-abdcc18081ab - Time: 42.89 seconds
2024-01-30 15:49:10,068 - DEBUG - lithops.future[ThreadPoolExecutor-813_0] - future.py:293 - ExecutorID 66ff3d-5 | JobID M009 - Got output from call 00001 - Activation ID: 5924283e-3222-4f2a-8199-abdcc18081ab
2024-01-30 15:49:14,562 - DEBUG - lithops.monitor[Thread-27] - monitor.py:127 - ExecutorID 66ff3d-5 - Pending: 0 - Running: 2 - Done: 2
2024-01-30 15:49:15,075 - DEBUG - lithops.future[ThreadPoolExecutor-822_0] - future.py:273 - ExecutorID 66ff3d-5 | JobID M009 - Got status from call 00002 - Activation ID: 734ddf49-75c0-4ba6-a7be-fd8db5fbfa66 - Time: 45.39 seconds
2024-01-30 15:49:15,075 - DEBUG - lithops.future[ThreadPoolExecutor-822_0] - future.py:293 - ExecutorID 66ff3d-5 | JobID M009 - Got output from call 00002 - Activation ID: 734ddf49-75c0-4ba6-a7be-fd8db5fbfa66
2024-01-30 15:49:45,651 - DEBUG - lithops.monitor[Thread-27] - monitor.py:127 - ExecutorID 66ff3d-5 - Pending: 0 - Running: 2 - Done: 2
2024-01-30 15:50:18,219 - DEBUG - lithops.monitor[Thread-27] - monitor.py:127 - ExecutorID 66ff3d-5 - Pending: 0 - Running: 2 - Done: 2
2024-01-30 15:50:50,804 - DEBUG - lithops.monitor[Thread-27] - monitor.py:127 - ExecutorID 66ff3d-5 - Pending: 0 - Running: 2 - Done: 2
2024-01-30 15:51:23,344 - DEBUG - lithops.monitor[Thread-27] - monitor.py:127 - ExecutorID 66ff3d-5 - Pending: 0 - Running: 2 - Done: 2
2024-01-30 15:51:55,855 - DEBUG - lithops.monitor[Thread-27] - monitor.py:127 - ExecutorID 66ff3d-5 - Pending: 0 - Running: 2 - Done: 2
2024-01-30 15:52:08,207 - DEBUG - lithops.monitor[Thread-27] - monitor.py:127 - ExecutorID 66ff3d-5 - Pending: 0 - Running: 1 - Done: 3
2024-01-30 15:52:08,279 - DEBUG - lithops.future[ThreadPoolExecutor-1082_0] - future.py:273 - ExecutorID 66ff3d-5 | JobID M009 - Got status from call 00003 - Activation ID: d422eda5-73fe-4241-b705-612898a7b115 - Time: 221.39 seconds
2024-01-30 15:52:08,279 - DEBUG - lithops.future[ThreadPoolExecutor-1082_0] - future.py:293 - ExecutorID 66ff3d-5 | JobID M009 - Got output from call 00003 - Activation ID: d422eda5-73fe-4241-b705-612898a7b115
2024-01-30 15:52:20,908 - DEBUG - lithops.monitor[Thread-27] - monitor.py:127 - ExecutorID 66ff3d-5 - Pending: 0 - Running: 0 - Done: 4
2024-01-30 15:52:20,908 - DEBUG - lithops.monitor[Thread-27] - monitor.py:409 - ExecutorID 66ff3d-5 - Storage job monitor finished
2024-01-30 15:52:21,296 - DEBUG - lithops.future[ThreadPoolExecutor-1103_0] - future.py:273 - ExecutorID 66ff3d-5 | JobID M009 - Got status from call 00000 - Activation ID: cf1b05e8-91e9-40d6-bada-7fddfa8ea38e - Time: 232.55 seconds
2024-01-30 15:52:21,296 - DEBUG - lithops.future[ThreadPoolExecutor-1103_0] - future.py:293 - ExecutorID 66ff3d-5 | JobID M009 - Got output from call 00000 - Activation ID: cf1b05e8-91e9-40d6-bada-7fddfa8ea38e
2024-01-30 15:52:21,297 - INFO - lithops.executors[Thread-1-ex] - executors.py:596 - ExecutorID 66ff3d-5 - Cleaning temporary data
2024-01-30 15:52:21,345 - DEBUG - lithops.executors[Thread-1-ex] - executors.py:506 - ExecutorID 66ff3d-5 - Finished getting results

@JosepSampe
Copy link
Member

JosepSampe commented Feb 1, 2024

If the problem is related to the memory, can you check in the function stats the worker_peak_memory_start and worker_peak_memory_end to see if the function that fails has a "weird" memory consumption?

Additionally, in your function's code, you can use the get_memory_usage() function to log the memory usage at the moment this function is called. You can print(get_memory_usage()) multiple times inside you code and see in the logs how the memory usage evolves and see which instruction produces the highest memory consumption.

Edit: I now see that Lamba shows the memory usage at the end of the logs..

So, the issue appears when you first invoke with 4GB, one or more function calls crash because of an OOM, and then you reinvoke the same function with 8GB? If you directly invoke with 8GB it always works fine?

@sergii-mamedov
Copy link
Author

sergii-mamedov commented Feb 2, 2024

We have wrappers around the Lithops Executor class and also around the map function.

In them, we implemented the possibility of choosing the type of executor in accordance with the required amount of RAM. We also monitor MemoryError, TimeoutError and invoke the function with more RAM.

An example of the code that implements this:

            futures, return_vals, exc = self._dispatch_map(
                wrapper_func, func_args, runtime_memory, debug_run_locally, lithops_kwargs
            )

           ................

            if (
                isinstance(exc, (MemoryError, TimeoutError, OSError))
                and runtime_memory <= max(MEM_LIMITS.values())
                and (max_memory is None or runtime_memory < max_memory)
            ):
                attempt += 1
                old_memory = runtime_memory
                runtime_memory *= 2

           ................

            futures, return_vals, exc = self._dispatch_map(
                wrapper_func, func_args, runtime_memory, debug_run_locally, lithops_kwargs
            )

This approach worked great with IBM Cloud Functions, IBM Code Engine, IBM VPC. After migrating to AWS, this approach works, when after invoking the lambda function we get a MemoryError (not enough 8 GB RAM in our case) and reinvoke on EC2.
In the case when the first invoke (aws lambda) requires 4 GB and crashes due to OOM, after that we run the same function with 8 GB of RAM, which leads to the problem described above.

My hypothesis is that lithops check the status of running executors via JobMonitor. During the restart that we do on our side (4 GB -> 8GB), `Activation ID' changes, but Job Monitor only knows about past executors. The last line of the log that I cited inspired me to this hypothesis

2024-01-30 14:51:13,269 - DEBUG - lithops.monitor[Thread-14] - monitor.py:409 - ExecutorID 290acf-0 - Storage job monitor finished

P.S> Yes, if I immediately allocate 8 GB of RAM, everything works without a problem.
P.S.S> this problem occurs no matter what initial amount of RAM we used.

@abourramouss
Copy link
Contributor

abourramouss commented Feb 5, 2024

Can you check the ephemeral storage assigned to each worker?

From my experience, workers with low memory do not throw exceptions, they take more time since each worker has less cpus assigned to them (Unless you detect that via your custom executor/map)

On the other hand, lambdas with low ephemeral memory throw memory-related exceptions if the threshold is surpassed.

@sergii-mamedov
Copy link
Author

@abourramouss
We do not use ephemeral storage at all.
In addition, this problem appears for different amounts of RAM, from 1 to 4 GB inclusive.

@JosepSampe
Copy link
Member

Hi @sergii-mamedov. I added this patch that is aimed to fix the issue you experienced in this thread. Could you test with master branch?

@sergii-mamedov
Copy link
Author

Thanks @JosepSampe . I will test it tomorrow.

@sergii-mamedov
Copy link
Author

@JosepSampe Works well. Waiting for new release :)

@JosepSampe
Copy link
Member

Hi @sergii-mamedov, I just created version 3.1.1

@sergii-mamedov
Copy link
Author

thanks a lot @JosepSampe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants