-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when running distributed processes #3
Comments
This seems to be due to IPOPT or other packages used not being thread-safe. Currently, the test suite uses Sys.CPU_THREADS - 1 to set the number of worker processes to create and uses during distributed processing. Testing with PC with a AMD FX(tm)-8350 Eight-Core Processor, 4000 Mhz, 4 Core(s), 8 Logical Processor(s), I can reliably run distributed sample creation functions with nproc specified as 5 (This results in 4 workers being used). Increasing this number by one resulted in a ReadOnlyMemoryError(). On a laptop with an Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz, 2112 Mhz, 4 Core(s), 8 Logical Processor(s), nproc=8 consistently works though even though the processor has the same number of cores. Note that running with a fewer number of cores (equal or less than the number of physical cores) does not appear to prevent the same error occuring when running dist_create_samples in the test suite as opposed to running the commands manually.. |
Possibly relevant when trying to automatically set up distributed processes: https://github.com/lanl-ansi/PowerModelsSecurityConstrained.jl/blob/master/src/scripts/distributed.jl |
When running distributed functions on certain computers a EXCEPTION_ACCESS_VIOLATION or ReadOnlyMemoryError error occurs on one or more worker processes. This seems to occur when running the line
with multiple workers that have not already had OPFLearn initialized.
Example Error:
The text was updated successfully, but these errors were encountered: