Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OLS_C Ramp Fitting #8707

Closed
Chrrrrris opened this issue Aug 14, 2024 · 10 comments
Closed

OLS_C Ramp Fitting #8707

Chrrrrris opened this issue Aug 14, 2024 · 10 comments

Comments

@Chrrrrris
Copy link

Chrrrrris commented Aug 14, 2024

Hi! When I ran the ramp fitting step, the program ran into an infinite loop at some stage and raised the same error over and over again. I suspect it has something to do with the multiprocessing. When I turn off the multiprocessing, this step runs perfectly. It could also be related to the memory leak issue mentioned here (#8680.

@braingram
Copy link
Collaborator

Thanks for opening the issue.

What's the error message?

Also, would you provide:

  • what command you're running that causes the error
  • a few versions for:
    • jwst
    • python
    • stcal
    • operating system

A full pip freeze output for the python environment would also be appreciated. Let me know if you have any questions.

@Chrrrrris
Copy link
Author

Chrrrrris commented Aug 14, 2024

Hi! Yes, I can provide the error message here


Traceback (most recent call last):
  File "/home/local/WIN/lwang178/anaconda3/envs/jwst_reduction/lib/python3.10/multiprocessing/forkserver.py", line 274, in main
    code = _serve_one(child_r, fds,
  File "/home/local/WIN/lwang178/anaconda3/envs/jwst_reduction/lib/python3.10/multiprocessing/forkserver.py", line 313, in _serve_one
    code = spawn._main(child_r, parent_sentinel)
  File "/home/local/WIN/lwang178/anaconda3/envs/jwst_reduction/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
    prepare(preparation_data)
  File "/home/local/WIN/lwang178/anaconda3/envs/jwst_reduction/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/home/local/WIN/lwang178/anaconda3/envs/jwst_reduction/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "/home/local/WIN/lwang178/anaconda3/envs/jwst_reduction/lib/python3.10/runpy.py", line 269, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/home/local/WIN/lwang178/anaconda3/envs/jwst_reduction/lib/python3.10/runpy.py", line 96, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/home/local/WIN/lwang178/anaconda3/envs/jwst_reduction/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/lwang178/Firefly/Firefly_dev/FIREFly_v3.6_stage0_to_stage1.py", line 286, in <module>
    delta_sb_each_group_all_segments = calibrate_stage_0(uncal_path, save_path, planet, output_file, jump_rejection_threshold, n_pix_grow_sat, maximum_cores,stage_2_step = 'assignwcs')
  File "/home/lwang178/Firefly/Firefly_dev/FIREFly_v3.6_stage0_to_stage1.py", line 227, in calibrate_stage_0
    cal_input = firefly_roof_temporal_step(cal_input,out,products_dir,ncores,temp_smooth_width)
  File "/home/lwang178/Firefly/Firefly_dev/firefly_v3_6.py", line 291, in firefly_roof_temporal_step
    clean_dict = collate_output(multithreaded_process(running_temporal_median, ncores, arange(len(cal_sig)), \
  File "/home/lwang178/Firefly/Firefly_dev/firefly_v3_6.py", line 2998, in multithreaded_process
    outputs.append(Parallel(n_jobs = nthreads)(delayed(function)(i, *parameters) for i in inputs))
  File "/home/local/WIN/lwang178/anaconda3/envs/jwst_reduction/lib/python3.10/site-packages/joblib/parallel.py", line 2005, in __call__
    next(output)
  File "/home/local/WIN/lwang178/anaconda3/envs/jwst_reduction/lib/python3.10/site-packages/joblib/parallel.py", line 1643, in _get_outputs
    self._start(iterator, pre_dispatch)
  File "/home/local/WIN/lwang178/anaconda3/envs/jwst_reduction/lib/python3.10/site-packages/joblib/parallel.py", line 1626, in _start
    if self.dispatch_one_batch(iterator):
  File "/home/local/WIN/lwang178/anaconda3/envs/jwst_reduction/lib/python3.10/site-packages/joblib/parallel.py", line 1517, in dispatch_one_batch
    self._dispatch(tasks)
  File "/home/local/WIN/lwang178/anaconda3/envs/jwst_reduction/lib/python3.10/site-packages/joblib/parallel.py", line 1418, in _dispatch
    job = self._backend.apply_async(batch, callback=batch_tracker)
  File "/home/local/WIN/lwang178/anaconda3/envs/jwst_reduction/lib/python3.10/site-packages/joblib/_parallel_backends.py", line 588, in apply_async
    future = self._workers.submit(func)
  File "/home/local/WIN/lwang178/anaconda3/envs/jwst_reduction/lib/python3.10/site-packages/joblib/externals/loky/reusable_executor.py", line 225, in submit
    return super().submit(fn, *args, **kwargs)
  File "/home/local/WIN/lwang178/anaconda3/envs/jwst_reduction/lib/python3.10/site-packages/joblib/externals/loky/process_executor.py", line 1248, in submit
    self._ensure_executor_running()
  File "/home/local/WIN/lwang178/anaconda3/envs/jwst_reduction/lib/python3.10/site-packages/joblib/externals/loky/process_executor.py", line 1220, in _ensure_executor_running
    self._adjust_process_count()
  File "/home/local/WIN/lwang178/anaconda3/envs/jwst_reduction/lib/python3.10/site-packages/joblib/externals/loky/process_executor.py", line 1209, in _adjust_process_count
    p.start()
  File "/home/local/WIN/lwang178/anaconda3/envs/jwst_reduction/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/home/local/WIN/lwang178/anaconda3/envs/jwst_reduction/lib/python3.10/site-packages/joblib/externals/loky/backend/process.py", line 45, in _Popen
    return Popen(process_obj)
  File "/home/local/WIN/lwang178/anaconda3/envs/jwst_reduction/lib/python3.10/site-packages/joblib/externals/loky/backend/popen_loky_posix.py", line 48, in __init__
    self._launch(process_obj)
  File "/home/local/WIN/lwang178/anaconda3/envs/jwst_reduction/lib/python3.10/site-packages/joblib/externals/loky/backend/popen_loky_posix.py", line 99, in _launch
    prep_data = spawn.get_preparation_data(
  File "/home/local/WIN/lwang178/anaconda3/envs/jwst_reduction/lib/python3.10/site-packages/joblib/externals/loky/backend/spawn.py", line 61, in get_preparation_data
    _check_not_importing_main()
  File "/home/local/WIN/lwang178/anaconda3/envs/jwst_reduction/lib/python3.10/site-packages/joblib/externals/loky/backend/spawn.py", line 39, in _check_not_importing_main
    raise RuntimeError(
RuntimeError: An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:

    if __name__ == '__main__':
        freeze_support()
        ...

The "freeze_support()" line can be omitted if the program

This error message was printed out indefinitely.

The command I ran was cal_input = calwebb_detector1.ramp_fit_step.RampFitStep.call(products_dir + out + '_linearitystep.fits',maximum_cores = maximum_cores, output_dir = products_dir, output_file = out). I can't provide the full script here, but we can chat privately if possible.

Oddly, when I ran the exact same code on a jupyter notebook, it runs perfectly.

The full environment that my code was running is here:

anyio==4.4.0 argon2-cffi==23.1.0 argon2-cffi-bindings==21.2.0 arrow==1.3.0 asdf==3.4.0 asdf-astropy==0.6.1 asdf_coordinates_schemas==0.3.0 asdf_standard==1.1.1 asdf_transform_schemas==0.5.0 asdf_wcs_schemas==0.4.0 asteval==1.0.2 astropy==6.1.2 astropy-iers-data==0.2024.8.12.0.32.58 asttokens @ file:///opt/conda/conda-bld/asttokens_1646925590279/work async-lru==2.0.4 attrs==24.2.0 babel==2.16.0 backcall @ file:///home/ktietz/src/ci/backcall_1611930011877/work batman-package==2.4.9 BayesicFitting==3.2.1 beautifulsoup4==4.12.3 bleach==6.1.0 certifi==2024.7.4 cffi==1.17.0 charset-normalizer==3.3.2 comm @ file:///croot/comm_1709322850197/work contourpy==1.2.1 crds==11.18.1 cycler==0.12.1 debugpy @ file:///home/builder/ci_310/debugpy_1640789504635/work decorator @ file:///opt/conda/conda-bld/decorator_1643638310831/work defusedxml==0.7.1 dill==0.3.8 drizzle==1.15.2 entrypoints @ file:///tmp/build/80754af9/entrypoints_1649908313000/work exceptiongroup @ file:///croot/exceptiongroup_1706031385326/work executing @ file:///opt/conda/conda-bld/executing_1646925071911/work fastjsonschema==2.20.0 filelock==3.15.4 fonttools==4.53.1 fqdn==1.5.1 future==1.0.0 gwcs==0.21.0 h11==0.14.0 httpcore==1.0.5 httpx==0.27.0 idna==3.7 imageio==2.35.0 importlib_metadata==8.2.0 ipykernel @ file:///croot/ipykernel_1671488378391/work ipynbname==2024.1.0.0 ipython @ file:///croot/ipython_1718287989724/work isoduration==20.11.0 jedi @ file:///croot/jedi_1721058342488/work Jinja2==3.1.4 jmespath==1.0.1 joblib==1.4.2 json5==0.9.25 jsonpointer==3.0.0 jsonschema==4.23.0 jsonschema-specifications==2023.12.1 jupyter-events==0.10.0 jupyter-lsp==2.2.5 jupyter_client==8.6.2 jupyter_core @ file:///croot/jupyter_core_1718818295206/work jupyter_server==2.14.2 jupyter_server_terminals==0.5.3 jupyterlab==4.2.4 jupyterlab_pygments==0.3.0 jupyterlab_server==2.27.3 jwst==1.15.1 kiwisolver==1.4.5 lacosmic==1.1.0 lazy_loader==0.4 lmfit==1.3.2 MarkupSafe==2.1.5 matplotlib==3.9.2 matplotlib-inline @ file:///opt/conda/conda-bld/matplotlib-inline_1662014470464/work mistune==3.0.2 nbclient==0.10.0 nbconvert==7.16.4 nbformat==5.10.4 nest-asyncio @ file:///croot/nest-asyncio_1708532673751/work networkx==3.3 notebook_shim==0.2.4 numpy==1.26.4 opencv-python-headless==4.10.0.84 overrides==7.7.0 packaging @ file:///croot/packaging_1720101850331/work pandas==2.2.2 pandocfilters==1.5.1 Parsley==1.3 parso @ file:///opt/conda/conda-bld/parso_1641458642106/work patsy==0.5.6 pexpect @ file:///tmp/build/80754af9/pexpect_1605563209008/work photutils==1.13.0 pickleshare @ file:///tmp/build/80754af9/pickleshare_1606932040724/work pillow==10.4.0 platformdirs @ file:///croot/platformdirs_1692205439124/work poppy==1.1.1 prometheus_client==0.20.0 prompt-toolkit @ file:///croot/prompt-toolkit_1704404351921/work psutil @ file:///home/builder/ci_310/psutil_1640792629460/work ptyprocess @ file:///tmp/build/80754af9/ptyprocess_1609355006118/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl pure-eval @ file:///opt/conda/conda-bld/pure_eval_1646925070566/work pycparser==2.22 pyerfa==2.0.1.4 Pygments @ file:///croot/pygments_1684279966437/work pyparsing==3.1.2 python-dateutil @ file:///tmp/build/80754af9/python-dateutil_1626374649649/work python-json-logger==2.0.7 pytz==2024.1 PyYAML==6.0.2 pyzmq==26.1.0 referencing==0.35.1 requests==2.32.3 rfc3339-validator==0.1.4 rfc3986-validator==0.1.1 rpds-py==0.20.0 scikit-image==0.24.0 scikit-learn==1.5.1 scipy==1.14.0 semantic-version==2.10.0 Send2Trash==1.8.3 six @ file:///tmp/build/80754af9/six_1644875935023/work sniffio==1.3.1 soupsieve==2.6 spherical_geometry==1.3.2 stack-data @ file:///opt/conda/conda-bld/stack_data_1646927590127/work statsmodels==0.14.2 stcal==1.7.3 stdatamodels==2.0.0 stpipe==0.6.0 stsci.image==2.3.9 stsci.imagestats==1.8.3 stsci.stimage==0.2.9 synphot==1.4.0 terminado==0.18.1 threadpoolctl==3.5.0 tifffile==2024.8.10 tinycss2==1.3.0 tomli==2.0.1 tornado==6.4.1 tqdm==4.66.5 traitlets @ file:///croot/traitlets_1718227057033/work tweakwcs==0.8.8 types-python-dateutil==2.9.0.20240316 typing_extensions @ file:///croot/typing_extensions_1715268824938/work tzdata==2024.1 uncertainties==3.2.2 uri-template==1.3.0 urllib3==2.2.2 wcwidth @ file:///Users/ktietz/demo/mc3/conda-bld/wcwidth_1629357192024/work webcolors==24.8.0 webencodings==0.5.1 websocket-client==1.8.0 wiimatch==0.3.2 zipp==3.20.0

Thanks!

@braingram
Copy link
Collaborator

Thanks!

Is this a windows install (or using the linux subsystem for windows)?

@Chrrrrris
Copy link
Author

Hi! I'm running the code on a remote server and I think it is Linux

@braingram
Copy link
Collaborator

Thanks again!

Of note, there is a section in the docs about multiprocessing:
https://jwst-pipeline.readthedocs.io/en/latest/jwst/user_documentation/running_pipeline_python.html#multiprocessing

It looks like you're using joblib to parallelize some code. Is it possible to share the script? The pipeline uses the fork_server multiprocessing mode and looking at the error message I wonder if joblib is trying to use spawn which is leading to an issue.

@Chrrrrris
Copy link
Author

Chrrrrris commented Aug 14, 2024

Yes! I'm using joblib.Parallel to parallelize the code
Parallel(n_jobs = nthreads)(delayed(function)(i, *parameters) for i in inputs). Also I want to note that when I ran the same code with previous versions of jwst, there was no problem. Is there anything changed in the way jwst pipeline parallelizes stuff for the latest version?

@braingram
Copy link
Collaborator

Would you try your script with the environment variable JOBLIB_START_METHOD=forkserver? joblib uses a custom multiprocessing backend which might be interfering with the backend used by the pipeline (see this related note in the joblib docs).

@braingram
Copy link
Collaborator

Also as noted in the linked jwst docs and the python docs the parallel calls will need to be guarded in a if __name__ == "__main__" block. Is this already the case?

@Chrrrrris
Copy link
Author

Thanks Brett, it is working now! Turns out I didn't have if __name__ == "__main__" before....

@braingram
Copy link
Collaborator

Nice! Thanks for helping sort out the issue. Now we have a searchable solution for other users. Good luck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants