Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shared python object access across threads and lazy_static deadlocks #973

Closed
clegaard opened this issue Jun 13, 2020 · 10 comments
Closed

shared python object access across threads and lazy_static deadlocks #973

clegaard opened this issue Jun 13, 2020 · 10 comments

Comments

@clegaard
Copy link

When wrapping Python libraries in Rust it may be useful to share a static instance of an object across threads. For example, python's datetime library has a one-time initialization of an API object. Using lazy_static for accessing the object can cause a deadlock as described below:
rust-lang-nursery/lazy-static.rs#116

@davidhewitt mentioned that he has put some thought into general solutions for internal use.

@kngwyu
Copy link
Member

kngwyu commented Jun 13, 2020

Honestly, now I have no good solution for this.
E.g., to avoid this kind of deadlock, rust-numpy locks the thread after importing the module, which I really don't like.

Possible solutions in user codes are:

  • Do not use lock. E.g., using once_cell::unsync
  • Do not import any module while locking the thread.

@davidhewitt
Copy link
Member

I'll try to sketch out a draft PR with my idea tomorrow. I think there can be a pretty elegant solution using the Python token...

@davidhewitt
Copy link
Member

See #975

@clegaard
Copy link
Author

Amazing, thank you. I will try to check the branch and see if it resolves my specific issue.

@davidhewitt
Copy link
Member

👍 note that we could probably wrap it with a macro similar to lazy_static too!

@clegaard
Copy link
Author

Hi, I finally had the time to try it out. Using the OnceCell seem to do the trick, my tests no longer freeze when running in parallel :)

Being the rust noob that I tried to instantiate the object in a similar way to the datetime object:

static SLAVE_MANAGER_ONCE_CELL: OnceCell<PyObject> = OnceCell::new();

struct SlaveManagerApi {
}

static SLAVE_MANAGER : SlaveManagerApi = SlaveManagerApi {

};

impl Deref for SlaveManagerApi {

    type Target = PyObject;

    fn deref(&self) -> &'static PyObject {

        let py = unsafe {Python::assume_gil_acquired()};
        

        SLAVE_MANAGER_ONCE_CELL.get_or_init(py, || {
            let cls = || -> PyResult<PyObject> {
                let ctx: pyo3::PyObject = py
                    .import("pyfmu.fmi2.slaveContext")
                    .expect("Unable to import module declaring slave manager. Ensure that PyFMU is installed inside your current envrioment.")
                    .get("Fmi2SlaveContext")?
                    .call0()?
                    .extract()?;
                println!("{:?}", ctx);
                Ok(ctx)
            };
    
            match cls() {
                Err(e) => {
                    e.print_and_set_sys_last_vars(py);
                    panic!("Unable to instantiate slave manager");
                }
                Ok(o) => o,
            }
        })

        
    }
}

One thing that is bothering me is that it seems that multiple threads are accessing the python code, despite having acquired the GIL. Should this be possible?

@davidhewitt
Copy link
Member

One thing that is bothering me is that it seems that multiple threads are accessing the python code, despite having acquired the GIL. Should this be possible?

Mmm the use of Python::assume_gil_acquired is usually not a good idea unless you're really sure that the callers will always have the GIL acquired.

Also it's not unexpected that even with GIL acquired properly, multiple threads might start the initialization because py.import temporarily releases the GIL, which can allow another thread to start proceeding. Once one thread finishes init of the once cell, all threads will return that same value.

I suggest modifying your code to be this:

static SLAVE_MANAGER: OnceCell<PyObject> = OnceCell::new();

pub fn get_slave_manager(py: Python) -> &PyAny {
    SLAVE_MANAGER
        .get_or_init(py, || {
            let cls = || -> PyResult<PyObject> {
                let ctx: pyo3::PyObject = py
                    .import("pyfmu.fmi2.slaveContext")
                    .expect("Unable to import module declaring slave manager. Ensure that PyFMU is installed inside your current envrioment.")
                    .get("Fmi2SlaveContext")?
                    .call0()?
                    .extract()?;
                println!("{:?}", ctx);
                Ok(ctx)
            };

            cls().unwrap_or_else(|e| {
                e.print_and_set_sys_last_vars(py);
                panic!("Unable to instantiate slave manager");
            })
        })
        .as_ref(py)
}

If you prefer &'static PyObject, you could drop the .as_ref() on the last line.

@clegaard
Copy link
Author

Thanks for the feedback, I really appreciate it :)

Also it's not unexpected that even with GIL acquired properly, multiple threads might start the initialization because py.import temporarily releases the GIL, which can allow another thread to start proceeding. Once one thread finishes init of the once cell, all threads will return that same value.

I guess i overlooked the fact that the GIL may be released by the Python code being executed.
So, acquiring the GIL is not sufficient to ensure that a function is not preempted, like below?

fn foo() {
    let gil = Python::acquire_gil();
    let py = gil.python();
    get_slave_manager(py).call_method0("function_which_should_not_be_interrupted");
}

fn bar() {
    let gil = Python::acquire_gil();
    let py = gil.python();
    get_slave_manager(py).call_method0("function_which_should_not_be_interrupted");
}

So the options would be rewriting the Python code being called to ensure that it does not release the GIL at an inopportune time, or using a mutex in Rust?

@davidhewitt
Copy link
Member

acquiring the GIL is not sufficient to ensure that a function is not preempted

I'm not sure of the exact mechanics of what bytecodes the Python interpreter may choose to switch thread on.

So the options would be rewriting the Python code being called to ensure that it does not release the GIL at an inopportune time, or using a mutex in Rust?

It depends exactly why you don't want the interpreter to switch threads. If you need to guarantee only one thread runs the whole critical section, you should use a Mutex, yes. If you just need to guarantee ordering, a channel might be enough?

@clegaard
Copy link
Author

Rewriting the Python code such that the GIL is not released in a critical section solved the problem.
I think the OnceCell combined with a getter function provides a good solution.

Again thank you for all the help you have provided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants