shared python object access across threads and lazy_static deadlocks #973

clegaard · 2020-06-13T07:44:42Z

When wrapping Python libraries in Rust it may be useful to share a static instance of an object across threads. For example, python's datetime library has a one-time initialization of an API object. Using lazy_static for accessing the object can cause a deadlock as described below:
rust-lang-nursery/lazy-static.rs#116

@davidhewitt mentioned that he has put some thought into general solutions for internal use.

kngwyu · 2020-06-13T11:20:57Z

Honestly, now I have no good solution for this.
E.g., to avoid this kind of deadlock, rust-numpy locks the thread after importing the module, which I really don't like.

Possible solutions in user codes are:

Do not use lock. E.g., using once_cell::unsync
Do not import any module while locking the thread.

davidhewitt · 2020-06-13T17:34:18Z

I'll try to sketch out a draft PR with my idea tomorrow. I think there can be a pretty elegant solution using the Python token...

davidhewitt · 2020-06-14T15:36:18Z

See #975

clegaard · 2020-06-14T15:49:07Z

Amazing, thank you. I will try to check the branch and see if it resolves my specific issue.

davidhewitt · 2020-06-14T15:54:11Z

👍 note that we could probably wrap it with a macro similar to lazy_static too!

clegaard · 2020-06-16T15:46:22Z

Hi, I finally had the time to try it out. Using the OnceCell seem to do the trick, my tests no longer freeze when running in parallel :)

Being the rust noob that I tried to instantiate the object in a similar way to the datetime object:

static SLAVE_MANAGER_ONCE_CELL: OnceCell<PyObject> = OnceCell::new();

struct SlaveManagerApi {
}

static SLAVE_MANAGER : SlaveManagerApi = SlaveManagerApi {

};

impl Deref for SlaveManagerApi {

    type Target = PyObject;

    fn deref(&self) -> &'static PyObject {

        let py = unsafe {Python::assume_gil_acquired()};
        

        SLAVE_MANAGER_ONCE_CELL.get_or_init(py, || {
            let cls = || -> PyResult<PyObject> {
                let ctx: pyo3::PyObject = py
                    .import("pyfmu.fmi2.slaveContext")
                    .expect("Unable to import module declaring slave manager. Ensure that PyFMU is installed inside your current envrioment.")
                    .get("Fmi2SlaveContext")?
                    .call0()?
                    .extract()?;
                println!("{:?}", ctx);
                Ok(ctx)
            };
    
            match cls() {
                Err(e) => {
                    e.print_and_set_sys_last_vars(py);
                    panic!("Unable to instantiate slave manager");
                }
                Ok(o) => o,
            }
        })

        
    }
}

One thing that is bothering me is that it seems that multiple threads are accessing the python code, despite having acquired the GIL. Should this be possible?

davidhewitt · 2020-06-16T21:51:51Z

One thing that is bothering me is that it seems that multiple threads are accessing the python code, despite having acquired the GIL. Should this be possible?

Mmm the use of Python::assume_gil_acquired is usually not a good idea unless you're really sure that the callers will always have the GIL acquired.

Also it's not unexpected that even with GIL acquired properly, multiple threads might start the initialization because py.import temporarily releases the GIL, which can allow another thread to start proceeding. Once one thread finishes init of the once cell, all threads will return that same value.

I suggest modifying your code to be this:

static SLAVE_MANAGER: OnceCell<PyObject> = OnceCell::new();

pub fn get_slave_manager(py: Python) -> &PyAny {
    SLAVE_MANAGER
        .get_or_init(py, || {
            let cls = || -> PyResult<PyObject> {
                let ctx: pyo3::PyObject = py
                    .import("pyfmu.fmi2.slaveContext")
                    .expect("Unable to import module declaring slave manager. Ensure that PyFMU is installed inside your current envrioment.")
                    .get("Fmi2SlaveContext")?
                    .call0()?
                    .extract()?;
                println!("{:?}", ctx);
                Ok(ctx)
            };

            cls().unwrap_or_else(|e| {
                e.print_and_set_sys_last_vars(py);
                panic!("Unable to instantiate slave manager");
            })
        })
        .as_ref(py)
}

If you prefer &'static PyObject, you could drop the .as_ref() on the last line.

clegaard · 2020-06-17T08:30:49Z

Thanks for the feedback, I really appreciate it :)

Also it's not unexpected that even with GIL acquired properly, multiple threads might start the initialization because py.import temporarily releases the GIL, which can allow another thread to start proceeding. Once one thread finishes init of the once cell, all threads will return that same value.

I guess i overlooked the fact that the GIL may be released by the Python code being executed.
So, acquiring the GIL is not sufficient to ensure that a function is not preempted, like below?

fn foo() {
    let gil = Python::acquire_gil();
    let py = gil.python();
    get_slave_manager(py).call_method0("function_which_should_not_be_interrupted");
}

fn bar() {
    let gil = Python::acquire_gil();
    let py = gil.python();
    get_slave_manager(py).call_method0("function_which_should_not_be_interrupted");
}

So the options would be rewriting the Python code being called to ensure that it does not release the GIL at an inopportune time, or using a mutex in Rust?

davidhewitt · 2020-06-17T08:52:50Z

acquiring the GIL is not sufficient to ensure that a function is not preempted

I'm not sure of the exact mechanics of what bytecodes the Python interpreter may choose to switch thread on.

So the options would be rewriting the Python code being called to ensure that it does not release the GIL at an inopportune time, or using a mutex in Rust?

It depends exactly why you don't want the interpreter to switch threads. If you need to guarantee only one thread runs the whole critical section, you should use a Mutex, yes. If you just need to guarantee ordering, a channel might be enough?

clegaard · 2020-06-17T12:41:58Z

Rewriting the Python code such that the GIL is not released in a critical section solved the problem.
I think the OnceCell combined with a getter function provides a good solution.

Again thank you for all the help you have provided.

davidhewitt mentioned this issue Jun 15, 2020

Add pyo3::once_cell::GILOnceCell #975

Merged

clegaard closed this as completed Jun 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

shared python object access across threads and lazy_static deadlocks #973

shared python object access across threads and lazy_static deadlocks #973

clegaard commented Jun 13, 2020

kngwyu commented Jun 13, 2020 •

edited

Loading

davidhewitt commented Jun 13, 2020

davidhewitt commented Jun 14, 2020

clegaard commented Jun 14, 2020

davidhewitt commented Jun 14, 2020

clegaard commented Jun 16, 2020

davidhewitt commented Jun 16, 2020

clegaard commented Jun 17, 2020

davidhewitt commented Jun 17, 2020

clegaard commented Jun 17, 2020

shared python object access across threads and lazy_static deadlocks #973

shared python object access across threads and lazy_static deadlocks #973

Comments

clegaard commented Jun 13, 2020

kngwyu commented Jun 13, 2020 • edited Loading

davidhewitt commented Jun 13, 2020

davidhewitt commented Jun 14, 2020

clegaard commented Jun 14, 2020

davidhewitt commented Jun 14, 2020

clegaard commented Jun 16, 2020

davidhewitt commented Jun 16, 2020

clegaard commented Jun 17, 2020

davidhewitt commented Jun 17, 2020

clegaard commented Jun 17, 2020

kngwyu commented Jun 13, 2020 •

edited

Loading