Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

virtualenv.cli_run is not thread-safe #2516

Closed
radoering opened this issue Mar 12, 2023 · 0 comments · Fixed by #2517
Closed

virtualenv.cli_run is not thread-safe #2516

radoering opened this issue Mar 12, 2023 · 0 comments · Fixed by #2517
Assignees
Labels

Comments

@radoering
Copy link
Contributor

Issue

I noticed that creating multiple virtual environments in parallel via virtualenv.cli_run() is not thread-safe. When running the following code snippet (see details), you can experience a FileNotFoundError for a json file at pypa/virtualenv/py_info/1 if the underlying interpreter has been updated so that the json file is removed and written again.

Environment

Provide at least:

  • OS: Windows 10

  • pip list of the host python where virtualenv is installed:

    packages
    Package                       Version
    ----------------------------- -------------------------------
    alabaster                     0.7.13
    attrs                         22.2.0
    Babel                         2.12.1
    beautifulsoup4                4.11.2
    certifi                       2022.12.7
    charset-normalizer            3.1.0
    click                         8.1.3
    click-default-group           1.2.2
    colorama                      0.4.6
    covdefaults                   2.3.0
    coverage                      7.2.1
    coverage-enable-subprocess    1.0
    distlib                       0.3.6
    docutils                      0.19
    exceptiongroup                1.1.0
    filelock                      3.9.0
    flaky                         3.7.0
    freezegun                     1.2.2
    furo                          2022.12.7
    future                        0.18.3
    idna                          3.4
    imagesize                     1.4.1
    importlib-metadata            6.0.0
    incremental                   22.10.0
    iniconfig                     2.0.0
    Jinja2                        3.1.2
    MarkupSafe                    2.1.2
    packaging                     23.0
    pip                           23.0.1
    platformdirs                  3.1.1
    pluggy                        1.0.0
    proselint                     0.13.0
    Pygments                      2.14.0
    pytest                        7.2.2
    pytest-env                    0.8.1
    pytest-freezegun              0.4.2
    pytest-mock                   3.10.0
    pytest-randomly               3.12.0
    pytest-timeout                2.1.0
    python-dateutil               2.8.2
    requests                      2.28.2
    setuptools                    67.4.0
    six                           1.16.0
    snowballstemmer               2.2.0
    soupsieve                     2.4
    Sphinx                        6.1.3
    sphinx-argparse               0.4.0
    sphinx-basic-ng               1.0.0b1
    sphinxcontrib-applehelp       1.0.4
    sphinxcontrib-devhelp         1.0.2
    sphinxcontrib-htmlhelp        2.0.1
    sphinxcontrib-jsmath          1.0.1
    sphinxcontrib-qthelp          1.0.3
    sphinxcontrib-serializinghtml 1.1.5
    sphinxcontrib-towncrier       0.3.2a0
    tomli                         2.0.1
    towncrier                     22.12.0
    urllib3                       1.26.15
    virtualenv                    20.20.1.dev3+g0398e5b.d20230312
    wheel                         0.38.4
    zipp                          3.15.0

Details

test script
import concurrent.futures
import shutil
import tempfile
import traceback
from pathlib import Path

import virtualenv


def create_venv():
    with tempfile.TemporaryDirectory() as tmp_dir:
        virtualenv.cli_run(["--python", str(env_exe), tmp_dir])


base_venv = Path("base_venv")
if base_venv.exists():
    shutil.rmtree(base_venv)
virtualenv.cli_run([str(base_venv)])
env_exe = base_venv / "Scripts" / "python.exe"

with concurrent.futures.ThreadPoolExecutor() as executor:
    tasks = []
    for _ in range(4):
        tasks.append(executor.submit(create_venv))
    concurrent.futures.wait(tasks)
    for task in tasks:
        try:
            task.result()
        except Exception:
            traceback.print_exc()

The failure is at

which is called by

It made me wonder that two threads can be in this section because it is protected by a lock at

with py_info_store.locked():

After some digging, I found an interesting statement considering filelock in tox-dev/filelock#22 (comment):

Since each filelock instance defines a reentrant lock, two different instances (whether they are in the same process/thread or not) will block during locking when the other instance already holds the lock. But locking the same instance twice while not block.

As you may have already guessed, the same instance is used for each thread in virtualenv due to the _lock_store.

I also verified the cited statement:

One `FileLock` per thread (thread-safe):
import concurrent.futures
import threading
import time
import traceback

from filelock import FileLock


def acquire_lock():
    lock = FileLock("_lock")
    thread_id = threading.current_thread().ident
    with lock:
        print(f'Thread {thread_id} acquired {lock.lock_file}')
        time.sleep(1)
    print(f'Thread {thread_id} released {lock.lock_file}')


with concurrent.futures.ThreadPoolExecutor() as executor:
    tasks = []
    for _ in range(2):
        tasks.append(executor.submit(acquire_lock))
    concurrent.futures.wait(tasks)
    for task in tasks:
        try:
            task.result()
        except Exception:
            traceback.print_exc()

Output (one thread per time in critical section):

Thread 17520 acquired _lock
Thread 17520 released _lock
Thread 20224 acquired _lock
Thread 20224 released _lock
One `FileLock` for all threads (not thread-safe):
import concurrent.futures
import threading
import time
import traceback

from filelock import FileLock


def acquire_lock(lock):
    thread_id = threading.current_thread().ident
    with lock:
        print(f'Thread {thread_id} acquired {lock.lock_file}')
        time.sleep(1)
    print(f'Thread {thread_id} released {lock.lock_file}')


with concurrent.futures.ThreadPoolExecutor() as executor:
    tasks = []
    lock = FileLock("_lock")
    for _ in range(2):
        tasks.append(executor.submit(acquire_lock, lock))
    concurrent.futures.wait(tasks)
    for task in tasks:
        try:
            task.result()
        except Exception:
            traceback.print_exc()

Output (both threads at the same time in critical section):

Thread 20496 acquired _lock
Thread 14296 acquired _lock
Thread 14296 released _lock
Thread 20496 released _lock
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
1 participant