-
Notifications
You must be signed in to change notification settings - Fork 105
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
86 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
## Installation | ||
|
||
The proof-of-concept works best on Linux x86-64. It also builds on macOS, Linux ARM64, and Windows (64-bit), but you will have to recompile extension modules yourself for these platforms. | ||
|
||
- [Install with pyenv](#install-with-pyenv) | ||
- [Run with Docker](#docker) | ||
- [Build from source](#build-from-source) | ||
|
||
## Install with pyenv | ||
|
||
First you need to install pyenv [on Linux](https://github.com/colesbury/nogil/wiki/Install-nogil-with-pyenv-on-Linux) or [on macOS](https://github.com/colesbury/nogil/wiki/Install-nogil-with-pyenv-on-macOS) . To install the "nogil-3.9.10-1" version: | ||
|
||
pyenv install nogil-3.9.10-1 | ||
|
||
You can make the nogil installation the default Python installation with: | ||
|
||
pyenv global nogil-3.9.10-1 | ||
|
||
## Docker | ||
|
||
A pre-built Docker image [nogil/python](https://hub.docker.com/r/nogil/python) is available on Docker Hub. For CUDA support, use [nogil/python-cuda](https://hub.docker.com/r/nogil/python-cuda). | ||
For arm64 devices, use [nogil/python-arm64](https://hub.docker.com/r/nogil/python-arm64). | ||
|
||
For example: | ||
|
||
docker run -it nogil/python | ||
|
||
## Build from source | ||
|
||
The build process has not changed from upstream CPython. See https://devguide.python.org/ for instructions on how to build from source, or follow the steps below. | ||
|
||
Install: | ||
|
||
./configure [--prefix=PREFIX] [--enable-optimizations] | ||
make -j | ||
make install | ||
|
||
The optional ``--prefix=PREFIX`` specifies the destination directory for the Python installation. The optional ``--enable-optimizations`` enables profile guided optimizations (PGO). This slows down the build process, but makes the compiled Python a bit faster. | ||
|
||
## Packages | ||
|
||
Use ``pip install <package>`` as usual to install packages. Please file an issue if you are unable to install a pip package you would like to use. | ||
|
||
The proof-of-concept comes with a modified bundled "pip" that includes an [alternative package index](https://d1yxz45j0ypngg.cloudfront.net/). The alternative package index includes C extensions that are either slow to build from source or require some modifications for compatibility. | ||
|
||
|
||
## GIL control | ||
|
||
The GIL is disabled by default, but if you wish, you can enable it at runtime using the environment variable ``PYTHONGIL=1``. You can check if the GIL is disabled from Python by accessing ``sys.flags.nogil``: | ||
|
||
python3 -c "import sys; print(sys.flags.nogil)" # True | ||
PYTHONGIL=1 python3 -c "import sys; print(sys.flags.nogil)" # False | ||
|
||
## Example | ||
|
||
You can use the existing Python APIs, such as the [threading](https://docs.python.org/3/library/threading.html) module and the [ThreadPoolExecutor](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor) class. | ||
|
||
Here is an example based on Larry Hastings's [Gilectomy benchmark](https://github.com/larryhastings/gilectomy/blob/gilectomy/x.py): | ||
|
||
```python | ||
import sys | ||
from concurrent.futures import ThreadPoolExecutor | ||
|
||
print(f"nogil={getattr(sys.flags, 'nogil', False)}") | ||
|
||
def fib(n): | ||
if n < 2: return 1 | ||
return fib(n-1) + fib(n-2) | ||
|
||
threads = 8 | ||
if len(sys.argv) > 1: | ||
threads = int(sys.argv[1]) | ||
|
||
with ThreadPoolExecutor(max_workers=threads) as executor: | ||
for _ in range(threads): | ||
executor.submit(lambda: print(fib(34))) | ||
``` | ||
|
||
Run it with, e.g.: | ||
|
||
time python3 fib.py 1 # 1 thread, 1x work | ||
time python3 fib.py 20 # 20 threads, 20x work | ||
|
||
The program parallelizes well up to the number of available cores. On a 20 core Intel Xeon E5-2698 v4 one thread takes 1.50 seconds and 20 threads take 1.52 seconds [1]. | ||
|
||
[1] Turbo boost was [disabled](https://askubuntu.com/questions/619875/disabling-intel-turbo-boost-in-ubuntu) to measure the scaling of the program without the effects of CPU frequency scaling. Additionally, you may get more reliable measurements by using [taskset](https://man7.org/linux/man-pages/man1/taskset.1.html) to avoid virtual "hyperthreading" cores. |