Fix bug in example causing config.py to crash on computers with no CUDA-enabled GPUs #55

serhatgktp · 2023-07-03T19:53:48Z

When running guardrails with the example configuration found in examples/llm/hf_pipeline_dolly, the following error will occur if the host machine does not have any CUDA-enabled GPUs:

│ /Users/username/anaconda3/lib/python3.10/site-packages/langchain/llms/huggingface_pipeline.py:106   │
│ in from_model_id                                                                                 │
│                                                                                                  │
│   103 │   │   │                                                                                  │
│   104 │   │   │   cuda_device_count = torch.cuda.device_count()                                  │
│   105 │   │   │   if device < -1 or (device >= cuda_device_count):                               │
│ ❱ 106 │   │   │   │   raise ValueError(                                                          │
│   107 │   │   │   │   │   f"Got device=={device}, "                                              │
│   108 │   │   │   │   │   f"device is required to be within [-1, {cuda_device_count})"           │
│   109 │   │   │   │   )                                                                          │
ValueError: Got device==0, device is required to be within [-1, 0)

This happens because the following lines in config.py attempt to select the "first" CUDA-enabled GPU on the device, when in reality there aren't any:

    # Using the first GPU
    device = 0

    llm = HuggingFacePipeline.from_model_id(
        model_id=repo_id, device=device, task="text-generation", model_kwargs=params
    )

However, this script runs fine if we simply don't specify which GPU the initializer should use, in which case it will revert to the default value. Thus, we may modify the example slightly so that it runs without crashing regardless of whether the host machine has a CUDA device:

device = 0 if torch.cuda.device_count() else -1

The caveat is that the example might seem a little bit more complex, but I think this is a better approach compared to setting device = -1 without checking for CUDA devices as the model will run significantly slower.

drazvan · 2023-07-11T08:11:59Z

Thanks for this @serhatgktp! Can you make sure the commit is signed (same as #51). Feel free to force-push and then I can merge.

serhatgktp · 2023-07-11T17:37:50Z

Hi @drazvan, the commit looks to be already signed on my end. Could you please check and let me know if you'd still like me to overwrite the commit?

Thanks!

drazvan · 2023-07-11T18:41:07Z

Your previous commit also had the GPG signature and showed as "Verified".

notice the "Verified" badge on the right of the commit

serhatgktp · 2023-07-11T19:01:34Z

Today I learned. I'll take care of this and drop a ping - thanks!

Signed-off-by: serhatgktp <efkan@ibm.com>

serhatgktp changed the title ~~Fix bug causing config.py to crash on computers with no CUDA-enabled GPUs~~ Fix bug in example causing config.py to crash on computers with no CUDA-enabled GPUs Jul 5, 2023

drazvan self-assigned this Jul 11, 2023

serhatgktp closed this Jul 11, 2023

serhatgktp force-pushed the fix-hfp-example branch from ae11980 to 40889af Compare July 11, 2023 19:52

Check for number of CUDA devices before specifying which one to use

ac04cd0

Signed-off-by: serhatgktp <efkan@ibm.com>

serhatgktp reopened this Jul 11, 2023

drazvan merged commit 237c067 into NVIDIA:main Jul 11, 2023
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bug in example causing config.py to crash on computers with no CUDA-enabled GPUs #55

Fix bug in example causing config.py to crash on computers with no CUDA-enabled GPUs #55

serhatgktp commented Jul 3, 2023 •

edited

Loading

drazvan commented Jul 11, 2023

serhatgktp commented Jul 11, 2023

drazvan commented Jul 11, 2023

serhatgktp commented Jul 11, 2023

Fix bug in example causing config.py to crash on computers with no CUDA-enabled GPUs #55

Fix bug in example causing config.py to crash on computers with no CUDA-enabled GPUs #55

Conversation

serhatgktp commented Jul 3, 2023 • edited Loading

drazvan commented Jul 11, 2023

serhatgktp commented Jul 11, 2023

drazvan commented Jul 11, 2023

serhatgktp commented Jul 11, 2023

serhatgktp commented Jul 3, 2023 •

edited

Loading