Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenVLA fails to start on NVIDIA AGX Orin #874

Open
1 task done
amirtaherin opened this issue Feb 24, 2025 · 0 comments
Open
1 task done

OpenVLA fails to start on NVIDIA AGX Orin #874

amirtaherin opened this issue Feb 24, 2025 · 0 comments
Labels
question Further information is requested

Comments

@amirtaherin
Copy link

Search before asking

  • I have searched the jetson-containers issues and found no similar feature requests.

Question

I am working on https://www.jetson-ai-lab.com/openvla.html#__tabbed_1_3 and tried to run it on my NVIDIA AGX Orin board.
I am facing an error that I am not able to resolve. I read a previous post i.e., #634 that reports a similar error. However, the solution to that issue did not work for me. I appreciate a resolution.

My log is:
jetson-containers run $(autotag nano_llm)
python3 -m nano_llm.vision.vla --api hf
--model openvla/openvla-7b
--dataset dusty-nv/bridge_orig_ep100
--dataset-type rlds
--max-episodes 10
--save-stats /data/benchmarks/openvla_bridge_fp16.json
Namespace(packages=['nano_llm'], prefer=['local', 'registry', 'build'], disable=[''], user='dustynv', output='/tmp/autotag', quiet=False, verbose=False)
-- L4T_VERSION=36.4.3 JETPACK_VERSION=6.2 CUDA_VERSION=12.6
-- Finding compatible container image for ['nano_llm']
dustynv/nano_llm:r36.4.0
V4L2_DEVICES:

DISPLAY environmental variable is already set: "localhost:10.0"

localuser:root being added to access control list
xhost: must be on local machine to add or remove hosts.

  • docker run --runtime nvidia -it --rm --network host --shm-size=8g --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/amir/projects/openVLA/jetson-containers/data:/data -v /etc/localtime:/etc/localtime:ro -v /etc/timezone:/etc/timezone:ro --device /dev/snd -e PULSE_SERVER=unix:/run/user/1000/pulse/native -v /run/user/1000/pulse:/run/user/1000/pulse --device /dev/bus/usb -e DISPLAY=localhost:10.0 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-3 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-6 --device /dev/i2c-7 --device /dev/i2c-8 --device /dev/i2c-9 --name jetson_container_20250223_210323 dustynv/nano_llm:r36.4.0 python3 -m nano_llm.vision.vla --api hf --model openvla/openvla-7b --dataset dusty-nv/bridge_orig_ep100 --dataset-type rlds --max-episodes 10 --save-stats /data/benchmarks/openvla_bridge_fp16.json
    /usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
    warnings.warn(
    /usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1142: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
    warnings.warn(
    Fetching 6 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 38716.65it/s]
    21:03:37 | INFO | Load dataset info from /data/datasets/huggingface/datasets--dusty-nv--bridge_orig_ep100/snapshots/f2b661dd5d67de43b7368a4018e549a7d8893d04/1.0.0
    21:03:37 | INFO | Creating a tf.data.Dataset reading 1 files located in folders: /data/datasets/huggingface/datasets--dusty-nv--bridge_orig_ep100/snapshots/f2b661dd5d67de43b7368a4018e549a7d8893d04/1.0.0.
    21:03:37 | INFO | Constructing tf.data.Dataset bridge_orig_ep100 for split train[:11], from /data/datasets/huggingface/datasets--dusty-nv--bridge_orig_ep100/snapshots/f2b661dd5d67de43b7368a4018e549a7d8893d04/1.0.0
    21:03:37 | SUCCESS | TFDSDataset | loaded bridge_orig_ep100 from /data/datasets/huggingface/datasets--dusty-nv--bridge_orig_ep100/snapshots/f2b661dd5d67de43b7368a4018e549a7d8893d04 (records=11)
    2025-02-23 21:03:37.565945: W tensorflow/core/kernels/data/cache_dataset_ops.cc:913] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to dataset.cache().take(k).repeat(). You should use dataset.take(k).cache().repeat() instead.
    2025-02-23 21:03:37.695238: W tensorflow/core/kernels/data/cache_dataset_ops.cc:913] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to dataset.cache().take(k).repeat(). You should use dataset.take(k).cache().repeat() instead.
    21:03:37 | SUCCESS | RLDSDataset | loaded bridge_orig_ep100 - episode format:
    { 'action': [7],
    'cameras': ['image'],
    'image_size': (224, 224, 3),
    'observation': { 'image': ((224, 224, 3), dtype('uint8')),
    'state': ((7,), dtype('float32'))},
    'step': [ 'action',
    'is_first',
    'is_last',
    'language_instruction',
    'observation']}
    Fetching 15 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 11503.85it/s]
    Fetching 18 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 37635.83it/s]
    21:03:37 | INFO | loading /data/models/huggingface/models--openvla--openvla-7b/snapshots/31f090d05236101ebfc381b61c674dd4746d4ce0 with HF
    21:03:37 | WARNING | AWQ not installed (requires JetPack 6 / L4T R36) - AWQ models will fail to initialize
    Traceback (most recent call last):
    File "/opt/NanoLLM/nano_llm/nano_llm.py", line 337, in init
    self.tokenizer = AutoTokenizer.from_pretrained(self.model_path, use_fast=True, trust_remote_code=True)
    File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 773, in from_pretrained
    config = AutoConfig.from_pretrained(
    File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 1100, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
    File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 634, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
    File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 689, in _get_config_dict
    resolved_config_file = cached_file(
    File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 356, in cached_file
    raise EnvironmentError(
    OSError: /data/models/huggingface/models--openvla--openvla-7b/snapshots/31f090d05236101ebfc381b61c674dd4746d4ce0/llm does not appear to have a file named config.json. Checkout 'https://huggingface.co//data/models/huggingface/models--openvla--openvla-7b/snapshots/31f090d05236101ebfc381b61c674dd4746d4ce0/llm/None' for available files.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/opt/NanoLLM/nano_llm/vision/vla.py", line 446, in
vla_process_dataset(**{**vars(args), 'dataset': dataset})
File "/opt/NanoLLM/nano_llm/vision/vla.py", line 296, in vla_process_dataset
model = NanoLLM.from_pretrained(model, **kwargs)
File "/opt/NanoLLM/nano_llm/nano_llm.py", line 94, in from_pretrained
model = HFModel(model_path, **kwargs)
File "/opt/NanoLLM/nano_llm/models/hf.py", line 27, in init
super(HFModel, self).init(model_path, **kwargs)
File "/opt/NanoLLM/nano_llm/nano_llm.py", line 339, in init
self.tokenizer = AutoTokenizer.from_pretrained(self.model_path, use_fast=False, trust_remote_code=True)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 773, in from_pretrained
config = AutoConfig.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 1100, in from_pretrained
config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 634, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 689, in _get_config_dict
resolved_config_file = cached_file(
File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 356, in cached_file
raise EnvironmentError(
OSError: /data/models/huggingface/models--openvla--openvla-7b/snapshots/31f090d05236101ebfc381b61c674dd4746d4ce0/llm does not appear to have a file named config.json. Checkout 'https://huggingface.co//data/models/huggingface/models--openvla--openvla-7b/snapshots/31f090d05236101ebfc381b61c674dd4746d4ce0/llm/None' for available files.

Additional

No response

@amirtaherin amirtaherin added the question Further information is requested label Feb 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant