OpenVLA fails to start on NVIDIA AGX Orin #874

amirtaherin · 2025-02-24T02:12:55Z

Search before asking

I have searched the jetson-containers issues and found no similar feature requests.

Question

I am working on https://www.jetson-ai-lab.com/openvla.html#__tabbed_1_3 and tried to run it on my NVIDIA AGX Orin board.
I am facing an error that I am not able to resolve. I read a previous post i.e., #634 that reports a similar error. However, the solution to that issue did not work for me. I appreciate a resolution.

My log is:
jetson-containers run $(autotag nano_llm)
python3 -m nano_llm.vision.vla --api hf
--model openvla/openvla-7b
--dataset dusty-nv/bridge_orig_ep100
--dataset-type rlds
--max-episodes 10
--save-stats /data/benchmarks/openvla_bridge_fp16.json
Namespace(packages=['nano_llm'], prefer=['local', 'registry', 'build'], disable=[''], user='dustynv', output='/tmp/autotag', quiet=False, verbose=False)
-- L4T_VERSION=36.4.3 JETPACK_VERSION=6.2 CUDA_VERSION=12.6
-- Finding compatible container image for ['nano_llm']
dustynv/nano_llm:r36.4.0
V4L2_DEVICES:

DISPLAY environmental variable is already set: "localhost:10.0"

localuser:root being added to access control list
xhost: must be on local machine to add or remove hosts.

docker run --runtime nvidia -it --rm --network host --shm-size=8g --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/amir/projects/openVLA/jetson-containers/data:/data -v /etc/localtime:/etc/localtime:ro -v /etc/timezone:/etc/timezone:ro --device /dev/snd -e PULSE_SERVER=unix:/run/user/1000/pulse/native -v /run/user/1000/pulse:/run/user/1000/pulse --device /dev/bus/usb -e DISPLAY=localhost:10.0 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-3 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-6 --device /dev/i2c-7 --device /dev/i2c-8 --device /dev/i2c-9 --name jetson_container_20250223_210323 dustynv/nano_llm:r36.4.0 python3 -m nano_llm.vision.vla --api hf --model openvla/openvla-7b --dataset dusty-nv/bridge_orig_ep100 --dataset-type rlds --max-episodes 10 --save-stats /data/benchmarks/openvla_bridge_fp16.json
/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1142: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
Fetching 6 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 38716.65it/s]
21:03:37 | INFO | Load dataset info from /data/datasets/huggingface/datasets--dusty-nv--bridge_orig_ep100/snapshots/f2b661dd5d67de43b7368a4018e549a7d8893d04/1.0.0
21:03:37 | INFO | Creating a tf.data.Dataset reading 1 files located in folders: /data/datasets/huggingface/datasets--dusty-nv--bridge_orig_ep100/snapshots/f2b661dd5d67de43b7368a4018e549a7d8893d04/1.0.0.
21:03:37 | INFO | Constructing tf.data.Dataset bridge_orig_ep100 for split train[:11], from /data/datasets/huggingface/datasets--dusty-nv--bridge_orig_ep100/snapshots/f2b661dd5d67de43b7368a4018e549a7d8893d04/1.0.0
21:03:37 | SUCCESS | TFDSDataset | loaded bridge_orig_ep100 from /data/datasets/huggingface/datasets--dusty-nv--bridge_orig_ep100/snapshots/f2b661dd5d67de43b7368a4018e549a7d8893d04 (records=11)
2025-02-23 21:03:37.565945: W tensorflow/core/kernels/data/cache_dataset_ops.cc:913] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to dataset.cache().take(k).repeat(). You should use dataset.take(k).cache().repeat() instead.
2025-02-23 21:03:37.695238: W tensorflow/core/kernels/data/cache_dataset_ops.cc:913] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to dataset.cache().take(k).repeat(). You should use dataset.take(k).cache().repeat() instead.
21:03:37 | SUCCESS | RLDSDataset | loaded bridge_orig_ep100 - episode format:
{ 'action': [7],
'cameras': ['image'],
'image_size': (224, 224, 3),
'observation': { 'image': ((224, 224, 3), dtype('uint8')),
'state': ((7,), dtype('float32'))},
'step': [ 'action',
'is_first',
'is_last',
'language_instruction',
'observation']}
Fetching 15 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 11503.85it/s]
Fetching 18 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 37635.83it/s]
21:03:37 | INFO | loading /data/models/huggingface/models--openvla--openvla-7b/snapshots/31f090d05236101ebfc381b61c674dd4746d4ce0 with HF
21:03:37 | WARNING | AWQ not installed (requires JetPack 6 / L4T R36) - AWQ models will fail to initialize
Traceback (most recent call last):
File "/opt/NanoLLM/nano_llm/nano_llm.py", line 337, in init
self.tokenizer = AutoTokenizer.from_pretrained(self.model_path, use_fast=True, trust_remote_code=True)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 773, in from_pretrained
config = AutoConfig.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 1100, in from_pretrained
config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 634, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 689, in _get_config_dict
resolved_config_file = cached_file(
File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 356, in cached_file
raise EnvironmentError(
OSError: /data/models/huggingface/models--openvla--openvla-7b/snapshots/31f090d05236101ebfc381b61c674dd4746d4ce0/llm does not appear to have a file named config.json. Checkout 'https://huggingface.co//data/models/huggingface/models--openvla--openvla-7b/snapshots/31f090d05236101ebfc381b61c674dd4746d4ce0/llm/None' for available files.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/opt/NanoLLM/nano_llm/vision/vla.py", line 446, in
vla_process_dataset(**{**vars(args), 'dataset': dataset})
File "/opt/NanoLLM/nano_llm/vision/vla.py", line 296, in vla_process_dataset
model = NanoLLM.from_pretrained(model, **kwargs)
File "/opt/NanoLLM/nano_llm/nano_llm.py", line 94, in from_pretrained
model = HFModel(model_path, **kwargs)
File "/opt/NanoLLM/nano_llm/models/hf.py", line 27, in init
super(HFModel, self).init(model_path, **kwargs)
File "/opt/NanoLLM/nano_llm/nano_llm.py", line 339, in init
self.tokenizer = AutoTokenizer.from_pretrained(self.model_path, use_fast=False, trust_remote_code=True)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 773, in from_pretrained
config = AutoConfig.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 1100, in from_pretrained
config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 634, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 689, in _get_config_dict
resolved_config_file = cached_file(
File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 356, in cached_file
raise EnvironmentError(
OSError: /data/models/huggingface/models--openvla--openvla-7b/snapshots/31f090d05236101ebfc381b61c674dd4746d4ce0/llm does not appear to have a file named config.json. Checkout 'https://huggingface.co//data/models/huggingface/models--openvla--openvla-7b/snapshots/31f090d05236101ebfc381b61c674dd4746d4ce0/llm/None' for available files.

Additional

No response

The text was updated successfully, but these errors were encountered:

amirtaherin added the question Further information is requested label Feb 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenVLA fails to start on NVIDIA AGX Orin #874

OpenVLA fails to start on NVIDIA AGX Orin #874

amirtaherin commented Feb 24, 2025

OpenVLA fails to start on NVIDIA AGX Orin #874

OpenVLA fails to start on NVIDIA AGX Orin #874

Comments

amirtaherin commented Feb 24, 2025

Search before asking

Question

DISPLAY environmental variable is already set: "localhost:10.0"

Additional