Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenVLA does not work on Jetson AGX orin #634

Closed
nirty-fai opened this issue Sep 17, 2024 · 3 comments
Closed

OpenVLA does not work on Jetson AGX orin #634

nirty-fai opened this issue Sep 17, 2024 · 3 comments

Comments

@nirty-fai
Copy link

I am trying to follow the tutorial here - https://www.jetson-ai-lab.com/openvla.html. I get the error message attached.

I am able to run the nanoLLM demo here - https://www.jetson-ai-lab.com/tutorial_nano-llm.html. I have acces to LLama models (2, 3) in the huggingface.

jetson-containers run $(autotag nano_llm)   python3 -m nano_llm.vision.vla --api mlc     --model openvla/openvla-7b     --quantization q4f16_ft     --dataset dusty-nv/bridge_orig_ep100     --dataset-type rlds     --max-episodes 10     --save-stats /data/benchmarks/openvla_bridge_int4.json
Namespace(packages=['nano_llm'], prefer=['local', 'registry', 'build'], disable=[''], user='dustynv', output='/tmp/autotag', quiet=False, verbose=False)
-- L4T_VERSION=36.3.0  JETPACK_VERSION=6.0  CUDA_VERSION=12.2
-- Finding compatible container image for ['nano_llm']
[sudo] password for <user>: 
dustynv/nano_llm:r36.3.0
localuser:root being added to access control list
+ sudo docker run --runtime nvidia -it --rm --network host --shm-size=8g --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/<user>/jetson-containers/data:/data --device /dev/snd --device /dev/bus/usb -e DISPLAY=:0 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-3 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-6 --device /dev/i2c-7 --device /dev/i2c-8 --device /dev/i2c-9 dustynv/nano_llm:r36.3.0 python3 -m nano_llm.vision.vla --api mlc --model openvla/openvla-7b --quantization q4f16_ft --dataset dusty-nv/bridge_orig_ep100 --dataset-type rlds --max-episodes 10 --save-stats /data/benchmarks/openvla_bridge_int4.json
/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
Fetching 6 files:   0%|                                                                                                                                                                                                                                                                                                                                       | 0/6 [00:00<?, ?it/s]/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Fetching 6 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 6772.29it/s]
23:07:21 | INFO | Load dataset info from /data/datasets/huggingface/datasets--dusty-nv--bridge_orig_ep100/snapshots/f2b661dd5d67de43b7368a4018e549a7d8893d04/1.0.0
23:07:21 | INFO | Creating a tf.data.Dataset reading 1 files located in folders: /data/datasets/huggingface/datasets--dusty-nv--bridge_orig_ep100/snapshots/f2b661dd5d67de43b7368a4018e549a7d8893d04/1.0.0.
23:07:21 | INFO | Constructing tf.data.Dataset bridge_orig_ep100 for split train[:11], from /data/datasets/huggingface/datasets--dusty-nv--bridge_orig_ep100/snapshots/f2b661dd5d67de43b7368a4018e549a7d8893d04/1.0.0
23:07:21 | SUCCESS | TFDSDataset | loaded bridge_orig_ep100 from /data/datasets/huggingface/datasets--dusty-nv--bridge_orig_ep100/snapshots/f2b661dd5d67de43b7368a4018e549a7d8893d04 (records=11)
2024-09-17 23:07:21.335497: W tensorflow/core/kernels/data/cache_dataset_ops.cc:913] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2024-09-17 23:07:21.472187: W tensorflow/core/kernels/data/cache_dataset_ops.cc:913] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
23:07:21 | SUCCESS | RLDSDataset | loaded bridge_orig_ep100 - episode format:
{ 'action': [7],
  'cameras': ['image'],
  'image_size': (224, 224, 3),
  'observation': { 'image': ((224, 224, 3), dtype('uint8')),
                   'state': ((7,), dtype('float32'))},
  'step': [ 'action',
            'is_first',
            'is_last',
            'language_instruction',
            'observation']}
Fetching 15 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 8298.98it/s]
Fetching 18 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 20333.28it/s]
23:07:21 | INFO | loading /data/models/huggingface/models--openvla--openvla-7b/snapshots/31f090d05236101ebfc381b61c674dd4746d4ce0 with MLC
Traceback (most recent call last):
  File "/opt/NanoLLM/nano_llm/nano_llm.py", line 337, in __init__
    self.tokenizer = AutoTokenizer.from_pretrained(self.model_path, use_fast=True, trust_remote_code=True)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 854, in from_pretrained
    config = AutoConfig.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 976, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 632, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 689, in _get_config_dict
    resolved_config_file = cached_file(
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 373, in cached_file
    raise EnvironmentError(
OSError: /data/models/huggingface/models--openvla--openvla-7b/snapshots/31f090d05236101ebfc381b61c674dd4746d4ce0/llm does not appear to have a file named config.json. Checkout 'https://huggingface.co//data/models/huggingface/models--openvla--openvla-7b/snapshots/31f090d05236101ebfc381b61c674dd4746d4ce0/llm/tree/None' for available files.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/opt/NanoLLM/nano_llm/vision/vla.py", line 446, in <module>
    vla_process_dataset(**{**vars(args), 'dataset': dataset})
  File "/opt/NanoLLM/nano_llm/vision/vla.py", line 296, in vla_process_dataset
    model = NanoLLM.from_pretrained(model, **kwargs)
  File "/opt/NanoLLM/nano_llm/nano_llm.py", line 91, in from_pretrained
    model = MLCModel(model_path, **kwargs)
  File "/opt/NanoLLM/nano_llm/models/mlc.py", line 50, in __init__
    super(MLCModel, self).__init__(model_path, **kwargs)
  File "/opt/NanoLLM/nano_llm/nano_llm.py", line 339, in __init__
    self.tokenizer = AutoTokenizer.from_pretrained(self.model_path, use_fast=False, trust_remote_code=True)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 854, in from_pretrained
    config = AutoConfig.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 976, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 632, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 689, in _get_config_dict
    resolved_config_file = cached_file(
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 373, in cached_file
    raise EnvironmentError(
OSError: /data/models/huggingface/models--openvla--openvla-7b/snapshots/31f090d05236101ebfc381b61c674dd4746d4ce0/llm does not appear to have a file named config.json. Checkout 'https://huggingface.co//data/models/huggingface/models--openvla--openvla-7b/snapshots/31f090d05236101ebfc381b61c674dd4746d4ce0/llm/tree/None' for available files.




@dusty-nv
Copy link
Owner

Hi @nirty-fai ! Can you try deleting your /data/models/huggingface/models--openvla--openvla-7b/snapshots/31f090d05236101ebfc381b61c674dd4746d4ce0/llm folder? And can you check that the model got downloaded. You can try running it first with --api=hf (unquantized)

@Gandinator123
Copy link

Hi @nirty-fai ! Can you try deleting your /data/models/huggingface/models--openvla--openvla-7b/snapshots/31f090d05236101ebfc381b61c674dd4746d4ce0/llm folder? And can you check that the model got downloaded. You can try running it first with --api=hf (unquantized)

Deleting the folder worked for me! Thanks both for pointing this out.

@BigJohnn
Copy link

BigJohnn commented Nov 27, 2024

Jetson Orin NX 16GB, similar problem but have some different ...

python3 -m nano_llm.vision.vla --api mlc     --model openvla/openvla-7b     --quantization q8f16_ft     --dataset dusty-nv/bridge_orig_ep100     --dataset-type rlds     --max-episodes 10     --save-stats /data/benchmarks/openvla_bridge_fp8.json
/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
19:32:15 | INFO | Load dataset info from /data/datasets/huggingface/datasets--dusty-nv--bridge_orig_ep100/snapshots/f2b661dd5d67de43b7368a4018e549a7d8893d04/1.0.0
19:32:15 | INFO | Creating a tf.data.Dataset reading 1 files located in folders: /data/datasets/huggingface/datasets--dusty-nv--bridge_orig_ep100/snapshots/f2b661dd5d67de43b7368a4018e549a7d8893d04/1.0.0.
19:32:15 | INFO | Constructing tf.data.Dataset bridge_orig_ep100 for split train[:11], from /data/datasets/huggingface/datasets--dusty-nv--bridge_orig_ep100/snapshots/f2b661dd5d67de43b7368a4018e549a7d8893d04/1.0.0
19:32:15 | SUCCESS | TFDSDataset | loaded bridge_orig_ep100 from /data/datasets/huggingface/datasets--dusty-nv--bridge_orig_ep100/snapshots/f2b661dd5d67de43b7368a4018e549a7d8893d04 (records=11)
2024-11-27 19:32:15.624473: W tensorflow/core/kernels/data/cache_dataset_ops.cc:913] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2024-11-27 19:32:15.780591: W tensorflow/core/kernels/data/cache_dataset_ops.cc:913] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
19:32:15 | SUCCESS | RLDSDataset | loaded bridge_orig_ep100 - episode format:
{ 'action': [7],
  'cameras': ['image'],
  'image_size': (224, 224, 3),
  'observation': { 'image': ((224, 224, 3), dtype('uint8')),
                   'state': ((7,), dtype('float32'))},
  'step': [ 'action',
            'is_first',
            'is_last',
            'language_instruction',
            'observation']}


19:36:37 | INFO | loading /data/models/huggingface/models--openvla--openvla-7b/snapshots/31f090d05236101ebfc381b61c674dd4746d4ce0 with MLC
['/data/models/mlc/dist/openvla-7b/ctx4096/openvla-7b-q8f16_ft/mlc-chat-config.json', '/data/models/mlc/dist/openvla-7b/ctx4096/openvla-7b-q8f16_ft/params/mlc-chat-config.json']
19:36:38 | INFO | running MLC quantization:

python3 -m mlc_llm.build --model /data/models/mlc/dist/models/openvla-7b --quantization q8f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 4096 --artifact-path /data/models/mlc/dist/openvla-7b/ctx4096 --use-safetensors 


Using path "/data/models/mlc/dist/models/openvla-7b" for model "openvla-7b"
Target configured: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
bin_idx_path==/data/models/mlc/dist/models/openvla-7b/model.safetensors.index.json
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/build.py", line 47, in <module>
    main()
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/build.py", line 43, in main
    core.build_model_from_args(parsed_args)
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/core.py", line 876, in build_model_from_args
    param_manager.init_torch_pname_to_bin_name(args.use_safetensors)
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/relax_model/param_manager.py", line 292, in init_torch_pname_to_bin_name
    mapping = load_torch_pname2binname_map(
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/relax_model/param_manager.py", line 984, in load_torch_pname2binname_map
    raise ValueError("Multiple weight shard files without json map is not supported")
ValueError: Multiple weight shard files without json map is not supported
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/opt/NanoLLM/nano_llm/vision/vla.py", line 446, in <module>
    vla_process_dataset(**{**vars(args), 'dataset': dataset})
  File "/opt/NanoLLM/nano_llm/vision/vla.py", line 296, in vla_process_dataset
    model = NanoLLM.from_pretrained(model, **kwargs)
  File "/opt/NanoLLM/nano_llm/nano_llm.py", line 91, in from_pretrained
    model = MLCModel(model_path, **kwargs)
  File "/opt/NanoLLM/nano_llm/models/mlc.py", line 60, in __init__
    quant = MLCModel.quantize(self.model_path, self.config, method=quantization, max_context_len=max_context_len, **kwargs)
  File "/opt/NanoLLM/nano_llm/models/mlc.py", line 276, in quantize
    subprocess.run(cmd, executable='/bin/bash', shell=True, check=True)  
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'python3 -m mlc_llm.build --model /data/models/mlc/dist/models/openvla-7b --quantization q8f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 4096 --artifact-path /data/models/mlc/dist/openvla-7b/ctx4096 --use-safetensors ' returned non-zero exit status 1

As it says, there is no related .json file within /data/models/mlc/dist/models/openvla-7b, so I have copied some from somewhere else, like this

cp -L /data/models/huggingface/models--openvla--openvla-7b/snapshots/31f090d05236101ebfc381b61c674dd4746d4ce0/*.json /data/models/mlc/dist/models/openvla-7b

But then new errors occured,

python3 -m nano_llm.vision.vla --api mlc     --model openvla/openvla-7b     --quantization q8f16_ft     --dataset dusty-nv/bridge_orig_ep100     --dataset-type rlds     --max-episodes 10     --save-stats /data/benchmarks/openvla_bridge_fp8.json
/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
20:04:49 | INFO | Load dataset info from /data/datasets/huggingface/datasets--dusty-nv--bridge_orig_ep100/snapshots/f2b661dd5d67de43b7368a4018e549a7d8893d04/1.0.0
20:04:49 | INFO | Creating a tf.data.Dataset reading 1 files located in folders: /data/datasets/huggingface/datasets--dusty-nv--bridge_orig_ep100/snapshots/f2b661dd5d67de43b7368a4018e549a7d8893d04/1.0.0.
20:04:49 | INFO | Constructing tf.data.Dataset bridge_orig_ep100 for split train[:11], from /data/datasets/huggingface/datasets--dusty-nv--bridge_orig_ep100/snapshots/f2b661dd5d67de43b7368a4018e549a7d8893d04/1.0.0
20:04:49 | SUCCESS | TFDSDataset | loaded bridge_orig_ep100 from /data/datasets/huggingface/datasets--dusty-nv--bridge_orig_ep100/snapshots/f2b661dd5d67de43b7368a4018e549a7d8893d04 (records=11)
2024-11-27 20:04:49.533554: W tensorflow/core/kernels/data/cache_dataset_ops.cc:913] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2024-11-27 20:04:49.737170: W tensorflow/core/kernels/data/cache_dataset_ops.cc:913] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
20:04:49 | SUCCESS | RLDSDataset | loaded bridge_orig_ep100 - episode format:
{ 'action': [7],
  'cameras': ['image'],
  'image_size': (224, 224, 3),
  'observation': { 'image': ((224, 224, 3), dtype('uint8')),
                   'state': ((7,), dtype('float32'))},
  'step': [ 'action',
            'is_first',
            'is_last',
            'language_instruction',
            'observation']}

20:09:10 | INFO | loading /data/models/huggingface/models--openvla--openvla-7b/snapshots/31f090d05236101ebfc381b61c674dd4746d4ce0 with MLC
['/data/models/mlc/dist/openvla-7b/ctx4096/openvla-7b-q8f16_ft/mlc-chat-config.json', '/data/models/mlc/dist/openvla-7b/ctx4096/openvla-7b-q8f16_ft/params/mlc-chat-config.json']
20:09:12 | INFO | running MLC quantization:

python3 -m mlc_llm.build --model /data/models/mlc/dist/models/openvla-7b --quantization q8f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 4096 --artifact-path /data/models/mlc/dist/openvla-7b/ctx4096 --use-safetensors 


Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/build.py", line 47, in <module>
    main()
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/build.py", line 41, in main
    parsed_args = core._parse_args(parsed_args)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/core.py", line 444, in _parse_args
    parsed = _setup_model_path(parsed)
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/core.py", line 494, in _setup_model_path
    validate_config(args.model_path)
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/core.py", line 538, in validate_config
    config["model_type"] in utils.supported_model_types
AssertionError: Model type openvla not supported.
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/opt/NanoLLM/nano_llm/vision/vla.py", line 446, in <module>
    vla_process_dataset(**{**vars(args), 'dataset': dataset})
  File "/opt/NanoLLM/nano_llm/vision/vla.py", line 296, in vla_process_dataset
    model = NanoLLM.from_pretrained(model, **kwargs)
  File "/opt/NanoLLM/nano_llm/nano_llm.py", line 91, in from_pretrained
    model = MLCModel(model_path, **kwargs)
  File "/opt/NanoLLM/nano_llm/models/mlc.py", line 60, in __init__
    quant = MLCModel.quantize(self.model_path, self.config, method=quantization, max_context_len=max_context_len, **kwargs)
  File "/opt/NanoLLM/nano_llm/models/mlc.py", line 276, in quantize
    subprocess.run(cmd, executable='/bin/bash', shell=True, check=True)  
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'python3 -m mlc_llm.build --model /data/models/mlc/dist/models/openvla-7b --quantization q8f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 4096 --artifact-path /data/models/mlc/dist/openvla-7b/ctx4096 --use-safetensors ' returned non-zero exit status 1.

...
Maybe I shouldn't copy files from /data/models/huggingface/models--openvla--openvla-7b/snapshots/31f090d05236101ebfc381b61c674dd4746d4ce0 ?


When I tried not to use MLC, another error occurred =. =

python3 -m nano_llm.vision.vla --api hf \
    --model openvla/openvla-7b \
    --dataset dusty-nv/bridge_orig_ep100 \
    --dataset-type rlds \
    --max-episodes 10 \
    --save-stats /data/benchmarks/openvla_bridge_fp16.json
/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
21:06:19 | INFO | Load dataset info from /data/datasets/huggingface/datasets--dusty-nv--bridge_orig_ep100/snapshots/f2b661dd5d67de43b7368a4018e549a7d8893d04/1.0.0
21:06:19 | INFO | Creating a tf.data.Dataset reading 1 files located in folders: /data/datasets/huggingface/datasets--dusty-nv--bridge_orig_ep100/snapshots/f2b661dd5d67de43b7368a4018e549a7d8893d04/1.0.0.
21:06:19 | INFO | Constructing tf.data.Dataset bridge_orig_ep100 for split train[:11], from /data/datasets/huggingface/datasets--dusty-nv--bridge_orig_ep100/snapshots/f2b661dd5d67de43b7368a4018e549a7d8893d04/1.0.0
21:06:19 | SUCCESS | TFDSDataset | loaded bridge_orig_ep100 from /data/datasets/huggingface/datasets--dusty-nv--bridge_orig_ep100/snapshots/f2b661dd5d67de43b7368a4018e549a7d8893d04 (records=11)
2024-11-27 21:06:19.986484: W tensorflow/core/kernels/data/cache_dataset_ops.cc:913] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2024-11-27 21:06:20.236710: W tensorflow/core/kernels/data/cache_dataset_ops.cc:913] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
21:06:20 | SUCCESS | RLDSDataset | loaded bridge_orig_ep100 - episode format:
{ 'action': [7],
  'cameras': ['image'],
  'image_size': (224, 224, 3),
  'observation': { 'image': ((224, 224, 3), dtype('uint8')),
                   'state': ((7,), dtype('float32'))},
  'step': [ 'action',
            'is_first',
            'is_last',
            'language_instruction',
            'observation']}
21:06:20 | INFO | loading /data/models/huggingface/models--openvla--openvla-7b/snapshots/31f090d05236101ebfc381b61c674dd4746d4ce0 with HF
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/opt/NanoLLM/nano_llm/vision/vla.py", line 446, in <module>
    vla_process_dataset(**{**vars(args), 'dataset': dataset})
  File "/opt/NanoLLM/nano_llm/vision/vla.py", line 296, in vla_process_dataset
    model = NanoLLM.from_pretrained(model, **kwargs)
  File "/opt/NanoLLM/nano_llm/nano_llm.py", line 94, in from_pretrained
    model = HFModel(model_path, **kwargs)
  File "/opt/NanoLLM/nano_llm/models/hf.py", line 47, in __init__
    self.model = AutoModelForCausalLM.from_pretrained(self.model_path,
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 567, in from_pretrained
    raise ValueError(
ValueError: Unrecognized configuration class <class 'transformers_modules.llm.configuration_prismatic.OpenVLAConfig'> for this kind of AutoModel: AutoModelForCausalLM.
Model type should be one of BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, CamembertConfig, LlamaConfig, CodeGenConfig, CohereConfig, CpmAntConfig, CTRLConfig, Data2VecTextConfig, DbrxConfig, ElectraConfig, ErnieConfig, FalconConfig, FuyuConfig, GemmaConfig, Gemma2Config, GitConfig, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, JambaConfig, JetMoeConfig, LlamaConfig, MambaConfig, Mamba2Config, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MistralConfig, MixtralConfig, MptConfig, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, NemotronConfig, OlmoConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PersimmonConfig, PhiConfig, Phi3Config, PLBartConfig, ProphetNetConfig, QDQBertConfig, Qwen2Config, Qwen2MoeConfig, RecurrentGemmaConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, Speech2Text2Config, StableLmConfig, Starcoder2Config, TransfoXLConfig, TrOCRConfig, WhisperConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants