how to inference a .nemo file which is converted from a HuggingFace format? #11478

zhaoyang-star · 2024-12-05T08:49:36Z

1. HF -> nemo

Convert Qwen2.5-7B to Qwen2.5-7B.nemo. After tar xvf, the Qwen2.5-7B.nemo is as following:

root@inp16075348439349544319-3-1:/mnt/tenant-home_speed/nemo_models# tar xvf Qwen2.5-7B.nemo 
./
./model_config.yaml
./model_weights/
./model_weights/.metadata
./model_weights/__0_0.distcp
./model_weights/__0_1.distcp
./model_weights/common.pt
./model_weights/metadata.json

2. Inference using `Qwen2.5-7B.nemo`

Then I tried to do inference using Qwen2.5-7B.nemo.

python3 megatron_gpt_eval.py \
            gpt_model_file=Qwen2.5-7B.nemo \
            inference.greedy=True \
            inference.add_BOS=True \
            trainer.devices=1 \
            trainer.num_nodes=1 \
            tensor_model_parallel_size=1 \
            pipeline_model_parallel_size=1 \
            prompts='["who are you?", "What is the captial of China?"]'

The response seems wrong. Part of the output:

[NeMo I 2024-12-04 21:57:07 nlp_overrides:1386] Model MegatronGPTModel was successfully restored from Qwen2.5-7B.nemo.
prompt=========:['who are you?', 'What is the captial of China?']
setting number of microbatches to constant 1
***************************
{'sentences': ['who are you?1000000000000000000000000000000000', 'What is the captial of China?100000000000000000000000000000'], 'tokens': [['<|im_start|>', 'who', 'Ġare', 'Ġyou', '?', '1', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0'], ['<|im_start|>', 'What', 'Ġis', 'Ġthe', 'Ġcapt', 'ial', 'Ġof', 'ĠChina', '?', '1', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0']], 'logprob': None, 'full_logprob': None, 'token_ids': [[151644, 14623, 525, 498, 30, 16, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15], [151644, 3838, 374, 279, 6427, 530, 315, 5616, 30, 16, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15]], 'offsets': [[0, 0, 3, 7, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45], [0, 0, 4, 7, 11, 16, 19, 22, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58]]}
***************************
[NeMo I 2024-12-04 21:57:14 megatron_gpt_model:1717] Pipeline model parallel rank: 0, Tensor model parallel rank: 0, Number of model parameters on device: 7.62e+09. Number of precise model parameters on device: 7615616512.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
    
Predicting DataLoader 0:   0%|                                                                                                                                          | 0/1 [00:00<?, ?it/s]setting number of microbatches to constant 1
Predicting DataLoader 0: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  0.51it/s]
***************************
[{'sentences': ['who are you?1000000000000000000000000000000000', 'What is the captial of China?100000000000000000000000000000'], 'tokens': [['<|im_start|>', 'who', 'Ġare', 'Ġyou', '?', '1', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0'], ['<|im_start|>', 'What', 'Ġis', 'Ġthe', 'Ġcapt', 'ial', 'Ġof', 'ĠChina', '?', '1', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0']], 'logprob': None, 'full_logprob': None, 'token_ids': [[151644, 14623, 525, 498, 30, 16, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15], [151644, 3838, 374, 279, 6427, 530, 315, 5616, 30, 16, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15]], 'offsets': [[0, 0, 3, 7, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45], [0, 0, 4, 7, 11, 16, 19, 22, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58]]}]
***************************

The text was updated successfully, but these errors were encountered:

hhd52859 · 2024-12-10T09:05:44Z

same problem with llama3.1

jianyuheng · 2024-12-24T05:59:58Z

same problem here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to inference a .nemo file which is converted from a HuggingFace format? #11478

how to inference a .nemo file which is converted from a HuggingFace format? #11478

zhaoyang-star commented Dec 5, 2024

hhd52859 commented Dec 10, 2024

jianyuheng commented Dec 24, 2024

how to inference a .nemo file which is converted from a HuggingFace format? #11478

how to inference a .nemo file which is converted from a HuggingFace format? #11478

Comments

zhaoyang-star commented Dec 5, 2024

1. HF -> nemo

2. Inference using Qwen2.5-7B.nemo

hhd52859 commented Dec 10, 2024

jianyuheng commented Dec 24, 2024

2. Inference using `Qwen2.5-7B.nemo`