Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to load model and inference CViT #1035

Open
ramdhan1989 opened this issue Dec 4, 2024 · 9 comments
Open

How to load model and inference CViT #1035

ramdhan1989 opened this issue Dec 4, 2024 · 9 comments

Comments

@ramdhan1989
Copy link

ramdhan1989 commented Dec 4, 2024

Hi,
I am trying to do inference using CViT. from training, I only have file with extension pdopt, pdparams, pdstates. how to load this file since inference method seems need file json??
thanks

@HydrogenSulfate
Copy link
Collaborator

HydrogenSulfate commented Dec 6, 2024

Let me briefly introduce the purpose of these suffixes:

  1. pdopt: Optimizer parameter file, stores optimizer.state_dict().
  2. pdparams: Model parameter file, stores model.state_dict().
  3. pdstates: Metric file, stores test metrics at a certain point in time, such as l2_rel, just a simple [str, float] dict.
  4. json and pdiparams: Inference model files(somewhat like .pb in tensorflow), where the JSON file stores the computational graph structure, and the pdiparams file stores the weight tensors involved in computations within the graph.

So if you need to inference, you need to first export CVit model to infernece model files, i.e. .json and .pdiparams, which can be done with python ns_cvit.py mode=export, and finally load these 2 files into paddle inference(python) and do prediction with paddle inference engine via: python ns_cvit.py mode=infer.

image

image

https://paddlescience-docs.readthedocs.io/zh-cn/latest/zh/examples/ns_cvit/#__tabbed_1_4

if you need to export your own models trained by your self, you can add extra CLI args:

# load your pretrained model and export to json+pdiparams
python ns_cvit.py mode=export INFER.pretrained_model_path=/path/to/your_model.pdparams

# do inference
python ns_cvit.py mode=infer

Note that we only provide the ns_cvit_small_8x8 pretrained model, so if you use other model config, please train it and specify the INFER.pretrained_model_path when exporting as same as decripted above

@ramdhan1989
Copy link
Author

ramdhan1989 commented Dec 15, 2024

Hi,
I got error when doing export as follow. Please advise
"Error executing job with overrides: ['mode=export', 'INFER.pretrained_model_path=C:/Users/Wibawa/DS/Transformers_for_modeling_physical_systems/PaddleScience/examples/ns/ns_cvit_pretrained.pdparams']
Traceback (most recent call last):
File "C:\Users\Wibawa\DS\Transformers_for_modeling_physical_systems\PaddleScience\examples\ns\ns_cvit.py", line 532, in main
export(cfg)
File "C:\Users\Wibawa\DS\Transformers_for_modeling_physical_systems\PaddleScience\examples\ns\ns_cvit.py", line 373, in export
solver.export(
File "C:\Users\Wibawa\DS\Transformers_for_modeling_physical_systems\PaddleScience\ppsci\utils\misc.py", line 542, in function_with_eval_state
result = func(self, *args, **kwargs)
File "C:\Users\Wibawa\DS\Transformers_for_modeling_physical_systems\PaddleScience\ppsci\solver\solver.py", line 928, in export
raise e
File "C:\Users\Wibawa\DS\Transformers_for_modeling_physical_systems\PaddleScience\ppsci\solver\solver.py", line 926, in export
jit.save(static_model, export_path, skip_prune_program=skip_prune_program)
File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\paddle\base\wrapped_decorator.py", line 40, in impl
return wrapped_func(*args, **kwargs)
File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\paddle\jit\api.py", line 895, in wrapper
func(layer, path, input_spec, **configs)
File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\paddle\base\wrapped_decorator.py", line 40, in impl
return wrapped_func(*args, **kwargs)
File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\paddle\base\dygraph\base.py", line 101, in impl
return func(*args, **kwargs)
File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\paddle\jit\api.py", line 1209, in save
static_func.concrete_program_specify_input_spec(
File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\paddle\jit\dy2static\program_translator.py", line 1026, in concrete_program_specify_input_spec
concrete_program, _ = self.get_concrete_program(
File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\paddle\jit\dy2static\program_translator.py", line 914, in get_concrete_program
concrete_program, partial_program_layer = self._program_cache[
File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\paddle\jit\dy2static\program_translator.py", line 1665, in getitem
self._caches[item_id] = self._build_once(item)
File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\paddle\jit\dy2static\program_translator.py", line 1603, in _build_once
concrete_program = ConcreteProgram.pir_from_func_spec(
File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\paddle\base\wrapped_decorator.py", line 40, in impl
return wrapped_func(*args, **kwargs)
File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\paddle\base\dygraph\base.py", line 101, in impl
return func(*args, **kwargs)
File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\paddle\jit\dy2static\program_translator.py", line 1276, in pir_from_func_spec
error_data.raise_new_exception()
File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\paddle\jit\dy2static\error.py", line 454, in raise_new_exception
raise new_exception from None
RuntimeError: In transformed code:

File "C:\Users\Wibawa\DS\Transformers_for_modeling_physical_systems\PaddleScience\ppsci\arch\cvit.py", line 1093, in forward
    y = self.forward_tensor(x, coords)
File "C:\Users\Wibawa\DS\Transformers_for_modeling_physical_systems\PaddleScience\ppsci\arch\cvit.py", line 1062, in forward_tensor
        coords = self.norm(coords)
    #print(coords.dtype)
    coords = einops.repeat(coords, "n d -> b n d", b=b)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE

    # process input function(encoder)

File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\einops\einops.py", line 641, in repeat
    return reduce(tensor, pattern, reduction="repeat", **axes_lengths)
File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\einops\einops.py", line 518, in reduce
    backend = get_backend(tensor)
File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\einops\_backends.py", line 59, in get_backend
    raise RuntimeError("Tensor type unknown to einops {}".format(type(tensor)))

RuntimeError: Tensor type unknown to einops <class 'paddle.base.libpaddle.pir.Value'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.">

@HydrogenSulfate
Copy link
Collaborator

Hi, I got error when doing export as follow. Please advise "Error executing job with overrides: ['mode=export', 'INFER.pretrained_model_path=C:/Users/Wibawa/DS/Transformers_for_modeling_physical_systems/PaddleScience/examples/ns/ns_cvit_pretrained.pdparams'] Traceback (most recent call last): File "C:\Users\Wibawa\DS\Transformers_for_modeling_physical_systems\PaddleScience\examples\ns\ns_cvit.py", line 532, in main export(cfg) File "C:\Users\Wibawa\DS\Transformers_for_modeling_physical_systems\PaddleScience\examples\ns\ns_cvit.py", line 373, in export solver.export( File "C:\Users\Wibawa\DS\Transformers_for_modeling_physical_systems\PaddleScience\ppsci\utils\misc.py", line 542, in function_with_eval_state result = func(self, *args, **kwargs) File "C:\Users\Wibawa\DS\Transformers_for_modeling_physical_systems\PaddleScience\ppsci\solver\solver.py", line 928, in export raise e File "C:\Users\Wibawa\DS\Transformers_for_modeling_physical_systems\PaddleScience\ppsci\solver\solver.py", line 926, in export jit.save(static_model, export_path, skip_prune_program=skip_prune_program) File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\decorator.py", line 232, in fun return caller(func, *(extras + args), **kw) File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\paddle\base\wrapped_decorator.py", line 40, in impl return wrapped_func(*args, **kwargs) File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\paddle\jit\api.py", line 895, in wrapper func(layer, path, input_spec, **configs) File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\decorator.py", line 232, in fun return caller(func, *(extras + args), **kw) File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\paddle\base\wrapped_decorator.py", line 40, in impl return wrapped_func(*args, **kwargs) File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\paddle\base\dygraph\base.py", line 101, in impl return func(*args, **kwargs) File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\paddle\jit\api.py", line 1209, in save static_func.concrete_program_specify_input_spec( File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\paddle\jit\dy2static\program_translator.py", line 1026, in concrete_program_specify_input_spec concrete_program, _ = self.get_concrete_program( File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\paddle\jit\dy2static\program_translator.py", line 914, in get_concrete_program concrete_program, partial_program_layer = self._program_cache[ File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\paddle\jit\dy2static\program_translator.py", line 1665, in getitem self._caches[item_id] = self._build_once(item) File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\paddle\jit\dy2static\program_translator.py", line 1603, in _build_once concrete_program = ConcreteProgram.pir_from_func_spec( File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\decorator.py", line 232, in fun return caller(func, *(extras + args), **kw) File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\paddle\base\wrapped_decorator.py", line 40, in impl return wrapped_func(*args, **kwargs) File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\paddle\base\dygraph\base.py", line 101, in impl return func(*args, **kwargs) File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\paddle\jit\dy2static\program_translator.py", line 1276, in pir_from_func_spec error_data.raise_new_exception() File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\paddle\jit\dy2static\error.py", line 454, in raise_new_exception raise new_exception from None RuntimeError: In transformed code:

File "C:\Users\Wibawa\DS\Transformers_for_modeling_physical_systems\PaddleScience\ppsci\arch\cvit.py", line 1093, in forward
    y = self.forward_tensor(x, coords)
File "C:\Users\Wibawa\DS\Transformers_for_modeling_physical_systems\PaddleScience\ppsci\arch\cvit.py", line 1062, in forward_tensor
        coords = self.norm(coords)
    #print(coords.dtype)
    coords = einops.repeat(coords, "n d -> b n d", b=b)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE

    # process input function(encoder)

File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\einops\einops.py", line 641, in repeat
    return reduce(tensor, pattern, reduction="repeat", **axes_lengths)
File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\einops\einops.py", line 518, in reduce
    backend = get_backend(tensor)
File "C:\Users\Wibawa\anaconda3\envs\py39\lib\site-packages\einops\_backends.py", line 59, in get_backend
    raise RuntimeError("Tensor type unknown to einops {}".format(type(tensor)))

RuntimeError: Tensor type unknown to einops <class 'paddle.base.libpaddle.pir.Value'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.">

Check if your paddlepaddle-gpu version is develop(nightly-build) or 3.0.0-b2

@ramdhan1989
Copy link
Author

hi,
this is my version 3.0.0-b2. please advise
image

thank you

@HydrogenSulfate
Copy link
Collaborator

hi, this is my version 3.0.0-b2. please advise image

thank you

Oh I have reprodue the error you met,

  1. Add self.paddle.base.libpaddle.pir.Value to /path/to/site-packages/einops/_backends.py
    image
  2. annotate all assert code (8 in total)in ppsci/arch/cvit.py
    image
    image

then cvit_ns can be exported successfully.

We will fix those bugs in PaddleScience and einops soonly

@ramdhan1989
Copy link
Author

ramdhan1989 commented Dec 16, 2024

Hi,
Thanks it's working now. Currently, I am using my dataset. I got this error, I believe it's from my dataloader. I would like to use input with two channel (second channel as external params) and output 1 channel. what is difference pred_steps and roll out? can i use one initial step to predict multisteps ahead? or do I need to several time steps as input? can I set pred_steps > 1 in config file (since I got error when setting the value more than 1)? please advise

`test data (1035, 10, 128, 128, 2) (1035, 4, 128, 128, 1)
[2024/12/15 23:51:10] ppsci INFO: Predicting batch 1/1
I1215 23:51:11.126005 33516 pir_interpreter.cc:1588] pir interpreter is running by trace mode ...
Error executing job with overrides: ['mode=infer']
Traceback (most recent call last):
File "C:\Users\Wibawa\DS\Transformers_for_modeling_physical_systems\PaddleScience\examples\ns\ns_cvit.py", line 534, in main
inference(cfg)
File "C:\Users\Wibawa\DS\Transformers_for_modeling_physical_systems\PaddleScience\examples\ns\ns_cvit.py", line 433, in inference
pred = rollout(
File "C:\Users\Wibawa\DS\Transformers_for_modeling_physical_systems\PaddleScience\examples\ns\ns_cvit.py", line 420, in rollout
pred = pred.reshape(b, pred_steps, h, w, c)
ValueError: cannot reshape array of size 131072 into shape (8,1,128,128,2)

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.`

Thank you
regards

@HydrogenSulfate
Copy link
Collaborator

pred_steps

  1. pred_steps means how many time steps outputs in one predicting, e.g. if your model is designed to predict [Tn, Tn+1] using [Tn-1, Tn-2, Tn-3], then the pred_steps is 2 and prev_steps is 3. So you need to prepare prev_steps data as inputs and will get subsequent pred_steps outputs.
  2. rollout functions means continuously inference in a sliding windows style.
  3. 131072 seems equal to 8x1x128x128x1, you can figure out the number of input and output channels and add necessary slice/concat during inference.

@ramdhan1989
Copy link
Author

ramdhan1989 commented Dec 17, 2024

Hi, I am doing inference right now using my dataset. The problem is input channel is 2 and output channel is 1. I have problem in this part
std = torch.tensor([torch.std(data[:,:,0]), torch.std(torch.tensor(visc))]) test data (1035, 10, 128, 128, 2) (1035, 4, 128, 128, 1) [2024/12/17 12:53:16] ppsci INFO: Predicting batch 1/1 I1217 12:53:17.007297 31780 pir_interpreter.cc:1588] pir interpreter is running by trace mode ... (8, 10, 128, 128, 2) (8, 1, 128, 128, 1) 10 [2024/12/17 12:53:17] ppsci INFO: Predicting batch 1/1 W1217 12:53:17.975394 31780 pir_interpreter.cc:1980] Instruction OP id: 19, Ir OP id: 3802, pd_op.conv3d raises an EnforceNotMet exception struct common::enforce::EnforceNotMet Error executing job with overrides: ['mode=infer'] Traceback (most recent call last): File "C:\Users\Wibawa\DS\Transformers_for_modeling_physical_systems\PaddleScience\examples\ns\ns_cvit.py", line 537, in main inference(cfg) File "C:\Users\Wibawa\DS\Transformers_for_modeling_physical_systems\PaddleScience\examples\ns\ns_cvit.py", line 436, in inference pred = rollout( File "C:\Users\Wibawa\DS\Transformers_for_modeling_physical_systems\PaddleScience\examples\ns\ns_cvit.py", line 414, in rollout pred = predictor.predict(input_dict, batch_size=None) File "C:\Users\Wibawa\DS\Transformers_for_modeling_physical_systems\PaddleScience\deploy\python_infer\pinn_predictor.py", line 171, in predict self.predictor.run() ValueError: In user code: InvalidArgumentError: The number of input's channels should be equal to filter's channels * groups for Op(Conv). But received: the input's channels is 1, the input's shape is [8, 10, 128, 128, 1]; the filter's channels is 2, the filter's shape is [768, 2, 1, 32, 32]; the groups is 1, the data_format is NDHWC. The error may come from wrong data_format setting. [Hint: Expected input_channels == filter_channels * groups, but received input_channels:1 != filter_channels * groups:2.] (at C:\home\workspace\Paddle\paddle\phi\infermeta\binary.cc:667) [operator < pd_kernel.phi_kernel > error]

I feel the model somehow takes channel 2 when creating model. This line might be a reason :
predictor = paddle_inference.create_predictor(config)
in config file in_dim is 2, may be the line above takes this argument. However, I did a trick for training to set up my problem because I can change in ppsci file.
model = ppsci.arch.CVit(**cfg.MODEL)
Unfortunately, I can't find where is paddle_inference.create_predictor located? is there any trick I can do to solve this problem? my goal is to get the result. is it possible to do inference by adapting the code from eval function (using ppsci)?

@HydrogenSulfate
Copy link
Collaborator

Hi, I am doing inference right now using my dataset. The problem is input channel is 2 and output channel is 1. I have problem in this part std = torch.tensor([torch.std(data[:,:,0]), torch.std(torch.tensor(visc))]) test data (1035, 10, 128, 128, 2) (1035, 4, 128, 128, 1) [2024/12/17 12:53:16] ppsci INFO: Predicting batch 1/1 I1217 12:53:17.007297 31780 pir_interpreter.cc:1588] pir interpreter is running by trace mode ... (8, 10, 128, 128, 2) (8, 1, 128, 128, 1) 10 [2024/12/17 12:53:17] ppsci INFO: Predicting batch 1/1 W1217 12:53:17.975394 31780 pir_interpreter.cc:1980] Instruction OP id: 19, Ir OP id: 3802, pd_op.conv3d raises an EnforceNotMet exception struct common::enforce::EnforceNotMet Error executing job with overrides: ['mode=infer'] Traceback (most recent call last): File "C:\Users\Wibawa\DS\Transformers_for_modeling_physical_systems\PaddleScience\examples\ns\ns_cvit.py", line 537, in main inference(cfg) File "C:\Users\Wibawa\DS\Transformers_for_modeling_physical_systems\PaddleScience\examples\ns\ns_cvit.py", line 436, in inference pred = rollout( File "C:\Users\Wibawa\DS\Transformers_for_modeling_physical_systems\PaddleScience\examples\ns\ns_cvit.py", line 414, in rollout pred = predictor.predict(input_dict, batch_size=None) File "C:\Users\Wibawa\DS\Transformers_for_modeling_physical_systems\PaddleScience\deploy\python_infer\pinn_predictor.py", line 171, in predict self.predictor.run() ValueError: In user code: InvalidArgumentError: The number of input's channels should be equal to filter's channels * groups for Op(Conv). But received: the input's channels is 1, the input's shape is [8, 10, 128, 128, 1]; the filter's channels is 2, the filter's shape is [768, 2, 1, 32, 32]; the groups is 1, the data_format is NDHWC. The error may come from wrong data_format setting. [Hint: Expected input_channels == filter_channels * groups, but received input_channels:1 != filter_channels * groups:2.] (at C:\home\workspace\Paddle\paddle\phi\infermeta\binary.cc:667) [operator < pd_kernel.phi_kernel > error]

I feel the model somehow takes channel 2 when creating model. This line might be a reason : predictor = paddle_inference.create_predictor(config) in config file in_dim is 2, may be the line above takes this argument. However, I did a trick for training to set up my problem because I can change in ppsci file. model = ppsci.arch.CVit(**cfg.MODEL) Unfortunately, I can't find where is paddle_inference.create_predictor located? is there any trick I can do to solve this problem? my goal is to get the result. is it possible to do inference by adapting the code from eval function (using ppsci)?

You can do inference via adapting eval code, just replace solver with predictor. https://github.com/PaddlePaddle/PaddleScience/blob/develop/examples/aneurysm/aneurysm.py#L354

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants