Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] TypeError from training NvNMD QNN model (-s s2) with float precision #3961

Closed
jiongwalai opened this issue Jul 10, 2024 · 0 comments · Fixed by #3978
Closed

[BUG] TypeError from training NvNMD QNN model (-s s2) with float precision #3961

jiongwalai opened this issue Jul 10, 2024 · 0 comments · Fixed by #3978
Assignees
Labels

Comments

@jiongwalai
Copy link

Bug summary

When training NvNMD QNN model (-s s2) in version 2.2.11 trained with float precision (export DP_INTERFACE_PREC=low), the log showed that the data type of g_t is float64.

DEEPMD DEBUG #u: Tensor("u/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #rji: Tensor("rji/EnsureShape:0", shape=(?, 3), dtype=float32)
DEEPMD DEBUG #s_s: Tensor("s_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #h_s: Tensor("h_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #s: Tensor("s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #h: Tensor("h/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #Rxyz: Tensor("Rxyz/FltNvnmd:0", dtype=float32)
DEEPMD INFO use the compressible model with stripped type embedding
DEEPMD DEBUG #g_s: Tensor("filter_type_all/g_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #g_t: Tensor("filter_type_all/g_t/FltNvnmd:0", dtype=float64)

It seems that this variable doesn't do the data type conversion.

DeePMD-kit Version

v2.2.11

Backend and its version

TensorFlow v2.14.0

How did you download the software?

docker

Input Files, Running Commands, Error Log, etc.

DEEPMD INFO training without frame parameter
DEEPMD INFO data stating... (this step may take long time)
2024-07-10 02:45:59.699397: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:382] MLIR V1 optimization pass is not enabled
DEEPMD INFO built lr
DEEPMD INFO the range of s is [-0.0, 6.388733386993408]
DEEPMD DEBUG #u: Tensor("u/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #rji: Tensor("rji/EnsureShape:0", shape=(?, 3), dtype=float32)
DEEPMD DEBUG #s_s: Tensor("s_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #h_s: Tensor("h_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #s: Tensor("s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #h: Tensor("h/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #Rxyz: Tensor("Rxyz/FltNvnmd:0", dtype=float32)
DEEPMD INFO use the compressible model with stripped type embedding
DEEPMD DEBUG #g_s: Tensor("filter_type_all/g_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #g_t: Tensor("filter_type_all/g_t/FltNvnmd:0", dtype=float64)
Traceback (most recent call last):
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/op_def_library.py", line 551, in _ExtractInputsAndAttrs
values = ops.convert_to_tensor(
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/profiler/trace.py", line 183, in wrapped
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/ops.py", line 698, in convert_to_tensor
return tensor_conversion_registry.convert(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/tensor_conversion_registry.py", line 209, in convert
return overload(dtype, name) # pylint: disable=not-callable
^^^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/tensor.py", line 762, in tf_tensor
raise ValueError(
ValueError: w: Tensor conversion requested dtype float32 for Tensor with dtype float64: <tf.Tensor 'filter_type_all/g_t/EnsureShape:0' shape=(?, 32) dtype=float64>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/deepmd-kit/bin/dp", line 10, in
sys.exit(main())
^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd_utils/main.py", line 657, in main
deepmd_main(args)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/entrypoints/main.py", line 92, in main
train_nvnmd(**dict_args)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/nvnmd/entrypoints/train.py", line 187, in train_nvnmd
train(**jdata)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/entrypoints/train.py", line 168, in train
_do_work(jdata, run_opt, is_compress)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/entrypoints/train.py", line 280, in _do_work
model.build(train_data, stop_batch, origin_type_map=origin_type_map)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/train/trainer.py", line 308, in build
self._build_network(data, suffix)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/train/trainer.py", line 385, in _build_network
self.model_pred = self.model.build(
^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/model/ener.py", line 222, in build
dout = self.build_descrpt(
^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/model/model.py", line 290, in build_descrpt
dout = self.descrpt.build(
^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/descriptor/se_atten.py", line 626, in build
self.dout, self.qmat = self._pass_filter(
^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/descriptor/se_atten.py", line 685, in _pass_filter
layer, qmat = self._filter(
^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/common.py", line 258, in wrapper
returned_tensor = func(
^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/descriptor/se_atten.py", line 1269, in _filter
xyz_scatter_1 = self._filter_lower(
^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/descriptor/se_atten.py", line 1104, in _filter_lower
return filter_lower_R42GR(
^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/nvnmd/descriptor/se_atten.py", line 217, in filter_lower_R42GR
G = op_module.mul_flt_nvnmd(G, two_embd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 2276, in mul_flt_nvnmd
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/op_def_library.py", line 778, in _apply_op_helper
_ExtractInputsAndAttrs(op_type_name, op_def, allowed_list_attr_map,
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/op_def_library.py", line 589, in _ExtractInputsAndAttrs
raise TypeError(
TypeError: Input 'w' of 'MulFltNvnmd' Op has type float64 that does not match type float32 of argument 'x'.

Steps to Reproduce

export DP_INTERFACE_PREC=low; export OMP_NUM_THREADS=8; dp train-nvnmd cnn.json --skip-neighbor-stat -s s1 >> train.log 2>&1 ; dp train-nvnmd qnn.json --skip-neighbor-stat -s s2 >> train.log 2>&1

Further Information, Files, and Links

No response

@jiongwalai jiongwalai added the bug label Jul 10, 2024
LiuGroupHNU pushed a commit to LiuGroupHNU/deepmd-kit that referenced this issue Jul 12, 2024
LiuGroupHNU pushed a commit to LiuGroupHNU/deepmd-kit that referenced this issue Jul 14, 2024
github-merge-queue bot pushed a commit that referenced this issue Jul 18, 2024
fix float precision problem of se_atten in line 217.
fix the bug: the different energy between qnn and lammps

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Improved energy calculation methods for more accurate results in the
`wrap` module.
- Introduced new parameters for enhanced configurability in
energy-related computations.

- **Improvements**
- Enhanced handling and processing of energy shift arrays for better
performance and accuracy.
- Updated array manipulation and calculation methods for various
wrapping functionalities.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: LiuGroupHNU <liujie123@HNU>
Co-authored-by: MoPinghui <mopinghui1020@gmail.com>
Co-authored-by: Han Wang <92130845+wanghan-iapcm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pinghui Mo <pinghui_mo@outlook.com>
@njzjz njzjz closed this as completed Jul 18, 2024
mtaillefumier pushed a commit to mtaillefumier/deepmd-kit that referenced this issue Sep 18, 2024
… (deepmodeling#3978)

fix float precision problem of se_atten in line 217.
fix the bug: the different energy between qnn and lammps

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Improved energy calculation methods for more accurate results in the
`wrap` module.
- Introduced new parameters for enhanced configurability in
energy-related computations.

- **Improvements**
- Enhanced handling and processing of energy shift arrays for better
performance and accuracy.
- Updated array manipulation and calculation methods for various
wrapping functionalities.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: LiuGroupHNU <liujie123@HNU>
Co-authored-by: MoPinghui <mopinghui1020@gmail.com>
Co-authored-by: Han Wang <92130845+wanghan-iapcm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pinghui Mo <pinghui_mo@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants