Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Error when loading a model with the se_a_mask descriptor #3928

Closed
wanghan-iapcm opened this issue Jun 29, 2024 Discussed in #3924 · 1 comment · Fixed by #3930
Closed

Bug: Error when loading a model with the se_a_mask descriptor #3928

wanghan-iapcm opened this issue Jun 29, 2024 Discussed in #3924 · 1 comment · Fixed by #3930
Assignees
Labels
bug reproduced This bug has been reproduced by developers

Comments

@wanghan-iapcm
Copy link
Collaborator

Discussed in #3924

Originally posted by lukasbaldauf June 28, 2024
Hi All,
I want to evaluate a trained model using the se_a_mask descriptor, but I'm encountering an error. The training goes smoothly and I get good accuracies for my system, I'm just having trouble loading the model. I get the same error when evaluating the trained zinc_protein example system (see the error message below). It seems like something related to "dfparam" and "daparam", where the Tensors are missing.

I get the same errors for deepmd versions 2.2.7 and 2.2.10, and tensorflow versions 2.9.0 and 2.15.0.

For the zinc example, I train and freeze the model as such:

dp train zinc_se_a_mask.json --skip-neighbor-stat
dp freeze -o graph.mask.pb

The problem occurs when I want to load the model:

from deepmd.infer import DeepPot
model = DeepPot("graph.mask.pb")

Traceback (most recent call last):
File "", line 1, in
File "/home/lukasb/miniforge3/envs/deepmd_gpu_2.2.7/lib/python3.10/site-packages/deepmd/infer/deep_pot.py", line 156, in init
self._get_tensor(tensor_name, attr_name)
File "/home/lukasb/miniforge3/envs/deepmd_gpu_2.2.7/lib/python3.10/site-packages/deepmd/infer/deep_eval.py", line 165, in _get_tensor
tensor = self.graph.get_tensor_by_name(tensor_path)
File "/home/lukasb/miniforge3/envs/deepmd_gpu_2.2.7/lib/python3.10/site-packages/tensorflow/python/framework/ops.py", line 4128, in get_tensor_by_name
return self.as_graph_element(name, allow_tensor=True, allow_operation=False)
File "/home/lukasb/miniforge3/envs/deepmd_gpu_2.2.7/lib/python3.10/site-packages/tensorflow/python/framework/ops.py", line 3952, in as_graph_element
return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
File "/home/lukasb/miniforge3/envs/deepmd_gpu_2.2.7/lib/python3.10/site-packages/tensorflow/python/framework/ops.py", line 3992, in _as_graph_element_locked
raise KeyError("The name %s refers to a Tensor which does not "
KeyError: "The name 'load/fitting_attr/dfparam:0' refers to a Tensor which does not exist. The operation, 'load/fitting_attr/dfparam', does not exist in the graph."

@wanghan-iapcm wanghan-iapcm added the reproduced This bug has been reproduced by developers label Jun 29, 2024
@njzjz njzjz added failed to reproduce bug reproduced This bug has been reproduced by developers and removed reproduced This bug has been reproduced by developers failed to reproduce labels Jun 29, 2024
@njzjz
Copy link
Member

njzjz commented Jun 29, 2024

Reproduced in v2.2.10 but failed to reproduce in devel.

njzjz added a commit to njzjz/deepmd-kit that referenced this issue Jun 29, 2024
Fix deepmodeling#3928. Prevent `fitting_attr` from becoming `fitting_attr_1`.

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
@njzjz njzjz self-assigned this Jun 29, 2024
github-merge-queue bot pushed a commit that referenced this issue Jul 2, 2024
…_attr_1` (#3930)

Fix #3928. Prevent `fitting_attr` from becoming `fitting_attr_1`.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Refactor**
- Improved TensorFlow variable scope management by switching to
`tf.AUTO_REUSE` to streamline code and reduce the likelihood of variable
reuse conflicts.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
njzjz added a commit to njzjz/deepmd-kit that referenced this issue Jul 2, 2024
…_attr_1` (deepmodeling#3930)

Fix deepmodeling#3928. Prevent `fitting_attr` from becoming `fitting_attr_1`.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Refactor**
- Improved TensorFlow variable scope management by switching to
`tf.AUTO_REUSE` to streamline code and reduce the likelihood of variable
reuse conflicts.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
(cherry picked from commit e809e64)
@njzjz njzjz closed this as completed Jul 2, 2024
njzjz added a commit that referenced this issue Jul 3, 2024
…_attr_1` (#3930)

Fix #3928. Prevent `fitting_attr` from becoming `fitting_attr_1`.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Refactor**
- Improved TensorFlow variable scope management by switching to
`tf.AUTO_REUSE` to streamline code and reduce the likelihood of variable
reuse conflicts.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
(cherry picked from commit e809e64)
mtaillefumier pushed a commit to mtaillefumier/deepmd-kit that referenced this issue Sep 18, 2024
…_attr_1` (deepmodeling#3930)

Fix deepmodeling#3928. Prevent `fitting_attr` from becoming `fitting_attr_1`.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Refactor**
- Improved TensorFlow variable scope management by switching to
`tf.AUTO_REUSE` to streamline code and reduce the likelihood of variable
reuse conflicts.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug reproduced This bug has been reproduced by developers
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants