[CodeStyle][NPU] use np.testing.assert_allclose instead of self.assertTrue(np.allclose(...)) (part 1) #44988

SigureMo · 2022-08-08T12:21:38Z

PR types

Others

PR changes

Others

Describe

使用 np.testing.assert_allclose(...) 替换 self.assertTrue(np.allclose(...))，以优化单测报错信息

Changes

self.assertTrue(np.allclose(...)) -> np.testing.assert_allclose(...) ^autofix（自动跳过因为 shape 问题失败的 case）
等价情况 self.assertEqual(np.allclose(...), True) -> np.testing.assert_allclose(...) ^autofix
等价情况 self.assertTrue(numpy.allclose(...)) -> np.testing.assert_allclose(...) ^autofix
部分原来代码里的错误使用 self.assertTrue(np.allclose(...), True) 移除 True ^autofix
等价情况 self.assertTrue(np.isclose(...).all()) -> np.testing.assert_allclose(...) ^autofix
语法树替换后有些 1e-4 变成 0.0001 了，明显 1e-4 更好一些（包含比 1e-4 小的，比 1e-4 大的暂不替换）
清理冗余的 err_msg，部分 err_msg 只是打印了 a 和 b，np.testing.allclose 本就会打印它们，因此可以清理
部分在 rtol=0.5 的情况下 CI 仍然不过时，手动对这些测试增加 atol=1e-8（仍然是保持默认行为），未设过 1e-8 之外的数值
少许没能成功跳过的 shape error，如果确定是静态图返回值问题直接加 [0]，如果暂时不确定是不是静态图返回值问题则暂时回退修改，留到下个 PR 处理

目前直接替换后发现大量测试无法通过（375），这基本是没有办法手动一个个修改的……问题主要在以下几个方面：

精度误差（本 PR 将集中解决本问题）

这是由于两者默认值不同
- np.allclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False)
- np.testing.assert_allclose(actual, desired, rtol=1e-07, atol=0, equal_nan=True, err_msg='', verbose=True)
明显 np.testing.assert_allclose 精度要求更高，所以这里暂时 np.allclose 无 rtol（即设为默认值）时将值设为 1e-5，使其与原来行为一致，atol 暂时不改看看情况

修改后测试失败大大降低（375 -> 195），基本降了一半，余量已经很少了，经检查，余量大多都是些误差非常小的（1e-10 以下），针对这些手动加上 atol=1e-8 ~~应可解决~~即可解决

shape 不对齐

本 PR 暂时不关注本问题，将在下一个 PR 详细描述～

这是由于 np.allclose 在比较时会自动 broadcast，而 np.testing.allclose 不会，因此需要手动对这些数据进行检查及修改

修改精度问题后，本问题占了 90% 以上，也就是将近 200 个需要考虑本问题，逐个手动修复，需要进一步评估成本

目前发现的问题：

静态图返回是一个 list，但有的开发者直接将返回值进行比较，这会在比较时认为静态图的 shape 比预期值多一维度，如下面这个（找个问题好麻烦呀……），该问题占了很大一部分

Paddle/python/paddle/fluid/tests/unittests/test_softmax2d.py

Lines 39 to 41 in c91aace

    
               res = exe.run(feed={'X': self.x_np}, fetch_list=[out]) 
        
           out_ref = ref_softmax(self.x_np, self.axis) 
        
           self.assertTrue(np.allclose(out_ref, res))

误操作，这里的 expected_dx 不需要添加 [0]

Paddle/python/paddle/fluid/tests/unittests/test_imperative_double_grad.py

Lines 83 to 86 in c91aace

    
           dx = fluid.dygraph.grad(out, [x, z], allow_unused=True) 
        
           dout = np.ones_like(np_y) 
        
           expected_dx = np.matmul(dout, np.transpose(np_y)) 
        
           self.assertTrue(np.allclose(dx[0].numpy(), expected_dx[0]))

静态图代码报错（shape 不对齐问题导致测试不通过而无法执行后面的转为静态图模式的代码，进而后续静态图测试全部无法通过，本质上是问题二）

剩余大概 48 个文件是还有需要修改的（存在大量 shape 问题和可能存在少量精度问题和其他问题），将在下个 PR 进行修复。

paddle-bot · 2022-08-08T12:21:42Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

luotao1 · 2022-08-09T07:46:44Z

静态图代码报错。应当可用测试代码开始时 paddle.enable_static()，测试代码结束时 paddle.disable_static() 包裹该段测试代码（还没测试）

可以贴一下哪些单测的静态图代码报错么？

python/paddle/fluid/tests/unittests/test_gather_nd_op.py

paddle-bot · 2022-08-10T14:50:29Z

很抱歉，经过我们的反复讨论，你的PR暂未达到合入标准，请阅读飞桨原生算子开发规范，你可以重新提交新的PR，我们先将此PR关闭，感谢你的贡献。
Sorry to inform you that through our discussion, your PR fails to meet the merging standard (Reference: Paddle Custom Operator Design Doc). You can also submit an new one. Thank you.

Yulv-git

LGTM

luotao1 · 2022-08-15T01:36:55Z

@SigureMo @qili93 需要在IPU/XPU的机器上跑一下看看，跑完后回复这个PR

SigureMo · 2022-08-15T03:51:01Z

@SigureMo @qili93 需要在IPU/XPU的机器上跑一下看看，跑完后回复这个PR

好的，如果有问题我恢复相关修改

qili93 · 2022-08-16T02:05:56Z

@SigureMo

由于对于XPU相关修改比较多，和昆仑同学确认，请提交一条以“xxxx, test=kunlun” 结尾的的commit msg，会自动触发昆仑流水线进行单测检查。

qili93 · 2022-08-16T05:51:16Z

CI passed for MLU and IPU.

import PaddlePaddle#45079 changes, try fix 'PreconditionNotMetError: the memcpy_h2d op has no valid phi kernel.'

SigureMo · 2022-08-16T19:55:46Z

python/paddle/fluid/tests/unittests/test_nearest_interp_v2_op.py

-from paddle._C_ops import final_state_nearest_interp
+from paddle import _C_ops


这里是因为原来 NPU 下单测 test_nearest_interp_v2_op_npu 在从 test_nearest_interp_v2_op import numpy 的参考实现（nearest_neighbor_interp_np）时候会报错，报错如下：

2022-08-17 00:32:35 1/1 Test #93: test_nearest_interp_v2_op_npu ....***Failed 3.95 sec 2022-08-17 00:32:35 W0817 00:32:32.922722 813 init.cc:288] AVX is available, Please re-compile on local machine 2022-08-17 00:32:35 Hint: Your machine support AVX, but the installed paddlepaddle doesn't have avx core. Hence, no-avx core with worse preformance will be imported. 2022-08-17 00:32:35 If you like, you could reinstall paddlepaddle by 'python -m pip install --force-reinstall paddlepaddle-gpu[==version]' to get better performance. 2022-08-17 00:32:35 The original error is: cannot import name 'core_avx' from 'paddle.fluid' (/workspace/npu_dev/Paddle/build/python/paddle/fluid/__init__.py) 2022-08-17 00:32:35 /workspace/npu_dev/Paddle/build/python/paddle/fluid/framework.py:189: UserWarning: We will fallback into legacy dygraph on NPU/XPU/MLU/IPU/ROCM devices. Because we only support new eager dygraph mode on CPU/GPU currently. 2022-08-17 00:32:35 "We will fallback into legacy dygraph on NPU/XPU/MLU/IPU/ROCM devices. Because we only support new eager dygraph mode on CPU/GPU currently. " 2022-08-17 00:32:35 Traceback (most recent call last): 2022-08-17 00:32:35 File "/workspace/npu_dev/Paddle/tools/test_runner.py", line 69, in <module> 2022-08-17 00:32:35 main() 2022-08-17 00:32:35 File "/workspace/npu_dev/Paddle/tools/test_runner.py", line 51, in main 2022-08-17 00:32:35 module = importlib.import_module(module_name) 2022-08-17 00:32:35 File "/opt/conda/lib/python3.7/importlib/__init__.py", line 127, in import_module 2022-08-17 00:32:35 return _bootstrap._gcd_import(name[level:], package, level) 2022-08-17 00:32:35 File "<frozen importlib._bootstrap>", line 1006, in _gcd_import 2022-08-17 00:32:35 File "<frozen importlib._bootstrap>", line 983, in _find_and_load 2022-08-17 00:32:35 File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked 2022-08-17 00:32:35 File "<frozen importlib._bootstrap>", line 677, in _load_unlocked 2022-08-17 00:32:35 File "<frozen importlib._bootstrap_external>", line 728, in exec_module 2022-08-17 00:32:35 File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed 2022-08-17 00:32:35 File "/workspace/npu_dev/Paddle/build/python/paddle/fluid/tests/unittests/npu/test_nearest_interp_v2_op_npu.py", line 29, in <module> 2022-08-17 00:32:35 from test_nearest_interp_v2_op import nearest_neighbor_interp_np 2022-08-17 00:32:35 File "../test_nearest_interp_v2_op.py", line 25, in <module> 2022-08-17 00:32:35 from paddle._C_ops import final_state_nearest_interp 2022-08-17 00:32:35 ImportError: cannot import name 'final_state_nearest_interp' from 'paddle.fluid.core_noavx.ops' (/workspace/npu_dev/Paddle/build/python/paddle/_C_ops.py) 2022-08-17 00:32:35 Errors while running CTest 2022-08-17 00:32:35 0% tests passed, 1 tests failed out of 1 2022-08-17 00:32:35 Total Test time (real) = 3.96 sec 2022-08-17 00:32:35 The following tests FAILED: 2022-08-17 00:32:35 93 - test_nearest_interp_v2_op_npu (Failed)

看起来是刚刚 merge 的 #45148 新增的这一行（这里的 25 行）直接从 _C_ops import 了 final_state_nearest_interp，但在 NPU 下貌似是访问不到的。为了避免这个问题，修改为该文件全局仅 import 到 _C_ops，修改后 NPU 测试全部通过。

SigureMo · 2022-08-16T20:02:29Z

由于对于XPU相关修改比较多，和昆仑同学确认，请提交一条以“xxxx, test=kunlun” 结尾的的commit msg，会自动触发昆仑流水线进行单测检查。

最新一次 commit 包含了 test=kunlun，目前除了两个非 Required 其余全部都通过了～

qili93

LGTM

XiaoguangHu01

LGTM

paddle-bot bot added contributor External developers status: proposed labels Aug 8, 2022

SigureMo force-pushed the assert-allclose branch from 7797308 to cd0a2e6 Compare August 8, 2022 16:33

SigureMo mentioned this pull request Aug 8, 2022

[CodeStyle] 单测报错信息优化 RFC PaddlePaddle/community#198

Merged

luotao1 self-assigned this Aug 9, 2022

luotao1 added the PFCC Paddle Framework Contributor Club，https://github.com/PaddlePaddle/community/tree/master/pfcc label Aug 9, 2022

paddle-bot bot removed the status: proposed label Aug 9, 2022

SigureMo commented Aug 9, 2022

View reviewed changes

python/paddle/fluid/tests/unittests/test_gather_nd_op.py Outdated Show resolved Hide resolved

SigureMo changed the title ~~[WIP][CodeStyle] use np.testing.assert_allclose instead of self.assertTrue(np.allclose(...))~~ [WIP][CodeStyle] use np.testing.assert_allclose instead of self.assertTrue(np.allclose(...)) (part 1) Aug 10, 2022

SigureMo force-pushed the assert-allclose branch from fb7670f to f5d67ab Compare August 10, 2022 09:52

SigureMo closed this Aug 10, 2022

SigureMo force-pushed the assert-allclose branch from b0ce5e5 to 4805da5 Compare August 10, 2022 14:50

paddle-bot bot added the status: not progressed label Aug 10, 2022

autofix

5ea9430

SigureMo reopened this Aug 10, 2022

try resolve precision issues

abd3f09

SigureMo force-pushed the assert-allclose branch 2 times, most recently from ad8a787 to 179c9b6 Compare August 10, 2022 18:38

revert some changes

8b3520a

SigureMo force-pushed the assert-allclose branch from 179c9b6 to 8b3520a Compare August 10, 2022 19:25

SigureMo added 3 commits August 11, 2022 03:52

clean some err_msg

5f6093b

0.0001 -> 1e-4

54e5e8d

update commented assert code

7c48fbd

SigureMo changed the title ~~[WIP][CodeStyle] use np.testing.assert_allclose instead of self.assertTrue(np.allclose(...)) (part 1)~~ [CodeStyle] use np.testing.assert_allclose instead of self.assertTrue(np.allclose(...)) (part 1) Aug 10, 2022

SigureMo changed the title ~~[CodeStyle] use np.testing.assert_allclose instead of self.assertTrue(np.allclose(...)) (part 1)~~ [WIP][CodeStyle] use np.testing.assert_allclose instead of self.assertTrue(np.allclose(...)) (part 1) Aug 11, 2022

SigureMo force-pushed the assert-allclose branch from 7702627 to 70484b6 Compare August 11, 2022 10:02

try to fix some shape errors

6954f99

Yulv-git previously approved these changes Aug 13, 2022

View reviewed changes

luotao1 mentioned this pull request Aug 15, 2022

Fix UT failures #45099

Merged

qili93 changed the title ~~[CodeStyle] use np.testing.assert_allclose instead of self.assertTrue(np.allclose(...)) (part 1)~~ [CodeStyle][NPU] use np.testing.assert_allclose instead of self.assertTrue(np.allclose(...)) (part 1) Aug 15, 2022

SigureMo added 2 commits August 16, 2022 12:47

empty commit, trigger kunlun ci, test=kunlun

8748a13

Merge branch 'develop' into assert-allclose

8af214a

SigureMo added 4 commits August 16, 2022 14:10

empty commit, retrigger kunlun ci, test=kunlun

2f44c59

Merge branch 'develop' into assert-allclose

c0fb5a8

import PaddlePaddle#45079 changes, try fix 'PreconditionNotMetError: the memcpy_h2d op has no valid phi kernel.'

empty commit, trigger kunlun ci, try fix npu memcpy_h2d, test=kunlun

904ae6d

try fix npu import error, test=kunlun

cb0e196

SigureMo dismissed stale reviews from Yulv-git and luotao1 via cb0e196 August 16, 2022 17:55

SigureMo commented Aug 16, 2022

View reviewed changes

qili93 approved these changes Aug 17, 2022

View reviewed changes

SigureMo mentioned this pull request Aug 17, 2022

[phi] Transfer fluid bilinear_interp_v2 to phi bilinear_interp (add yaml) #45140

Merged

SigureMo requested a review from luotao1 August 17, 2022 06:13

luotao1 approved these changes Aug 17, 2022

View reviewed changes

XiaoguangHu01 approved these changes Aug 17, 2022

View reviewed changes

luotao1 merged commit 2de0d67 into PaddlePaddle:develop Aug 17, 2022

SigureMo deleted the assert-allclose branch August 17, 2022 08:58

This was referenced Aug 17, 2022

[CodeStyle] use np.testing.assert_allclose instead of self.assertTrue(np.allclose(...)) (part 2) #45213

Merged

[CodeStyle] use np.testing.assert_array_equal(...) instead of self.assertTrue(np.array_equal(...)) (part 3) #45251

Merged

SigureMo mentioned this pull request Sep 7, 2022

【社区治理】SigureMo 发起 Committer 身份申请 #45841

Closed

This was referenced Oct 8, 2022

[CodeStyle][F401] remove unused import in unittests/xpu #46783

Merged

[CodeStyle][F401][NPU] remove unused imports in unittests/npu #46782

Closed

SigureMo mentioned this pull request Jul 13, 2023

[CodeStyle] replace assert np.allclose with np.testing.assert_allclose and assert np.array_equal with np.testing.assert_array_equal #55385

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CodeStyle][NPU] use np.testing.assert_allclose instead of self.assertTrue(np.allclose(...)) (part 1) #44988

[CodeStyle][NPU] use np.testing.assert_allclose instead of self.assertTrue(np.allclose(...)) (part 1) #44988

SigureMo commented Aug 8, 2022 •

edited

Loading

paddle-bot bot commented Aug 8, 2022

luotao1 commented Aug 9, 2022

paddle-bot bot commented Aug 10, 2022

Yulv-git left a comment

luotao1 commented Aug 15, 2022

SigureMo commented Aug 15, 2022

qili93 commented Aug 16, 2022 •

edited

Loading

qili93 commented Aug 16, 2022 •

edited

Loading

SigureMo Aug 16, 2022 •

edited

Loading

SigureMo commented Aug 16, 2022

qili93 left a comment

XiaoguangHu01 left a comment

	res = exe.run(feed={'X': self.x_np}, fetch_list=[out])
	out_ref = ref_softmax(self.x_np, self.axis)
	self.assertTrue(np.allclose(out_ref, res))

	dx = fluid.dygraph.grad(out, [x, z], allow_unused=True)
	dout = np.ones_like(np_y)
	expected_dx = np.matmul(dout, np.transpose(np_y))
	self.assertTrue(np.allclose(dx[0].numpy(), expected_dx[0]))

		from paddle._C_ops import final_state_nearest_interp
		from paddle import _C_ops

[CodeStyle][NPU] use np.testing.assert_allclose instead of self.assertTrue(np.allclose(...)) (part 1) #44988

[CodeStyle][NPU] use np.testing.assert_allclose instead of self.assertTrue(np.allclose(...)) (part 1) #44988

Conversation

SigureMo commented Aug 8, 2022 • edited Loading

PR types

PR changes

Describe

Changes

paddle-bot bot commented Aug 8, 2022

luotao1 commented Aug 9, 2022

paddle-bot bot commented Aug 10, 2022

Yulv-git left a comment

Choose a reason for hiding this comment

luotao1 commented Aug 15, 2022

SigureMo commented Aug 15, 2022

qili93 commented Aug 16, 2022 • edited Loading

qili93 commented Aug 16, 2022 • edited Loading

SigureMo Aug 16, 2022 • edited Loading

Choose a reason for hiding this comment

SigureMo commented Aug 16, 2022

qili93 left a comment

Choose a reason for hiding this comment

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

SigureMo commented Aug 8, 2022 •

edited

Loading

qili93 commented Aug 16, 2022 •

edited

Loading

qili93 commented Aug 16, 2022 •

edited

Loading

SigureMo Aug 16, 2022 •

edited

Loading