using float32 in gradient checking may be not appropriate due to large rounding error #4283

pengli09 · 2017-09-21T03:28:51Z

Major problem: float32's rounding error is so large that it may dominate the difference between the numerical gradients and the analytical gradients, which cases relatively large relative error in gradient checking. As a consensus, the gradient checker used in unit tests may be unreliable.

Potential solution:

Choosing epsilon carefully to make rounding error reasonable. However, this is a challenging task. See https://en.wikipedia.org/wiki/Numerical_differentiation and the experiments in the end of this issue.
Using float64 instead of float32. Reference: http://cs231n.github.io/neural-networks-3/

Experiments
The differences between the numerical and analytical gradients of the linear function f(x, y) = x^T * y are shown as bellow. We can conclude that

Although linear function is very simple, the absolute error and relative error are unacceptable large if float32 is used.
The errors are very small is float64 is used.
If the scale of epsilon is comparable with x/y, errors will be small. But I'm not sure whether this conclusion generalizes to more complicated functions.

x_shape (1, 200) y_shape (200, 1)
<type 'numpy.float32'>
epsilon              	max diff             	max relative diff    	avg_abs_diff         	avg_abs_x            	avg_abs_y            
10000.000000000000000	5.96046e-08          	6.27301e-08          	5.96046e-08          	0.485359             	0.532412             
 1000.000000000000000	0                    	0                    	0                    	0.485359             	0.532412             
  100.000000000000000	5.96046e-08          	6.27301e-08          	5.96046e-08          	0.485359             	0.532412             
   10.000000000000000	2.98023e-07          	3.13651e-07          	2.98023e-07          	0.485359             	0.532412             
    1.000000000000000	1.19209e-07          	1.2546e-07           	1.19209e-07          	0.485359             	0.532412             
    0.100000000000000	7.7486e-06           	8.15491e-06          	7.7486e-06           	0.485359             	0.532412             
    0.010000000000000	6.49691e-05          	6.83758e-05          	6.49691e-05          	0.485359             	0.532412             
    0.050000000000000	2.68221e-05          	2.82285e-05          	2.68221e-05          	0.485359             	0.532412             
    0.001000000000000	0.00031656           	0.00033316           	0.00031656           	0.485359             	0.532412             
    0.000100000000000	0.0034982            	0.00368163           	0.0034982            	0.485359             	0.532412             
    0.000010000000000	0.194233             	0.204418             	0.194233             	0.485359             	0.532412             

x_shape (1, 200) y_shape (200, 1)
<type 'numpy.float64'>
epsilon              	max diff             	max relative diff    	avg_abs_diff         	avg_abs_x            	avg_abs_y            
10000.000000000000000	0                    	0                    	0                    	0.485358             	0.532412             
 1000.000000000000000	0                    	0                    	0                    	0.485358             	0.532412             
  100.000000000000000	0                    	0                    	0                    	0.485358             	0.532412             
   10.000000000000000	2.22045e-16          	2.33688e-16          	2.22045e-16          	0.485358             	0.532412             
    1.000000000000000	2.66454e-15          	2.80425e-15          	2.66454e-15          	0.485358             	0.532412             
    0.100000000000000	2.39808e-14          	2.52383e-14          	2.39808e-14          	0.485358             	0.532412             
    0.010000000000000	3.79252e-13          	3.99139e-13          	3.79252e-13          	0.485358             	0.532412             
    0.050000000000000	9.50351e-14          	1.00018e-13          	9.50351e-14          	0.485358             	0.532412             
    0.001000000000000	3.31291e-13          	3.48662e-13          	3.31291e-13          	0.485358             	0.532412             
    0.000100000000000	1.38796e-11          	1.46074e-11          	1.38796e-11          	0.485358             	0.532412             
    0.000010000000000	1.28229e-10          	1.34953e-10          	1.28229e-10          	0.485358             	0.532412             

---------------------------------------------------------------
x_shape (1, 84) y_shape (84, 1)
<type 'numpy.float32'>
epsilon              	max diff             	max relative diff    	avg_abs_diff         	avg_abs_x            	avg_abs_y            
10000.000000000000000	0                    	0                    	0                    	0.475109             	0.482829             
 1000.000000000000000	2.98023e-08          	1.10408e-07          	2.98023e-08          	0.475109             	0.482829             
  100.000000000000000	2.98023e-08          	1.10408e-07          	2.98023e-08          	0.475109             	0.482829             
   10.000000000000000	0                    	0                    	0                    	0.475109             	0.482829             
    1.000000000000000	8.9407e-08           	3.31225e-07          	8.9407e-08           	0.475109             	0.482829             
    0.100000000000000	8.9407e-08           	3.31225e-07          	8.9407e-08           	0.475109             	0.482829             
    0.010000000000000	3.80576e-05          	0.000140992          	3.80576e-05          	0.475109             	0.482829             
    0.050000000000000	8.9407e-08           	3.31225e-07          	8.9407e-08           	0.475109             	0.482829             
    0.001000000000000	3.80576e-05          	0.000140992          	3.80576e-05          	0.475109             	0.482829             
    0.000100000000000	0.00289908           	0.0107402            	0.00289908           	0.475109             	0.482829             
    0.000010000000000	0.079193             	0.293386             	0.079193             	0.475109             	0.482829             

x_shape (1, 84) y_shape (84, 1)
<type 'numpy.float64'>
epsilon              	max diff             	max relative diff    	avg_abs_diff         	avg_abs_x            	avg_abs_y            
10000.000000000000000	0                    	0                    	0                    	0.475109             	0.482829             
 1000.000000000000000	5.55112e-17          	2.05652e-16          	5.55112e-17          	0.475109             	0.482829             
  100.000000000000000	5.55112e-17          	2.05652e-16          	5.55112e-17          	0.475109             	0.482829             
   10.000000000000000	5.55112e-17          	2.05652e-16          	5.55112e-17          	0.475109             	0.482829             
    1.000000000000000	6.66134e-16          	2.46782e-15          	6.66134e-16          	0.475109             	0.482829             
    0.100000000000000	7.77156e-15          	2.87912e-14          	7.77156e-15          	0.475109             	0.482829             
    0.010000000000000	6.10623e-14          	2.26217e-13          	6.10623e-14          	0.475109             	0.482829             
    0.050000000000000	9.99201e-15          	3.70173e-14          	9.99201e-15          	0.475109             	0.482829             
    0.001000000000000	4.16334e-13          	1.54239e-12          	4.16334e-13          	0.475109             	0.482829             
    0.000100000000000	6.68909e-12          	2.4781e-11           	6.68909e-12          	0.475109             	0.482829             
    0.000010000000000	1.84325e-10          	6.82867e-10          	1.84325e-10          	0.475109             	0.482829             

---------------------------------------------------------------
x_shape (1, 10) y_shape (10, 1)
<type 'numpy.float32'>
epsilon              	max diff             	max relative diff    	avg_abs_diff         	avg_abs_x            	avg_abs_y            
10000.000000000000000	0                    	0                    	0                    	0.314629             	0.419932             
 1000.000000000000000	0                    	0                    	0                    	0.314629             	0.419932             
  100.000000000000000	2.98023e-08          	7.10943e-08          	2.98023e-08          	0.314629             	0.419932             
   10.000000000000000	0                    	0                    	0                    	0.314629             	0.419932             
    1.000000000000000	0                    	0                    	0                    	0.314629             	0.419932             
    0.100000000000000	1.78814e-07          	4.26566e-07          	1.78814e-07          	0.314629             	0.419932             
    0.010000000000000	1.01328e-06          	2.4172e-06           	1.01328e-06          	0.314629             	0.419932             
    0.050000000000000	1.78814e-07          	4.26566e-07          	1.78814e-07          	0.314629             	0.419932             
    0.001000000000000	4.91738e-06          	1.17306e-05          	4.91738e-06          	0.314629             	0.419932             
    0.000100000000000	0.000173867          	0.000414764          	0.000173867          	0.314629             	0.419932             
    0.000010000000000	0.00399846           	0.00953843           	0.00399846           	0.314629             	0.419932             

x_shape (1, 10) y_shape (10, 1)
<type 'numpy.float64'>
epsilon              	max diff             	max relative diff    	avg_abs_diff         	avg_abs_x            	avg_abs_y            
10000.000000000000000	5.55112e-17          	1.32423e-16          	5.55112e-17          	0.314629             	0.419932             
 1000.000000000000000	0                    	0                    	0                    	0.314629             	0.419932             
  100.000000000000000	0                    	0                    	0                    	0.314629             	0.419932             
   10.000000000000000	0                    	0                    	0                    	0.314629             	0.419932             
    1.000000000000000	1.11022e-16          	2.64847e-16          	1.11022e-16          	0.314629             	0.419932             
    0.100000000000000	9.99201e-16          	2.38362e-15          	9.99201e-16          	0.314629             	0.419932             
    0.010000000000000	7.66054e-15          	1.82744e-14          	7.66054e-15          	0.314629             	0.419932             
    0.050000000000000	1.22125e-15          	2.91331e-15          	1.22125e-15          	0.314629             	0.419932             
    0.001000000000000	9.64784e-14          	2.30152e-13          	9.64784e-14          	0.314629             	0.419932             
    0.000100000000000	5.40568e-13          	1.28954e-12          	5.40568e-13          	0.314629             	0.419932             
    0.000010000000000	8.34116e-12          	1.98981e-11          	8.34116e-12          	0.314629             	0.419932             

---------------------------------------------------------------
x_shape (1, 1) y_shape (1, 1)
<type 'numpy.float32'>
epsilon              	max diff             	max relative diff    	avg_abs_diff         	avg_abs_x            	avg_abs_y            
10000.000000000000000	0                    	0                    	0                    	0.417022             	0.720325             
 1000.000000000000000	0                    	0                    	0                    	0.417022             	0.720325             
  100.000000000000000	5.96046e-08          	8.27469e-08          	5.96046e-08          	0.417022             	0.720325             
   10.000000000000000	0                    	0                    	0                    	0.417022             	0.720325             
    1.000000000000000	0                    	0                    	0                    	0.417022             	0.720325             
    0.100000000000000	0                    	0                    	0                    	0.417022             	0.720325             
    0.010000000000000	8.9407e-07           	1.2412e-06           	8.9407e-07           	0.417022             	0.720325             
    0.050000000000000	0                    	0                    	0                    	0.417022             	0.720325             
    0.001000000000000	2.44379e-06          	3.39262e-06          	2.44379e-06          	0.417022             	0.720325             
    0.000100000000000	2.38419e-06          	3.30988e-06          	2.38419e-06          	0.417022             	0.720325             
    0.000010000000000	0.000891685          	0.00123789           	0.000891685          	0.417022             	0.720325             

x_shape (1, 1) y_shape (1, 1)
<type 'numpy.float64'>
epsilon              	max diff             	max relative diff    	avg_abs_diff         	avg_abs_x            	avg_abs_y            
10000.000000000000000	0                    	0                    	0                    	0.417022             	0.720324             
 1000.000000000000000	0                    	0                    	0                    	0.417022             	0.720324             
  100.000000000000000	1.11022e-16          	1.54128e-16          	1.11022e-16          	0.417022             	0.720324             
   10.000000000000000	1.11022e-16          	1.54128e-16          	1.11022e-16          	0.417022             	0.720324             
    1.000000000000000	0                    	0                    	0                    	0.417022             	0.720324             
    0.100000000000000	1.11022e-16          	1.54128e-16          	1.11022e-16          	0.417022             	0.720324             
    0.010000000000000	1.22125e-15          	1.69541e-15          	1.22125e-15          	0.417022             	0.720324             
    0.050000000000000	1.11022e-16          	1.54128e-16          	1.11022e-16          	0.417022             	0.720324             
    0.001000000000000	1.23235e-14          	1.71082e-14          	1.23235e-14          	0.417022             	0.720324             
    0.000100000000000	1.23346e-13          	1.71236e-13          	1.23346e-13          	0.417022             	0.720324             
    0.000010000000000	4.31877e-13          	5.99559e-13          	4.31877e-13          	0.417022             	0.720324             

---------------------------------------------------------------

code

import numpy as np

def print_diff(dtype, x_shape, y_shape):
    np.random.seed(1)

    x = np.random.random(x_shape).astype(dtype)
    y = np.random.random(y_shape).astype(dtype)
    
    def f(e):
        return np.matmul(x + e, y)
    
    e = np.zeros(x_shape).astype(dtype)
    
    one = e.copy()
    one[0, 0] = 1
    target = np.dot(one, y)
    
    print '%-21s\t%-21s\t%-21s\t%-21s\t%-21s\t%-21s' \
            % ('delta', 'max diff', 'max relative diff',
               'avg_abs_diff', 'avg_abs_x', 'avg_abs_y')
    #for delta in [10000, 1000, 100, 10, 1, 0.1, 0.01, 0.05, 0.001, 0.0001, 0.00001]:
    for delta in [0.01, 0.05, 0.001, 0.0001, 0.00001]:
        #delta = np.abs(x).sum() / x.size
        e[0, 0] = delta
        grad = (f(e) - f(-e)) / 2 / delta
        #grad = np.matmul(e, y) / delta
        
        diff = grad - target
        
        target_ = target.copy()
        target_[target_ < 1e-3] = 1
        relative_diff = np.abs(diff) / target_
    
        print '%21.15f\t%-21g\t%-21g\t%-21g\t%-21g\t%-21g' \
                % (delta,
                   np.abs(diff).max(),
                   np.abs(relative_diff).max(),
                   np.abs(diff).mean(),
                   np.abs(x).mean(),
                   np.abs(y).mean())

for x_shape, y_shape in [((1, 200), (200, 1)), ((1, 84), (84, 1)), ((1, 10), (10, 1))]:
    for dtype in (np.float32, np.float64):
        print 'x_shape', x_shape, 'y_shape', y_shape
        print dtype
        print_diff(dtype, x_shape, y_shape)
        print ''

    print '-' * 63

The text was updated successfully, but these errors were encountered:

pengli09 · 2017-09-22T09:01:13Z

Conclusion

float64 should be used in gradient checking unless float64 is not supported
computing Jacobian matrix is a safer choice
smaller but reasonable tensor can be used to reduce testing time

Evidence
Pytorch

Tensorflow (code: tensorflow/python/ops)

float64 is widely used
Jacobian matrix is used

lcy-seso mentioned this issue Sep 22, 2017

Current gradient checking logic is too brutal #4134

Closed

shanyi15 closed this as completed Aug 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using float32 in gradient checking may be not appropriate due to large rounding error #4283

using float32 in gradient checking may be not appropriate due to large rounding error #4283

pengli09 commented Sep 21, 2017

pengli09 commented Sep 22, 2017

using float32 in gradient checking may be not appropriate due to large rounding error #4283

using float32 in gradient checking may be not appropriate due to large rounding error #4283

Comments

pengli09 commented Sep 21, 2017

pengli09 commented Sep 22, 2017