Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saner error bounds on linear algebra tests #67

Closed
jiahao opened this issue Jan 29, 2014 · 28 comments
Closed

Saner error bounds on linear algebra tests #67

jiahao opened this issue Jan 29, 2014 · 28 comments
Labels
test This change adds or pertains to unit tests

Comments

@jiahao
Copy link
Member

jiahao commented Jan 29, 2014

In the current linear algebra test suite, many of the linear algebra tests have error bounds that are heuristically implemented, e.g. something like N^2*eps(). This occasionally causes brittleness in that the linear algebra tests fail for no apparent reason.

The purpose of this issue is to track

  • Specific linear algebra tests which are brittle and have to had their tolerances adjusted manually for no apparent reason
  • The brittleness of the magic seed that is currently used in the linear algebra test suite.
  • Thoughts on an appropriate fuzz factor for linear algebraic tests
  • The implementation of proper error bounds on linear algebra tests.

The long term hope is that the last action item will close this issue once and for all, but in the meantime we should keep track of any heuristic adjustments to the test suite.

Cross-reference: JuliaLang/julia#5578 JuliaLang/julia#5472 #33

@staticfloat
Copy link
Member

Shouldn't all error bounds be in terms of eps and the number of instructions that could cause an error of eps in the output? I can't think of another way to determine the error bounds, other than just running our tests with a large sample pool and measuring them. But that seems rather self-defeating, as we want to ensure our tests are accurate, and logically our operations should have some finite, calculable bound. Perhaps our reasoning is wrong on some of these bounds?

@jiahao
Copy link
Member Author

jiahao commented Jan 29, 2014

Our reasoning is wrong on almost all of these bounds; they are either too strict or too lenient (and sometimes both), and the ones for which the error bounds are too strict occasionally break when the randomly sampled matrices change, which causes the test suite to die.

@rickhg12hs
Copy link

Error update (XRef #5472):

$ ./julia -e 'versioninfo()'
Julia Version 0.3.0-prerelease+1347
Commit 896dcc9* (2014-01-31 12:34 UTC)
Platform Info:
  System: Linux (i686-redhat-linux)
  CPU: Genuine Intel(R) CPU           T2250  @ 1.73GHz
  WORD_SIZE: 32
  BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY)
  LAPACK: libopenblas
  LIBM: libopenlibm

$ make test-linalg
    JULIA test/linalg
     * linalg
exception on 1: ERROR: assertion failed: |:(*(triu(a),x)) - b| <= 0.0017881393432617188
  :(*(triu(a),x)) = 4.000061    .99801636
4.9999237       3.0001984
3.0004425       5.000412
5.000763        2.9999542
1.0000076       3.0000153
5.0000305       3.0000916
1.9999999       3.9999993
2       4
3       4
3       4

  b = 4 1
5       3
3       5
5       3
1       3
5       3
2       4
2       4
3       4
3       4

  difference = 0.0019836426 > 0.0017881393432617188
 in error at error.jl:22
 in test_approx_eq at test.jl:68
 in anonymous at no file:183
 in runtests at /usr/local/src/julia/julia/test/testdefs.jl:5
 in anonymous at multi.jl:613
 in run_work_thunk at multi.jl:575
 in remotecall_fetch at multi.jl:647
 in remotecall_fetch at multi.jl:662
 in anonymous at multi.jl:1382
while loading linalg.jl, in expression starting on line 23
ERROR: assertion failed: |:(*(triu(a),x)) - b| <= 0.0017881393432617188
  :(*(triu(a),x)) = 4.000061    .99801636
4.9999237       3.0001984
3.0004425       5.000412
5.000763        2.9999542
1.0000076       3.0000153
5.0000305       3.0000916
1.9999999       3.9999993
2       4
3       4
3       4

  b = 4 1
5       3
3       5
5       3
1       3
5       3
2       4
2       4
3       4
3       4

  difference = 0.0019836426 > 0.0017881393432617188
 in error at error.jl:22
 in test_approx_eq at test.jl:68
 in anonymous at no file:183
 in runtests at /usr/local/src/julia/julia/test/testdefs.jl:5
 in anonymous at multi.jl:613
 in runtests at /usr/local/src/julia/julia/test/testdefs.jl:5
 in anonymous at multi.jl:613
 in run_work_thunk at multi.jl:575
 in remotecall_fetch at multi.jl:647
 in remotecall_fetch at multi.jl:662
 in anonymous at multi.jl:1382
while loading linalg.jl, in expression starting on line 23
while loading /usr/local/src/julia/julia/test/runtests.jl, in expression starting on line 23

make[1]: *** [linalg] Error 1
make: *** [test-linalg] Error 2


@mschauer
Copy link
Contributor

Thanks, exactly what I thought. This is JuliaLang/julia#5526 , just made a comment there.

@andreasnoack
Copy link
Member

Okay. Short term fix is that I install a 32 bit VB and adjust these.

@mschauer
Copy link
Contributor

VB will be overkill, see my comment there.

@andreasnoack
Copy link
Member

I'll continue the conversation from #70 here.

Most of the linear algebra tests look superficially very similar and fail similarly, which is a problem.

Can you elaborate on that?

Better error bounds would be better, but the tests have actually worked pretty well for a long time. The Recently, I included Float16 in the tests but because they get promoted before the testing, the criterion in @test_approx_eq calculated too small bounds. Therefore, I had to make the ad hoc bounds that are causing all the problems now.

There are many tests and it will take some time to write correct error bounds for all of them. Hence, as a short term solution I don't think it is a problem just to widen the error bounds and get the tests to pass again. The primary reason for the tests is to detect when we break something.

@stevengj
Copy link
Member

stevengj commented Feb 7, 2014

As I commented in JuliaLang/julia#5705, the root problem here is not the error bounds, but rather the fact that the tests use a random matrix. Eventually, you will always hit a badly conditioned matrix by accident.

I would just use a deterministic matrix for the tests. Then you know how accurately the solution should be computed, and can set error bounds relative to that once and for all.

@mschauer
Copy link
Contributor

mschauer commented Feb 7, 2014

I am not sure I understand you, the rng is seeded and will produce always the same matrices in the test. srand(32);rand(Float64,2,2) is a deterministic matrix barring changes in the random number generator.

@stevengj
Copy link
Member

stevengj commented Feb 7, 2014

@mschauer, I didn't notice that the RNG has a deterministic seed in the tests. Nevermind then, that shouldn't be a problem.

@jiahao
Copy link
Member Author

jiahao commented Feb 7, 2014

I really shouldn't have spent so much time on this, but I've just uploaded an IJulia notebook JIN5075 (nbviewer link) which does some numerical testing to find out how brittle the test case is when applied to many random matrices, and I've also done some testing to adapt the textbook forward and error bounds into tests that can be used to replace the magic number test in JuliaLang/julia#5075.

I hope the notebook will explain clearly why I am not comfortable with continually adjusting the magic numbers. It is just pushing the issue down the road: what if we one day reorder the tests, or someone inserts a test before it, causing the test to sample a completely new random matrix?

cc: @andreasnoackjensen

@staticfloat
Copy link
Member

Thanks for the writeup, @jiahao. I think I understand better now. Also, I put in an nbviewer link for those of us too lazy to copy the notebook into our own IJulia setup.

@jiahao
Copy link
Member Author

jiahao commented Feb 7, 2014

Thanks, I should remember to do that.

@StefanKarpinski
Copy link
Member

Well, I'm glad we have a random matrix expert or two around.

@jiahao
Copy link
Member Author

jiahao commented Feb 7, 2014

I'm glad I'm useful for something other than supplying snacks for @JeffBezanson and @loladiro .

@mschauer
Copy link
Contributor

mschauer commented Feb 7, 2014

Thank you, for the notebook jiahao. You note that changing a magic number like in 5705 "decreases the probability of failure by an essentially negligible amount". This was a bit the point of the change from my point of view, it doesn't make the tests more or less brittle, it was rather the smallest change to make the test in 32 bit systems to behave similar to the 64 systems until a real solution is found and such that a variant of JuliaLang/julia#5578 can be merged without breaking everything.

@mschauer
Copy link
Contributor

mschauer commented Feb 7, 2014

One idea: Instead of plain random matrices one can use for testing fixed diagonal matrices with a random orthogonal basis transform that preserves the eigenvalues and condition numbers so it is easier to reason about reasonable error bounds.

@rickhg12hs
Copy link

Error update (Xref JuliaLang/julia#5472 ):

$ ./julia -E 'versioninfo()'
Julia Version 0.3.0-prerelease+1419
Commit a673e4c* (2014-02-06 22:55 UTC)
Platform Info:
  System: Linux (i686-redhat-linux)
  CPU: Genuine Intel(R) CPU           T2250  @ 1.73GHz
  WORD_SIZE: 32
  BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY)
  LAPACK: libopenblas
  LIBM: libopenlibm
nothing
$ make test-linalg   # debug=true in test/linalg.jl
    JULIA test/linalg
     * linalg

[lots of passed tests ... and then ...]

type of a: Int32 type of b: Float16

(Automatic) upper Cholesky factor
exception on 1: ERROR: assertion failed: |:(det(capd)) - :(det(apd))| <= 0.002384185791015625
  :(det(capd)) = 1.626428240984628e9
  :(det(apd)) = 1.6264282409980435e9
  difference = 0.01341557502746582 > 0.002384185791015625
 in error at error.jl:22
 in test_approx_eq at test.jl:68
 in anonymous at no file:54
 in runtests at /usr/local/src/julia/julia/test/testdefs.jl:5
 in anonymous at multi.jl:613
 in run_work_thunk at multi.jl:575
 in remotecall_fetch at multi.jl:647
 in remotecall_fetch at multi.jl:662
 in anonymous at multi.jl:1382
while loading linalg.jl, in expression starting on line 23
ERROR: assertion failed: |:(det(capd)) - :(det(apd))| <= 0.002384185791015625
  :(det(capd)) = 1.626428240984628e9
  :(det(apd)) = 1.6264282409980435e9
  difference = 0.01341557502746582 > 0.002384185791015625
 in error at error.jl:22
 in test_approx_eq at test.jl:68
 in anonymous at no file:54
 in runtests at /usr/local/src/julia/julia/test/testdefs.jl:5
 in anonymous at multi.jl:613
 in run_work_thunk at multi.jl:575
 in remotecall_fetch at multi.jl:647
 in remotecall_fetch at multi.jl:662
 in anonymous at multi.jl:1382
while loading linalg.jl, in expression starting on line 23
while loading /usr/local/src/julia/julia/test/runtests.jl, in expression starting on line 23

make[1]: *** [linalg] Error 1
make: *** [test-linalg] Error 2
$ 

@jiahao
Copy link
Member Author

jiahao commented Feb 7, 2014

@mschauer I didn't mean to criticize the intention of JuliaLang/julia#5705, but rather to point out that the smallest change to make the test pass on your system didn't fundamentally change the state of affairs. I think those of us actively discussing these issues understand this quite well, but the point of the numerical experiment was to quantify exactly what the situation looked like and hopefully give others a sense of why this is important.

@jiahao
Copy link
Member Author

jiahao commented Feb 7, 2014

@mschauer we could generate matrices from random orthogonal/unitary similarity transformations of diagonal matrices, that is true, but those matrix multiplies don't come for free, and the the tests would still have to account for the roundoff induced by those multiplications. I haven't thought through this case yet, but it is certainly plausible that in this particular situation the numerical stability of the rotation matrices would lead to nice behavior.

@mschauer
Copy link
Contributor

mschauer commented Feb 7, 2014

The idea would be to take the result of the similarity transform "as is", that is as the random matrix we are testing with and basically forget about the way it was constructed. That doesn't change much about the integer cases though.

@andreasnoack
Copy link
Member

@jiahao I have followed your example and spent too much time on making a notebook with a reply.

http://nbviewer.ipython.org/gist/andreasnoackjensen/8882742

The answer is also related to JuliaLang/julia#5726

@jiahao
Copy link
Member Author

jiahao commented Feb 8, 2014

Thanks for the reply @andreasnoackjensen . A couple of points:

  1. I'm proposing to consider the test matrix as arbitrary rather than random. (pet peeve alert) There are actually quite a lot of tighter error bounds and results concerning specifically the Gaussian random matrices (quite a few from @alanedelman), which are our current test subjects, but I do not consider them to be relevant to the issue of testing linear algebra.
  2. The condition number is not known at the time of the computation, but it of course can be computed, and in fact can be done so fairly accurately for Float64. I've found that at lower precisions, this gets a little dicey. I used BigFloat but it is definitely overkill; explicitly casting to Float64 or Complex128 would almost certainly suffice.
  3. Note that the error bound involving Skeel condition numbers is structurally that of the constant γn = n*ε/(1-n*ε). In the derivation, Higham assumes that is always positive, which was not a bad assumption when only single and double precision were worth considering. I rederived this error bound last night (so take late night mathematics for what it is worth) and it looks like it is sufficient to just take the absolute value of the error bound (although technically this involves loosening the inequality bound yet further). This is implemented in Replace magic constant test for linear solve on triangular matrices julia#5726.
  4. I left this out of the version of the notebook I uploaded, but I can't find any way to justify an error bound that looks like the 'old' version being tested. The lhs of the current test is abs(norm(A*x)-norm(b)), which is exactly the lower bound of the triangle inequality as applied to the residual vector r=A*x-b, i.e. abs(norm(A*x)-norm(b)) <= norm(A*x-b) <= norm(A*x) + norm(b). Maybe it's possible to bound the leftmost size of the inequality, but to me it feels a lot more natural to just bound the residual itself.
  5. The reason why the error bounds on the forward error bound are loose is actually known; it comes from trying to abstract away the order in which the substitutions are done. (Higham, §§ 8.1-2 sketches out the derivation in considerable detail.) I was surprised, though, at how much slack this introduced into the error bound in the end. We could tighten the bounds considerably, at the cost of tying the test to a specific implementation of the linear solver.
  6. Your observation that sampling the solution vector and reconstructing the rhs a posteriori is quite interesting. I don't think I've seen anyone propose that before. The sampling does look much more reasonable doing it this way. (The gap between the data and the bound is to be expected, though, since the bound demarcates the region of zero probability and the gap is probably the region expected from possible, but very unlikely, samples.)

Right now I still think that where possible, we should use theoretically justified error bounds. But adapting the textbook results has turned out to be somewhat tricker than I had expected. The test derived from the backward error bound is easy to reason about and I think it's correct. Conversely, the forward error test is tricky; the correct way to implement my version of the forward error bound test ought to account also for the error of the more precise estimator for the solution. For BigFloats I currently assume that the contribution to the error is exactly zero, when for BigFloat, γn~2e-76. There has been a lot of literature on mixed precision calculations, which could be worth digging into, but I haven't yet unearthed results specific to the forward error bound between two estimated solutions of different precisions.

I smell an excellent paper about linear algebra testing if we want to pursue this further.

@alanedelman
Copy link
Contributor

Reading real quickly now.

Lots of linear algebra error bounds:
http://www.netlib.org/lapack/lug/node72.html

Probably many are pessimistic

I have a preference for eps * (the right condition number) * (# ops/number) * (3,4,or 10)

lapack has lots of discussion about testing, good test matrices, etc
not that things can't be improved, but decades of good thinking has happened already

random matrices, i feel, are good for a few tests, but not overused

something i mentioned and often gets quoted:

We want to convey that random matrices are very special matrices. It is a mistake to link psychologically a random matrix with the intuitive notion of a ‘typical’ matrix or the vague concept of ‘any old matrix’. In fact, the larger the size of the matrix the more predictable it becomes.

@andreasnoack
Copy link
Member

@jiahao

  1. If the distribution has unbounded support and any test failure is problematic then I think arbitrariness and randomness can be considered the same. However, I see your point, we don't want to exploit probability results in these tests.
  2. I guess you only know an estimate of the condition number, but with a negligible error, right? In theory, for a very tight bound the inequality could be violated because of the estimation error in the condition number. This is not important, but I just wanted to differentiate knowns and unknowns.
  3. Causing big frustration to me, my library account doesn't allow me to read SIAM material, so I haven't read Higham. With that reservation, I don't think γ is the problem in itself. Rather it is the 1-cond(A)γ denominator and I don't think the absolute value will suffice as a solution when the matrices are very ill conditioned. Please have a look here.
  4. I completely agree. I don't remember why it was like that before. The revisions of test_approx_eq I have done, have been completely theory ignorant.
  5. This has been a concern to me. There is a size/power trade off. If the bounds get too loose, we avoid false positives at the cost of not detecting true errors.
  6. I am a little surprised as I thought it was the most obvious thing to do if you wanted to measure the error. I agree that a gap is to be expected. The important thing is the correlation ensuring that the error and bound moves together when the condition number gets big. You don't have that effect in the BigFloat solution and I think that is problematic.

Lets read more of the LAPACK manual and the LAWNs and take whats there.

@jiahao
Copy link
Member Author

jiahao commented Feb 11, 2014

I guess you only know an estimate of the condition number, but with a negligible error, right?

At some level, the distinction between that and the best possible answer that can be computed within floating-point roundoff becomes meaningless. For the dense matrix case, I think cond computes the full SVD and takes the ratio of the largest to smallest condition number, which is by definition the "classical" condition number. SVD itself isn't always that stable numerically...

@KristofferC
Copy link
Member

With the guardsrand changes and the inactivity here, I think this can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test This change adds or pertains to unit tests
Projects
None yet
Development

No branches or pull requests

9 participants