Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decaying() Returns Unexpected Values #3447

Closed
SamuelLKane opened this issue Mar 20, 2019 · 5 comments
Closed

Decaying() Returns Unexpected Values #3447

SamuelLKane opened this issue Mar 20, 2019 · 5 comments
Labels
bug Bugs and behaviour differing from documentation

Comments

@SamuelLKane
Copy link
Contributor

How to reproduce the behavior

Add the following code to any project which includes spaCy

from spacy import util

sizes = util.decaying(1., 10., 0.001)

size = next(sizes)
print (size)
assert size == 1.
size = next(sizes)
print (size)
assert size == 1. - 0.001
size = next(sizes)
print (size)
assert size == 1. - 0.001 - 0.001

This is a direct test of the example provided in the spaCy docs for util.decaying. It will fail on the first assertion.

Additionally the example shows an impossible sequence as this is a decaying series and 1 > 10. If you invert the start and end values you do get a sequence which never decays below the end.

Looking at the actual series you can see that it does not decay at a rate of 0.001 but some approximately close number lost to floating point math which eventually results in the ability to get nearly duplicate values in the series:

these values are next to each other when viewing the series defined by
decaying(1., 10., 0.001)

0.8257638315441783
0.8250825082508251

There is also a problem with how the decay factor is considered. If you use a larger factor the results are completely non-sensical:

dropout = decaying(10., 1., 0.45)
6.8965517241379315
5.2631578947368425
4.25531914893617
3.5714285714285716
3.076923076923077
2.7027027027027026
2.4096385542168672
2.173913043478261
1.9801980198019802
1.8181818181818181
1.680672268907563
1.5625
1.4598540145985401
1.36986301369863
1.2903225806451613
1.2195121951219514
1.1560693641618496
1.098901098901099
1.0471204188481675
1.0

I expressed this issue on twitter in this thread. This issue is mainly being opened so I can make the PR per the contribution guidelines.

Your Environment

  • spaCy version: 2.0.18
  • Platform: Darwin-18.2.0-x86_64-i386-64bit (macOS Mojave 10.14.3)
  • Python version: 3.7.1
  • Models: en
@ines ines added the bug Bugs and behaviour differing from documentation label Mar 20, 2019
@honnibal
Copy link
Member

Thanks for this!

I was sure I replied to that comment on Twitter, but I don't see it there. I guess I must not have. Maybe I lost connection after typing the tweet. Sorry!

Anyway, I would've encouraged you to open an issue, and noted that this type of thing is one of my weaker points.

@ines
Copy link
Member

ines commented Mar 21, 2019

@honnibal Here it is btw – Twitter just makes it difficult to find nested replies: https://twitter.com/honnibal/status/1100848503759216640

@SamuelLKane
Copy link
Contributor Author

I have pushed up a branch and made a PR - slk/issue#3447- you can see the change I've made to util.py. The change I've made brings the model for calculating the linear series more in line with how other methods calculate/return a series (i.e. compounding, stepping) as well as simplifies the logic so that it is a truly linear series.

I think this does address the core problem with how the method was originally defined but doesn't address the error I mentioned above about floating point inaccuracy. Here are the outputs for my version of decaying (new) and the current version of decaying (old).

decaying( 10., 1., .001)

'old' 'new'
9.990009990009991 10.0
9.980039920159681 9.999
9.970089730807578 9.998000000000001
9.9601593625498 9.997000000000002
9.950248756218906 9.996000000000002
9.940357852882704 9.995000000000003
9.9304865938431 9.994000000000003
9.920634920634921 9.993000000000004
9.910802775024777 9.992000000000004
9.900990099009901 9.991000000000005
9.891196834817015 9.990000000000006
9.881422924901186 9.989000000000006
9.87166831194472 9.988000000000007
9.861932938856016 9.987000000000007
9.852216748768473 9.986000000000008
9.84251968503937 9.985000000000008
9.832841691248772 9.984000000000009
9.823182711198427 9.98300000000001
9.813542688910697 9.98200000000001
9.803921568627452 9.98100000000001
9.794319294809013 9.980000000000011

@honnibal
Copy link
Member

Merged, thanks!

@lock
Copy link

lock bot commented Apr 29, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Apr 29, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation
Projects
None yet
Development

No branches or pull requests

3 participants