Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc2Vec.infer_vector learning rate decays extremely fast (non-linearly) #2061

Closed
umangv opened this issue May 25, 2018 · 2 comments
Closed
Labels
bug Issue described a bug

Comments

@umangv
Copy link
Contributor

umangv commented May 25, 2018

I am working with a corpus of very short documents and noticed that the inferred vectors for the same document were very different.

from scipy.spatial.distance import pdist, squareform
testdoc = "This is a small sample document."
vectors = [d2vmod.infer_vector(testdoc) for _ in range(5)]
squareform(pdist(vectors, "cosine"))
array([[0.        , 0.05987812, 0.06183155, 0.06931093, 0.05466599],
       [0.05987812, 0.        , 0.03724874, 0.05006329, 0.04789369],
       [0.06183155, 0.03724874, 0.        , 0.04771786, 0.05983109],
       [0.06931093, 0.05006329, 0.04771786, 0.        , 0.0367826 ],
       [0.05466599, 0.04789369, 0.05983109, 0.0367826 , 0.        ]])

More training steps makes things worse in this case:

vectors = [d2vmod.infer_vector(testdoc, 10000) for _ in range(5)]
squareform(pdist(vectors, "cosine"))
array([[0.        , 0.27392197, 0.308742  , 0.51374501, 0.45744246],
       [0.27392197, 0.        , 0.14912033, 0.32902151, 0.1822687 ],
       [0.308742  , 0.14912033, 0.        , 0.2895444 , 0.27019636],
       [0.51374501, 0.32902151, 0.2895444 , 0.        , 0.38096254],
       [0.45744246, 0.1822687 , 0.27019636, 0.38096254, 0.        ]])

Note: This is more extreme than what I'm seeing with more domain-specific sample documents, where start to get more consistent after about 5000 steps.

I believe this is happening because the learning rate decays extremely rapidly:
https://github.com/RaRe-Technologies/gensim/blob/8b810918d59781116794a6679999afdc76b857ef/gensim/models/doc2vec.py#L565

alpha = 0.025
min_alpha = 0.001
steps = 100
for i in range(steps):
    print(alpha)
    alpha = ((alpha - min_alpha) / (steps - i)) + min_alpha
0.025
0.00124
0.0010024242424242424
0.0010000247371675943
...

Notice that alpha is very close to min_alpha after the first step and this is exaggerated even more when the number of steps is larger.

When I change Doc2Vec to have a linear decay in learning rate

alpha_delta = (alpha-min_alpha)/(steps-1)
for i in range(steps):
    # ...
    alpha -= alpha_delta

I get much better results. With 20 steps, we get pairwise cosine distances of

array([[0.        , 0.01617053, 0.02467067, 0.01828433, 0.01834735],
       [0.01617053, 0.        , 0.01879757, 0.00910884, 0.01358116],
       [0.02467067, 0.01879757, 0.        , 0.01521225, 0.01392789],
       [0.01828433, 0.00910884, 0.01521225, 0.        , 0.01121792],
       [0.01834735, 0.01358116, 0.01392789, 0.01121792, 0.        ]])

, with 100 we get

array([[0.        , 0.00282428, 0.00373375, 0.00331408, 0.00362875],
       [0.00282428, 0.        , 0.0036147 , 0.0028999 , 0.00210812],
       [0.00373375, 0.0036147 , 0.        , 0.0032986 , 0.00361321],
       [0.00331408, 0.0028999 , 0.0032986 , 0.        , 0.00318849],
       [0.00362875, 0.00210812, 0.00361321, 0.00318849, 0.        ]])

, and with 1000 steps:

array([[0.        , 0.00055459, 0.000633  , 0.00074271, 0.00036596],
       [0.00055459, 0.        , 0.00067211, 0.00075522, 0.00058975],
       [0.000633  , 0.00067211, 0.        , 0.00109709, 0.00049239],
       [0.00074271, 0.00075522, 0.00109709, 0.        , 0.00072527],
       [0.00036596, 0.00058975, 0.00049239, 0.00072527, 0.        ]])
@gojomo
Copy link
Collaborator

gojomo commented Jun 11, 2018

Wow, that's a humongous bug going back to my initial implementation of this 3+years ago!

It should have been linear from the start, and I'm surprised inference has worked as well as it has, with this error.

Thanks for finding this!

@umangv
Copy link
Contributor Author

umangv commented Jun 11, 2018

No problem! I stumbled on it by accident and I'm glad I caught it. My guess is that this problem is far more exaggerated for smaller documents.

@gojomo gojomo added the bug Issue described a bug label Jun 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue described a bug
Projects
None yet
Development

No branches or pull requests

2 participants