Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LDA hyperparameter fix: eta dimensionality and optimization #1024

Merged
merged 18 commits into from
Nov 29, 2016

Conversation

olavurmortensen
Copy link
Contributor

@olavurmortensen olavurmortensen commented Nov 18, 2016

Working on a problem with the parameter eta and automatic learning of eta. For now, the unchanged version is in the ldamodelold.py file, so that I can compare both versions. There is a notebook in docs/notebooks/lda_tests.ipynb with the tests I have been doing.

The changes I've made improve convergence (increase in bound over iterations), when learning eta.

The problem

Basically, eta is supposed to be either

  • a scalar
  • a V vector

where V is the size of the vocabulary. In ldamodel, we instead have the options that eta is

  • a scalar
  • a K vector
  • a K x V matrix

where K is the number of topics. As we will see in a little bit, this causes some problems.

Results so far

The figure below shows the bound (perwordbound) as a function of the number of passes over data. It shows the bound both before and after the changes I made, and for eta='symmetric' and for eta='auto'.

The figure above shows that using automatically learned eta actually decreased the bound before this fix.

I also noticed that if we set iterations=1, and eta='auto', the algorithm diverges. This is no longer a problem with this commit.

Asymmetric priors

I added a check that raises a ValueError if eta is equal to 'asymmetric'. While a explicit asymmetric prior supplied by the user makes sense (e.g. if the user wants to boost certain words, or has estimated the prior from data), there is no reason for using an arbitrarily initialized asymmetric prior. Furthermore, with the current method of initializing asymmetric priors, and when eta is the same size as the vocabulary, I noticed some convergence problems.

As a side note on the topic of asymmetric priors, I'm not sure the current method is ideal. One could, for example, initialize from a gamma distribution the same way lambda and gamma are initialized instead.

@olavurmortensen
Copy link
Contributor Author

olavurmortensen commented Nov 18, 2016

There is one problem that remains. The symmetric priors are initialized as
init_prior = np.asarray([1.0 / self.num_topics for i in xrange(prior_shape)])

instead of
init_prior = np.asarray([1.0 / prior_shape for i in xrange(prior_shape)])

Because if 1.0 / prior_shape is used, the problem does not converge. This happens because when the prior is a small number (e.g. eta = 0.001), the problem diverges. This does not happen if e.g. alpha=0.001, however.

While this works fine, I'm not sure if this way of initializing the priors is sound in general.

EDIT: Since Matt Hoffman's code (which Gensim's code is based on) uses eta = 1 / num_topics and alpha = 1 / num_topics, I gather we might as well do the same here (see Matt's code here).

…s,)'. Removed tests of asymmetric eta, and where eta has shape '(num_topics, num_terms)'.
@olavurmortensen
Copy link
Contributor Author

PR ready for review.

@tmylk @piskvorky

@hazelybell
Copy link

I agree that this should be changed because when I wrote the original auto_eta code, I didn't realize that gensim was using the smoothed LDA model, and thus I misinterpreted the variables and my code doesn't make sense in the context of the smoothed LDA model.

@olavurmortensen
Copy link
Contributor Author

The failing unit tests are not from my code. There is more on that on issue #971.

Copy link
Contributor

@tmylk tmylk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please re-do the merge with develop to make sure keyedvectors get properly merged in.

@@ -0,0 +1,762 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please give a better name to the ipynb

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we are ready to merge, I'll remove this file.

@@ -7,6 +7,7 @@
from .coherencemodel import CoherenceModel
from .hdpmodel import HdpModel
from .ldamodel import LdaModel
from .ldamodelold import LdaModelOld
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we merge, there is no need to keep the old code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we are ready to merge, I'll remove this file.

@@ -84,7 +84,7 @@ def update_dir_prior(prior, N, logphat, rho):

dprior = -(gradf - b) / q

if all(rho * dprior + prior > 0):
if (rho * dprior + prior > 0).all():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old way has never worked for me. Whenever I install Gensim from source I have to change this line. Can we just keep it this way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the error that you get? which python version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tested it, and I don't experience this problem any more. Something I've done must have fixed it for me. So I just changed it back.

@@ -301,17 +303,19 @@ def __init__(self, corpus=None, num_topics=100, id2word=None,
self.minimum_phi_value = minimum_phi_value
self.per_word_topics = per_word_topics

self.random_state = get_random_state(random_state)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why move this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, there was a reason, but it's not relevant any more. I'll just move it back to its old spot.

assert (self.eta.shape == (self.num_topics, 1) or self.eta.shape == (self.num_topics, self.num_terms)), (
"Invalid eta shape. Got shape %s, but expected (%d, 1) or (%d, %d)" %
(str(self.eta.shape), self.num_topics, self.num_topics, self.num_terms))
assert self.eta.shape == (self.num_terms,), "Invalid alpha shape. Got shape %s, but expected (%d, )" % (str(self.eta.shape), self.num_terms)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the old format was ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, re-re-formatted :)

@@ -0,0 +1,1049 @@
#!/usr/bin/env python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for this file in the merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we are ready to merge, I'll remove this file.

model = self.class_(**kwargs)
self.assertEqual(model.eta.shape, expected_shape)
self.assertTrue(np.allclose(model.eta, [[0.630602], [0.369398]]))
self.assertTrue(all(model.eta == np.array([0.5] * num_terms)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the assymmetric eta exception test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "asymmetric" option should not be used for eta. It makes arbitrary words more likely apriori, which does not make sense. I have added an exception in the ldamodel.py file to make sure that eta is not equal to "asymmetric".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do you test that exception happens?
Please add assertRaises test case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is tested here. I added an assertRaise in unit tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is not a test, it is a check :) just kidding.


# all should raise an exception for being wrong shape
kwargs['eta'] = testeta.reshape(tuple(reversed(testeta.shape)))
self.assertRaises(AssertionError, self.class_, **kwargs)

kwargs['eta'] = [0.3, 0.3, 0.3]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use num_terms+1 instead of just 3 elements

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Although just num_terms, not num_terms+1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this change? line 213/196 is still the same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@olavurmortensen
Copy link
Contributor Author

olavurmortensen commented Nov 28, 2016

I re-introduced the K x V asymmetric prior. While a K vector eta is not correct, there is a special place for the K x V prior.

As said in the docstring for eta, using K x V asymmetric prior the user may impose specific priors for each topic (e.g. to coerce a specific topic, or to get a "garbage collection" topic).

@tmylk can you read the new docstring, just to check that it's ok?

EDIT: @tmylk I made some unit tests for K x V eta (same as were there before).

self.assertRaises(AssertionError, self.class_, **kwargs)

kwargs['eta'] = [0.3, 0.3, 0.3]
kwargs['eta'] = [0.3] * num_terms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is same as line 188 above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rmoved it. Don't know why I put that there.

@@ -210,15 +201,15 @@ def testEta(self):
kwargs['eta'] = testeta.reshape(tuple(reversed(testeta.shape)))
self.assertRaises(AssertionError, self.class_, **kwargs)

kwargs['eta'] = [0.3, 0.3, 0.3]
self.assertRaises(AssertionError, self.class_, **kwargs)

kwargs['eta'] = [0.3]
self.assertRaises(AssertionError, self.class_, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where did assertraises for a longer eta shape go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add an assertRaises where shape of eta is num_terms+1.

@tmylk
Copy link
Contributor

tmylk commented Nov 28, 2016

Let's clean it up and then ready to merge.
Please add a good explanation to CHANGELOG.md. I will publish your explanation of this breaking change together with release notes.

@olavurmortensen
Copy link
Contributor Author

@tmylk should the explanation be under "0.13.5, 2016-11-12" in the changelog? Or where should I put it?

@olavurmortensen
Copy link
Contributor Author

@tmylk you said:

Please re-do the merge with develop to make sure keyedvectors get properly merged in.

What do you mean? What happened?

@tmylk
Copy link
Contributor

tmylk commented Nov 28, 2016

The explanation should be on the top of CHANGELOG.md

@olavurmortensen
Copy link
Contributor Author

@tmylk CHANGELOG.md updated.

@tmylk tmylk merged commit 54871ba into piskvorky:develop Nov 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants