-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update WMD documentation #3067
Comments
@gojomo do you remember the motivation for forcing normalized vectors in WMD? I vaguely remember some discussion. |
There's discussion in #1094 - but there's no clear case for either way there. (Non-normalized seems closer to the original implementation; some secondhand testimony suggests norming might give better results.) It's hard to think of why you'd want to use euclidean distance on non-normalized vectors, as their magnitudes could then contribute a lot of "difference" that isn't correlated with the more typical cosine-distance-comparison. So I suspect default to unit-normed vectors is better-grounded & better-performing, but don't have an empirical case either way. The discussion as part of the |
Okay, thanks @gojomo. So basically we don't have a strong intuition / evidence one way or the other. @mattkoehne since you're the user, what makes more sense to you, or gives better results? I'm thinking we could add a |
Well, if there's any reason to think different users will have good reasons to prefer different norming-or-not, an optional parameter is very easy to add & defer-to, just in case anyone needs it. (Or even just to support: future experiments either way when someone has proper time to research.) Though, there's still the question of what the default should be. I've seen slightly more testimony for using normed vectors – including via the tutorial which includes a whole section claiming it's a good step & advocating (destructive via Can we trust that the author of those docs (@olavurmortensen? someone else?) knew what they were talking about, to prefer the norm default (even as we delete the steps/explanations that no longer apply in 4.0, and provide an option to not normalize)? |
A user reported a documentation issue on the mailing list: https://groups.google.com/g/gensim/c/8nobtm9tu-g.
The report shows two problems:
wmdistance
between 3.8 and 4.0 that is not properly captured in the Migration notes.The WMD tutorial contains instructions that are now outdated in 4.0:
…And then our own tutorial logs a
WARNING : destructive init_sims(replace=True) deprecated & no longer required for space-efficiency
, which looks silly.The text was updated successfully, but these errors were encountered: