Improve/prune docs/tutorial of TranslationMatrix functionality #2977
Labels
bug
Issue described a bug
documentation
Current issue related to documentation
testing
Issue related with testing (code, documentation, etc)
The concerning test failure at #2944 now seems to me to be a false alarm. With more testing across many seeds, it appears the extremely flimsy
BackMappingTranslationTest.test_infer_vector()
was only passing in the base case (float64
randoms downcast tofloat32
s) due to a lucky seeding, and only failing in the changed case due to unlucky seeding of the slightly-different stream of (float32
from the start) random numbers.I've disabled the flimsy test, and it's questionable whether the
BackMappingTranslationMatrix
should even exist. It's perhaps 10 lines of using (not specializing-via-subclass) the actualTranslationMatrix
class, and over-specialized onDoc2Vec
models – whereas theTranslationMatrix
functionality could and should be general to any vector-set, requiring just a few lines to apply to word-vectors, doc-vectors, or others. (And, calling the translation/projectioninfer_vector
is unnecessarily prone to confusion with the different 'inference' that's native toDoc2Vec
.)I still think the
TranslationMatrix
itself is an under-appreciated bit of functionality, and I even strongly suspect – subject to experimentation – it could be part of a recommended solution for evolving a model to include more words that's far more robust/theoretically-defensible/performant than thebuild_vocab(..., update=True)
& then incrementally.train()
approach.But, it'll need at the very least better docs/tutorial examples. The existing
docs/notebook/tranlsation_matrix.ipynb
is muddled & hard to run. (The test data it's using links to an all-in-Chinese Baidu download page that seems to require a login before raw.txt
download.) It demos theBackmappingTranslationMatrix
class in a later 'experimental' area I have trouble following even though it reuses some of the IMDB-datasetDoc2Vec
tutorial I wrote.I only have time to disable the
BackMappingTranslationTest.test_infer_vector
test right now, and this is pretty fringe functionality, so there's no urgency to clean it up - but this issue it to keep it under consideration, when the right person comes along.The text was updated successfully, but these errors were encountered: