Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Sent2Vec model. Fix #1376 #1619

Closed
wants to merge 143 commits into from
Closed
Show file tree
Hide file tree
Changes from 134 commits
Commits
Show all changes
143 commits
Select commit Hold shift + click to select a range
023c141
Fixes a part of #1192
prerna135 Jun 22, 2017
ad33484
Removing additional whitespaces
prerna135 Jun 23, 2017
4a5143a
Removing additional whitespaces from utils.py
prerna135 Jun 23, 2017
9c31c01
Removing trailing/leading whitespaces from
prerna135 Jun 23, 2017
91980d6
Making changes according to Google Code Style
prerna135 Jun 26, 2017
5eb9008
Removing trailing spaces after Travis build
prerna135 Jun 26, 2017
ecfd353
Removing duplication citation, toctree and non-local image uri warnings
prerna135 Jul 2, 2017
f62e113
Adding .inc files to flake8 ignore list
prerna135 Jul 2, 2017
c7ffec3
Merge branch 'develop' into develop
prerna135 Jul 2, 2017
f4b4ce6
Fixing more identation errors
prerna135 Jul 3, 2017
c37e2e7
Merge branch 'develop' of https://github.com/prerna135/gensim into de…
prerna135 Jul 3, 2017
02fb823
Removing the last few warnings
prerna135 Jul 5, 2017
ade83fc
[WIP] Address Detection Evaluation on various NER libraries
prerna135 Aug 19, 2017
8eb3666
[WIP] Address Detection Evaluation on various NER libraries
prerna135 Aug 19, 2017
318ffe7
Merge branch 'develop' of https://github.com/prerna135/gensim into de…
prerna135 Aug 19, 2017
06cefaa
[WIP] Native implementation of sent2vec in gensim
prerna135 Oct 10, 2017
e1078e5
Revamping sent2vec class
prerna135 Oct 17, 2017
66b8bca
Bug fixes and resolving travis build errors
prerna135 Oct 18, 2017
1a18310
Fixing pep8 issues
prerna135 Oct 18, 2017
9e213b4
Resolving bugs, adding elementary tests to jupyter notebook
prerna135 Oct 18, 2017
c066b53
Adding tests, comparison to c++ implementation
prerna135 Oct 21, 2017
e40e3f3
Adding tests from paper, comparison to doc2vec and fasttext, logger, …
prerna135 Nov 1, 2017
43f4baf
Adding docstrings
prerna135 Nov 8, 2017
bc73f7e
Notebook and code edits
prerna135 Nov 11, 2017
993f0d8
Adding missing imports
prerna135 Nov 14, 2017
1e7de52
Notebook updates
prerna135 Nov 19, 2017
c99d18f
Beginning cythonization
prerna135 Nov 23, 2017
af8b3a0
Notebook updates
prerna135 Nov 26, 2017
ef45548
Cythonizing hotspots
prerna135 Nov 28, 2017
1468b91
Changes to setup.py
prerna135 Nov 28, 2017
2c4a25f
Fixing bugs
prerna135 Nov 30, 2017
990ee8e
Bug fixes
prerna135 Dec 4, 2017
8c7279d
Removing dynamic allocation
prerna135 Dec 5, 2017
c60cfbd
Adding Doc2vec evaluation on sample toronto corpus
prerna135 Dec 5, 2017
50049bf
Removing travis errors
prerna135 Dec 5, 2017
1a9b2ac
Beginning multithreading
prerna135 Dec 7, 2017
7e8e520
Merge branch 'develop' into sent2vec
prerna135 Dec 7, 2017
acef372
Merge branch 'develop' into sent2vec
prerna135 Dec 7, 2017
757704c
Beginning multithreading
prerna135 Dec 7, 2017
62b7bdc
Updating sent2vec.py
prerna135 Dec 20, 2017
59c1cc6
Updating sent2vec_inner.pyx
prerna135 Dec 20, 2017
65d64ae
Updating setup.py
prerna135 Dec 20, 2017
d453ffd
Merge branch 'develop' into sent2vec
prerna135 Dec 22, 2017
94176cd
Updating setup.py
prerna135 Dec 22, 2017
d2754ca
Fixing bugs
prerna135 Dec 23, 2017
3d902d2
Add max_vocab_size as parameter, remove flake8 errors
prerna135 Dec 25, 2017
6203489
Notebook updates
prerna135 Jan 1, 2018
ff13007
Edits based on reviews
prerna135 Jan 1, 2018
0f4bc01
Adding sent2vec to init.py
prerna135 Jan 1, 2018
4e1b68a
Adding TODO, missing parentheses
prerna135 Jan 1, 2018
5139a3a
Delete sent2vec_inner.c
prerna135 Jan 1, 2018
ab365f4
Reverting word2vec_inner.c to original
prerna135 Jan 1, 2018
080e99e
Merge branch 'develop' into sent2vec
prerna135 Jan 1, 2018
4f9fafc
Merge branch 'develop' into sent2vec
menshikh-iv Jan 8, 2018
44f87fc
Changes to word2vec_inner.c
prerna135 Jan 9, 2018
b4ac60c
Adding sent2vec_inner.cpp
prerna135 Jan 9, 2018
4c409ee
Merge remote-tracking branch 'upstream/develop' into develop
menshikh-iv Jan 10, 2018
8549548
Merge branch 'develop' into sent2vec
menshikh-iv Jan 10, 2018
10783e5
add s2v to apiref
menshikh-iv Jan 10, 2018
a5fb365
initial cleanup
menshikh-iv Jan 10, 2018
35e7fc9
fix bug with sys.exit + parameter name + doc fix[1]
menshikh-iv Jan 10, 2018
439fe35
doc fix[2]
menshikh-iv Jan 10, 2018
9227f31
doc fix[3]
menshikh-iv Jan 10, 2018
58fcf81
Final edits
prerna135 Jan 10, 2018
4a03936
drop unnecessary change
prerna135 Jan 10, 2018
fc1c396
Adding online training, tests
prerna135 Jan 14, 2018
18e2862
Remove flake8 errors
prerna135 Jan 14, 2018
2631e5c
skip test for Appveyor
prerna135 Jan 14, 2018
e3ac088
try to debug unexpected fail in Travis
menshikh-iv Jan 15, 2018
4e28dfc
disable buggy test, continue investigation
menshikh-iv Jan 15, 2018
0a240ed
Return buggy test & enable logging here
menshikh-iv Jan 16, 2018
41ebb7a
add more verbosity
menshikh-iv Jan 16, 2018
9a2768d
moar asserts
menshikh-iv Jan 16, 2018
8f3c0ce
Remove slow version, try to fix test
prerna135 Jan 16, 2018
ac75523
fix flake8 errors
prerna135 Jan 16, 2018
3979afb
fix
menshikh-iv Jan 16, 2018
b662335
Add sentence vector test, remove debugging statements, notebook updates
prerna135 Jan 16, 2018
2314d0f
Revert tox changes
prerna135 Jan 16, 2018
b09d3db
remove flake8 errors
prerna135 Jan 16, 2018
c6cfa1b
review based edits
prerna135 Jan 20, 2018
0f921a5
Rename dropoutK to dropout_k
prerna135 Jan 20, 2018
d107762
Update sent2vec_inner.cpp
prerna135 Jan 20, 2018
1ea4bbc
Notebook updates
prerna135 Jan 30, 2018
a9db84f
cleanup tutorial notebook
menshikh-iv Jan 31, 2018
23b9d49
Remove segfault due to datatype mismatch
prerna135 Feb 6, 2018
2da7d58
Thread safe random number generation
prerna135 Feb 10, 2018
8d066f2
Use minstd_rand instead of mt19937
prerna135 Feb 10, 2018
16e6d15
Revert setup.py changes
prerna135 Feb 10, 2018
fd68a3a
Merge branch 'develop' into sent2vec
prerna135 Feb 22, 2018
00daf79
Edits
prerna135 Feb 22, 2018
282a7fe
Add sent2vec to MANIFEST.in, fix setup.py
prerna135 Feb 22, 2018
d328fff
Fix tests, docstring
prerna135 Feb 22, 2018
3e6ed48
fix compiler options
menshikh-iv Feb 25, 2018
6fec21a
try to reduce memory usage
menshikh-iv Feb 25, 2018
9bff735
Merge branch 'develop' into sent2vec
menshikh-iv Feb 25, 2018
c3ae833
enable C++11
menshikh-iv Feb 25, 2018
dd02f24
add conditional for nt/posix compiler
menshikh-iv Feb 25, 2018
743c657
try to fix memory error, windows compilation error
prerna135 Feb 25, 2018
47a2b90
fix test
prerna135 Feb 25, 2018
f2cac01
add '/EHsc' to extra_compile_args
prerna135 Feb 25, 2018
8f09726
Add extra compile args for windows, try failed test with one worker
prerna135 Feb 26, 2018
9d45856
Revert to rand(), use noblas
prerna135 Mar 12, 2018
12ded39
Restore blas calls, reduce default alpha
prerna135 Apr 3, 2018
f58d19c
rerun tutorial notebook, revert tox.ini
prerna135 Apr 17, 2018
09b1281
Add docstrings for sent2vec_inner.pyx
prerna135 Apr 22, 2018
7bc16c8
delete unnecessary binary file
prerna135 Apr 23, 2018
0159917
Changes after review
prerna135 Apr 25, 2018
898b23e
Fix docs
prerna135 Apr 28, 2018
7ac7866
Remove char ngrams from docstrings
prerna135 Jun 20, 2018
ed6d468
Fix documentation
prerna135 Jun 29, 2018
68ddb42
Restore word2vec_inner.c
prerna135 Jun 29, 2018
4e24a20
fix sent2vec.py, rerun notebook
prerna135 Jun 29, 2018
799360d
fix test,flake8 errors
prerna135 Jun 29, 2018
63e6332
Fix test_training error
prerna135 Jun 29, 2018
daf9ae3
fix word2vec files, python version of sent2vec
prerna135 Jun 29, 2018
5c4547e
resolve merge conflict
prerna135 Jun 29, 2018
d575f81
resolve merge conflict
prerna135 Jun 29, 2018
7136df9
Merge branch 'master' of https://github.com/RaRe-Technologies/gensim …
prerna135 Jun 29, 2018
b9577bd
Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim…
prerna135 Jun 29, 2018
361eb9f
Merge remote-tracking branch 'upstream/develop' into develop
menshikh-iv Aug 12, 2018
121936a
Merge branch 'develop' into sent2vec
menshikh-iv Aug 12, 2018
d8a3a5f
add dummy "input_streams" parameter
menshikh-iv Aug 12, 2018
a5fa735
update word2vec_inner.c
Sep 18, 2018
693614b
Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim…
Sep 18, 2018
4e2944c
Delete useless binary files
Sep 18, 2018
0e36c43
Add corpus_file argument
Sep 18, 2018
aba1255
Update word2vec_inner.c
Dec 13, 2018
4bac49e
Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim…
Dec 13, 2018
fa109a1
Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim…
Dec 28, 2018
8e77f06
Merge remote-tracking branch 'upstream/develop' into sent2vec
menshikh-iv Jan 8, 2019
f0a0bd5
cleanup[1]
menshikh-iv Jan 8, 2019
67f6499
cleanup[2]
menshikh-iv Jan 8, 2019
d4a7228
regenerate s2v & w2v
menshikh-iv Jan 8, 2019
991b8d6
more cleanup
menshikh-iv Jan 8, 2019
d5f37d1
Merge remote-tracking branch 'upstream/develop' into sent2vec
menshikh-iv Jan 15, 2019
e045ace
use correct hash function
menshikh-iv Jan 15, 2019
f6a821b
unicode all things
menshikh-iv Jan 15, 2019
0d5b7ef
make sure than build_vocab isn't mandatory
menshikh-iv Jan 15, 2019
b777ea5
cleanup
menshikh-iv Jan 15, 2019
e5a7531
upd
menshikh-iv Jan 24, 2019
9443375
Merge remote-tracking branch 'upstream/develop' into sent2vec
menshikh-iv Jan 24, 2019
ccb0678
use optimized hash function & type fixes
menshikh-iv Jan 24, 2019
2cfb5be
add encoding before hash
menshikh-iv Jan 24, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -31,5 +31,7 @@ include gensim/models/_utils_any2vec.c
include gensim/models/_utils_any2vec.pyx
include gensim/corpora/_mmreader.c
include gensim/corpora/_mmreader.pyx
include gensim/models/sent2vec_inner.cpp
include gensim/models/sent2vec_inner.pyx
include gensim/_matutils.c
include gensim/_matutils.pyx
449 changes: 449 additions & 0 deletions docs/notebooks/sent2vec_tutorial.ipynb

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions docs/src/apiref.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@ Modules:
models/word2vec
models/keyedvectors
models/doc2vec
models/sent2vec
models/sent2vec_inner
models/fasttext
models/phrases
models/poincare
Expand Down
8 changes: 8 additions & 0 deletions docs/src/models/sent2vec.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
:mod:`models.sent2vec` -- Sent2Vec model
========================================

.. automodule:: gensim.models.sent2vec
:synopsis: Sent2Vec model
:members:
:inherited-members:
:show-inheritance:
8 changes: 8 additions & 0 deletions docs/src/models/sent2vec_inner.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
:mod:`models.sent2vec_inner` -- Sent2Vec cython model
=====================================================

.. automodule:: gensim.models.sent2vec_inner
:synopsis: Sent2Vec Cython model
:members:
:inherited-members:
:show-inheritance:
1 change: 1 addition & 0 deletions gensim/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
from .ldaseqmodel import LdaSeqModel # noqa:F401
from .fasttext import FastText # noqa:F401
from .translation_matrix import TranslationMatrix, BackMappingTranslationMatrix # noqa:F401
from .sent2vec import Sent2Vec # noqa:F401

from . import wrappers # noqa:F401
from . import deprecated # noqa:F401
Expand Down
Loading