-
Notifications
You must be signed in to change notification settings - Fork 856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize nltk CoreNLP parser in simple way #2
Comments
See branch multicore |
Why not use python's multiprocessing or something? Also, we could potentially put in some small hadoop/spark connector (if people wanted to hit AWS?) @netj @raphaelhoffmann @alldefector |
With the current NN-SR model and the notebook docs, the java process seems to take 200MB - 900MB of memory during parsing (as opposed to 4GB with an old model). So spawning #core processes should be fine for a typical laptop. A few months ago we also had a simple HTTP service wrapper for the parser: |
http://www.nltk.org/_modules/nltk/parse/stanford.html Looks like the nltk wrapper is actually using some old model that probably has a throughput of 1 sent / sec, as opposed to the SR / NN models that are 100 sents / sec... The StanfordNeuralDependencyParser class ought to address that. |
Yeah I actually did set up a simple queue based multiprocessing version on The switch to SR / NN was what made the huge difference in our normal
|
Let's definitely make that change! On Sun, Feb 28, 2016 at 9:36 AM Alex Ratner notifications@github.com
|
Subset of #228 |
Emphasis on simple- this is not going to be an optimal preprocessing setup either way, we just want to make it a bit better through simple means that don't require any additional installs, configs, etc.
The text was updated successfully, but these errors were encountered: