Parallelize nltk CoreNLP parser in simple way #2

ajratner · 2016-02-26T06:05:41Z

Emphasis on simple- this is not going to be an optimal preprocessing setup either way, we just want to make it a bit better through simple means that don't require any additional installs, configs, etc.

ajratner · 2016-02-28T10:12:56Z

See branch multicore

chrismre · 2016-02-28T15:37:38Z

Why not use python's multiprocessing or something? Also, we could potentially put in some small hadoop/spark connector (if people wanted to hit AWS?) @netj @raphaelhoffmann @alldefector

alldefector · 2016-02-28T17:22:25Z

With the current NN-SR model and the notebook docs, the java process seems to take 200MB - 900MB of memory during parsing (as opposed to 4GB with an old model). So spawning #core processes should be fine for a typical laptop.

A few months ago we also had a simple HTTP service wrapper for the parser:
https://github.com/HazyResearch/bazaar/blob/master/parser/src/main/scala/com/clearcut/nlp/Server.scala

alldefector · 2016-02-28T17:31:36Z

http://www.nltk.org/_modules/nltk/parse/stanford.html

Looks like the nltk wrapper is actually using some old model that probably has a throughput of 1 sent / sec, as opposed to the SR / NN models that are 100 sents / sec...

The StanfordNeuralDependencyParser class ought to address that.
http://nlp.stanford.edu/software/nndep.shtml

ajratner · 2016-02-28T17:36:44Z

Yeah I actually did set up a simple queue based multiprocessing version on
the 'multicore' branch (see comment on the issue), but think there's still
a bug; either way someone could work from that

The switch to SR / NN was what made the huge difference in our normal
pipeline when we did that, so this is probably the easiest gain to get...
On Sun, Feb 28, 2016 at 9:31 AM alldefector notifications@github.com
wrote:

http://www.nltk.org/_modules/nltk/parse/stanford.html

Looks like the nltk wrapper is actually using some old model that probably
has a throughput of 1 sent / sec, as opposed to the SR / NN models that are
100 sents / sec...

—
Reply to this email directly or view it on GitHub
#2 (comment).

chrismre · 2016-02-28T20:15:09Z

Let's definitely make that change!

On Sun, Feb 28, 2016 at 9:36 AM Alex Ratner notifications@github.com
wrote:

Yeah I actually did set up a simple queue based multiprocessing version on
the 'multicore' branch (see comment on the issue), but think there's still
a bug; either way someone could work from that

The switch to SR / NN was what made the huge difference in our normal
pipeline when we did that, so this is probably the easiest gain to get...
On Sun, Feb 28, 2016 at 9:31 AM alldefector notifications@github.com
wrote:

http://www.nltk.org/_modules/nltk/parse/stanford.html

Looks like the nltk wrapper is actually using some old model that
probably
has a throughput of 1 sent / sec, as opposed to the SR / NN models that
are
100 sents / sec...

—
Reply to this email directly or view it on GitHub
<#2 (comment)
.

—
Reply to this email directly or view it on GitHub
#2 (comment).

ajratner · 2016-07-22T01:27:34Z

Subset of #228

ajratner added the feature request label Feb 26, 2016

ajratner added this to the DeepDive Lite 0.1 milestone Feb 26, 2016

henryre modified the milestones: DeepDive Lite 0.3, DeepDive Lite 0.1 Mar 29, 2016

ajratner removed this from the DeepDive Lite 0.3 milestone Jun 6, 2016

ajratner closed this as completed Jul 22, 2016

jondeans mentioned this issue Jan 9, 2019

Add explanation on what an UDF is #995

Closed

lambdaofgod mentioned this issue Aug 27, 2019

Error when using LabelModel with GPU #1430

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize nltk CoreNLP parser in simple way #2

Parallelize nltk CoreNLP parser in simple way #2

ajratner commented Feb 26, 2016

ajratner commented Feb 28, 2016

chrismre commented Feb 28, 2016

alldefector commented Feb 28, 2016

alldefector commented Feb 28, 2016

ajratner commented Feb 28, 2016

chrismre commented Feb 28, 2016

ajratner commented Jul 22, 2016

Parallelize nltk CoreNLP parser in simple way #2

Parallelize nltk CoreNLP parser in simple way #2

Comments

ajratner commented Feb 26, 2016

ajratner commented Feb 28, 2016

chrismre commented Feb 28, 2016

alldefector commented Feb 28, 2016

alldefector commented Feb 28, 2016

ajratner commented Feb 28, 2016

chrismre commented Feb 28, 2016

ajratner commented Jul 22, 2016