Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

Add symbolic beam search #233

Merged
merged 10 commits into from
Aug 15, 2018
Merged

Add symbolic beam search #233

merged 10 commits into from
Aug 15, 2018

Conversation

junrushao
Copy link
Member

@junrushao junrushao commented Jul 27, 2018

Description

Symbolic beam search is made possible after enabling control flow operators mx.sym.contrib.while_loop (apache/mxnet#11566) and mx.sym.contrib.cond (apache/mxnet#11760). In this PR, we create a class HybridBeamSearchSampler, which could be hybridized to perform beam search.

Checklist

Essentials

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Changes

  • Add class model.beam_search.HybridBeamSearchSampler
  • Allow method model.beam_search._expand_to_beam_size to accept Symbols
  • Add _extract_and_flatten_nested_structure and _reconstruct_flattened_structure to flatten and unflatten structure used in decoders
  • I slightly modified the unittest for BeamSearchSampler and HybridBeamSearchSampler to workaround failing testcases causes by topk.

TODO

  • Add unittests for model.beam_search.HybridBeamSearchSampler
  • Rename vocab_num to vocab_size
  • Review docstring again and again, make sure there is nothing wrong.

Comments

  • HybridBeamSearchSampler requires two extra arguments, batch_size and vocab_size, compared with BeamSearchSampler

@junrushao junrushao requested a review from szha as a code owner July 27, 2018 00:53
@junrushao junrushao force-pushed the symbolic-beam-search branch 6 times, most recently from b9b3c90 to 1e1fdd3 Compare July 27, 2018 01:12
@mli
Copy link
Member

mli commented Jul 27, 2018

Job PR-233/7 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-233/7/index.html

@leezu
Copy link
Contributor

leezu commented Jul 27, 2018

Great! Can you make sure the HybridizedBeamSearchSampler is tested in test_beam_search.py too?

@szha szha requested review from sxjscience and leezu July 27, 2018 17:26
@junrushao
Copy link
Member Author

@leezu Hey, thank you for the suggestions! So do you prefer I create a new method test_hybrid_beam_search or add test code into the existing test_beam_search?

Thanks!

@leezu
Copy link
Contributor

leezu commented Jul 27, 2018

Would it make sense to make BeamSearchSampler are gluon Block? In that case it may be possible to use the same test by parametrizing it with an argument that specifies the sampler implementation and wether to call .hybridize():

@pytest.mark.parametrize('hybridize', [True, False])
@pytest.mark.parametrize('sampler', [BeamSearchSampler, HybridizedBeamSearchSampler])
def test_beam_search(hybridize, sampler):
    [...]

Also, is there any advantage of BeamSearchSampler over HybridizedBeamSearchSampler when using the non-hybrid version?

@junrushao
Copy link
Member Author

junrushao commented Jul 27, 2018

@leezu HybridizedBeamSearchSampler inherits from HybridBlock, but I am not sure if I could change BeamSearchSampler into a gluon block because it is originally inherited from object. Could you help confirm it?

HybridizedBeamSearchSampler requires two extra parameters, batch_size and vocab_size, which are required by static shape inference, while BeamSearchSampler takes advantage of imperative execution so it is more flexible.

@szhengac
Copy link
Member

Good job. But for translation task, the batch_size can change across the iterations, so we cannot simply treat it as an extra parameter.

@junrushao
Copy link
Member Author

@szhengac It is mandatory for now because static shape inference is required. This issue could be alleviated once symbolic shape is realized.

@codecov
Copy link

codecov bot commented Jul 29, 2018

Codecov Report

Merging #233 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #233   +/-   ##
=======================================
  Coverage   74.69%   74.69%           
=======================================
  Files          83       83           
  Lines        7682     7682           
  Branches     1315     1315           
=======================================
  Hits         5738     5738           
  Misses       1675     1675           
  Partials      269      269

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0316eef...016acac. Read the comment docs.

@mli
Copy link
Member

mli commented Jul 29, 2018

Job PR-233/8 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-233/8/index.html

@junrushao
Copy link
Member Author

Question: do you guys prefer the name vocab_num or vocab_size?

@szha
Copy link
Member

szha commented Jul 30, 2018

vocab_size, which is a term already used elsewhere in gluonnlp.

@leezu
Copy link
Contributor

leezu commented Jul 31, 2018

@junrushao1994 Yes, you can change it to inherit from Block. Just change the def __call__(self, inputs, states): to def forward(self, inputs, states):

@junrushao
Copy link
Member Author

@szha Seems that there are already many vocab_nums in beam_search.py, should I change them to vocab_size by the way?

@junrushao
Copy link
Member Author

It seems that numeral instability and uncertainty would cause unittest to fail. For example, when there are close (or equal) values, topk seems unable to produce a result consistent to numpy. So I slightly modify the unittest to let it pass.

@szha
Copy link
Member

szha commented Aug 8, 2018

@junrushao1994 our test environment depends on a specific nightly version. Check under env/ and see if you need to update the date of the nightly build.

@junrushao
Copy link
Member Author

@zheng-da finds the bug, and just now we submit the fix here: apache/mxnet#12078

@junrushao
Copy link
Member Author

junrushao commented Aug 11, 2018

The most recent commit fails CI test for the following reason:

AttributeError: 'LSTM' object has no attribute 'h2h_weight'

Is that an incompatibility issue with MXNet nightly build, or cuDNN, or anything else? @szha

@szha
Copy link
Member

szha commented Aug 11, 2018

@junrushao1994 yeah, I'm working on it as part of #264

@mli
Copy link
Member

mli commented Aug 14, 2018

Job PR-233/27 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-233/27/index.html

@mli
Copy link
Member

mli commented Aug 14, 2018

Job PR-233/28 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-233/28/index.html

@junrushao
Copy link
Member Author

junrushao commented Aug 14, 2018

@szha This PR has passed CI test, so would you like to help review the code, especially docstring (my English is pretty bad) Thank you!

@@ -196,13 +198,13 @@ def hybrid_forward(self, F, inputs, states):
return log_probs, states

class RNNLayerDecoder(Block):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a hybrid block now

for beam_size, bos_id, eos_id, alpha, K in [(2, 1, 3, 0, 1.0), (4, 2, 3, 1.0, 5.0)]:
if sampler_cls is HybridBeamSearchSampler and decoder_fn is RNNLayerDecoder:
# Hybrid beam search does not work on non-hybridizable object
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to skip because RNNLayerDecoder can be a hybrid block.

@mli
Copy link
Member

mli commented Aug 15, 2018

Job PR-233/32 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-233/32/index.html

@szha szha merged commit 2c6fbb9 into dmlc:master Aug 15, 2018
@junrushao
Copy link
Member Author

Thank you guys all the help and suggestions!

paperplanet pushed a commit to paperplanet/gluon-nlp that referenced this pull request Jun 9, 2019
* Add symbolic beam search

* Update _BeamSearchStepUpdate

* [WIP] Fix a lot of stuff

Here is one thing I could not address.
The `samples` are `taken` at each time stamp,
it could not be expressed in `while_loop`.
I totally have no idea how to deal with this.

* [WIP] fix typo

* [WIP] Submit fixes for debugging cut subgraph for Da

* Reduce search length to prevent numeral instability propagates

* Make linter happy

* Rename all vocab_num => vocab_size

* Change RNNLayerDecoder to HybridBlock

* Symbol[0] => Symbol.squeeze(axis=0)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants