Add layout + compute_layout support: TransformerNMT, BERT, ALBERT, ELECTRA, MobileBERT, RoBERTA, XLMR #1258

sxjscience · 2020-07-08T17:01:15Z

@MoisesHer @dmlc/gluon-nlp-team

This PR adds two additional flags to backbone models to enhance the computational speed and usability.

layout
The layout of the inputs + outputs, where
- NT: (batch_size, sequence_length, ...)
- TN: (sequence_length, batch_size, ...)
compute_layout
The layout of the inner computation. By default, we have the "auto" option, in which GluonNLP will determine the best layout based on heuristics.

The technical insights about why layouts may matter lies as follows (also documented here

gluon-nlp/src/gluonnlp/attention_cell.py

Lines 540 to 546 in cd48efd

    
           # 2. Calculate the attention weights 
        
           #   Score: (L_query, B, N, C_Q) X (L_mem, B, N, C_Q) --> (B, N, L_query, L_mem) 
        
           #   This layout structure can be implemented very efficiently because B, N are consecutive 
        
           #   to each other. To have a clear picture of what's happening, we may consider the 
        
           #   (i, j)th element of the output 
        
           #       out[i, j, :, :] = query[:, i, j, :] X key[:, i, j, :].T, which is just one GEMM call 
        
           #   We can thus implement the whole kernel via a single call of batched GEMM with stride.

):

When the layouts of memory and query are "TNC", they have the shape:

query: (L_query, B, N, C_Q)
memory: (L_mem, B, N, C_Q)

One step in the AttentionCell is to obtain the multi-head attention scores, which can be written as follows:

(L_query, B, N, C_Q) X (L_mem, B, N, C_Q) -> (B, N, L_query, L_mem)

This layout structure can be implemented very efficiently because B, N are consecutive to each other.

To have a clear picture of what's happening, we may consider the (i, j)th element of the output, i.e.,

out[i, j, :, :] = query[:, i, j, :] X key[:, i, j, :].T

This is just one GEMM call. We can thus implement the whole kernel via a single call of batched GEMM by correctly specifying the strides.

Also, in fairseq, the inner computation of the TransformerEncoder is using the TN layout:

https://github.com/pytorch/fairseq/blob/108bb2560b1ec01524ba723bc7c69186875afa0a/fairseq/models/transformer.py#L399-L407

After this PR, these models will have the layout flag:

codecov · 2020-07-13T02:33:18Z

Codecov Report

Merging #1258 into numpy will increase coverage by 1.35%.
The diff coverage is 89.17%.

@@            Coverage Diff             @@
##            numpy    #1258      +/-   ##
==========================================
+ Coverage   82.56%   83.92%   +1.35%     
==========================================
  Files          38       41       +3     
  Lines        5534     6157     +623     
==========================================
+ Hits         4569     5167     +598     
- Misses        965      990      +25

Impacted Files	Coverage Δ
setup.py	`0.00% <ø> (ø)`
src/gluonnlp/models/transformer_xl.py	`82.52% <66.66%> (-0.20%)`	⬇️
src/gluonnlp/models/xlmr.py	`88.23% <70.00%> (+1.35%)`	⬆️
src/gluonnlp/models/mobilebert.py	`87.72% <81.67%> (+6.37%)`	⬆️
src/gluonnlp/models/electra.py	`80.78% <84.68%> (+1.83%)`	⬆️
src/gluonnlp/attention_cell.py	`79.52% <86.66%> (-0.39%)`	⬇️
src/gluonnlp/models/roberta.py	`93.67% <87.50%> (+4.89%)`	⬆️
src/gluonnlp/models/bert.py	`94.86% <92.96%> (+9.93%)`	⬆️
src/gluonnlp/utils/testing.py	`94.11% <93.47%> (-3.03%)`	⬇️
src/gluonnlp/models/albert.py	`95.47% <93.80%> (-1.22%)`	⬇️
... and 17 more

sxjscience · 2020-07-13T17:42:03Z

Now I'm waiting for #1261. We added two flags:

layout
It means the layout of the input + output
compute_layout
It means the layout of the inner computation. By default, we will always use compute_layout=auto, which will automatically determine the best compute layout based on our heuristic.

sxjscience · 2020-07-13T17:59:48Z

src/gluonnlp/models/albert.py

+    # Also, we can not use string here due to https://github.com/rbgirshick/yacs/issues/26
+    cfg.VERSION = 1
+    cfg.freeze()
+    return cfg


@zheyuye I'm moving the cfgs to the file to make it consistent to Transformer + RoBERTA.

ok, I am going to revise other models

sxjscience · 2020-07-28T07:42:48Z

@dmlc/gluon-nlp-committers @MoisesHer @zheyuye Should be ready for review.

commit 232e0b6 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 01:05:17 2020 +0800 update commit 995e5d7 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 01:01:56 2020 +0800 fix commit 9623240 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 00:52:17 2020 +0800 fix commit d9c4140 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Wed Jul 29 23:07:10 2020 +0800 fix transformer commit e49fbe1 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Wed Jul 29 22:18:12 2020 +0800 update commit 1f75b26 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Wed Jul 29 22:04:08 2020 +0800 test bart commit 5bab516 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Wed Jul 29 21:34:47 2020 +0800 fix cfg commit 6c62a29 Merge: 3366cf3 033214e Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Wed Jul 29 21:33:10 2020 +0800 Merge remote-tracking branch 'upstream/numpy' into bart commit 033214e Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Wed Jul 29 00:36:57 2020 -0700 [Numpy] Fix SQuAD + Fix GLUE downloading (dmlc#1280) * Update run_squad.py * Update run_squad.py * Update prepare_glue.py commit 3c87457 Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Tue Jul 28 18:03:21 2020 -0700 Add layout + compute_layout support: TransformerNMT, BERT, ALBERT, ELECTRA, MobileBERT, RoBERTA, XLMR (dmlc#1258) * Add layout support * fix test * Update transformer.py * Update transformer.py * Update README.md * try to add set_layout * update test case * fix * update * update * update * Update bert.py * fix bug * update * Update test_models_bert.py * Update tokenizers.py * add compute layout * Update xlmr.py * Update test_models_bert.py * revise test cases * Update layers.py * move jieba to try import * fix * Update transformer.py * fix * Update bert.py * Update setup.py * Update test_models_bert.py * Update test_models_bert.py * fix * update * Revise * Update electra.py * Update electra.py * Update test_models_electra.py * fix * fix bug * Update test_models_albert.py * add more testcases * fix * Update albert.py * Update albert.py * fix bug * fix testcase * Update test_models_electra.py * Update bert.py * update * Update test_models_electra.py * Update mobilebert.py * Update mobilebert.py * update mobilebert * Update test_models_mobilebert.py * Update mobilebert.py * fix bug * Update roberta.py * fix roberta * update * update * fix import * fix bug * update * reduce test workloads * address comment * address comment commit 4d43f82 Author: Sheng Zha <szha@users.noreply.github.com> Date: Mon Jul 27 20:21:00 2020 -0700 add subversion/wget to docker, add readme (dmlc#1279) commit d76897b Author: phile <phile_999@126.com> Date: Tue Jul 28 10:10:13 2020 +0800 Add embedding related methods in numpy version (dmlc#1263) * A draft for embedding * fix embed_loader * add hyperbolic space and some updates * revise evaluation * fix * simple fixes * move l2norm to op.py * new features * fix * update * add tests, update * newline

commit 510d991 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 02:33:22 2020 +0800 test commit 1b5fa7b Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 01:48:01 2020 +0800 fix comment1 commit 6533601 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 01:27:44 2020 +0800 fix comment commit a8853f9 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 01:10:06 2020 +0800 Squashed commit of the following: commit 232e0b6 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 01:05:17 2020 +0800 update commit 995e5d7 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 01:01:56 2020 +0800 fix commit 9623240 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 00:52:17 2020 +0800 fix commit d9c4140 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Wed Jul 29 23:07:10 2020 +0800 fix transformer commit e49fbe1 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Wed Jul 29 22:18:12 2020 +0800 update commit 1f75b26 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Wed Jul 29 22:04:08 2020 +0800 test bart commit 5bab516 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Wed Jul 29 21:34:47 2020 +0800 fix cfg commit 6c62a29 Merge: 3366cf3 033214e Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Wed Jul 29 21:33:10 2020 +0800 Merge remote-tracking branch 'upstream/numpy' into bart commit 033214e Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Wed Jul 29 00:36:57 2020 -0700 [Numpy] Fix SQuAD + Fix GLUE downloading (dmlc#1280) * Update run_squad.py * Update run_squad.py * Update prepare_glue.py commit 3c87457 Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Tue Jul 28 18:03:21 2020 -0700 Add layout + compute_layout support: TransformerNMT, BERT, ALBERT, ELECTRA, MobileBERT, RoBERTA, XLMR (dmlc#1258) * Add layout support * fix test * Update transformer.py * Update transformer.py * Update README.md * try to add set_layout * update test case * fix * update * update * update * Update bert.py * fix bug * update * Update test_models_bert.py * Update tokenizers.py * add compute layout * Update xlmr.py * Update test_models_bert.py * revise test cases * Update layers.py * move jieba to try import * fix * Update transformer.py * fix * Update bert.py * Update setup.py * Update test_models_bert.py * Update test_models_bert.py * fix * update * Revise * Update electra.py * Update electra.py * Update test_models_electra.py * fix * fix bug * Update test_models_albert.py * add more testcases * fix * Update albert.py * Update albert.py * fix bug * fix testcase * Update test_models_electra.py * Update bert.py * update * Update test_models_electra.py * Update mobilebert.py * Update mobilebert.py * update mobilebert * Update test_models_mobilebert.py * Update mobilebert.py * fix bug * Update roberta.py * fix roberta * update * update * fix import * fix bug * update * reduce test workloads * address comment * address comment commit 4d43f82 Author: Sheng Zha <szha@users.noreply.github.com> Date: Mon Jul 27 20:21:00 2020 -0700 add subversion/wget to docker, add readme (dmlc#1279) commit d76897b Author: phile <phile_999@126.com> Date: Tue Jul 28 10:10:13 2020 +0800 Add embedding related methods in numpy version (dmlc#1263) * A draft for embedding * fix embed_loader * add hyperbolic space and some updates * revise evaluation * fix * simple fixes * move l2norm to op.py * new features * fix * update * add tests, update * newline

commit 9e1ffde Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 11:42:01 2020 +0800 todo commit 9a7c343 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 10:53:15 2020 +0800 revert gelu commit 0425346 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 10:49:52 2020 +0800 re-upload bart commit 516ae84 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 03:32:35 2020 +0800 use_qkv_bias for transformer commit 9d60cda Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 03:17:28 2020 +0800 classifier_activation commit 510d991 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 02:33:22 2020 +0800 test commit 1b5fa7b Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 01:48:01 2020 +0800 fix comment1 commit 6533601 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 01:27:44 2020 +0800 fix comment commit a8853f9 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 01:10:06 2020 +0800 Squashed commit of the following: commit 232e0b6 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 01:05:17 2020 +0800 update commit 995e5d7 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 01:01:56 2020 +0800 fix commit 9623240 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 00:52:17 2020 +0800 fix commit d9c4140 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Wed Jul 29 23:07:10 2020 +0800 fix transformer commit e49fbe1 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Wed Jul 29 22:18:12 2020 +0800 update commit 1f75b26 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Wed Jul 29 22:04:08 2020 +0800 test bart commit 5bab516 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Wed Jul 29 21:34:47 2020 +0800 fix cfg commit 6c62a29 Merge: 3366cf3 033214e Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Wed Jul 29 21:33:10 2020 +0800 Merge remote-tracking branch 'upstream/numpy' into bart commit 033214e Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Wed Jul 29 00:36:57 2020 -0700 [Numpy] Fix SQuAD + Fix GLUE downloading (dmlc#1280) * Update run_squad.py * Update run_squad.py * Update prepare_glue.py commit 3c87457 Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Tue Jul 28 18:03:21 2020 -0700 Add layout + compute_layout support: TransformerNMT, BERT, ALBERT, ELECTRA, MobileBERT, RoBERTA, XLMR (dmlc#1258) * Add layout support * fix test * Update transformer.py * Update transformer.py * Update README.md * try to add set_layout * update test case * fix * update * update * update * Update bert.py * fix bug * update * Update test_models_bert.py * Update tokenizers.py * add compute layout * Update xlmr.py * Update test_models_bert.py * revise test cases * Update layers.py * move jieba to try import * fix * Update transformer.py * fix * Update bert.py * Update setup.py * Update test_models_bert.py * Update test_models_bert.py * fix * update * Revise * Update electra.py * Update electra.py * Update test_models_electra.py * fix * fix bug * Update test_models_albert.py * add more testcases * fix * Update albert.py * Update albert.py * fix bug * fix testcase * Update test_models_electra.py * Update bert.py * update * Update test_models_electra.py * Update mobilebert.py * Update mobilebert.py * update mobilebert * Update test_models_mobilebert.py * Update mobilebert.py * fix bug * Update roberta.py * fix roberta * update * update * fix import * fix bug * update * reduce test workloads * address comment * address comment commit 4d43f82 Author: Sheng Zha <szha@users.noreply.github.com> Date: Mon Jul 27 20:21:00 2020 -0700 add subversion/wget to docker, add readme (dmlc#1279) commit d76897b Author: phile <phile_999@126.com> Date: Tue Jul 28 10:10:13 2020 +0800 Add embedding related methods in numpy version (dmlc#1263) * A draft for embedding * fix embed_loader * add hyperbolic space and some updates * revise evaluation * fix * simple fixes * move l2norm to op.py * new features * fix * update * add tests, update * newline

* init * fix convert roberta * rename TransformerNMTModel as TransformerModel * update bart * fix * fix * update init * add layernorm_embedding for transformer * convert script * encoder * fix * fix vocab * fix roberta * fix * fix electra * add conversion bash for roberta and xlmr * ELECTRA SETUP * convert bart decoder * fix * update * testing output * remove arange_like for embeddings * fix * update * use_pooler for bart * fix * upload params for bart * add test_models_bart * fix cfg * test bart * update * fix transformer * Squashed commit of the following: commit 510d991 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 02:33:22 2020 +0800 test commit 1b5fa7b Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 01:48:01 2020 +0800 fix comment1 commit 6533601 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 01:27:44 2020 +0800 fix comment commit a8853f9 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 01:10:06 2020 +0800 Squashed commit of the following: commit 232e0b6 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 01:05:17 2020 +0800 update commit 995e5d7 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 01:01:56 2020 +0800 fix commit 9623240 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 00:52:17 2020 +0800 fix commit d9c4140 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Wed Jul 29 23:07:10 2020 +0800 fix transformer commit e49fbe1 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Wed Jul 29 22:18:12 2020 +0800 update commit 1f75b26 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Wed Jul 29 22:04:08 2020 +0800 test bart commit 5bab516 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Wed Jul 29 21:34:47 2020 +0800 fix cfg commit 6c62a29 Merge: 3366cf3 033214e Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Wed Jul 29 21:33:10 2020 +0800 Merge remote-tracking branch 'upstream/numpy' into bart commit 033214e Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Wed Jul 29 00:36:57 2020 -0700 [Numpy] Fix SQuAD + Fix GLUE downloading (#1280) * Update run_squad.py * Update run_squad.py * Update prepare_glue.py commit 3c87457 Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Tue Jul 28 18:03:21 2020 -0700 Add layout + compute_layout support: TransformerNMT, BERT, ALBERT, ELECTRA, MobileBERT, RoBERTA, XLMR (#1258) * Add layout support * fix test * Update transformer.py * Update transformer.py * Update README.md * try to add set_layout * update test case * fix * update * update * update * Update bert.py * fix bug * update * Update test_models_bert.py * Update tokenizers.py * add compute layout * Update xlmr.py * Update test_models_bert.py * revise test cases * Update layers.py * move jieba to try import * fix * Update transformer.py * fix * Update bert.py * Update setup.py * Update test_models_bert.py * Update test_models_bert.py * fix * update * Revise * Update electra.py * Update electra.py * Update test_models_electra.py * fix * fix bug * Update test_models_albert.py * add more testcases * fix * Update albert.py * Update albert.py * fix bug * fix testcase * Update test_models_electra.py * Update bert.py * update * Update test_models_electra.py * Update mobilebert.py * Update mobilebert.py * update mobilebert * Update test_models_mobilebert.py * Update mobilebert.py * fix bug * Update roberta.py * fix roberta * update * update * fix import * fix bug * update * reduce test workloads * address comment * address comment commit 4d43f82 Author: Sheng Zha <szha@users.noreply.github.com> Date: Mon Jul 27 20:21:00 2020 -0700 add subversion/wget to docker, add readme (#1279) commit d76897b Author: phile <phile_999@126.com> Date: Tue Jul 28 10:10:13 2020 +0800 Add embedding related methods in numpy version (#1263) * A draft for embedding * fix embed_loader * add hyperbolic space and some updates * revise evaluation * fix * simple fixes * move l2norm to op.py * new features * fix * update * add tests, update * newline * Squashed commit of the following: commit 9e1ffde Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 11:42:01 2020 +0800 todo commit 9a7c343 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 10:53:15 2020 +0800 revert gelu commit 0425346 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 10:49:52 2020 +0800 re-upload bart commit 516ae84 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 03:32:35 2020 +0800 use_qkv_bias for transformer commit 9d60cda Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 03:17:28 2020 +0800 classifier_activation commit 510d991 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 02:33:22 2020 +0800 test commit 1b5fa7b Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 01:48:01 2020 +0800 fix comment1 commit 6533601 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 01:27:44 2020 +0800 fix comment commit a8853f9 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 01:10:06 2020 +0800 Squashed commit of the following: commit 232e0b6 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 01:05:17 2020 +0800 update commit 995e5d7 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 01:01:56 2020 +0800 fix commit 9623240 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Thu Jul 30 00:52:17 2020 +0800 fix commit d9c4140 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Wed Jul 29 23:07:10 2020 +0800 fix transformer commit e49fbe1 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Wed Jul 29 22:18:12 2020 +0800 update commit 1f75b26 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Wed Jul 29 22:04:08 2020 +0800 test bart commit 5bab516 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Wed Jul 29 21:34:47 2020 +0800 fix cfg commit 6c62a29 Merge: 3366cf3 033214e Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Wed Jul 29 21:33:10 2020 +0800 Merge remote-tracking branch 'upstream/numpy' into bart commit 033214e Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Wed Jul 29 00:36:57 2020 -0700 [Numpy] Fix SQuAD + Fix GLUE downloading (#1280) * Update run_squad.py * Update run_squad.py * Update prepare_glue.py commit 3c87457 Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Tue Jul 28 18:03:21 2020 -0700 Add layout + compute_layout support: TransformerNMT, BERT, ALBERT, ELECTRA, MobileBERT, RoBERTA, XLMR (#1258) * Add layout support * fix test * Update transformer.py * Update transformer.py * Update README.md * try to add set_layout * update test case * fix * update * update * update * Update bert.py * fix bug * update * Update test_models_bert.py * Update tokenizers.py * add compute layout * Update xlmr.py * Update test_models_bert.py * revise test cases * Update layers.py * move jieba to try import * fix * Update transformer.py * fix * Update bert.py * Update setup.py * Update test_models_bert.py * Update test_models_bert.py * fix * update * Revise * Update electra.py * Update electra.py * Update test_models_electra.py * fix * fix bug * Update test_models_albert.py * add more testcases * fix * Update albert.py * Update albert.py * fix bug * fix testcase * Update test_models_electra.py * Update bert.py * update * Update test_models_electra.py * Update mobilebert.py * Update mobilebert.py * update mobilebert * Update test_models_mobilebert.py * Update mobilebert.py * fix bug * Update roberta.py * fix roberta * update * update * fix import * fix bug * update * reduce test workloads * address comment * address comment commit 4d43f82 Author: Sheng Zha <szha@users.noreply.github.com> Date: Mon Jul 27 20:21:00 2020 -0700 add subversion/wget to docker, add readme (#1279) commit d76897b Author: phile <phile_999@126.com> Date: Tue Jul 28 10:10:13 2020 +0800 Add embedding related methods in numpy version (#1263) * A draft for embedding * fix embed_loader * add hyperbolic space and some updates * revise evaluation * fix * simple fixes * move l2norm to op.py * new features * fix * update * add tests, update * newline * fix comment * use xavier for embedding initializer

sxjscience added 13 commits July 8, 2020 10:00

Add layout support

6ccad39

fix test

2a5bae7

Update transformer.py

326630e

Update transformer.py

7a61c23

Update README.md

e94d6e9

Merge remote-tracking branch 'upstream/numpy' into backbone_layout

3caba81

try to add set_layout

1801807

update test case

55885bb

fix

e3dce2f

update

03d6fd7

update

d030d34

update

695c1dc

Update bert.py

1f252a5

sxjscience added 6 commits July 12, 2020 23:11

fix bug

334db84

update

84008a4

Update test_models_bert.py

c66c82c

Merge remote-tracking branch 'upstream/numpy' into backbone_layout

531f4cd

Update tokenizers.py

8d89c65

add compute layout

f25708a

sxjscience changed the title ~~[WIP] Add layout support of all backbones + Transformer~~ [WIP] Add layout support of all backbone models + Transformer for NMT Jul 13, 2020

sxjscience commented Jul 13, 2020

View reviewed changes

sxjscience added 3 commits July 13, 2020 11:27

Update xlmr.py

b75ff17

Update test_models_bert.py

5c5a278

revise test cases

6b5814a

sxjscience requested review from MoisesHer and zheyuye July 13, 2020 21:00

sxjscience added 2 commits July 13, 2020 22:19

Update layers.py

d4f0a1f

move jieba to try import

add993c

sxjscience changed the title ~~Add layout support of Transformer for NMT, BERT, ALBERT, ELECTRA~~ Add layout + compute_layout support. First round models: Transformer for NMT, BERT, ALBERT, ELECTRA Jul 27, 2020

sxjscience added 13 commits July 27, 2020 01:12

Update bert.py

cb38d75

update

9ddec18

Update test_models_electra.py

fc458df

Update mobilebert.py

68e50eb

Update mobilebert.py

a23a17d

update mobilebert

b8f0e0d

Update test_models_mobilebert.py

8e2a253

Update mobilebert.py

16156ed

fix bug

a2c36bc

Update roberta.py

f3ca027

fix roberta

612ef47

update

e330b11

update

bf9074c

sxjscience changed the title ~~Add layout + compute_layout support. First round models: Transformer for NMT, BERT, ALBERT, ELECTRA~~ Add layout + compute_layout support: TransformerNMT, BERT, ALBERT, ELECTRA, MobileBERT, RoBERTA, XLMR Jul 28, 2020

fix import

c79b53d

sxjscience added 3 commits July 28, 2020 01:23

fix bug

6f6fb46

update

d681f5c

reduce test workloads

ad22e0d

szha requested review from eric-haibin-lin and szhengac July 28, 2020 19:32

sxjscience added 2 commits July 28, 2020 13:38

address comment

8d9059a

address comment

e4f8f10

szhengac approved these changes Jul 29, 2020

View reviewed changes

sxjscience merged commit 3c87457 into dmlc:numpy Jul 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add layout + compute_layout support: TransformerNMT, BERT, ALBERT, ELECTRA, MobileBERT, RoBERTA, XLMR #1258

Add layout + compute_layout support: TransformerNMT, BERT, ALBERT, ELECTRA, MobileBERT, RoBERTA, XLMR #1258

sxjscience commented Jul 8, 2020 •

edited

Loading

codecov bot commented Jul 13, 2020 •

edited

Loading

sxjscience commented Jul 13, 2020

sxjscience Jul 13, 2020 •

edited

Loading

zheyuye Jul 17, 2020

sxjscience commented Jul 28, 2020

	# 2. Calculate the attention weights
	# Score: (L_query, B, N, C_Q) X (L_mem, B, N, C_Q) --> (B, N, L_query, L_mem)
	# This layout structure can be implemented very efficiently because B, N are consecutive
	# to each other. To have a clear picture of what's happening, we may consider the
	# (i, j)th element of the output
	# out[i, j, :, :] = query[:, i, j, :] X key[:, i, j, :].T, which is just one GEMM call
	# We can thus implement the whole kernel via a single call of batched GEMM with stride.

Add layout + compute_layout support: TransformerNMT, BERT, ALBERT, ELECTRA, MobileBERT, RoBERTA, XLMR #1258

Add layout + compute_layout support: TransformerNMT, BERT, ALBERT, ELECTRA, MobileBERT, RoBERTA, XLMR #1258

Conversation

sxjscience commented Jul 8, 2020 • edited Loading

codecov bot commented Jul 13, 2020 • edited Loading

Codecov Report

sxjscience commented Jul 13, 2020

sxjscience Jul 13, 2020 • edited Loading

Choose a reason for hiding this comment

zheyuye Jul 17, 2020

Choose a reason for hiding this comment

sxjscience commented Jul 28, 2020

sxjscience commented Jul 8, 2020 •

edited

Loading

codecov bot commented Jul 13, 2020 •

edited

Loading

sxjscience Jul 13, 2020 •

edited

Loading