Training transformer model goes from score 0.97 to ZERO #12301

mbrunecky · 2023-02-18T17:44:38Z

mbrunecky
Feb 18, 2023

I am training NER using transformer model.
On one of my data sets, during epoch 2, the score reaches 0.97 and then (after a huge loss) drops to ZERO, where it stays until the process dies with an out-of-memory error.

What I should I be looking for as the reason for this behavior?

02/18-02:52:32.282 ============================= Training pipeline =============================�[0m
02/18-02:52:32.282 [i] Pipeline: ['transformer', 'ner', 'doc_cleaner']
02/18-02:52:32.282 [i] Initial learn rate: 0.0
02/18-02:52:32.282 E    #       LOSS TRANS...  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE 
02/18-02:52:32.282 ---  ------  -------------  --------  ------  ------  ------  ------
02/18-02:53:26.942   0       0         741.03    842.20    0.83    0.44    6.68    0.03
02/18-03:00:53.389   0     800       35387.67  131378.27   92.45   91.63   93.28    0.93
02/18-03:08:21.388   0    1600         846.64  93264.55   92.85   92.78   92.91    0.93
02/18-03:15:56.981   0    2400        5107.06  68810.17   94.86   95.75   93.99    0.95
02/18-03:23:40.199   0    3200       23586.03  35748.45   95.69   96.39   95.01    0.96
02/18-03:31:42.270   0    4000        3324.74  10904.08   95.27   95.47   95.08    0.95
02/18-03:40:10.199   1    4800       69579.98   3293.41   95.71   95.29   96.13    0.96
02/18-03:49:08.304   1    5600       15203.48   1351.42   96.14   96.01   96.27    0.96
02/18-03:58:35.240   1    6400        5012.19   1022.37   96.19   96.33   96.06    0.96
02/18-04:08:44.572   1    7200        2621.33    943.09   95.85   95.30   96.40    0.96
02/18-04:19:21.697   1    8000        2262.92    829.70   96.75   97.13   96.37    0.97
02/18-04:31:10.735   1    8800       10229.21    982.74   95.90   97.48   94.37    0.96
02/18-04:43:10.557   2    9600       29553.29   1354.11   96.03   95.29   96.78    0.96
02/18-04:56:31.975   2   10400        3775.07    824.47   96.61   97.12   96.10    0.97
02/18-05:10:22.435   2   11200     2795971.49  12601.45    0.00    0.00    0.00    0.00
02/18-05:25:14.185   2   12000      513981.72  22502.53    0.00    0.00    0.00    0.00
02/18-05:40:56.915   2   12800       40347.06  18249.37    0.00    0.00    0.00    0.00
02/18-05:59:26.751   2   13600       34795.68  18328.94    0.00    0.00    0.00    0.00
02/18-06:18:05.600   3   14400       32507.22  19082.38    0.00    0.00    0.00    0.00
02/18-06:37:15.405   3   15200       27791.56  18447.91    0.00    0.00    0.00    0.00
02/18-06:57:16.382   3   16000       25837.16  18390.90    0.00    0.00    0.00    0.00
02/18-06:57:26.490 [+] Saved pipeline to output directory
02/18-06:59:28.779 Invoked train_run_004:: process finished, exit value=-1073741571 (0xc00000fd)

Configuration:

[paths]
train = "L:\\training\\CA\\PLACER\\FEB23\\DMOD\\train"
dev = "L:\\training\\CA\\PLACER\\FEB23\\DMOD\\tval"
vectors = null
init_tok2vec = null

[system]
gpu_allocator = "pytorch"
seed = 0

[nlp]
lang = "en"
pipeline = ["transformer","ner","doc_cleaner"]
batch_size = 80
disabled = []
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}

[nlp.before_creation]
@callbacks = "adjust_stop_words"
add_stop_words = []
rem_stop_words = ["amount","and","as","at","between","by","eight","eleven","each","except","fifteen","fifty","first","five","for","formerly","forty","four","hereby","herein","nine","of","six","sixty","ten","third","three","to","twelve","twenty","two"]
debug = true

[components]

[components.doc_cleaner]
factory = "doc_cleaner"
silent = true

[components.doc_cleaner.attrs]
tensor = null
_.trf_data = null

[components.ner]
factory = "ner"
incorrect_spans_key = null
moves = null
scorer = {"@scorers":"spacy.ner_scorer.v1"}
update_with_oracle_cut_size = 128

[components.ner.model]
@architectures = "spacy.TransitionBasedParser.v2"
state_type = "ner"
extra_state_tokens = false
hidden_width = 80
maxout_pieces = 2
use_upper = false
nO = null

[components.ner.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
pooling = {"@layers":"reduce_mean.v1"}
upstream = "*"

[components.transformer]
factory = "transformer"
max_batch_items = 2048
set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}

[components.transformer.model]
@architectures = "spacy-transformers.TransformerModel.v3"
name = "roberta-base"
mixed_precision = true

[components.transformer.model.get_spans]
@span_getters = "spacy-transformers.strided_spans.v1"
window = 128
stride = 80

[components.transformer.model.grad_scaler_config]

[components.transformer.model.tokenizer_config]
use_fast = true

[components.transformer.model.transformer_config]

[corpora]

[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
gold_preproc = true
limit = 0
augmenter = null

[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 0
gold_preproc = true
limit = 0
augmenter = null

[training]
accumulate_gradient = 3
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
patience = 8000
max_epochs = 0
max_steps = 32000
eval_frequency = 800
frozen_components = []
before_to_disk = null
annotating_components = []

[training.batcher]
@batchers = "spacy.batch_by_padded.v1"
discard_oversize = true
size = 1536
buffer = 256
get_length = null

[training.logger]
@loggers = "spacy.ConsoleLogger.v1"
progress_bar = false

[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001

[training.optimizer.learn_rate]
@schedules = "warmup_linear.v1"
warmup_steps = 250
total_steps = 32000
initial_rate = 0.00005

[training.score_weights]
ents_f = 0.5
ents_p = 0.2
ents_r = 0.3
ents_per_type = null

[pretraining]

[initialize]
vectors = null
init_tok2vec = null
vocab_data = null
lookups = null
before_init = null
after_init = null

[initialize.components]

[initialize.tokenizer]

svlandeg · 2023-02-20T14:58:51Z

svlandeg
Feb 20, 2023
Maintainer

Hi @mbrunecky! Thanks for the report, that's definitely unexpected & unideal :/

One thing I noticed looking at the training log, is that there seems to be quite a bit of fluctuation/variation in your data - e.g. the loss also variates a lot between iterations 1600 and 4800 for instance. Nevertheless, the training score increases nicely until the sudden jump at 11200.

Out of curiosity, what do the model predictions of the model-last look like? With the scores being all zero at the end, it probably isn't predicting a single entity anymore, or is it?

Some more poking we could do to try and understand what is happening, is if you could run this with a random subsample of your data (maybe a few times with different sets) to see whether this keeps occurring or not. This just to verify whether there might be a few "bad samples" messing up things. Alternatively, a lower learning rate might make this more stable.

Either way - we'd love to get to the bottom of this!

1 reply

svlandeg Feb 20, 2023
Maintainer

Also, I don't suppose you can share your data with us? Maybe over email?

svlandeg · 2023-03-08T08:55:12Z

svlandeg
Mar 8, 2023
Maintainer

I'm transferring this to the issue tracker because it does feel like a bug.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training transformer model goes from score 0.97 to ZERO #12301

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Training transformer model goes from score 0.97 to ZERO #12301

mbrunecky Feb 18, 2023

Replies: 2 comments · 1 reply

svlandeg Feb 20, 2023 Maintainer

svlandeg Feb 20, 2023 Maintainer

svlandeg Mar 8, 2023 Maintainer

mbrunecky
Feb 18, 2023

Replies: 2 comments 1 reply

svlandeg
Feb 20, 2023
Maintainer

svlandeg Feb 20, 2023
Maintainer

svlandeg
Mar 8, 2023
Maintainer