We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug The MoverScore calculation seems to fail when trying to make use of the DistilBert tokenizer.
To Reproduce
>>> import nlgmetricverse >>> scorer = nlgmetricverse.NLGMetricverse(metrics=['moverscore']) >>> scorer(predictions=['foo'], references=['bar']) # Expected score value
Exception Traceback (if available)
>>> import nlgmetricverse 2023-09-19 13:19:51.367801: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. >>> scorer = nlgmetricverse.NLGMetricverse(metrics=['moverscore']) /home/mmior/.local/share/virtualenvs/annotate-schema-yEyO5xw6/lib/python3.8/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( /home/mmior/.local/share/virtualenvs/annotate-schema-yEyO5xw6/lib/python3.8/site-packages/transformers/generation_utils.py:24: FutureWarning: Importing `GenerationMixin` from `src/transformers/generation_utils.py` is deprecated and will be removed in Transformers v5. Import as `from transformers import GenerationMixin` instead. warnings.warn( /home/mmior/.local/share/virtualenvs/annotate-schema-yEyO5xw6/lib/python3.8/site-packages/transformers/generation_tf_utils.py:24: FutureWarning: Importing `TFGenerationMixin` from `src/transformers/generation_tf_utils.py` is deprecated and will be removed in Transformers v5. Import as `from transformers import TFGenerationMixin` instead. warnings.warn( loading configuration file config.json from cache at /home/mmior/.cache/huggingface/hub/models--distilbert-base-uncased/snapshots/6cdc0aad91f5ae2e6712e91bc7b65d1cf5c05411/config.json Model config DistilBertConfig { "activation": "gelu", "architectures": [ "DistilBertForMaskedLM" ], "attention_dropout": 0.1, "dim": 768, "dropout": 0.1, "hidden_dim": 3072, "initializer_range": 0.02, "max_position_embeddings": 512, "model_type": "distilbert", "n_heads": 12, "n_layers": 6, "output_attentions": true, "output_hidden_states": true, "pad_token_id": 0, "qa_dropout": 0.1, "seq_classif_dropout": 0.2, "sinusoidal_pos_embds": false, "tie_weights_": true, "transformers_version": "4.33.2", "vocab_size": 30522 } loading file vocab.txt from cache at /home/mmior/.cache/huggingface/hub/models--distilbert-base-uncased/snapshots/6cdc0aad91f5ae2e6712e91bc7b65d1cf5c05411/vocab.txt loading file added_tokens.json from cache at None loading file special_tokens_map.json from cache at None loading file tokenizer_config.json from cache at /home/mmior/.cache/huggingface/hub/models--distilbert-base-uncased/snapshots/6cdc0aad91f5ae2e6712e91bc7b65d1cf5c05411/tokenizer_config.json loading configuration file config.json from cache at /home/mmior/.cache/huggingface/hub/models--distilbert-base-uncased/snapshots/6cdc0aad91f5ae2e6712e91bc7b65d1cf5c05411/config.json Model config DistilBertConfig { "_name_or_path": "distilbert-base-uncased", "activation": "gelu", "architectures": [ "DistilBertForMaskedLM" ], "attention_dropout": 0.1, "dim": 768, "dropout": 0.1, "hidden_dim": 3072, "initializer_range": 0.02, "max_position_embeddings": 512, "model_type": "distilbert", "n_heads": 12, "n_layers": 6, "pad_token_id": 0, "qa_dropout": 0.1, "seq_classif_dropout": 0.2, "sinusoidal_pos_embds": false, "tie_weights_": true, "transformers_version": "4.33.2", "vocab_size": 30522 } loading weights file model.safetensors from cache at /home/mmior/.cache/huggingface/hub/models--distilbert-base-uncased/snapshots/6cdc0aad91f5ae2e6712e91bc7b65d1cf5c05411/model.safetensors Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_layer_norm.bias', 'vocab_transform.weight', 'vocab_projector.bias', 'vocab_layer_norm.weight', 'vocab_transform.bias'] - This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). All the weights of DistilBertModel were initialized from the model checkpoint at distilbert-base-uncased. If your task is similar to the task the model of the checkpoint was trained on, you can already use DistilBertModel for predictions without further training. >>> scorer(predictions=['foo'], references=['bar']) multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/mmior/.pyenv/versions/3.8.16/lib/python3.8/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/home/mmior/.pyenv/versions/3.8.16/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar return list(map(*args)) File "/home/mmior/.local/share/virtualenvs/annotate-schema-yEyO5xw6/lib/python3.8/site-packages/moverscore_v2.py", line 30, in process a = ["[CLS]"]+truncate(tokenizer.tokenize(a))+["[SEP]"] File "/home/mmior/.local/share/virtualenvs/annotate-schema-yEyO5xw6/lib/python3.8/site-packages/moverscore_v2.py", line 25, in truncate if len(tokens) > tokenizer.max_len - 2: AttributeError: 'DistilBertTokenizer' object has no attribute 'max_len' """ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/mmior/.local/share/virtualenvs/annotate-schema-yEyO5xw6/lib/python3.8/site-packages/nlgmetricverse/core.py", line 86, in __call__ score = self._compute_single_score(inputs) File "/home/mmior/.local/share/virtualenvs/annotate-schema-yEyO5xw6/lib/python3.8/site-packages/nlgmetricverse/core.py", line 208, in _compute_single_score score = metric.compute(predictions=predictions, references=references, reduce_fn=reduce_fn, **kwargs) File "/home/mmior/.local/share/virtualenvs/annotate-schema-yEyO5xw6/lib/python3.8/site-packages/evaluate/module.py", line 444, in compute output = self._compute(**inputs, **compute_kwargs) File "/home/mmior/.local/share/virtualenvs/annotate-schema-yEyO5xw6/lib/python3.8/site-packages/nlgmetricverse/metrics/_core/base.py", line 362, in _compute result = self.evaluate(predictions=predictions, references=references, reduce_fn=reduce_fn, **eval_params) File "/home/mmior/.local/share/virtualenvs/annotate-schema-yEyO5xw6/lib/python3.8/site-packages/nlgmetricverse/metrics/_core/base.py", line 302, in evaluate return eval_fn(predictions=predictions, references=references, **kwargs) File "/home/mmior/.local/share/virtualenvs/annotate-schema-yEyO5xw6/lib/python3.8/site-packages/nlgmetricverse/metrics/moverscore/moverscore_planet.py", line 132, in _compute_single_pred_single_ref idf_dict_ref = moverscore_v2.get_idf_dict(references) # idf_dict_ref = defaultdict(lambda: 1.) File "/home/mmior/.local/share/virtualenvs/annotate-schema-yEyO5xw6/lib/python3.8/site-packages/moverscore_v2.py", line 42, in get_idf_dict idf_count.update(chain.from_iterable(p.map(process_partial, arr))) File "/home/mmior/.pyenv/versions/3.8.16/lib/python3.8/multiprocessing/pool.py", line 364, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/mmior/.pyenv/versions/3.8.16/lib/python3.8/multiprocessing/pool.py", line 771, in get raise self._value AttributeError: 'DistilBertTokenizer' object has no attribute 'max_len'
Environment Information:
The text was updated successfully, but these errors were encountered:
You could modify this in the script to be as follows:
if len(tokens) > tokenizer.model_max_length - 2:
And make sure that any other occurrences in the script are modified the same way too.
Sorry, something went wrong.
No branches or pull requests
Describe the bug
The MoverScore calculation seems to fail when trying to make use of the DistilBert tokenizer.
To Reproduce
Exception Traceback (if available)
Environment Information:
The text was updated successfully, but these errors were encountered: