Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PasswordModel tokenizer error #214

Open
marcorosa opened this issue Nov 2, 2021 · 2 comments
Open

PasswordModel tokenizer error #214

marcorosa opened this issue Nov 2, 2021 · 2 comments
Labels
bug Something isn't working WIP work in progress

Comments

@marcorosa
Copy link
Member

Sometimes, the scan fails due to a tokeniser error raised by the PasswordModel

For example (scanning repo https://github.com/wuest-amiconsult/BTP-Day2-Bookshop-Exercise)

Exception in thread credentialdigger@https://github.com/wuest-amiconsult/BTP-Day2-Bookshop-Exercise:                                                              
Traceback (most recent call last):                                                                                                                                
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py", line 973, in _bootstrap_inner                  
    self.run()                                                                                                                                                    
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py", line 910, in run                               
    self._target(*self._args, **self._kwargs)                                                                                                                     
  File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/credentialdigger-4.5.0-py3.9.egg/credentialdigger/client.py", line 793, in scan    
    return self._scan(                                                                                                                                            
  File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/credentialdigger-4.5.0-py3.9.egg/credentialdigger/client.py", line 1142, in _scan  
    self._analyze_discoveries(mm, password_discoveries, debug)                                                                                                    
  File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/credentialdigger-4.5.0-py3.9.egg/credentialdigger/client.py", line 1225, in _analyze_discoveries
    model_manager.launch_model_batch(discoveries)
  File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/credentialdigger-4.5.0-py3.9.egg/credentialdigger/models/model_manager.py", line 66, in launch_model_batch
    return self.model.analyze_batch(discoveries)
  File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/credentialdigger-4.5.0-py3.9.egg/credentialdigger/models/password_model.py", line 50, in analyze_batch
    data = self._pre_process([d['snippet'] for d in new_discoveries])
  File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/credentialdigger-4.5.0-py3.9.egg/credentialdigger/models/password_model.py", line 105, in _pre_process
    encodings = self.tokenizer(snippet,
  File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2404, in __call__
    return self.batch_encode_plus(
  File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2589, in batch_encode_plus
    return self._batch_encode_plus(
  File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 720, in _batch_encode_plus
    batch_outputs = self._batch_prepare_for_model(
  File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 792, in _batch_prepare_for_model
    batch_outputs = self.pad(
  File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2714, in pad
    raise ValueError(
ValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided []
@marcorosa marcorosa added the bug Something isn't working label Nov 2, 2021
@marcorosa marcorosa added the WIP work in progress label Mar 28, 2022
marcorosa added a commit that referenced this issue Mar 30, 2022
password_model batch mode classify only new discoveries
@marcorosa
Copy link
Member Author

Fix released in #228

@marcorosa
Copy link
Member Author

This error raised again, so it was not properly fixed

@marcorosa marcorosa reopened this Apr 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working WIP work in progress
Projects
None yet
Development

No branches or pull requests

1 participant