Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix encode_string function #927

Closed
2 tasks
Tracked by #967 ...
felixdittrich92 opened this issue May 23, 2022 · 1 comment · Fixed by #929
Closed
2 tasks
Tracked by #967 ...

Fix encode_string function #927

felixdittrich92 opened this issue May 23, 2022 · 1 comment · Fixed by #929
Labels
good first issue Good for newcomers module: datasets Related to doctr.datasets type: bug Something isn't working

Comments

@felixdittrich92
Copy link
Contributor

Bug description

Currently there is no check if the single characters are also available in the given vocabulary.
We need a check for this :)

TODO's:

  • check that in the function and throw a meaningful exception
  • improve the corresponding test

discussion:
#926

Code snippet to reproduce the bug

from doctr.datasets.utils import encode_string
from doctr.datasets import VOCABS

x = encode_string(input_string='abcDÄÜ', vocab=VOCABS['english'])   # Ä and Ü does not exist in vocab
# raises ValueError: substring not found

Error traceback

Traceback (most recent call last):
  File "/home/felix/Desktop/doctr/test.py", line 7, in <module>
    x = encode_string(input_string='abcDÄÜ', vocab=VOCABS['english'])   # Ä and Ü does not exist in vocab
  File "/home/felix/Desktop/doctr/doctr/datasets/utils.py", line 75, in encode_string
    return list(map(vocab.index, input_string))  # type: ignore[arg-type]
ValueError: substring not found

Environment

not need :)

Deep Learning backend

same

@felixdittrich92 felixdittrich92 added type: bug Something isn't working good first issue Good for newcomers module: datasets Related to doctr.datasets labels May 23, 2022
@felixdittrich92 felixdittrich92 added this to the 0.6.0 milestone May 23, 2022
@frgfm
Copy link
Collaborator

frgfm commented May 25, 2022

Thanks for the issue 🙏
I'd argue it's a documentation/error message clarity issue rather than a bug (meaning that we don't want the code to stop failing in this snippet)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers module: datasets Related to doctr.datasets type: bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants