Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserialization is inconsistent for empty documents #599

Closed
zifeishan opened this issue Nov 2, 2016 · 2 comments
Closed

Deserialization is inconsistent for empty documents #599

zifeishan opened this issue Nov 2, 2016 · 2 comments
Labels
bug Bugs and behaviour differing from documentation

Comments

@zifeishan
Copy link

zifeishan commented Nov 2, 2016

Issue body

import spacy.en
from spacy.tokens.doc import Doc
nlp = spacy.en.English()
doc = nlp('', tag=True, parse=True)
bytes = doc.to_bytes()
doc2 = Doc(nlp.vocab)
doc2.from_bytes(bytes)

Result:

>>> doc.is_parsed
True
>>> doc2.is_parsed
False
>>> [_ for _ in doc.sents]
[]
>>> [_ for _ in doc2.sents]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <listcomp>
  File "spacy/tokens/doc.pyx", line 395, in __get__ (spacy/tokens/doc.cpp:9506)
ValueError: sentence boundary detection requires the dependency parse, which requires data to be installed. If you haven't done so, run:
python -m spacy.en.download all
to install the data

Your Environment

  • Operating System: Linux
  • Python Version Used: 3.5
  • spaCy Version Used: latest pip release
  • Environment Information:
@zifeishan zifeishan changed the title Serialization is inconsistent for empty documents Deserialization is inconsistent for empty documents Nov 2, 2016
honnibal added a commit that referenced this issue Nov 2, 2016
honnibal added a commit that referenced this issue Nov 2, 2016
@honnibal honnibal added the bug Bugs and behaviour differing from documentation label Nov 2, 2016
@honnibal honnibal closed this as completed Nov 2, 2016
@honnibal
Copy link
Member

honnibal commented Nov 2, 2016

Thanks! Mixed feelings about my solution to this. I'm now considering empty docs to be parsed and tagged, because there's no information for a tagger or parser to add.

@lock
Copy link

lock bot commented May 9, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation
Projects
None yet
Development

No branches or pull requests

2 participants