-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dependencies not deprojectivized in spaCy 1.7 #898
Comments
Literally just pushed a fix to this. Could you try redownloading? It should give you |
Wow, what a coincidence :) |
1.2.0 behaves as the old one (perhaps it is the old one?) while:
|
1.2.0 is the old one -- 1.2.1 is the fix. Sorry, missed an entry in our compatibility table (for future reference: https://github.com/explosion/spacy-models/blob/master/compatibility.json ) Try now? |
It took me some time to realise that I need to try en_depent_web_md without version suffix. The main issue — superfluous proper noun tags — seems gone now! Thanks for the quick reaction. There is, however, some bug — broken pronoun lemma.
Also, here is some unexpected dependency links but perhaps these are just within the expected margin of error. pain in lower back: “back” as an adverbial particle, lower as phrase head :(
|
The |
This seems quite a controversial decision to me… I understand at least some reasons (for instance, there's no obvious base form for 3rd person personal pronouns), but other than that one would expect that lemma is a form belonging to the language vocabulary (unlike stems). Also, this brings personal, possessive and other pronouns into the same lemma, which is not always a good thing. Out of curiosity: does this decision stem from OntoNotes or is it your idea? I've checked CLEAR guidelines and it's not part of it. Also, it would be nice if this was added to the annotation docs. EDIT: sorry, either I missed the part in the docs or you just added it :) Is there any other special lemma like this? |
Sorry for flooding with syntactic details, but chances are the following behaviour was not intended.
What got me thinking is both ‘relcl’ label itself (I'd expect ‘rcmod’ if anything) and the x||y syntax. Also, the new model seems to like appositions a lot more than the old one (some “NP, NP” constructs are labelled as appositions rather than coordinations, but I guess this distinction is semantic/pragmatic, so hard to expect a supervised parser to perform well in this task). |
The -PRON- lemma was my idea, and yes I agree that it's controversial. I see your argument about it not being within the language, but it seemed to me to be the best solution for pronouns. I should check again what the Universal Dependencies project does. Thanks for the note about the dependency labels. I think the de-projectiviser isn't running after the parser. That explains the I'm not sure what the situation is with the appositions. |
Thanks for the explanations! I understand the reason and for some practical reasons this artificial lemma is actually quite convenient (including my use case). Anyway I've got a comment loosely related to personal pronoun lemmas. My first language is Polish, a morphologically rich language. Polish adjective inflects for number, gender and case. In noun phrases adjective gender depends on the gender of the noun being the syntactic head of the noun (strictly speaking, adjectives and noun agree on number, gender and grammatical case). So, aside for picking nominative case, there is no proper way to decide which adjective form should be used as lemma. The tradition has it masculine, singular, nominative — both in old dictionaries and in modern tagsets. Similar situation happens for other Slavic languages (e.g. Slovene, Czech, Croatian). BTW in the tagset of the National Corpus of Polish, 3rd person pronouns are lemmatised to the male form, also (http://nkjp.pl/poliqarp/help/ense2.html#x3-20002, see ppron3). In the case of the MULTEXT-East tagset, the decisions made differ (http://nl.ijs.si/ME/V4/msd/html/index.html). There is also an interesting discussion on the Universal Dependencies project you mentioned: UniversalDependencies/docs#276 |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
I've noticed suspiciously large amount of evident parser errors after migrating from Spacy 1.6.0 and generic ‘en’ model to 1.7.2 + ‘en_depent_web_md’.
Environment: Python 3.4.3 / 3.5.1 on 64-bit Linux (Ubuntu).
Some example below (please note the abundance of proper noun tags).
1.6.0: pain in lower back
1.7.2: pain in lower back
1.6.0: I feel pain in lower back
1.7.2: I feel pain in lower back
1.6.0: sores on my dick
1.7.2: sores on my dick
The text was updated successfully, but these errors were encountered: