coref_draft

Description

Implementation of Stanford multi-seive coreference resolution approach for Dutch. This is a draft version of the code. An official first release will be made available on github/cltl upon completion and basic testing of the first version of the system.

Current implementation

The current implementation works on naf input files parsed by Alpino (i.e. it works for Dutch).

Future plans:

separate Alpino specific functions from general naf-extraction functions (extend to other languages)
create library for English

Usage

From command line:

$ python multisieve_coreference < inputfile.naf

From python:

from multisieve_coreference import process_coreference
process_coreference(naf_object)

Calling process_coreference will change the naf_object in-place by adding coref nodes (if any).

Gaps in mention spans (mostly left-out punctuation marks) are not filled by default. To make sure mentions only refer to consecutive spans, pass -f or --fill-gaps on the command line or call process_coreference(naf_object, fill_gaps=True).

!! NB !! Singleton clusters are left out by default. To Include singleton clusters pass -s or --include_singletons on the command line or call process_coreference(naf_object, include_singletons=True).

Issues

Design ideal

Instead of passing around a bare dictionary of mentions, an iterable MentionCollection object would be a better idea because it would be able to take care of mention ordering and filtering.

It would be great if all changing information (mostly related to which mentions should or should not be in the same coreference class) is kept in one object, instead of spread out amongst the mention objects themselves. (This is currently the case with Cmention.coreference_prohibited.)

It would be even nicer if every sieve would inherit from some Sieve abstraction and these sieves could be easily plugged and played by moving them around in some input sequence.

Contact

Antske Fokkens
antske.fokkens@vu.nl
antske@gmail.com
http://antskefokkens.info
Vrije University of Amsterdam

Contributors

Martin van Harmelen
m.p.van.harmelen@vu.nl
martin@vanharmelen.com
Vrije Universiteit Amsterdam

License

Sofware distributed under Apache License v2.0, see LICENSE file for details.

References

Heeyoung Lee, Angel Chang, Yves Peirsman, Nathanael Chambers, Mihai Surdeanu, and Dan Jurafsky. 2013. Deterministic coreference resolution based on entity-centric, precision-ranked rules. Comput. Linguist. 39, 4 (December 2013), 885-916. DOI=http://dx.doi.org/10.1162/COLI_a_00152

Name		Name	Last commit message	Last commit date
Latest commit History 211 Commits
multisieve_coreference		multisieve_coreference
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

coref_draft

Description

Current implementation

Usage

Issues

Design ideal

Contact

Contributors

License

References

About

Releases

Packages

Contributors 3

Languages

License

antske/coref_draft

Folders and files

Latest commit

History

Repository files navigation

coref_draft

Description

Current implementation

Usage

Issues

Design ideal

Contact

Contributors

License

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages