Releases: AylaRT/ACTER
Releases · AylaRT/ACTER
v1.5
Now includes sequential and tokenised annotations,
Not many changes to actual annotations, but major update to how the annotations are presented etc.:
- Removed a few very long Named Entity annotations (from wind-en and from htfl-en; counts updated) over which there was doubt whether it was a real NE.
- Updated normalisation:
- Replaced "İ" with "I" in the annotations to avoid problems lowercasing (concerns mainly wind_en_01)
- Removed rare but problematic characters: ["", "", "", "", "�"] (not handled well by some transformers)
- Major update of README.md
- Different structure of all data:
- include sequential annotations
- include tokenised versions of annotations
v1.4 normalised
Identical to version 1.3, except with minor normalisation for both the text files and annotations:
unicodedata.normalize("NFC", text)
normalising all dashes to "-", all single quotes to "'" and all double quotes to '"'
v1.3 Github release
First release on Github, after completion of TermEval shared task.