Releases: LanguageMachines/foliautils
Releases · LanguageMachines/foliautils
v0.23
v0.22
v0.21
- a lot off code changes. Many regarding hyphens
- added an extract_final_hyphen function, used by several programs
FoLiA-abby, FoLiA-page and FoLiA-txt - FoLiA-txt: filter out ZWNJ characters. Avoid spurious LineBreaks
- fix for #68
- FoLiA-idf was not quite working. Fixed
- FoliA-page: removed --sent option. updated man page
lots of other fixes too - added experimental FoLiA-merge program:
merging lemma/pos information into FoLiA files - more and better tests added
- updates in README.MD
v0.20
v0.19
[Ko van der Sloot]
- general C++ cleanup and refactoring
- Some fixes for building on Mac OSX
- FoLiA-txt:
- now we handle soft-hyphens
- modifications to solve #67
--remove-end-hyphens is the default now. We create<t-hbr>
nodes - modifications for proycon/foliapy#25
- Unicode awareness
- FoLiA-2text:
- added a --restore-formatting option, which outputs the text inside
<t-hspace>
and<t-hbr>
nodes
- added a --restore-formatting option, which outputs the text inside
- FoLiA-abby:
- handling of soft-hyphens
- fixes for
<br/>
and<t-hbr>
- preserve original spaces in
<t-hspace>
's text
- FoLiA-correct: small fix in program logic.
v0.18
[Ko van der Sloot]
- FoLiA-page: only add LineBreak annotation when needed
- added more tests to make check
- adapted and fixed tests
- fixed the ugly problem of temporally disabling text checking.
- start using the "system" foliadiff
- fix declarations
[Maarten van Gompel]
- FoLiA-page: added a --nomarkup parameter to revert to the old behaviour, and an extra --nostrings parameter to omit the strings #65
- added a note for the --sent option #65
- Added some comments for the ugly disable set_checktext patch, I don't like this but it seems needed (underlying libfolia issue?) #65
- Add linebreaks and t-str to the paragraph text (currently fails text validation)
- added Dockerfile and instructions
- codemeta.json: updated according to (proposed) CLARIAH requirements (CLARIAH/clariah-plus#38)
v0.17
- needs libfolia 2.9 or above
- replaced TravisCI by GitHub actions
- FoLiA-correct:
- fixex a problem with correcting FoLia with both p and s nodes
- added support for the FoLiA 'tag' feature
- clearer error messages
- fixed bugs in HEMP handling
- better handling of Ucto's ABBREVIATION* tokens
- fixed corrections when a word has 'space="no"'.
- some smaller fixes
- added more tests
- FoLiA-clean:
- improved, using new features from libfolia 2.9
- FoLiA-2text:
- replaced '--original' parameter by a '--correction-handling' parameter
- implemented a --honour-tags option, to interpret tag="token" tags
- some improvement in output-file naming
- FoLiA-abby:
- complete reworked the code
- added '-S' and '-C' as alternatives for '--setname' and '--classname'
- added a --keephyphens option
- added a --addbreaks option
- addes option --addmetrics to optionally add positional info to the
paragraphs - improved handling of '-' (Hyphen)
- add 'font_properties', 'font_id' and 'font_style' as a feature node
- improved handling of text with spaces at 'unexpected' locations
- all modules:
- Code refactoring and cleaning
- added and improved tests
- adapted man pages
v0.16
[Ko vd Sloot]
- requires libfolia 2.7 or above
- provenance data is better for a lot of modules
- added better checking on invalid NCnames in some modules.
- FoLiA-abby:
- a lot of refactoring and additions to handle font/style information
- FoLiA-pm:
- Notes are handled correctly now
- fixed error in xlink attributes
- FoLiA-page:
- more types of Page files are handled now
- fixed annotation declarations
- fixed offset calculation (due to change in FoLiA's opinion on those)
- page number is added as a
node and in the metadata - added a --trusttokens option. This means that Word items in the Page file
are added as Word's in the FoLiA, embedded in Sentences. - added a --norefs option to avoid adding references to the original texts
- FoLiA-correct:
- make sure that the default is to run on 1 thread
- added a --rebase-inputclass option
- FoLiA-alto:
- the -t option was not always handled correctly
[Maarten van Gompel]
- FoLiA-benchmark: guard against compiler optimisation #48
v0.15
v0.14
[Martin Reynaert]
- updated man pages
[Ko vd Sloot]
- added man pages
- revised usage() in many modules
- the default separator in FoLiA-stats is '_' now
- fix for: #37
- fix for: #41
- adapted to changes in libfolia
- many small code refactorings
- FoLiA-correct is improved a lot, allowing ngram corrections in FoLiA
- FoLiA-stats accepts a 'word_in_doc' mode now
- FoLiA-alto by default created nodes now. use --oldstring to get
- improved a lot in tests/
- many small fixes