Changelog for jusText

3.0.1 (2024-05-09)

BUG FIX: Fix issue with new version of lxml #48.

3.0.0 (2021-10-21)

INCOMPATIBLE CHANGE: Dropped support for Python 3.4 and below.
BUG FIX: Don't join words separated only by <br> tag.
BUG FIX: List available stop-lists alphabetically.

2.2.0 (2016-03-06)

INCOMPATIBLE CHANGE: Stop words are case insensitive.
INCOMPATIBLE CHANGE: Dropped support for Python 3.2
BUG FIX: Preserve new lines from original text in paragraphs.

2.1.1 (2014-05-27)

BUG FIX: Function decode_html now respects parameter errors when falling to default_encoding #9.

2.1.0 (2014-01-25)

FEATURE: Added XPath selector to the paragrahs. XPath selector is also available in detailed output as xpath attribute of <p> tag #5.

2.0.0 (2013-08-26)

FEATURE: Added pluggable DOM preprocessor.
FEATURE: Added support for Python 3.2+.
INCOMPATIBLE CHANGE: Paragraphs are instances of justext.paragraph.Paragraph.
INCOMPATIBLE CHANGE: Script 'justext' removed in favour of command python -m justext.
FEATURE: It's possible to enter an URI as input document in CLI.
FEATURE: It is possible to pass unicode string directly.

1.2.0 (2011-08-08)

FEATURE: Character counts used instead of word counts where possible in order to make the algorithm work well in the language independent mode (without a stoplist) for languages where counting words is not easy (Japanese, Chinese, Thai, etc).
BUG FIX: More robust parsing of meta tags containing the information about used charset.
BUG FIX: Corrected decoding of HTML entities  to

1.1.0 (2011-03-09)

First public release.