Changelog

v0.0.23

Improvements of interrogative type detection (fix of some test cases, add new test cases)
Numeric tokens now provide a value attribute representing the real value of the number, typed as Javascript number
Fix singular attributes for noun token not being set
Detectors can now be executed before dependency parsing
- compendium.detectors.add is deprecated in favor of compendium.detectors.after, will be removed in v1.0.0
- compendium.detectors.after registers detectors that will be executed after dependency parsing
- compendium.detectors.before registers detectors that will be executed before dependency parsing

v0.0.22

Better like token handling: transform into preposition when possible (I like that vs It's like that)
Better have token handling: rarely a noun
Roman numerals handling (Chapter IV, Henri III)
Improved Natural Entity Recognition (more patterns such as IO2009, CamelCased Inc., Henry III...)
Bug fixes
- Avoid duplicate items in lexicon (leading to wrong PoS tagging and sentiment analysis)
- Avoid raw tokens being normalized

v0.0.21

Fix Missing infinitives for some verb tokens
Add tense attribute to verb tokens

v0.0.20

Fix #3: raw field is a reconstruction of the sentence, not the actual raw string. Fixed by providing the real raw string.
Scaffolds some code for multilingual use of compendium - for now one build per language
- Add post processors to lexer for language-specific tokens handling
- Reorganize sources to have a clean multilingual directory structure (to be continued)
- Create gulp build tasks for french language
- Add initial tests for french language

v0.0.19

Minor improvements of english dependency parsing with new tests
Minor improvements of profiling
NER fixes

v0.0.18

Fix infinite loop in dependency parsing

v0.0.17

New dependency parsing rules and tests
New PoS rules
Fix some token sentiment scores being skipped when building lexicon
Add experimental dependency-based sentiment score propagation
Allow lexicon sentiment scores to be floats

v0.0.16

Sentiment analysis: better "mixed" tagging by comparing amplitude to score in the case of low score + medium amplitude
Better handling of quotes (lexer, PoS)
Slight cleanup of some lexicon symbols

v0.0.15

Sentence types: add refusal type
Negation detection slight refactoring (negation is expended to negation mark master verb)

v0.0.14

Remove cleaner step (replaced by synonyms handler)
Sentence types: add approval type
Dependency parsing: add new governors ranks
Token attributes: add is_punc attribute
Add new Brill rules (+0.1% on Penn Treebank)
Statistics: add words stat (number of actual words in a sentence: tokens length - punc, emots...)
New tests + some tests refactoring

v0.0.13

Fix issues
- Missing 're contraction
- Lexer bit too greedy with emoticons (was catching -s in inter-sport)
Improved dependency parsing
- Third rank of governors
- More governor tag candidates
Sentence type imperative by looking up for VB governors
New Brill rules

v0.0.12

Remove annoying console.log
Few new Brill rules
Better looking example page + readme screenshot
Fix bug that skipped lot of emoticons when building lexicons

v0.0.11

Verbs
- Irregular verbs conjugation + integration in lexicon
- Regular verbs in Lexicon
- Basic tense detection (for simple sentences, based on dependency parsing)
Numerous new Brill's rules for PoS tagging (92.519% on Penn Treebank)
Improved dependency parsing
Trie class interface
Bit of code documentation
Sentence detectors are now applied directly in analysis sentence loop (not anymore in a dedicated second loop)
New attributes for tokens (is_verb, infinitive, is_noun, plural, singular)
*in > *ing inference (if a word ends with in, is not in lexicon, and the same word plus g exists in lexicon, then infer it as VBG)
New tests

v0.0.10

Basically working dependency parsing
Bug fixes/improvements
- Fix critical issue using stem when singularised should be used
Bit of project cleanup
- Move benchmark folder to test/
- Remove find utility (use grep!)

v0.0.9

Improved token PoS tagging (+0.8% on Penn treebank!):
- Order of detectors changed
- Better management of composed words
First step of scaffolding for dependency parsing feature

v0.0.8

All regular verbs now conjugated (and/or conjugable)
PoS tagging for verbs greatly improved
Better packing of verbs and nationalities (-2ko)
Better filtering of lexicon (-1ko)
Reorganised a bit the project
- Lexicon data files moved to src/lexicon
- Compendium data files moved to src/dictionaries
Lot of news tests (isSingular, verbs, lexicon...)
Refactored detectors API so it's a bid less verbose

v0.0.7

Better sentiment profiling for mixed sentiment, in particular when using multiple adverbs
Politeness, dirtiness scores
Synonyms feature for tokens normalization
- Used by PoS tagger in case no other method returned a tag

v0.0.6

Add interrogative and exclamatory sentence types
Fix low confidence for obvious PoS tagging (CD, SYM...)
[Gulpfile] Add test run on live rebuild

v0.0.5

Statistics skips punctuation tokens
Improve verb inflector
Better sentiment profiling
Better breakpoint detection

Compendium-js, English NLP for Node.js and the browser, MIT Licensed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changelog

v0.0.23

v0.0.22

v0.0.21

v0.0.20

v0.0.19

v0.0.18

v0.0.17

v0.0.16

v0.0.15

v0.0.14

v0.0.13

v0.0.12

v0.0.11

v0.0.10

v0.0.9

v0.0.8

v0.0.7

v0.0.6

v0.0.5

Clone this wiki locally