Skip to content
Xav edited this page May 16, 2015 · 25 revisions

v0.0.12

  • Remove annoying console.log
  • Few new Brill rules
  • Better looking example page + readme screenshot
  • Fix bug that skipped lot of emoticons when building lexicons

v0.0.11

  • Verbs
    • Irregular verbs conjugation + integration in lexicon
    • Regular verbs in Lexicon
    • Basic tense detection (for simple sentences, based on dependency parsing)
  • Numerous new Brill's rules for PoS tagging (92.519% on Penn Treebank)
  • Improved dependency parsing
  • Trie class interface
  • Bit of code documentation
  • Sentence detectors are now applied directly in analysis sentence loop (not anymore in a dedicated second loop)
  • New attributes for tokens (is_verb, infinitive, is_noun, plural, singular)
  • *in > *ing inference (if a word ends with in, is not in lexicon, and the same word plus g exists in lexicon, then infer it as VBG)
  • New tests

v0.0.10

v0.0.9

  • Improved token PoS tagging (+0.8% on Penn treebank!):
    • Order of detectors changed
    • Better management of composed words
  • First step of scaffolding for dependency parsing feature

v0.0.8

  • All regular verbs now conjugated (and/or conjugable)
  • PoS tagging for verbs greatly improved
  • Better packing of verbs and nationalities (-2ko)
  • Better filtering of lexicon (-1ko)
  • Reorganised a bit the project
    • Lexicon data files moved to src/lexicon
    • Compendium data files moved to src/dictionaries
  • Lot of news tests (isSingular, verbs, lexicon...)
  • Refactored detectors API so it's a bid less verbose

v0.0.7

  • Better sentiment profiling for mixed sentiment, in particular when using multiple adverbs
  • Politeness, dirtiness scores
  • Synonyms feature for tokens normalization
    • Used by PoS tagger in case no other method returned a tag

v0.0.6

  • Add interrogative and exclamatory sentence types
  • Fix low confidence for obvious PoS tagging (CD, SYM...)
  • [Gulpfile] Add test run on live rebuild

v0.0.5

  • Statistics skips punctuation tokens
  • Improve verb inflector
  • Better sentiment profiling
  • Better breakpoint detection
Clone this wiki locally