npm install it-compromise
it-compromise
è un port di compromise in italiano.
L'obiettivo di questo progetto è fornire un tagger POS piccolo, di base e basato su regole.
(this project is a small, basic, rules-based POS tagger!)
import nlp from 'it-compromise'
let doc = nlp(`con l'autoradio sempre nella mano destra`)
doc.match('#Noun').json()
// [{text:'autoradio'}, {text:'mano'}]
o lato cliente:
<script src="https://unpkg.com/it-compromise"></script>
<script>
let txt = 'un canarino sopra la finestra'
let doc = itCompromise(txt) // window.itCompromise
console.log(doc.json())
// { text:'un canarino...', terms:[ ... ] }
</script>
it-compromise includes all methods of compromise/one
:
- .text() - return the document as text
- .json() - return the document as data
- .debug() - pretty-print the interpreted document
- .out() - a named or custom output
- .html({}) - output custom html tags for matches
- .wrap({}) - produce custom output for document matches
- .found [getter] - is this document empty?
- .docs [getter] get term objects as json
- .length [getter] - count the # of characters in the document (string length)
- .isView [getter] - identify a compromise object
- .compute() - run a named analysis on the document
- .clone() - deep-copy the document, so that no references remain
- .termList() - return a flat list of all Term objects in match
- .cache({}) - freeze the current state of the document, for speed-purposes
- .uncache() - un-freezes the current state of the document, so it may be transformed
- .all() - return the whole original document ('zoom out')
- .terms() - split-up results by each individual term
- .first(n) - use only the first result(s)
- .last(n) - use only the last result(s)
- .slice(n,n) - grab a subset of the results
- .eq(n) - use only the nth result
- .firstTerms() - get the first word in each match
- .lastTerms() - get the end word in each match
- .fullSentences() - get the whole sentence for each match
- .groups() - grab any named capture-groups from a match
- .wordCount() - count the # of terms in the document
- .confidence() - an average score for pos tag interpretations
(match methods use the match-syntax.)
- .match('') - return a new Doc, with this one as a parent
- .not('') - return all results except for this
- .matchOne('') - return only the first match
- .if('') - return each current phrase, only if it contains this match ('only')
- .ifNo('') - Filter-out any current phrases that have this match ('notIf')
- .has('') - Return a boolean if this match exists
- .before('') - return all terms before a match, in each phrase
- .after('') - return all terms after a match, in each phrase
- .union() - return combined matches without duplicates
- .intersection() - return only duplicate matches
- .complement() - get everything not in another match
- .settle() - remove overlaps from matches
- .growRight('') - add any matching terms immediately after each match
- .growLeft('') - add any matching terms immediately before each match
- .grow('') - add any matching terms before or after each match
- .sweep(net) - apply a series of match objects to the document
- .splitOn('') - return a Document with three parts for every match ('splitOn')
- .splitBefore('') - partition a phrase before each matching segment
- .splitAfter('') - partition a phrase after each matching segment
- .lookup([]) - quick find for an array of string matches
- .autoFill() - create type-ahead assumptions on the document
- .tag('') - Give all terms the given tag
- .tagSafe('') - Only apply tag to terms if it is consistent with current tags
- .unTag('') - Remove this term from the given terms
- .canBe('') - return only the terms that can be this tag
- .toLowerCase() - turn every letter of every term to lower-cse
- .toUpperCase() - turn every letter of every term to upper case
- .toTitleCase() - upper-case the first letter of each term
- .toCamelCase() - remove whitespace and title-case each term
- .pre('') - add this punctuation or whitespace before each match
- .post('') - add this punctuation or whitespace after each match
- .trim() - remove start and end whitespace
- .hyphenate() - connect words with hyphen, and remove whitespace
- .dehyphenate() - remove hyphens between words, and set whitespace
- .toQuotations() - add quotation marks around these matches
- .toParentheses() - add brackets around these matches
- .map(fn) - run each phrase through a function, and create a new document
- .forEach(fn) - run a function on each phrase, as an individual document
- .filter(fn) - return only the phrases that return true
- .find(fn) - return a document with only the first phrase that matches
- .some(fn) - return true or false if there is one matching phrase
- .random(fn) - sample a subset of the results
- .replace(match, replace) - search and replace match with new content
- .replaceWith(replace) - substitute-in new text
- .remove() - fully remove these terms from the document
- .insertBefore(str) - add these new terms to the front of each match (prepend)
- .insertAfter(str) - add these new terms to the end of each match (append)
- .concat() - add these new things to the end
- .swap(fromLemma, toLemma) - smart replace of root-words,using proper conjugation
- .sort('method') - re-arrange the order of the matches (in place)
- .reverse() - reverse the order of the matches, but not the words
- .normalize({}) - clean-up the text in various ways
- .unique() - remove any duplicate matches
(these methods are on the main nlp
object)
-
nlp.tokenize(str) - parse text without running POS-tagging
-
nlp.lazy(str, match) - scan through a text with minimal analysis
-
nlp.plugin({}) - mix in a compromise-plugin
-
nlp.parseMatch(str) - pre-parse any match statements into json
-
nlp.world() - grab or change library internals
-
nlp.model() - grab all current linguistic data
-
nlp.methods() - grab or change internal methods
-
nlp.hooks() - see which compute methods run automatically
-
nlp.verbose(mode) - log our decision-making for debugging
-
nlp.version - current semver version of the library
-
nlp.addWords(obj) - add new words to the lexicon
-
nlp.addTags(obj) - add new tags to the tagSet
-
nlp.typeahead(arr) - add words to the auto-fill dictionary
-
nlp.buildTrie(arr) - compile a list of words into a fast lookup form
-
nlp.buildNet(arr) - compile a list of matches into a fast match form
può analizzare e generare numeri scritti
let doc = nlp('ne ho milleduecentosessantasette euro')
doc.numbers().minus(15)
doc.text()
// 'ne ho milleduecentocinquantadue euro'
può coniugare parole radice
let doc = nlp('Ho guidato al negozio')
doc.compute('root')
doc.has('{guidare} al #Noun')
//true
per favore unisciti per aiutare! - please join to help! help with first PR1
git clone https://github.com/nlp-compromise/it-compromise.git
cd it-compromise
npm install
npm test
npm watch
- Morph-it - by Marco Baroni and Eros Zanchetta
- PoSTWITA-UD italian tweets dataaset - by Manuela Sanguinetti et al
- ian-hamlin/verb-data - italian verb conjugations scraped from wiktionary
- RDRPOSTagger - rule-based tagger in python & java w/ italian model
- opennlp-italian - Java tagger w/ italian model
- TreeTagger - Perl tagger w/ italian model
MIT