it-compromise

modesta elaborazione del linguaggio naturale

npm install it-compromise

_{work-in-progress! • lavori in corso!}

_{see: french • german •spanish • english}

it-compromise è un port di compromise in italiano.

L'obiettivo di questo progetto è fornire un tagger POS piccolo, di base e basato su regole.

_{(this project is a small, basic, rules-based POS tagger!)}

import nlp from 'it-compromise'

let doc = nlp(`con l'autoradio sempre nella mano destra`)
doc.match('#Noun').json()
// [{text:'autoradio'}, {text:'mano'}]

o lato cliente:

<script src="https://unpkg.com/it-compromise"></script>
<script>
  let txt = 'un canarino sopra la finestra'
  let doc = itCompromise(txt) // window.itCompromise
  console.log(doc.json())
  // { text:'un canarino...', terms:[ ... ] }
</script>

API

it-compromise includes all methods of compromise/one:

Clicca per vedere

Output

.text() - return the document as text
.json() - return the document as data
.debug() - pretty-print the interpreted document
.out() - a named or custom output
.html({}) - output custom html tags for matches
.wrap({}) - produce custom output for document matches

Utils

.found [getter] - is this document empty?
.docs [getter] get term objects as json
.length [getter] - count the # of characters in the document (string length)
.isView [getter] - identify a compromise object
.compute() - run a named analysis on the document
.clone() - deep-copy the document, so that no references remain
.termList() - return a flat list of all Term objects in match
.cache({}) - freeze the current state of the document, for speed-purposes
.uncache() - un-freezes the current state of the document, so it may be transformed

Accessors

.all() - return the whole original document ('zoom out')
.terms() - split-up results by each individual term
.first(n) - use only the first result(s)
.last(n) - use only the last result(s)
.slice(n,n) - grab a subset of the results
.eq(n) - use only the nth result
.firstTerms() - get the first word in each match
.lastTerms() - get the end word in each match
.fullSentences() - get the whole sentence for each match
.groups() - grab any named capture-groups from a match
.wordCount() - count the # of terms in the document
.confidence() - an average score for pos tag interpretations

Match

(match methods use the match-syntax.)

.match('') - return a new Doc, with this one as a parent
.not('') - return all results except for this
.matchOne('') - return only the first match
.if('') - return each current phrase, only if it contains this match ('only')
.ifNo('') - Filter-out any current phrases that have this match ('notIf')
.has('') - Return a boolean if this match exists
.before('') - return all terms before a match, in each phrase
.after('') - return all terms after a match, in each phrase
.union() - return combined matches without duplicates
.intersection() - return only duplicate matches
.complement() - get everything not in another match
.settle() - remove overlaps from matches
.growRight('') - add any matching terms immediately after each match
.growLeft('') - add any matching terms immediately before each match
.grow('') - add any matching terms before or after each match
.sweep(net) - apply a series of match objects to the document
.splitOn('') - return a Document with three parts for every match ('splitOn')
.splitBefore('') - partition a phrase before each matching segment
.splitAfter('') - partition a phrase after each matching segment
.lookup([]) - quick find for an array of string matches
.autoFill() - create type-ahead assumptions on the document

Tag

.tag('') - Give all terms the given tag
.tagSafe('') - Only apply tag to terms if it is consistent with current tags
.unTag('') - Remove this term from the given terms
.canBe('') - return only the terms that can be this tag

Case

.toLowerCase() - turn every letter of every term to lower-cse
.toUpperCase() - turn every letter of every term to upper case
.toTitleCase() - upper-case the first letter of each term
.toCamelCase() - remove whitespace and title-case each term

Whitespace

.pre('') - add this punctuation or whitespace before each match
.post('') - add this punctuation or whitespace after each match
.trim() - remove start and end whitespace
.hyphenate() - connect words with hyphen, and remove whitespace
.dehyphenate() - remove hyphens between words, and set whitespace
.toQuotations() - add quotation marks around these matches
.toParentheses() - add brackets around these matches

Loops

.map(fn) - run each phrase through a function, and create a new document
.forEach(fn) - run a function on each phrase, as an individual document
.filter(fn) - return only the phrases that return true
.find(fn) - return a document with only the first phrase that matches
.some(fn) - return true or false if there is one matching phrase
.random(fn) - sample a subset of the results

Insert

.replace(match, replace) - search and replace match with new content
.replaceWith(replace) - substitute-in new text
.remove() - fully remove these terms from the document
.insertBefore(str) - add these new terms to the front of each match (prepend)
.insertAfter(str) - add these new terms to the end of each match (append)
.concat() - add these new things to the end
.swap(fromLemma, toLemma) - smart replace of root-words,using proper conjugation

Transform

.sort('method') - re-arrange the order of the matches (in place)
.reverse() - reverse the order of the matches, but not the words
.normalize({}) - clean-up the text in various ways
.unique() - remove any duplicate matches

Lib

(these methods are on the main nlp object)

nlp.tokenize(str) - parse text without running POS-tagging
nlp.lazy(str, match) - scan through a text with minimal analysis
nlp.plugin({}) - mix in a compromise-plugin
nlp.parseMatch(str) - pre-parse any match statements into json
nlp.world() - grab or change library internals
nlp.model() - grab all current linguistic data
nlp.methods() - grab or change internal methods
nlp.hooks() - see which compute methods run automatically
nlp.verbose(mode) - log our decision-making for debugging
nlp.version - current semver version of the library
nlp.addWords(obj) - add new words to the lexicon
nlp.addTags(obj) - add new tags to the tagSet
nlp.typeahead(arr) - add words to the auto-fill dictionary
nlp.buildTrie(arr) - compile a list of words into a fast lookup form
nlp.buildNet(arr) - compile a list of matches into a fast match form

docs

Numeri

può analizzare e generare numeri scritti

let doc = nlp('ne ho milleduecentosessantasette euro')
doc.numbers().minus(15)
doc.text()
// 'ne ho milleduecentocinquantadue euro'

number docs

Lemmatizzazione

può coniugare parole radice

let doc = nlp('Ho guidato al negozio')
doc.compute('root')
doc.has('{guidare} al #Noun')
//true

root docs

Contribuire

per favore unisciti per aiutare! - please join to help! help with first PR1

git clone https://github.com/nlp-compromise/it-compromise.git
cd it-compromise
npm install
npm test
npm watch

Fonti

Morph-it - by Marco Baroni and Eros Zanchetta
PoSTWITA-UD italian tweets dataaset - by Manuela Sanguinetti et al
ian-hamlin/verb-data - italian verb conjugations scraped from wiktionary

Guarda anche:

RDRPOSTagger - rule-based tagger in python & java w/ italian model
opennlp-italian - Java tagger w/ italian model
TreeTagger - Perl tagger w/ italian model

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

API

Clicca per vedere

Output

Utils

Accessors

Match

Tag

Case

Whitespace

Loops

Insert

Transform

Lib

Numeri

Lemmatizzazione

Contribuire

Fonti

Guarda anche:

Files

README.md

Latest commit

History

README.md

File metadata and controls

API

Clicca per vedere

Output

Utils

Accessors

Match

Tag

Case

Whitespace

Loops

Insert

Transform

Lib

Numeri

Lemmatizzazione

Contribuire

Fonti

Guarda anche: