GitHub - nlp-compromise/ja-compromise: 日本語の控えめな自然言語処理

ja-compromise

ブラウザでのシンプルな自然言語処理

npm install ja-compromise

_{work-in-progress! • 進行中！}

_{see:
フランス語 •
スペイン語 •
ドイツ語 •
英語}

ja-compromise は、英語の JavaScript ライブラリ nlp-compromise を日本語で移植したものです。

このプロジェクトの目標は、小さくて基本的なルールベースの POS タグを提供することです。

ja-compromise (妥協) is a port of compromise in japanese.

The goal of this project is to provide a small, basic, rule-based POS-tagger.

import nlp from 'ja-compromise'

let doc = ldv('小さな子供は食料品店に歩いた')
doc.match('#Noun').out('array')
// [ '子', '食料品店']

またはブラウザで

<script src="https://unpkg.com/de-compromise"></script>
<script>
  let txt = '小さな子供が食料品を買いました。 彼はとても怖がっていた'
  let doc = jaCompromise(txt)
  console.log(doc.sentences(1).json())
  // { text:'小さな子供が食...', terms:[ ... ] }
</script>

see en-compromise/api for full API documentation.

API

ja-compromise には、compromise/one のすべてのメソッドが含まれます:

クリックして API メソッドを表示

Output

.text() - return the document as text
.json() - return the document as data
.debug() - pretty-print the interpreted document
.out() - a named or custom output
.html({}) - output custom html tags for matches
.wrap({}) - produce custom output for document matches

Utils

.found [getter] - is this document empty?
.docs [getter] get term objects as json
.length [getter] - count the # of characters in the document (string length)
.isView [getter] - identify a compromise object
.compute() - run a named analysis on the document
.clone() - deep-copy the document, so that no references remain
.termList() - return a flat list of all Term objects in match
.cache({}) - freeze the current state of the document, for speed-purposes
.uncache() - un-freezes the current state of the document, so it may be transformed

Accessors

.all() - return the whole original document ('zoom out')
.terms() - split-up results by each individual term
.first(n) - use only the first result(s)
.last(n) - use only the last result(s)
.slice(n,n) - grab a subset of the results
.eq(n) - use only the nth result
.firstTerms() - get the first word in each match
.lastTerms() - get the end word in each match
.fullSentences() - get the whole sentence for each match
.groups() - grab any named capture-groups from a match
.wordCount() - count the # of terms in the document
.confidence() - an average score for pos tag interpretations

Match

(match methods use the match-syntax.)

.match('') - return a new Doc, with this one as a parent
.not('') - return all results except for this
.matchOne('') - return only the first match
.if('') - return each current phrase, only if it contains this match ('only')
.ifNo('') - Filter-out any current phrases that have this match ('notIf')
.has('') - Return a boolean if this match exists
.before('') - return all terms before a match, in each phrase
.after('') - return all terms after a match, in each phrase
.union() - return combined matches without duplicates
.intersection() - return only duplicate matches
.complement() - get everything not in another match
.settle() - remove overlaps from matches
.growRight('') - add any matching terms immediately after each match
.growLeft('') - add any matching terms immediately before each match
.grow('') - add any matching terms before or after each match
.sweep(net) - apply a series of match objects to the document
.splitOn('') - return a Document with three parts for every match ('splitOn')
.splitBefore('') - partition a phrase before each matching segment
.splitAfter('') - partition a phrase after each matching segment
.lookup([]) - quick find for an array of string matches
.autoFill() - create type-ahead assumptions on the document

Tag

.tag('') - Give all terms the given tag
.tagSafe('') - Only apply tag to terms if it is consistent with current tags
.unTag('') - Remove this term from the given terms
.canBe('') - return only the terms that can be this tag

Case

.toLowerCase() - turn every letter of every term to lower-cse
.toUpperCase() - turn every letter of every term to upper case
.toTitleCase() - upper-case the first letter of each term
.toCamelCase() - remove whitespace and title-case each term

Whitespace

.pre('') - add this punctuation or whitespace before each match
.post('') - add this punctuation or whitespace after each match
.trim() - remove start and end whitespace
.hyphenate() - connect words with hyphen, and remove whitespace
.dehyphenate() - remove hyphens between words, and set whitespace
.toQuotations() - add quotation marks around these matches
.toParentheses() - add brackets around these matches

Loops

.map(fn) - run each phrase through a function, and create a new document
.forEach(fn) - run a function on each phrase, as an individual document
.filter(fn) - return only the phrases that return true
.find(fn) - return a document with only the first phrase that matches
.some(fn) - return true or false if there is one matching phrase
.random(fn) - sample a subset of the results

Insert

.replace(match, replace) - search and replace match with new content
.replaceWith(replace) - substitute-in new text
.remove() - fully remove these terms from the document
.insertBefore(str) - add these new terms to the front of each match (prepend)
.insertAfter(str) - add these new terms to the end of each match (append)
.concat() - add these new things to the end
.swap(fromLemma, toLemma) - smart replace of root-words,using proper conjugation

Transform

.sort('method') - re-arrange the order of the matches (in place)
.reverse() - reverse the order of the matches, but not the words
.normalize({}) - clean-up the text in various ways
.unique() - remove any duplicate matches

Lib

(these methods are on the main nlp object)

nlp.tokenize(str) - parse text without running POS-tagging
nlp.lazy(str, match) - scan through a text with minimal analysis
nlp.plugin({}) - mix in a compromise-plugin
nlp.parseMatch(str) - pre-parse any match statements into json
nlp.world() - grab or change library internals
nlp.model() - grab all current linguistic data
nlp.methods() - grab or change internal methods
nlp.hooks() - see which compute methods run automatically
nlp.verbose(mode) - log our decision-making for debugging
nlp.version - current semver version of the library
nlp.addWords(obj) - add new words to the lexicon
nlp.addTags(obj) - add new tags to the tagSet
nlp.typeahead(arr) - add words to the auto-fill dictionary
nlp.buildTrie(arr) - compile a list of words into a fast lookup form
nlp.buildNet(arr) - compile a list of matches into a fast match form

参加して助けてください！ - please join to help!

指示： / Contributing

git clone https://github.com/nlp-compromise/ja-compromise.git
cd ja-compromise
npm install
npm test
npm watch

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
builds		builds
demo		demo
learn		learn
lexicon		lexicon
scripts		scripts
src		src
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
changelog.md		changelog.md
package-lock.json		package-lock.json
package.json		package.json
rollup.config.js		rollup.config.js
scratch.js		scratch.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

API

クリックして API メソッドを表示

Output

Utils

Accessors

Match

Tag

Case

Whitespace

Loops

Insert

Transform

Lib

指示： / Contributing

See also

About

Releases

Languages

nlp-compromise/ja-compromise

Folders and files

Latest commit

History

Repository files navigation

API

クリックして API メソッドを表示

Output

Utils

Accessors

Match

Tag

Case

Whitespace

Loops

Insert

Transform

Lib

指示： / Contributing

See also

About

Resources

Stars

Watchers

Forks

Releases

Languages