N-Grams can be obtained for Arrays of Strings, or with single Strings (which will first be tokenized).
-
Add the dependency to your
shard.yml
:dependencies: cadmium_ngrams: github: cadmiumcr/ngrams
-
Run
shards install
require "cadmium_ngrams"
ngrams = Cadmium.ngrams.new
ngrams.bigrams("these are some words")
# => [["these", "are"], ["are", "some"], ["some", "words"]]
ngrams = Cadmium.ngrams.new
ngrams.trigrams("these are some words")
# => [["these", "are", "some"], ["are", "some", "words"]]
ngrams = Cadmium.ngrams.new
ngrams.ngrams("some other words here for you", 4)
# => [["some", "other", "words", "here"], ["other", "words", "here", "for"], ["words", "here", "for", "you"]]
n-grams can also be returned with left or right padding by passing a start and/or end symbol to the bigrams, trigrams or ngrams.
ngrams = Cadmium.ngrams.new
ngrams.ngrams("these are some words", 4, "[start]", "[end]")
# => [
["[start]", "[start]", "[start]", "these"],
["[start]", "[start]", "these", "are"],
["[start]", "these", "are", "some"],
["these", "are", "some", "words"],
["are", "some", "words", "[end]"],
["some", "words", "[end]", "[end]"],
["words", "[end]", "[end]", "[end]"]
]
- Fork it (https://github.com/cadmiumcr/ngrams/fork)
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request
- Chris Watson - creator and maintainer