Skip to content

Calculate the number of occurrences of each word in a text. Words smaller than two letters will be ignored.

License

Notifications You must be signed in to change notification settings

proustibat/occurences

Repository files navigation

Occurrences GitHub license

Calculate the number of occurrences of each word in a text. Get varisous stats: smallest, longest words, etc.

NPM
GitHub package version npm Npm downloads
Maintenance
GitHub last commit
Open issues
Build Status
Sonar quality gate
Code Climate
Coverage Status
Greenkeeper badge
Dependencies Status
DevDependencies Status

Installation

npm i -S occurences

Usage

Javascript

const Occurrences = require('occurences');
let occ = new Occurrences(data, [options])

Where data is a string. Options object isn't required.

Running example on Runkit.com: https://runkit.com/proustibat/occurences-example-request

Typescript

Wanna use it with Angular 2? For example in an Ionic application. Import as follows:

import * as Occurences from 'Occurences';

Note that stats of an instance is an object. So to list the words in an ionic template with *ngFor, proceed as follows to transform it in an array:

Typescript file:

    this.textOccurrences = new Occurences(this.text);
    this.statsArray = Object.keys(this.textOccurrences.stats).map( key => {
        return { word: key, number: this.textOccurrences.stats[key] };
    });

HTML :

<table>
    <tr *ngFor="let item of statsArray">
        <td>{{ item.word }}</td>
        <td>{{ item.number }}</td>
    </tr>
</table>

Options

Option Type Default Description
sensitiveCase Boolean false If defined to true, counts as 2 different words same word with uppercases
ignored String or Array - One or several words to ignore when counting occurrences
biggerThan int 2 Considers only words larger than this number of letters

Properties

Property Type Description
stats Object Each words occurrences: word as key, occurence number as value (read-only)
meta Object Global stats about the data: total number of words, number of different words, total number of characters with spaces (charsWS) or no (charsNS) Returns an object as follows: {totalWords:int, differentWords:int, charsWS:int, charsNS:int}
lessUsed Array The less used word of the data (read-only)
mostUsed Array The most used word of the data (read-only)
smallest Array The smallest used word (read-only)
longest Array The longest used word (read-only)
options Object Settings of the instance (read-only)

Methods

Property Parameters Default Description
getSorted String: 'desc', 'asc' 'desc' Returns an array with objects sorted by order descendant or ascendant, each index of the array is an object as follows : {word:'three', number: '3'}

Examples

Example with latin alphabet

Simple stats

const Occurrences = require('occurences'); // note the lib is named with only one R
const latinText = "Not connected to power. Power is it good or bad. What is power? Dunno what power is but I know what it's not.";
let occurrencesLatin = new Occurrences(latinText);
console.log(occurrencesLatin.stats);

Output:

{ 
    not: 2,
    connected: 1,
    power: 4,
    good: 1,
    bad: 1,
    what: 3,
    dunno: 1,
    but: 1,
    know: 1,
    'it\'s': 1 
}

Other properties

console.log("longest: ", occurrencesLatin.longest);
console.log("smallest: ", occurrencesLatin.smallest);
console.log("lessUsed: ", occurrencesLatin.lessUsed);
console.log("mostUsed: ", occurrencesLatin.mostUsed);
console.log("getSorted: ", occurrencesLatin.getSorted());

Output:

longest:  ['connected']
smallest:  [ 'not', 'bad', 'but' ]
lessUsed:  [ 'connected', 'good', 'bad', 'dunno', 'but', 'know', 'it\'s' ]
mostUsed:  ['power']
getSorted:  [ { value: 'power', number: 4 },
  { value: 'what', number: 3 },
  { value: 'not', number: 2 },
  { value: 'connected', number: 1 },
  { value: 'good', number: 1 },
  { value: 'bad', number: 1 },
  { value: 'dunno', number: 1 },
  { value: 'but', number: 1 },
  { value: 'know', number: 1 },
  { value: 'it\'s', number: 1 } ]

Example with hebrew alphabet

const Occurrences = require('occurences'); // note the lib is named with only one R
const hebrewText = "שלום! חג פסח שמח ו שבת שלום לכולם!";
let occurrencesHebrew = new Occurrences(hebrewText);
console.log(occurrencesHebrew.stats);

Output:

{ 
	'שלום': 2, 
	'פסח': 1, 
	'שמח': 1, 
	'שבת': 1, 
	'לכולם': 1 
}

Note that text editor don't outputs from left to right but the object is ok in real life

Example with async data

const Occurrences = require('occurences');  // note the lib is named with only one R
const request = require('request');         // note you have to install request lib
const url = "http://faker.hook.io/?property=lorem.sentences";
request({
    url: url,
    json: true
}, function (error, response, data) {
    if (!error && response.statusCode === 200) {
        let myResult = new Occurrences(data);
        console.log(myResult.stats);
    }
    else {
        console.log("It seems an error occured when requesting ", url);
    }
});

Output:

{ 
    nobis: 1,
    quam: 1,
    sapiente: 1,
    fugiat: 1,
    cumque: 2,
    nisi: 1,
    voluptatem: 1,
    sint: 1,
    quibusdam: 1,
    impedit: 1,
    modi: 2,
    expedita: 1,
    deserunt: 1,
    non: 1 
}

Tests

npm test

Coverage

npm run cover

Continuous Code Quality

I use Sonarqube on Sonarcloud.io to maintain clean code. Public dashboard is here: https://sonarcloud.io/dashboard?id=proustibat_occurences

Some results:

Comments (%) Open issues Code smells Technical debt Bugs Reliability remediation effort Coverage

Using Sonar Scanner

Be sure you have downloaded and installed the Sonarqube Scanner. You need to add sonar-project.properties to the root of the project as folllows:

sonar.projectName=Occurences
sonar.projectKey=proustibat_occurences
sonar.host.url=https://sonarcloud.io
sonar.organization=proustibat-github
sonar.login=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
sonar.sources=.
sonar.exclusions=node_modules/**/*,coverage/**/*,example.js,test/**/*
sonar.javascript.lcov.reportPath=coverage/lcov.info
sonar.java.source=1.8
sonar.java.binaries=.

And then run sonar scanner as follows:

sonar-scanner -X -Dsonar.projectVersion=x.x.x

More information on Sonarcloud.io

Contributing