Skip to content

Application that split longer words into meaningful constituents.

License

Notifications You must be signed in to change notification settings

borbiuk/word-breaker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Word Breaker

Module for get meaningful parts of compound words.

Example

example

Algorithm

The algorithm is based on the comparison of words with the dictionary. A dictionary is a large set of words, it used for comparison. The minimum possible length for a subword is equal 3, to exclude extra iterations and non-valued subwords.

Simple:

for each compound {
    for each possible_word {
        if (compound.Contains(possible_word)) {
            results.Add(possible_word)
        }
    }
}

Advanced:

for each compound {
    for each character except the first one {
        get possible_words starting with this character
        for each possible_word {
            if (compound contains the word at the character’s position) {
                results.Add(possible_word)
            }
        }
    }
}

Data structure

The resulting data structure consists of word lists indexed by two properties:

  • Length of the words
  • Two characters of the words
Dictionary<int, Dictionary<string, List<string>>>

where the int is the word length, the string is the first two characters and the lists contain the suitable words.

Search by length and first two letters in HashTable minimize word list for Contains() method. This significantly speeds up the program.

Notes

  • The algorithm can be modified to get all subwords
  • The algorithm can also be modified to receive first shorter or longer subword
  • Be careful to work with umlaut
  • If you have any remarks, let me know

Releases

No releases published

Packages

No packages published

Languages