GitHub - victorman/ib4erule: Doing some word annalysis concerning the 'I before E except after C' rule

victorman / ib4erule Public

Notifications You must be signed in to change notification settings
Fork 2
Star 1

Doing some word annalysis concerning the 'I before E except after C' rule

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.words		.words
.wordsNoCaps		.wordsNoCaps
.wordsNoDupe		.wordsNoDupe
.wordsNoHyphen		.wordsNoHyphen
README		README
ib4erule		ib4erule

Repository files navigation

summary of words files:

.words was generated using:
grep -iP -e '(ie|ei)' /usr/share/dict/words > .words

.wordsNoDupe was then generated using .words
grep -ivP -e '(ier|iest|ed|ing|s|tion)$' .words > .wordsNoDupe

.wordsNoCaps was then generated:
grep -vP -e '^[A-Z]' .wordsNoDupe > .wordsNoCaps

.wordsNoHyphen:
grep -vP -e '-' .wordsNoCaps > .wordsNoHyphen

Known issues:
some deeper analysis needs to happen.
I realise that with the dupe ommissions there are some legitimate words being removed that are not actually duplicates of anything.
I'm not sure how big that number is though.

there is a large chunk of words that are legitimate followers of i before e excluded by omitting 'ier' and 'iest'.