Full unicode support #4

richard-willis · 2012-10-22T09:02:39Z

We need to support Unicode character sets. Unfortunately Javascript doesn't make it easy to work with Unicode character sets, so we should consider using http://xregexp.com/.

After some initial testing, it appears the xRegExp library fits the job, but there are certain decisions we need to make before properly integrating it.

The spellchecker plugin currently relies on sending a string of words to the back-end service/s. This means we need to strip punctuation from the string of text. Determining what is punctuation is the difficult part.

The RegExp library gives us the \p{P} (unicode punctuation regexp category) but we don't want to strip punctuation that forms part of a word, eg "Here's". I can't even being to decide on how to handle this for languages other than English.

I suggest we have a look at existing spellcheckers to determine how others handle this to come to a decision. The solution has to be generic for all languages.

I'm making changes related to Unicode support the unicode-support branch.

Any advice or suggestions would be greatly appreciated.

badsyntax · 2012-10-25T16:28:44Z

I've had to make some changes to the findAndReplaceDOMText library to support Unicode find and replace:

…ings are looking good. Refs #4

…code character sets. Refs #4

badsyntax · 2012-11-01T20:13:19Z

I've merged 'unicode-support' branch into develop, as this feature seems to working pretty well now, and i have written some tests for it

ghost assigned badsyntax Oct 22, 2012

badsyntax added a commit that referenced this issue Oct 25, 2012

Added partial Unicode support using XRegExp. Lots to clean up, but th…

72cd06d

…ings are looking good. Refs #4

badsyntax added a commit that referenced this issue Oct 25, 2012

Updated examples: added russian examples; fixed google driver for uni…

d774130

…code character sets. Refs #4

badsyntax closed this as completed Nov 1, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full unicode support #4

Full unicode support #4

richard-willis commented Oct 22, 2012

badsyntax commented Oct 25, 2012

badsyntax commented Nov 1, 2012

Full unicode support #4

Full unicode support #4

Comments

richard-willis commented Oct 22, 2012

badsyntax commented Oct 25, 2012

badsyntax commented Nov 1, 2012