You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey Xav,
In the object that gets returned, the raw text seems to have some extra spaces between tokens. For example, I would expect that running compendium.analyse('My name is Dr. Jekyll.'); would return the original text as the 'raw' property, as follows:
[ { time: 9, // Time of processing, in ms
length: 6, // Count of tokens
raw: 'My name is Dr. Jekyll.', // Raw string
stats: ...
However, it actually returns the following:
[ { time: 9, // Time of processing, in ms
length: 6, // Count of tokens
raw: 'My name is Dr. Jekyll .', // Raw string
stats: ...
It's a bit more pronounced with extra punctuation, since those are tokenized separately:
compendium.analyse('Today is 4/2/2015, or 2/4/2015- depending on where in the world you live!');
[ { time: 6, // Time of processing, in ms
length: 23, // Count of tokens
raw: 'Today is 4 / 2 / 2015 , or 2 / 4 / 2015- depending on where in the world you live !', // Raw string
stats: ...
It's a minor issue, but it can cause some presentation weirdness.
A quick look at the code makes me think it's happening on line 41 of detector.s.1.entities.js but that's only a first glance.
BTW, great work on this package!
The text was updated successfully, but these errors were encountered:
Thanks for the report, and sorry for the late reply.
The issue was caused by an incomplete implementation: raw was in fact an inaccurate reconstruction of the original string. I plan to implement it later on, as we may sometimes get some different tokens than in the original string (e.g. 2day to today).
In between, release v0.0.20 solve the issue by providing in raw field the real original string of the sentence.
As you look like you're really using the library, I wanted you to know that I'm working on being able to work on different languages, second one after english being french. This release scaffolds some code for doing so, so I hope you won't bump into any surprise - all tests are passing but who knows?
Let me know if you have any more feedback or suggestion!
Hey Xav,
In the object that gets returned, the raw text seems to have some extra spaces between tokens. For example, I would expect that running compendium.analyse('My name is Dr. Jekyll.'); would return the original text as the 'raw' property, as follows:
However, it actually returns the following:
It's a bit more pronounced with extra punctuation, since those are tokenized separately:
compendium.analyse('Today is 4/2/2015, or 2/4/2015- depending on where in the world you live!');
It's a minor issue, but it can cause some presentation weirdness.
A quick look at the code makes me think it's happening on line 41 of detector.s.1.entities.js but that's only a first glance.
BTW, great work on this package!
The text was updated successfully, but these errors were encountered: