-
-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add different builds: Full and Basic #385
Comments
Added in Full explanation of build files. cc: @cshoredaniel, @sidvishnoi, @ndelangen - let me know your thoughts. I figured this would be a good option to have. Rationale:
|
I like the idea. In the future I wonder if it'd be possible to make the Byteap hack I pointed you at sufficiently performant (I haven't looked deeply so it's just a thought for now) handle all lengths of string that Fuse wishes to support. It might not be possible from an algorithmic view but I'd rather like the ability to have fuzzy short and long strings without extended search included. |
@cshoredaniel indeed, I tested the Byteap solution you provided (which, btw - I forgot to thank you for - so, thank you!) and I'm still trying to get adequate performance out of it. The 2D array computation is the bottleneck. Curious to know your thoughts: For very long patterns, it would seem that there would be a far smaller error-to-pattern-length ratio than for smaller patterns. That is, for longer patterns, there is "more to work with that is spelt accurately". Because of this, I went the ngram route, and it does seem to provide pretty good results.
Agreed. Goal is to get byteap to be close to bitap on performance. |
Hi, sorry for the long delay. For a longer patterns against the same text I think your observation holds. I suspected the 2D array would be an issue, but hopefully not insurmountable. I see you've been keeping busy and I have some reading to do. |
To save on size, it would be great to have different builds
Full: Builds that contain standard fuzzy searching, extended searching, and long pattern searching. These builds are larger.
Basic: Builds that contain only standard fuzzy searching. Use these if you're not going to enable extended searching and if the patterns that are searched are typically less than 32 characters in length.
The text was updated successfully, but these errors were encountered: