-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Refactor hntrie to avoid the need for boundary cells
Whereas before the string segment was encoded as: LL OOOOOOOOOOOO where L are the upper 8 bits and used to encode the length of the segment, and O are the lower 24 bits and used to encode the offset of the string data in the character buffer, the new code encode as follow: OOOOOOOOOOOO LL And furthermore the most significant bit of the length LL is now used to mark whether the current string segment is a label boundary. This means a cell can't reference a segment longer then 127 characters. To work around this limitation for when a segment is longer than 127 characters (a rare occurrence), the algorithm will simply split the segment into multiple adjacent cells. As a result, there is no longer a need to encode "boundariness" into special cells, which simplifies both the storing and matching algorithms. Additionally, added minimal documentation for the NPM package on how to import and use HNTrieContainer as a standalone API.
- Loading branch information
Showing
8 changed files
with
370 additions
and
249 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.