-
Notifications
You must be signed in to change notification settings - Fork 682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bleve v2.0.0 proposal PR #1494
bleve v2.0.0 proposal PR #1494
Conversation
Keep in mind this is a large change, so please review carefully. I attempted to clean it up as best I could, but in a change this large sometimes things are missed. |
Links to new repos:
NOTE: we should finalize v1.0.0 of the api interfaces, and update accordingly before merging... |
So, one of the wrinkles you'll see upon closer review is that we have a few places where we type assert that a However, one of my additional refactors I'd like to consider is moving |
OK, now looks like moving Instead I'll focus on trying to remove any |
review comments from Sreekanth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly OK to me. Couple things though -
- I'm guessing you'll set up release tag(s) for bleve_index_api and scorch_segment_api - think we should keep their go versions at 1.13 as well (rather than 1.15).
- Is there a reason why you included go.sum into this change, I don't remember your reason for not committing it in earlier?
review comments
I expect once these interfaces packages are finalized, they will be tagged v1.0.0. What reason is there for them to be something else?
Yes this is intentional. Best practices is to include go.sum. In the past we did not because our go.mod was committed after manual edits (forward declarations of module versions that did not yet exist). Therefore everyone's go.sum would change after doing a basic build locally, which meant every PR was going to propose a change to go.sum. I wished to avoid that. However, now that that problem has gone away, we should adopt the best practices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine with me.
Logic inside segment implementations has been moved up the stack into scorch, so that fewer methods are required in the API. - Iterator() becomes AutomatonIterator(nil, nil, nil) - RangeIterator(start, end) now computes an end key compatible with traditional inclusive start, exclusive end keys. Then becomes: AutomatonIterator(nil, start, endExclusive) - PrefixIterator(prefix) now computes an end key and becomes: AutomatonIterator(nil, prefix, prefixEnd) NumericRangeSearcher updated to no longer require the IndexReaderOnly interface. This interface was only used for non-scorch indexes, but only scorch actually implemented it. Removing this also meant we can remove the OnlyIterator method from the segment dictionary API.
this code was originally placed in the segment package, however it is only used by scorch, and makes more sense to be placed at this level
the empty postings iterator is now only used by scorch's optimize.go, so it is relocated here the uvarint ascending encode/decode is only used by scorch, so it too is moved inside this package
the unadorned postings lists are only used by scorch so they are relocated to this package
note: while doing this it was observed that 'go mod tidy' was failing with an error. this relates to the kvstores in the blevex package which will needed to be updated separately. this change addresses this by removing the config_* and upsidedown/benchmark_* files which caused the problem. this does not remove support for these kvstores, just the build tag support.
also fixup some places we missed during refactor
* move the plugin interface into scorch * rename scorch Plugin to SegmentPlugin and related functions * update to make all analysis take place in func() (#1508) * update to make all analysis take place in func() by making analysis take place inside a func() a closure can be used to encapsulate all the index specific details. this allows the index API to not know about scorch vs upsidedown analysis structures * rewrite variable declaration for clarity also update comment, per review feedback * update to latest versions selects latest versions compatible with these changes
* update Advanced() method at the top-level of bleve, Advanced no longer knows about K/V stores, instead it only returns reference to the internal index implementation. this will allow us to remove the Advanced() method from the internal index implementation. NOTE: users who previously needed to access the underlying K/V store, will still be able to do so, but must first type assert that the internal index impl is *UpsideDownCouch and if it is, they can use that Advanced() method directly * relocate store pkg inside upsidedown (#1510) * relocate store pkg inside upsidedown since kvstores are a concept unique to the upsidedown index implementation, the store folder should be relocated inside of that package * remove Advanced() method from scorch (#1511) * remove Advanced() method from scorch after the API is updated, scorch no longer requires this useless method to satisfy the interface * update to latest apis
also, place shared kvstore test code in this module, not in upsidedown (avoids circular deps)
previous the FieldCache type was in the index package, but it is only used by upsidedown, so we relocate it there.
this removes the top-level bleve err ErrUnknownIndexType instead, now the upsidedown specific error is returned. users who actually care and are checking for this error type still can, only now it is defined in the upsidedown package.
this interface is only used by the searcher package, thus it makes sense to define it there
the dump methods were only ever supposed by upsidedown now, users of this method must type-assert that their reader is from the upsidedown package, where these methods remain, unchanged.
these cause circular dependencies and are easy for applications to do themselves
* use the moved indexing options * update analysis.TokenFrequency to use options (#1522) * update analysis.TokenFrequency to use options previously it used boolean, and future changes would require additional booleans. using options directly keeps the function signatures cleaner. * update to latest apis
* update to work with removed items * update to latest api
instead use the interface exposed by index api
* update for latest renaming * update to latest apis
one example required updating as it's custom sort contained a tie (both fields with age 0). upsidedown always returned the results in a particular order, which was not guaranteed by scorch. to address this, the exmaple has been updated to include the doc id as an additional sort criteria (ensuring the original order expected is kept by scorch)
* Adding configurable freq/norm processing Useful for use cases where score relevancy isn't needed for the search requests. A special frequency value of zero/0 is used for skipping the freq/norm. * update to work with removed items (#1524) * update to work with removed items * update to latest api * do not type assert on document or field (#1525) instead use the interface exposed by index api * update for latest renaming (#1527) * update for latest renaming * update to latest apis * switch default index to scorch with latest zap (#1528) one example required updating as it's custom sort contained a tie (both fields with age 0). upsidedown always returned the results in a particular order, which was not guaranteed by scorch. to address this, the exmaple has been updated to include the doc id as an additional sort criteria (ensuring the original order expected is kept by scorch) * update to release candidate apis (#1531) Co-authored-by: Marty Schoch <marty.schoch@gmail.com>
it appears that while moving files around I mistakenly changed a copyright header, which was not my intention
Right now `MatchOperator` constants are ints and when you're trying to assign them to a variable you get an int. If you try to use those variables as operator you get the following error: ``` cannot use operator (type int) as type MatchQueryOperator in argument to q.SetOperator ``` So, to fix this problem we just need to create `MatchOperator` constants
* update to v1 apis and zapx versions * update to v1.0.1 upsidedown_store_api * make sort integration tests more precise the upsidedown implementation would break tie sort order in a predictable way, and some tests inadvertely relied on this behavior. these tests have been adjusted to use a more specific sort order, breaking all ties, such that the expected results no longer depend on implementation details.
@abhinavdangeti @sreekanth-cb last call before we merge to master, requesting final re-review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, rebase.squash.merge away.
This PR currently includes the bare minimum set of changes to avoid circular dependencies with zap.
This is accomplished by refactoring the
blevesearch/bleve/index
andblevesearch/index/scorch/segment
packages out into a separate modulesblevesearch/bleve_index_api
andblevesearch/scorch_segment_api
respectively.Further, a new module
blevesearch/zapx
is introduced. This repository is a duplicate ofblevesearch/zap
up to today, with all major version branches (v11.x, v12.x, v13.x, v14.x and master) updated to no longer depend on bleve. This solution was chose we intend for the major version number to mostly strongly indicate file format compatibility. Compatibility with major version of bleve (or lack of such a dependency) is therefore another dimension, and we felt most clearly represented with a different package name.Along the way a few other minor changes were required, in all such cases the focus was to keep these as small as possible.