Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to lucene-9.5.0-snapshot-d19c3e2e0ed #92957

Merged
merged 20 commits into from
Jan 19, 2023

Conversation

javanna
Copy link
Member

@javanna javanna commented Jan 16, 2023

9.5 will include several changes related to vector search. An extensive list is available at https://github.com/apache/lucene/milestone/4 .

9.5 will include several changes related to vector search. An extensive list is available at https://github.com/apache/lucene/milestone/4 .
@javanna javanna added :Search/Search Search-related issues that do not fall into other categories >upgrade v8.7.0 labels Jan 16, 2023
@github-actions
Copy link
Contributor

Documentation preview:

@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Jan 16, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine
Copy link
Collaborator

Hi @javanna, I've created a changelog YAML for you.

@javanna
Copy link
Member Author

javanna commented Jan 16, 2023

@romseygeek there's a couple of changes that you may be familiar with: newSynonymQuery signature changed to take a String field argument, as well as visitDocument removed and replaced by document. Could you double check that my changes make sense?

@benwtrent could you double check the vector search related changes? Is there more?

@iverase
Copy link
Contributor

iverase commented Jan 16, 2023

Hello @javanna,

Thanks for working on this. With this new snapshot you will need to remove the following test:

and adjust the loop in the test above. Thanks!

@jpountz
Copy link
Contributor

jpountz commented Jan 16, 2023

In order to reduce noise on nightly investigations, I'd be tempted to disable the Panama-based MMapDirectory at first and enable it in a few days in order to be able to distinguish the impact from this directory (which could affect every single query since it touches the Directory) from other changes by adding -Dorg.apache.lucene.store.MMapDirectory.enableMemorySegments=false to the list of JVM options (see apache/lucene#12062). CC @ChrisHegarty

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We will want to move things over to using the non-deprecated storedFields() methods at some point but I don't think that's necessary immediately, and it may require a bit of re-working in any case.

@ChrisHegarty
Copy link
Contributor

In order to reduce noise on nightly investigations, I'd be tempted to disable the Panama-based MMapDirectory at first and enable it in a few days in order to be able to distinguish the impact from this directory (which could affect every single query since it touches the Directory) from other changes by adding -Dorg.apache.lucene.store.MMapDirectory.enableMemorySegments=false to the list of JVM options (see apache/lucene#12062). CC @ChrisHegarty

@jpountz That seems reasonable. We want to eventually use the new Panama implementation, but as a first step and to better isolate the changes in the upgrade, disabling through the system property initially seems reasonable. We can then remove the system property a few days later (so as to then enable the Panama implementation).

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vector changes are much more broad than this. Everywhere we call getVectorValues needs to handle the getByteVectorValues case in addition to the changes I mentioned in the review.

public void testUnknownField() throws IOException {
// Test isn't relevant, since query is never parsed from xContent
DenseVectorFieldMapper.ElementType elementType() {
return DenseVectorFieldMapper.ElementType.FLOAT;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@javanna
Copy link
Member Author

javanna commented Jan 17, 2023

@romseygeek could you check the HighlighterWithAnalyzersTests » testPhrasePrefix failure? Does it look familiar?

Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving the changes on the H3 library

@benwtrent benwtrent self-assigned this Jan 18, 2023
Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@romseygeek
Copy link
Contributor

The highlighting failure is due to the fact that we don't rewrite queries in the unified highlighter any more, and MatchPhrasePrefixQuery in ES doesn't implement QueryVisitor and was relying on the rewrite to work properly. It's only an issue in ES. I'm working on a fix.

@javanna
Copy link
Member Author

javanna commented Jan 19, 2023

Thanks all for the collaboration, tests are green, we are good to go!

@javanna javanna merged commit edd7749 into elastic:main Jan 19, 2023
@javanna javanna deleted the enhancement/lucene_9_5_snapshot branch January 19, 2023 13:07
javanna added a commit to javanna/elasticsearch that referenced this pull request Jan 25, 2023
When we upgraded to lucene 9.5 (snapshot) with elastic#92957 we initially disable panama-based
mmap directory through a system property. With this commit we remove the system property
and enable java 19 memory segments by default (based on apache/lucene#12033)
javanna added a commit that referenced this pull request Jan 25, 2023
When we upgraded to lucene 9.5 (snapshot) with #92957 we initially disable panama-based
mmap directory through a system property. With this commit we remove the system property
and enable java 19 memory segments by default (based on apache/lucene#12033)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team >upgrade v8.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants