Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vector API integration, plan B #12302

Closed
rmuir opened this issue May 17, 2023 · 31 comments · Fixed by #12311
Closed

vector API integration, plan B #12302

rmuir opened this issue May 17, 2023 · 31 comments · Fixed by #12311

Comments

@rmuir
Copy link
Member

rmuir commented May 17, 2023

Description

For years we have explored using the vector api to actually take advantage of SIMD units on the hardware. A couple of approaches have been attempted so far:

  1. try to coerce the hotspot superword autovectorization into giving better results: I think @jpountz may know details of this and it currently is limited to postings list decode, attempting to use 64-bit long. It is better than nothing, but not good enough. especially for stuff like vectors encoding and other bottlenecks.
  2. vectorize code and hope that the openjdk feature will "graduate" soon. This is not happening. Java is dying and becoming the next COBOL, in my opinion, as a result of ignoring the problem and delaying until "perfection". For evidence, simply look at vector api being in "6th incubation" with really no api changes happening, just waiting on other features (some wet dream project valhalla or whatever) which will prolly never land before we retire.
  3. hackedy-häcks: this is stuff to bypass the problem, such as prototype i wrote in frustration two years ago here: LUCENE-9838: simd version of VectorUtil.dotProduct #18 . It is dirty and insecure but it demonstrates we can potentially make stuff easier on the user and take advantage of the hardware.

Currently, unless the user is extremely technical, they can't make use of the vector support their hardware has, which is terribly sad. If they have immense resources/funding/etc, they can fork lucene and patch the source code, and maintain a fork, hooking in incubator openjdk stuff, but that's too hard on users.

I think we have to draw a line in the sand, basically we can not rely upon openjdk to be managed as a performant project, their decisions make no sense, we have to play a little less nicer and apply some hacks! otherwise give up and switch to a different programming language with better perf!

So I'd suggest to look at the work @uschindler has done with mmap and the preview apis, and let's carve out a path where we use the vector api IFF the user opts in via the command-line.

Proposal (depends entirely upon user's jdk version and supported flags):

  1. user does nothing and runs lucene without special flags: they get a warning message logged to the console (once!) telling them they need to add some stuff to the commandline for best performance. something such as "vector falling back to scalar implementation: please add "--add-modules .x.y.z...."
  2. user supplies that command-line argument and lucene is faster and uses correct incubating vector api associated with their jdk version.

Actually the system @uschindler developed I think is the correct design for this, the only trick is that the incubating api is more difficult than the preview api. So we need more build system support, it could require more stuff to be downloaded or build to be slower. But I think its the right decision?

We don't want to have base64-encoded hackiness that is hard to maintain, at the same time, we need to give the users option to opt-in to actually making use their hardware. I think we should suffer the complexity to make this easy on them. It fucking sucks that openjdk makes this almost impossible, but we need to do it for our users. That's what being a library is all about.

@rmuir
Copy link
Member Author

rmuir commented May 17, 2023

it goes without saying, but i think there are a couple obvious hotspots where we'd want to integrate. I realize containing the complexity may be difficult:

  1. the VectorUtil methods (dotProduct etc). These are kinda simplest and contained and the obvious way to prove the approach.
  2. the postings list FOR/PFOR decode.
  3. possibly the docvalues decode.

@uschindler
Copy link
Contributor

Hi @rmuir,
I agree with your proposals. #12219 looks like a first start: Making the implementations "pluggable" it allows users to have a custom implementation as separate jar file / module. This could also be a solution to add an implementation jar for each java version as a separate Lucene module. For compiling them, we can use exactly the same aproach like for MMapDirectory. Of course we can also include the implementation classes into the main JAR file.

About the compilation:

  • yes, we can extract APIJAR stubs for the verctor API. I would use separate APIJAR stubs for this, especially as it is different java modules. The reason is how the compilation with module pathcing works in my code.
  • actually compiling might be simpler with the APIJAR stubs for incubator modules: They live in a private module hidden by default in the jdk. So the signatures are not visible unless you explicitely add the module to compiler. So here we can use a very simple approach: Just compile the vectorclasses against incubator classes without any module-patching, just add the APIJAR as a normal dependency. As the package names are unique there's no problem! This is the main reason why the APIJAR files need to be separate from the foreign ones.
  • We can use the same approach like for MMAP at runtime: Use the MR-JAR mechanism to separate the classes and hide them from stupid IDEs (actually the current approach would not need a MR-JAR at all, it is just used to "hide" the Java 19, 20, 21 classes from stupid IDEs). Of course user has to pass the --add-module explicitely. I'd not add the module to module-info as this would cause warning for ALL endusers.
  • at a later stage when the vector API goes into preview phase, we can merge all this with the current MMAP code.

@rmuir
Copy link
Member Author

rmuir commented May 17, 2023

Of course we can also include the implementation classes into the main JAR file.

To be clear this is what I propose, not making things pluggable. That's not related and not what I want. I want it to just work, similar to mmap.

Also we could use the approach for the postings and possibly docvalues decode.

@msokolov
Copy link
Contributor

+1 to this - I would be grateful to you all if you are able to get this working. Don't really want to have to die with a sword in my hand to get to the promised land of vector API.

@rmuir
Copy link
Member Author

rmuir commented May 17, 2023

I am at a conference this week but i will try to make a prototype, building upon work uwe has done. for example using the apijar technique to prevent having to juggle N jvms on every build/developer machine. This means we can maintain .java files, hopefully semi-normally not base64d bytecode.

I realize if we do this, it will make the gradle uglier and less elegant. I think theres no way around it, we are re-implementing mr-jar basically. we just have to try to contain the complexity as much as we can... but I really feel this is just something we need to do on behalf of our users.

@uschindler
Copy link
Contributor

I realize if we do this, it will make the gradle uglier and less elegant. I think theres no way around it, we are re-implementing mr-jar basically. we just have to try to contain the complexity as much as we can... but I really feel this is just something we need to do on behalf of our users.

Actually when I did this i made this very self contained. It is just a loop over java versions and a modified "jar" task coping gradle outputs to a different prefix dir in the JAR and setting the manifest bit.

When we add vectors into the game we may need a bit more separation with build dirs / source sets. We currently have "main19", "main20", "main21", but maybe the names should in future include the type (panama, vector), so we can compile separately.

I am ready to help you, just share when you have a branch.

@uschindler
Copy link
Contributor

This does not look too complex: https://github.com/apache/lucene/blob/main/gradle/java/memorysegment-mrjar.gradle

@ChrisHegarty
Copy link
Contributor

ChrisHegarty commented May 17, 2023

I agree with the high-level idea - use the Incubating Vector API internally in Lucene, in a way that is opt-in and also constrained by module/private-SPI/loading techniques (so as to reduce the static usage to only a small defined set of code). We would very much like to use this in Elasticsearch.

I am ready to help you, just share when you have a branch.

Same.

Related, but not directly proposed. Unfortunately (or not), the incubating Vector API must be loaded as part of the boot layer - it must be added by the command line --add-modules flag. Loading into a separate loader is not possible because of the qualified exports from java.base.

@ChrisHegarty
Copy link
Contributor

ChrisHegarty commented May 17, 2023

Correction.

The incubating Vector API must be loaded as part of the boot layer - it must be added by the command line --add-modules flag OR explicitly required by the consuming module. Loading into a separate loader is not possible because of the qualified exports from java.base.

@dweiss
Copy link
Contributor

dweiss commented May 17, 2023

I like this too, especially the part that it basically auto-wires based on whether you enable the corresponding jvm feature on cmd line.

@nknize
Copy link
Member

nknize commented May 17, 2023

2. Java is dying and becoming the next COBOL

😆

let's carve out a path where we use the vector api IFF the user opts in via the command-line.

YES!!! This is timely and the heart of my comment on the proposal to increase KNN dimensionality (is it time we explore a new optional Lucene Vector module that supports cutting edge JDK features through gradle tooling for optimizing the vector use case?).

Actually the system @uschindler developed I think is the correct design for this, .... we need more build system support, it could require more stuff to be downloaded or build to be slower. But I think its the right decision?

💯 agree @uschindler low friction to enable preview is exactly the mechanism I was thinking about in my post. I'm not a gradle expert (ping @markrmiller ) but I think gradle's java toolchain allows for this type of scenario? We did it for enabling preview features on OpenSearch, as did Elasticsearch so why can't we use this path in the upstream lucene project? Thank you for raising this proposal @rmuir

...and lucene is faster and uses correct incubating vector api

One quick question, though. What does this look like in practice? Do we create separate classes in sandbox to keep this isolated? Are you suggesting introducing a new experimental vector module w/ vector encapsulated logic? Or are you suggesting sprinkling if (runtimeVersion >= 19 && runtimeVersion <= 21) { logic around the existing vector implemenations (e.g., VectorUtil and KNNVector stuff)? Or a mix of the above? Or is that what we're here to brainstorm?

@uschindler
Copy link
Contributor

Correction.

The incubating Vector API must be loaded as part of the boot layer - it must be added by the command line --add-modules flag OR explicitly required by the consuming module. Loading into a separate loader is not possible because of the qualified exports from java.base.

My idea was just about compiling. As said before you have to enable it via command line.

@uschindler
Copy link
Contributor

  1. Java is dying and becoming the next COBOL

😆

let's carve out a path where we use the vector api IFF the user opts in via the command-line.

YES!!! This is timely and the heart of my comment on the proposal to increase KNN dimensionality (is it time we explore a new optional Lucene Vector module that supports cutting edge JDK features through gradle tooling for optimizing the vector use case?).

Actually the system @uschindler developed I think is the correct design for this, .... we need more build system support, it could require more stuff to be downloaded or build to be slower. But I think its the right decision?

💯 agree @uschindler low friction to enable preview is exactly the mechanism I was thinking about in my post. I'm not a gradle expert (ping @markrmiller ) but I think gradle's java toolchain allows for this type of scenario? We did it for enabling preview features on OpenSearch, as did Elasticsearch so why can't we use this path in the upstream lucene project? Thank you for raising this proposal @rmuir

...and lucene is faster and uses correct incubating vector api

One quick question, though. What does this look like in practice? Do we create separate classes in sandbox to keep this isolated? Are you suggesting introducing a new experimental vector module w/ vector encapsulated logic? Or are you suggesting sprinkling if (runtimeVersion >= 19 && runtimeVersion <= 21) { logic around the existing vector implemenations (e.g., VectorUtil and KNNVector stuff)? Or a mix of the above? Or is that what we're here to brainstorm?

The compilation problem is already solved. No need for toolchains. We tried that already, it's a mess especially because it does not support ea releases. Read the code referred above for memory mapping. Works identical for vectors and incubator. We compile against stubs.

@contrebande-labs
Copy link

Is there room for a helping hand here? Tests, benchmarks, continuous integration, documentation, anything ?

@rmuir
Copy link
Member Author

rmuir commented May 18, 2023

yes, as nothing has been done yet :) I personally plan to start a branch if nobody beats me to it, but i'm stuck in a conference and without bandwidth this week.

You can get an idea of what it takes to add support for every JDK release here using Uwe's mmap setup: #12294

It is for a preview api, but i think most of the logic applies. maybe the only difference here being at "read time" where we may have to load the class differently and use methodhandle guard (like the crappy prototype i posted in the description). But we can maintain .java files and not use base64! :)

There are also some relevant special jenkins build jobs and stuff and yeah, we should be testing against EA releases if possible to stay current, just like Uwe does for mmap.

@ChrisHegarty
Copy link
Contributor

To better understand the existing "non-final JDK API stub" mechanism, I quickly put together the small set of changes that we need to get started - that generates the Vector API stubs, #12311. We can use this, or not, but it's just a start. We'll need same for JDK 21 EA, but that is trivial, and needed for Foreign too.

@uschindler The mechanism looks good. As you said, it's compile-time only. If we felt the need, we could choose to move the implementations (in org.apache.lucene.store) to a non-exported package, but that is not strictly needed - I'm just pondering if any runtime restrictions would be beneficial.

@ChrisHegarty
Copy link
Contributor

ChrisHegarty commented May 18, 2023

Next I might look at refactoring the internals of VectorUtil, and have a different implementation for JDK 20, using a similar guard mechanism as is done for mmap.

@ChrisHegarty
Copy link
Contributor

ChrisHegarty commented May 18, 2023

I updated #12311, with a basic dotProduct(float[], float[]). Maybe this is a reasonable place to start.

@ChrisHegarty
Copy link
Contributor

ChrisHegarty commented May 18, 2023

BTW, and not even remotely suggested - I'm not trying to take over this work, just sketch out a few concrete things to help get it started. Whatever collaboration mechanism is typically used when developing such changes for Lucene should also be used here. How should we best work together on this? where will the code/branch live?

@rmuir
Copy link
Member Author

rmuir commented May 18, 2023

thank you for getting it started: it is great. Lets just iterate forwards with your branch?

personally ive never had an issue collaborating on a fork like this (such as ChrisHegarty:panama_vector). In the worst case anyone can send PRs to you to make changes. I think if you are a committer to lucene you can probably push directly to the branch without PR.

you can always create a panama_vector branch under github.com/apache/lucene if you prefer that.

@uschindler
Copy link
Contributor

By default all Lucene committers can commit to your branch. This is enabled by default and you had to agree when creating the PR.

@ChrisHegarty
Copy link
Contributor

Ok, cool. Committers please commit directly. Consider the branch in my personal fork as our shared place for collaboration. Let me know if you encounter any issues.

@contrebande-labs
Copy link

@ChrisHegarty said Ok, cool. Committers please commit directly. Consider the branch in my personal fork as our shared place for collaboration. Let me know if you encounter any issues.

I'm not a commiter. Should I fork your fork and do PRs on it? And is this the branch we should base my work on? I also think that PRs are good for collaboration. But if someone can maintain a small, workable bounty list somewhere, I think that'd be great! Maybe we can do a voice call to gather and break down all the ideas ? Or you can submit your ideas here and I can try to maintain the bounty list myself in a Github Kanban/Project on my own fork....

@rmuir said: You can get an idea of what it takes to add support for every JDK release here using Uwe's mmap setup

So, it seems like official support for preview/incubating/experimental/jvm-sensitive features will go back to two LTSes ago? Therefore, when Java 21 is released, you will maintain experimental extensions available since Java 17. Is this correct? If not, when are you planning to drop support for 19 and 20 ? And what will be the policy in the future?

@rmuir
Copy link
Member Author

rmuir commented May 18, 2023

I don't think there is any official plan or anything here. Sorry I don't really have any answers.

I dont think there is any rule on which jdk versions are supported. For mmap, the ones supported are the ones that @uschindler supports, he is shouldering all of the maintenance work.

Personally I think, for this one as a start, it is an incubating module and so it is acceptable to only support the very latest jdk release... If someone wants to use the bleeding edge then they must use the bleeding edge. Let's start minimal and try to reduce our maintenance costs.

@rmuir
Copy link
Member Author

rmuir commented May 18, 2023

I'm not a commiter. Should I fork your fork and do PRs on it? And is this the branch we should base my work on?

It is the correct branch. I would just send @ChrisHegarty a pull request if u have contributions. I am a committee but Might do the same thing myself if I have some time to hack on it soon. Pull requests are nice since we can review each other's changes.

@rmuir
Copy link
Member Author

rmuir commented May 18, 2023

Sorry for the chaotic typos/answers, on a phone.

@contrebande-labs
Copy link

From what I can see, the classes/files modified so far are:

Would it make sense for me to port and maintain these changes to either/or Java 17, Java 18, Java 19 .. and Java 21 based on @ChrisHegarty's original Java 20 code? Also, should I start a panama-specific CI stack on my Github account with unit tests and benchmarks (I could write these too). Whatever you guys think is the priority...

@uschindler
Copy link
Contributor

uschindler commented May 19, 2023

I don't understand what's your intention is. The general setup is there. No infrastructure work needed anymore. The decision which variants of the code life in parallel is unrelated to this PR.

Please keep in mind that not all code of Lucene can/may use those APIs at the moment, because vector specific apis can't be part of Lucene public API. So the pattern for other places would be similar to this one. Implement all algorithms with a simple API in the provider and call it from VectorUtil. More complex integrations like postings lists may need more thoughts. Some of them may only be embedded in Lucene 11 (like directly multiplying vectors residing in off-heap memory segments need API changes using not yet public apis).

The amount of work for keeping support for various java versions depends on the number of implementations in the provider and how complex the glue code to lucene is. The current vector dot product is so simple, you could duplicate the code to support various versions easily within minutes.

But for now let's start with java 20. The vector code in java 17 is outdated and has several performance traps. Memorymapping using segments was added in 19 to Lucene. So don't let's go before that point in time.

As Robert said: in contrast to mmap which is a preview API and FULLY supported by Lucene, the incubator code requires users to add command line flags and the apis are still in flux, I would only keep rather recent versions.

We do not need to make any guarantee to which versions are supported. If a specific version falls out of our support matrix, the code will fall back to the implementation that's already there. You can be sure that we will make no official guarantees. It may change with every Lucene release. It will not break for anybody, just get slower, so no support guarantees are needed. We are an open source project, not a commercial project!

@ChrisHegarty
Copy link
Contributor

++ to all of what @uschindler and @rmuir said relating to which JDK versions we add support for. Specifically, let's start with JDK 20 only. After which we can prepare for, and test with, JDK 21 EA.

@contrebande-labs
Copy link

Guys, relax. My intention is to HELP. I was trying to find something that can be done in parallel to what @ChrisHegarty is doing so I don't step on his toes. If you want help, let me know. And be specific about it.

@kwatters
Copy link
Contributor

For what it's worth, I did an alternative impl of vector utils using nd4j to compute cosine similarity... And shockingly, it was not any faster. I am very much in favor of having interfaces here so we can experiment with native implementations...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants