Integrate the Incubating Panama Vector API #12311

ChrisHegarty · 2023-05-18T12:37:08Z

Leverage accelerated vector hardware instructions in Vector Search.

Lucene already has a mechanism that enables the use of non-final JDK APIs, currently used for the Previewing Pamana Foreign API. This change expands this mechanism to include the Incubating Pamana Vector API. When the jdk.incubator.vector module is present at run time the Panamaized version of the low-level primitives used by Vector Search is enabled. If not present, the default scalar version of these low-level primitives is used (as it was previously).

Currently, we're only targeting support for JDK 20. A subsequent PR should evaluate JDK 21, which is still in development.

rmuir

When i looked at this before, this was my impl:
https://issues.apache.org/jira/secure/attachment/13022378/LUCENE-9838_standalone.patch

Did a lot of jmh benchmarking at varying vector sizes/hardware to make sure there isn't crazy regressions: here is my benchmark in case it is useful: DotProduct.java.txt

basically I can explain a few differences:

lets avoid fma() completely. It can make the whole thing dog slow even in "normal"/"typical" environments. fma is frequently unavailable in production-type environments (e.g. VMS) and openjdk DOES NOT fall back gracefully, it falls back horribly slow. see Ban use of Math.fma across the entire codebase #12014 for more details. And its definitely not good to use fma() in the vector impl but then inconsistently use */+ in the tail.
we have to manually unroll at least a little for performance, java doesnt' do it. I recommend just doing it twice like my patch.

edit: attached my old jmh benchmark as a text file here

ChrisHegarty · 2023-05-18T16:19:12Z

Thanks @rmuir - I just merged in your implementation. I think that it's a much much better starting (if not the final) place. This might be a reasonable minimal point to start from.

Before digging too deeply into the performance and optimising the code, I guess I just want to understand if this is the right level to be plugging in at.

uschindler · 2023-05-18T17:13:48Z

I'd prefer to have separate apijars, because the current code compiles with patching base module.

I'd like to separate this. But as a start it is ok.

uschindler · 2023-05-18T17:20:46Z

On the other hand: it just works! 😉

gradle/generation/panama-foreign.gradle

lucene/core/src/java/org/apache/lucene/util/VectorUtil.java

lucene/core/src/java20/org/apache/lucene/util/JDKVectorUtilProvider.java

lucene/core/src/java/org/apache/lucene/util/VectorUtil.java

uschindler · 2023-05-18T18:57:10Z

Is vector's FMA also always slow (does it use BigDecimal, too?).

ChrisHegarty · 2023-05-18T20:05:54Z

I'd prefer to have separate apijars, because the current code compiles with patching base module.
On the other hand: it just works! 😉

Yeah, this is a bit of a hack!. It would be better to separate these out, but then what would we do, still patch it into java.base or build a slim module around it or something? It doesn't feel any better than just patching it all into java.base!

ChrisHegarty · 2023-05-18T20:08:32Z

Is vector's FMA also always slow (does it use BigDecimal, too?).

I dunno what it does - I haven't looked - but I doubt it falls back to BD. I'll take a look and do some experiments.

rmuir · 2023-05-18T20:41:29Z

I really wish Math.fma fell back to sane behavior such as */+ and only StrictMath.fma did the slow big decimal stuff! Not good decisionmaking here on these apis.

uschindler · 2023-05-18T21:25:29Z

I'd prefer to have separate apijars, because the current code compiles with patching base module.
On the other hand: it just works! 😉

Yeah, this is a bit of a hack!. It would be better to separate these out, but then what would we do, still patch it into java.base or build a slim module around it or something? It doesn't feel any better than just patching it all into java.base!

Let's keep it as is. The whole compile is a hack, modules do not matter.

With separate apijar we could also add it to classpath as the package names are not special at all.

… will likely grow over time

gradle/testing/defaults-tests.gradle

ChrisHegarty · 2023-05-19T11:33:28Z

I refactored the provider and impl's:

So as to separate them out from VectorUtil - this should improve readability, etc, as we move beyond dotProduct.
I also moved them into a it's own non-exported package.

I'm less sure about no.2. The general thought was that the code might be more reusable from there, but now that I think about it, it might be better as package-private where it was, since the "interface" is through VectorUtils - not directly to the imp. Thoughts?

gradle/testing/defaults-tests.gradle

Most relevant changes: - add api to allow concurrent query rewrite (GITHUB-11838 Add api to allow concurrent query rewrite apache/lucene#11840) - knn query rewrite (Concurrent rewrite for KnnVectorQuery apache/lucene#12160) - Integrate the incubating Panama Vector API (Integrate the Incubating Panama Vector API apache/lucene#12311) As part of this commit I moved the ES codebase off of overriding or relying on the deprecated rewrite(IndexReader) method in favour of using rewrite(IndexSearcher) instead. For score functions, I went for not breaking existing plugins and create a new IndexSearcher whenever we rewrite a filter, otherwise we'd need to change the ScoreFunction#rewrite signature to take a searcher instead of a reader. Co-authored-by: ChrisHegarty <christopher.hegarty@elastic.co>

mikemccand · 2023-06-03T14:33:26Z

I ran @rmuir's vectorbench on a new Raptor Lake build (i9-13900K).

Note that this CPU does NOT seem to support AVX-512:

flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req hfi umip pku ospke waitpkg gfni vaes vpclmulqdq tme rdpid movdiri movdir64b fsrm md_clear serialize pconfig arch_lbr ibt flush_l1d arch_capabilities
vmx flags       : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs ept_mode_based_exec tsc_scaling usr_wait_pause

Benchy results:

Benchmark                                (size)   Mode  Cnt    Score    Error   Units
BinaryCosineBenchmark.cosineDistanceNew       1  thrpt    5  128.894 ±  0.021  ops/us
BinaryCosineBenchmark.cosineDistanceNew     128  thrpt    5   64.367 ±  0.062  ops/us
BinaryCosineBenchmark.cosineDistanceNew     207  thrpt    5   37.563 ±  0.030  ops/us
BinaryCosineBenchmark.cosineDistanceNew     256  thrpt    5   36.636 ±  0.004  ops/us
BinaryCosineBenchmark.cosineDistanceNew     300  thrpt    5   31.305 ±  0.012  ops/us
BinaryCosineBenchmark.cosineDistanceNew     512  thrpt    5   20.769 ±  0.008  ops/us
BinaryCosineBenchmark.cosineDistanceNew     702  thrpt    5   13.757 ±  0.017  ops/us
BinaryCosineBenchmark.cosineDistanceNew    1024  thrpt    5   10.201 ±  0.008  ops/us
BinaryCosineBenchmark.cosineDistanceOld       1  thrpt    5  128.889 ±  0.095  ops/us
BinaryCosineBenchmark.cosineDistanceOld     128  thrpt    5   14.177 ±  0.064  ops/us
BinaryCosineBenchmark.cosineDistanceOld     207  thrpt    5    8.967 ±  0.023  ops/us
BinaryCosineBenchmark.cosineDistanceOld     256  thrpt    5    7.295 ±  0.014  ops/us
BinaryCosineBenchmark.cosineDistanceOld     300  thrpt    5    6.215 ±  0.034  ops/us
BinaryCosineBenchmark.cosineDistanceOld     512  thrpt    5    3.709 ±  0.003  ops/us
BinaryCosineBenchmark.cosineDistanceOld     702  thrpt    5    2.714 ±  0.005  ops/us
BinaryCosineBenchmark.cosineDistanceOld    1024  thrpt    5    1.872 ±  0.001  ops/us
BinaryDotProductBenchmark.dotProductNew    1024  thrpt    5   23.562 ±  0.025  ops/us
BinaryDotProductBenchmark.dotProductOld    1024  thrpt    5    3.885 ±  0.082  ops/us
BinarySquareBenchmark.squareDistanceNew       1  thrpt    5  522.630 ±  4.081  ops/us
BinarySquareBenchmark.squareDistanceNew     128  thrpt    5  101.951 ±  4.427  ops/us
BinarySquareBenchmark.squareDistanceNew     207  thrpt    5   65.050 ±  0.254  ops/us
BinarySquareBenchmark.squareDistanceNew     256  thrpt    5   60.495 ±  0.922  ops/us
BinarySquareBenchmark.squareDistanceNew     300  thrpt    5   51.767 ±  0.042  ops/us
BinarySquareBenchmark.squareDistanceNew     512  thrpt    5   32.832 ±  0.022  ops/us
BinarySquareBenchmark.squareDistanceNew     702  thrpt    5   23.786 ±  0.018  ops/us
BinarySquareBenchmark.squareDistanceNew    1024  thrpt    5   17.062 ±  0.145  ops/us
BinarySquareBenchmark.squareDistanceOld       1  thrpt    5  529.882 ±  1.503  ops/us
BinarySquareBenchmark.squareDistanceOld     128  thrpt    5   32.478 ±  0.037  ops/us
BinarySquareBenchmark.squareDistanceOld     207  thrpt    5   20.901 ±  0.023  ops/us
BinarySquareBenchmark.squareDistanceOld     256  thrpt    5   16.644 ±  0.070  ops/us
BinarySquareBenchmark.squareDistanceOld     300  thrpt    5   14.502 ±  0.111  ops/us
BinarySquareBenchmark.squareDistanceOld     512  thrpt    5    8.703 ±  0.050  ops/us
BinarySquareBenchmark.squareDistanceOld     702  thrpt    5    6.473 ±  0.013  ops/us
BinarySquareBenchmark.squareDistanceOld    1024  thrpt    5    4.454 ±  0.016  ops/us
FloatCosineBenchmark.cosineNew                1  thrpt    5  395.222 ±  3.364  ops/us
FloatCosineBenchmark.cosineNew                4  thrpt    5  275.572 ±  2.528  ops/us
FloatCosineBenchmark.cosineNew                6  thrpt    5  217.377 ±  0.561  ops/us
FloatCosineBenchmark.cosineNew                8  thrpt    5  192.311 ±  1.492  ops/us
FloatCosineBenchmark.cosineNew               13  thrpt    5  143.959 ±  0.061  ops/us
FloatCosineBenchmark.cosineNew               16  thrpt    5  127.340 ±  0.181  ops/us
FloatCosineBenchmark.cosineNew               25  thrpt    5  116.471 ±  0.219  ops/us
FloatCosineBenchmark.cosineNew               32  thrpt    5  117.458 ±  0.031  ops/us
FloatCosineBenchmark.cosineNew               64  thrpt    5  100.845 ±  0.016  ops/us
FloatCosineBenchmark.cosineNew              100  thrpt    5   73.392 ±  0.129  ops/us
FloatCosineBenchmark.cosineNew              128  thrpt    5   79.363 ±  0.012  ops/us
FloatCosineBenchmark.cosineNew              207  thrpt    5   48.170 ±  0.027  ops/us
FloatCosineBenchmark.cosineNew              256  thrpt    5   43.298 ±  0.103  ops/us
FloatCosineBenchmark.cosineNew              300  thrpt    5   36.302 ±  0.017  ops/us
FloatCosineBenchmark.cosineNew              512  thrpt    5   26.113 ±  0.076  ops/us
FloatCosineBenchmark.cosineNew              702  thrpt    5   19.034 ±  0.008  ops/us
FloatCosineBenchmark.cosineNew             1024  thrpt    5   16.945 ±  0.052  ops/us
FloatCosineBenchmark.cosineOld                1  thrpt    5  398.987 ±  0.675  ops/us
FloatCosineBenchmark.cosineOld                4  thrpt    5  279.282 ±  4.223  ops/us
FloatCosineBenchmark.cosineOld                6  thrpt    5  220.884 ±  5.144  ops/us
FloatCosineBenchmark.cosineOld                8  thrpt    5  196.722 ±  0.389  ops/us
FloatCosineBenchmark.cosineOld               13  thrpt    5  146.701 ±  0.832  ops/us
FloatCosineBenchmark.cosineOld               16  thrpt    5  130.186 ±  0.280  ops/us
FloatCosineBenchmark.cosineOld               25  thrpt    5   87.526 ±  0.083  ops/us
FloatCosineBenchmark.cosineOld               32  thrpt    5   70.398 ±  0.124  ops/us
FloatCosineBenchmark.cosineOld               64  thrpt    5   35.020 ±  0.007  ops/us
FloatCosineBenchmark.cosineOld              100  thrpt    5   21.121 ±  0.009  ops/us
FloatCosineBenchmark.cosineOld              128  thrpt    5   16.276 ±  0.008  ops/us
FloatCosineBenchmark.cosineOld              207  thrpt    5   10.017 ±  0.002  ops/us
FloatCosineBenchmark.cosineOld              256  thrpt    5    8.085 ±  0.002  ops/us
FloatCosineBenchmark.cosineOld              300  thrpt    5    6.882 ±  0.001  ops/us
FloatCosineBenchmark.cosineOld              512  thrpt    5    3.981 ±  0.009  ops/us
FloatCosineBenchmark.cosineOld              702  thrpt    5    2.900 ±  0.008  ops/us
FloatCosineBenchmark.cosineOld             1024  thrpt    5    1.990 ±  0.001  ops/us
FloatDotProductBenchmark.dotProductNew        1  thrpt    5  482.634 ±  0.308  ops/us
FloatDotProductBenchmark.dotProductNew        4  thrpt    5  358.350 ±  0.814  ops/us
FloatDotProductBenchmark.dotProductNew        6  thrpt    5  299.456 ±  9.216  ops/us
FloatDotProductBenchmark.dotProductNew        8  thrpt    5  282.228 ±  0.560  ops/us
FloatDotProductBenchmark.dotProductNew       13  thrpt    5  237.520 ±  0.758  ops/us
FloatDotProductBenchmark.dotProductNew       16  thrpt    5  226.653 ±  0.598  ops/us
FloatDotProductBenchmark.dotProductNew       25  thrpt    5  203.128 ±  0.136  ops/us
FloatDotProductBenchmark.dotProductNew       32  thrpt    5  234.430 ±  2.885  ops/us
FloatDotProductBenchmark.dotProductNew       64  thrpt    5  199.576 ±  1.049  ops/us
FloatDotProductBenchmark.dotProductNew      100  thrpt    5  143.859 ±  0.351  ops/us
FloatDotProductBenchmark.dotProductNew      128  thrpt    5  163.681 ±  1.253  ops/us
FloatDotProductBenchmark.dotProductNew      207  thrpt    5   98.173 ±  1.241  ops/us
FloatDotProductBenchmark.dotProductNew      256  thrpt    5   95.487 ±  0.053  ops/us
FloatDotProductBenchmark.dotProductNew      300  thrpt    5   64.907 ±  0.056  ops/us
FloatDotProductBenchmark.dotProductNew      512  thrpt    5   62.076 ±  0.192  ops/us
FloatDotProductBenchmark.dotProductNew      702  thrpt    5   33.659 ±  0.053  ops/us
FloatDotProductBenchmark.dotProductNew     1024  thrpt    5   29.892 ±  0.188  ops/us
FloatDotProductBenchmark.dotProductOld        1  thrpt    5  558.982 ± 14.484  ops/us
FloatDotProductBenchmark.dotProductOld        4  thrpt    5  425.068 ±  3.916  ops/us
FloatDotProductBenchmark.dotProductOld        6  thrpt    5  398.557 ±  1.348  ops/us
FloatDotProductBenchmark.dotProductOld        8  thrpt    5  359.623 ±  0.203  ops/us
FloatDotProductBenchmark.dotProductOld       13  thrpt    5  262.966 ±  0.098  ops/us
FloatDotProductBenchmark.dotProductOld       16  thrpt    5  229.867 ±  0.083  ops/us
FloatDotProductBenchmark.dotProductOld       25  thrpt    5  165.441 ±  0.115  ops/us
FloatDotProductBenchmark.dotProductOld       32  thrpt    5  152.221 ±  0.138  ops/us
FloatDotProductBenchmark.dotProductOld       64  thrpt    5   85.443 ±  0.270  ops/us
FloatDotProductBenchmark.dotProductOld      100  thrpt    5   53.636 ±  0.020  ops/us
FloatDotProductBenchmark.dotProductOld      128  thrpt    5   42.828 ±  0.023  ops/us
FloatDotProductBenchmark.dotProductOld      207  thrpt    5   26.981 ±  0.055  ops/us
FloatDotProductBenchmark.dotProductOld      256  thrpt    5   21.944 ±  0.157  ops/us
FloatDotProductBenchmark.dotProductOld      300  thrpt    5   18.856 ±  0.011  ops/us
FloatDotProductBenchmark.dotProductOld      512  thrpt    5   11.330 ±  0.011  ops/us
FloatDotProductBenchmark.dotProductOld      702  thrpt    5    8.150 ±  0.006  ops/us
FloatDotProductBenchmark.dotProductOld     1024  thrpt    5    5.814 ±  0.006  ops/us
FloatSquareBenchmark.squareNew                1  thrpt    5  479.897 ±  0.414  ops/us
FloatSquareBenchmark.squareNew                4  thrpt    5  347.548 ±  5.824  ops/us
FloatSquareBenchmark.squareNew                6  thrpt    5  320.104 ±  3.070  ops/us
FloatSquareBenchmark.squareNew                8  thrpt    5  272.376 ±  2.516  ops/us
FloatSquareBenchmark.squareNew               13  thrpt    5  236.600 ±  1.357  ops/us
FloatSquareBenchmark.squareNew               16  thrpt    5  225.289 ±  0.361  ops/us
FloatSquareBenchmark.squareNew               25  thrpt    5  201.074 ±  0.363  ops/us
FloatSquareBenchmark.squareNew               32  thrpt    5  222.044 ±  1.173  ops/us
FloatSquareBenchmark.squareNew               64  thrpt    5  192.298 ±  2.776  ops/us
FloatSquareBenchmark.squareNew              100  thrpt    5  131.676 ±  0.082  ops/us
FloatSquareBenchmark.squareNew              128  thrpt    5  144.401 ±  1.032  ops/us
FloatSquareBenchmark.squareNew              207  thrpt    5   85.532 ±  0.490  ops/us
FloatSquareBenchmark.squareNew              256  thrpt    5   79.800 ±  0.023  ops/us
FloatSquareBenchmark.squareNew              300  thrpt    5   67.323 ±  0.287  ops/us
FloatSquareBenchmark.squareNew              512  thrpt    5   43.781 ±  0.029  ops/us
FloatSquareBenchmark.squareNew              702  thrpt    5   27.687 ±  0.008  ops/us
FloatSquareBenchmark.squareNew             1024  thrpt    5   20.829 ±  0.131  ops/us
FloatSquareBenchmark.squareOld                1  thrpt    5  479.053 ±  0.635  ops/us
FloatSquareBenchmark.squareOld                4  thrpt    5  345.476 ±  3.422  ops/us
FloatSquareBenchmark.squareOld                6  thrpt    5  320.319 ±  0.476  ops/us
FloatSquareBenchmark.squareOld                8  thrpt    5  347.901 ±  0.762  ops/us
FloatSquareBenchmark.squareOld               13  thrpt    5  223.134 ±  0.950  ops/us
FloatSquareBenchmark.squareOld               16  thrpt    5  213.299 ±  0.439  ops/us
FloatSquareBenchmark.squareOld               25  thrpt    5  141.332 ±  1.297  ops/us
FloatSquareBenchmark.squareOld               32  thrpt    5  117.027 ±  0.792  ops/us
FloatSquareBenchmark.squareOld               64  thrpt    5   63.886 ±  0.034  ops/us
FloatSquareBenchmark.squareOld              100  thrpt    5   42.088 ±  0.016  ops/us
FloatSquareBenchmark.squareOld              128  thrpt    5   33.163 ±  0.007  ops/us
FloatSquareBenchmark.squareOld              207  thrpt    5   19.855 ±  0.098  ops/us
FloatSquareBenchmark.squareOld              256  thrpt    5   16.870 ±  0.043  ops/us
FloatSquareBenchmark.squareOld              300  thrpt    5   14.326 ±  0.973  ops/us
FloatSquareBenchmark.squareOld              512  thrpt    5    8.837 ±  0.047  ops/us
FloatSquareBenchmark.squareOld              702  thrpt    5    6.299 ±  0.042  ops/us
FloatSquareBenchmark.squareOld             1024  thrpt    5    4.351 ±  0.062  ops/us

markrmiller · 2023-06-03T15:22:27Z

They killed AVX-512 on consumer hardware a while back. And they really don’t want you to have it. First they killed it and the BIOS makers figured out how you could enable it anyway, and so then they hardware killed it. Brilliant move that I bet is just funneling business to their server CPUs hand over fist.

On Sat, Jun 3, 2023 at 9:33 AM Michael McCandless ***@***.***> wrote: I ran @rmuir <https://github.com/rmuir>'s vectorbench on a new Raptor Lake build (i9-13900K). Note that this CPU does NOT seem to support AVX-512: flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req hfi umip pku ospke waitpkg gfni vaes vpclmulqdq tme rdpid movdiri movdir64b fsrm md_clear serialize pconfig arch_lbr ibt flush_l1d arch_capabilities vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs ept_mode_based_exec tsc_scaling usr_wait_pause Benchy results: Benchmark (size) Mode Cnt Score Error Units BinaryCosineBenchmark.cosineDistanceNew 1 thrpt 5 128.894 ± 0.021 ops/us BinaryCosineBenchmark.cosineDistanceNew 128 thrpt 5 64.367 ± 0.062 ops/us BinaryCosineBenchmark.cosineDistanceNew 207 thrpt 5 37.563 ± 0.030 ops/us BinaryCosineBenchmark.cosineDistanceNew 256 thrpt 5 36.636 ± 0.004 ops/us BinaryCosineBenchmark.cosineDistanceNew 300 thrpt 5 31.305 ± 0.012 ops/us BinaryCosineBenchmark.cosineDistanceNew 512 thrpt 5 20.769 ± 0.008 ops/us BinaryCosineBenchmark.cosineDistanceNew 702 thrpt 5 13.757 ± 0.017 ops/us BinaryCosineBenchmark.cosineDistanceNew 1024 thrpt 5 10.201 ± 0.008 ops/us BinaryCosineBenchmark.cosineDistanceOld 1 thrpt 5 128.889 ± 0.095 ops/us BinaryCosineBenchmark.cosineDistanceOld 128 thrpt 5 14.177 ± 0.064 ops/us BinaryCosineBenchmark.cosineDistanceOld 207 thrpt 5 8.967 ± 0.023 ops/us BinaryCosineBenchmark.cosineDistanceOld 256 thrpt 5 7.295 ± 0.014 ops/us BinaryCosineBenchmark.cosineDistanceOld 300 thrpt 5 6.215 ± 0.034 ops/us BinaryCosineBenchmark.cosineDistanceOld 512 thrpt 5 3.709 ± 0.003 ops/us BinaryCosineBenchmark.cosineDistanceOld 702 thrpt 5 2.714 ± 0.005 ops/us BinaryCosineBenchmark.cosineDistanceOld 1024 thrpt 5 1.872 ± 0.001 ops/us BinaryDotProductBenchmark.dotProductNew 1024 thrpt 5 23.562 ± 0.025 ops/us BinaryDotProductBenchmark.dotProductOld 1024 thrpt 5 3.885 ± 0.082 ops/us BinarySquareBenchmark.squareDistanceNew 1 thrpt 5 522.630 ± 4.081 ops/us BinarySquareBenchmark.squareDistanceNew 128 thrpt 5 101.951 ± 4.427 ops/us BinarySquareBenchmark.squareDistanceNew 207 thrpt 5 65.050 ± 0.254 ops/us BinarySquareBenchmark.squareDistanceNew 256 thrpt 5 60.495 ± 0.922 ops/us BinarySquareBenchmark.squareDistanceNew 300 thrpt 5 51.767 ± 0.042 ops/us BinarySquareBenchmark.squareDistanceNew 512 thrpt 5 32.832 ± 0.022 ops/us BinarySquareBenchmark.squareDistanceNew 702 thrpt 5 23.786 ± 0.018 ops/us BinarySquareBenchmark.squareDistanceNew 1024 thrpt 5 17.062 ± 0.145 ops/us BinarySquareBenchmark.squareDistanceOld 1 thrpt 5 529.882 ± 1.503 ops/us BinarySquareBenchmark.squareDistanceOld 128 thrpt 5 32.478 ± 0.037 ops/us BinarySquareBenchmark.squareDistanceOld 207 thrpt 5 20.901 ± 0.023 ops/us BinarySquareBenchmark.squareDistanceOld 256 thrpt 5 16.644 ± 0.070 ops/us BinarySquareBenchmark.squareDistanceOld 300 thrpt 5 14.502 ± 0.111 ops/us BinarySquareBenchmark.squareDistanceOld 512 thrpt 5 8.703 ± 0.050 ops/us BinarySquareBenchmark.squareDistanceOld 702 thrpt 5 6.473 ± 0.013 ops/us BinarySquareBenchmark.squareDistanceOld 1024 thrpt 5 4.454 ± 0.016 ops/us FloatCosineBenchmark.cosineNew 1 thrpt 5 395.222 ± 3.364 ops/us FloatCosineBenchmark.cosineNew 4 thrpt 5 275.572 ± 2.528 ops/us FloatCosineBenchmark.cosineNew 6 thrpt 5 217.377 ± 0.561 ops/us FloatCosineBenchmark.cosineNew 8 thrpt 5 192.311 ± 1.492 ops/us FloatCosineBenchmark.cosineNew 13 thrpt 5 143.959 ± 0.061 ops/us FloatCosineBenchmark.cosineNew 16 thrpt 5 127.340 ± 0.181 ops/us FloatCosineBenchmark.cosineNew 25 thrpt 5 116.471 ± 0.219 ops/us FloatCosineBenchmark.cosineNew 32 thrpt 5 117.458 ± 0.031 ops/us FloatCosineBenchmark.cosineNew 64 thrpt 5 100.845 ± 0.016 ops/us FloatCosineBenchmark.cosineNew 100 thrpt 5 73.392 ± 0.129 ops/us FloatCosineBenchmark.cosineNew 128 thrpt 5 79.363 ± 0.012 ops/us FloatCosineBenchmark.cosineNew 207 thrpt 5 48.170 ± 0.027 ops/us FloatCosineBenchmark.cosineNew 256 thrpt 5 43.298 ± 0.103 ops/us FloatCosineBenchmark.cosineNew 300 thrpt 5 36.302 ± 0.017 ops/us FloatCosineBenchmark.cosineNew 512 thrpt 5 26.113 ± 0.076 ops/us FloatCosineBenchmark.cosineNew 702 thrpt 5 19.034 ± 0.008 ops/us FloatCosineBenchmark.cosineNew 1024 thrpt 5 16.945 ± 0.052 ops/us FloatCosineBenchmark.cosineOld 1 thrpt 5 398.987 ± 0.675 ops/us FloatCosineBenchmark.cosineOld 4 thrpt 5 279.282 ± 4.223 ops/us FloatCosineBenchmark.cosineOld 6 thrpt 5 220.884 ± 5.144 ops/us FloatCosineBenchmark.cosineOld 8 thrpt 5 196.722 ± 0.389 ops/us FloatCosineBenchmark.cosineOld 13 thrpt 5 146.701 ± 0.832 ops/us FloatCosineBenchmark.cosineOld 16 thrpt 5 130.186 ± 0.280 ops/us FloatCosineBenchmark.cosineOld 25 thrpt 5 87.526 ± 0.083 ops/us FloatCosineBenchmark.cosineOld 32 thrpt 5 70.398 ± 0.124 ops/us FloatCosineBenchmark.cosineOld 64 thrpt 5 35.020 ± 0.007 ops/us FloatCosineBenchmark.cosineOld 100 thrpt 5 21.121 ± 0.009 ops/us FloatCosineBenchmark.cosineOld 128 thrpt 5 16.276 ± 0.008 ops/us FloatCosineBenchmark.cosineOld 207 thrpt 5 10.017 ± 0.002 ops/us FloatCosineBenchmark.cosineOld 256 thrpt 5 8.085 ± 0.002 ops/us FloatCosineBenchmark.cosineOld 300 thrpt 5 6.882 ± 0.001 ops/us FloatCosineBenchmark.cosineOld 512 thrpt 5 3.981 ± 0.009 ops/us FloatCosineBenchmark.cosineOld 702 thrpt 5 2.900 ± 0.008 ops/us FloatCosineBenchmark.cosineOld 1024 thrpt 5 1.990 ± 0.001 ops/us FloatDotProductBenchmark.dotProductNew 1 thrpt 5 482.634 ± 0.308 ops/us FloatDotProductBenchmark.dotProductNew 4 thrpt 5 358.350 ± 0.814 ops/us FloatDotProductBenchmark.dotProductNew 6 thrpt 5 299.456 ± 9.216 ops/us FloatDotProductBenchmark.dotProductNew 8 thrpt 5 282.228 ± 0.560 ops/us FloatDotProductBenchmark.dotProductNew 13 thrpt 5 237.520 ± 0.758 ops/us FloatDotProductBenchmark.dotProductNew 16 thrpt 5 226.653 ± 0.598 ops/us FloatDotProductBenchmark.dotProductNew 25 thrpt 5 203.128 ± 0.136 ops/us FloatDotProductBenchmark.dotProductNew 32 thrpt 5 234.430 ± 2.885 ops/us FloatDotProductBenchmark.dotProductNew 64 thrpt 5 199.576 ± 1.049 ops/us FloatDotProductBenchmark.dotProductNew 100 thrpt 5 143.859 ± 0.351 ops/us FloatDotProductBenchmark.dotProductNew 128 thrpt 5 163.681 ± 1.253 ops/us FloatDotProductBenchmark.dotProductNew 207 thrpt 5 98.173 ± 1.241 ops/us FloatDotProductBenchmark.dotProductNew 256 thrpt 5 95.487 ± 0.053 ops/us FloatDotProductBenchmark.dotProductNew 300 thrpt 5 64.907 ± 0.056 ops/us FloatDotProductBenchmark.dotProductNew 512 thrpt 5 62.076 ± 0.192 ops/us FloatDotProductBenchmark.dotProductNew 702 thrpt 5 33.659 ± 0.053 ops/us FloatDotProductBenchmark.dotProductNew 1024 thrpt 5 29.892 ± 0.188 ops/us FloatDotProductBenchmark.dotProductOld 1 thrpt 5 558.982 ± 14.484 ops/us FloatDotProductBenchmark.dotProductOld 4 thrpt 5 425.068 ± 3.916 ops/us FloatDotProductBenchmark.dotProductOld 6 thrpt 5 398.557 ± 1.348 ops/us FloatDotProductBenchmark.dotProductOld 8 thrpt 5 359.623 ± 0.203 ops/us FloatDotProductBenchmark.dotProductOld 13 thrpt 5 262.966 ± 0.098 ops/us FloatDotProductBenchmark.dotProductOld 16 thrpt 5 229.867 ± 0.083 ops/us FloatDotProductBenchmark.dotProductOld 25 thrpt 5 165.441 ± 0.115 ops/us FloatDotProductBenchmark.dotProductOld 32 thrpt 5 152.221 ± 0.138 ops/us FloatDotProductBenchmark.dotProductOld 64 thrpt 5 85.443 ± 0.270 ops/us FloatDotProductBenchmark.dotProductOld 100 thrpt 5 53.636 ± 0.020 ops/us FloatDotProductBenchmark.dotProductOld 128 thrpt 5 42.828 ± 0.023 ops/us FloatDotProductBenchmark.dotProductOld 207 thrpt 5 26.981 ± 0.055 ops/us FloatDotProductBenchmark.dotProductOld 256 thrpt 5 21.944 ± 0.157 ops/us FloatDotProductBenchmark.dotProductOld 300 thrpt 5 18.856 ± 0.011 ops/us FloatDotProductBenchmark.dotProductOld 512 thrpt 5 11.330 ± 0.011 ops/us FloatDotProductBenchmark.dotProductOld 702 thrpt 5 8.150 ± 0.006 ops/us FloatDotProductBenchmark.dotProductOld 1024 thrpt 5 5.814 ± 0.006 ops/us FloatSquareBenchmark.squareNew 1 thrpt 5 479.897 ± 0.414 ops/us FloatSquareBenchmark.squareNew 4 thrpt 5 347.548 ± 5.824 ops/us FloatSquareBenchmark.squareNew 6 thrpt 5 320.104 ± 3.070 ops/us FloatSquareBenchmark.squareNew 8 thrpt 5 272.376 ± 2.516 ops/us FloatSquareBenchmark.squareNew 13 thrpt 5 236.600 ± 1.357 ops/us FloatSquareBenchmark.squareNew 16 thrpt 5 225.289 ± 0.361 ops/us FloatSquareBenchmark.squareNew 25 thrpt 5 201.074 ± 0.363 ops/us FloatSquareBenchmark.squareNew 32 thrpt 5 222.044 ± 1.173 ops/us FloatSquareBenchmark.squareNew 64 thrpt 5 192.298 ± 2.776 ops/us FloatSquareBenchmark.squareNew 100 thrpt 5 131.676 ± 0.082 ops/us FloatSquareBenchmark.squareNew 128 thrpt 5 144.401 ± 1.032 ops/us FloatSquareBenchmark.squareNew 207 thrpt 5 85.532 ± 0.490 ops/us FloatSquareBenchmark.squareNew 256 thrpt 5 79.800 ± 0.023 ops/us FloatSquareBenchmark.squareNew 300 thrpt 5 67.323 ± 0.287 ops/us FloatSquareBenchmark.squareNew 512 thrpt 5 43.781 ± 0.029 ops/us FloatSquareBenchmark.squareNew 702 thrpt 5 27.687 ± 0.008 ops/us FloatSquareBenchmark.squareNew 1024 thrpt 5 20.829 ± 0.131 ops/us FloatSquareBenchmark.squareOld 1 thrpt 5 479.053 ± 0.635 ops/us FloatSquareBenchmark.squareOld 4 thrpt 5 345.476 ± 3.422 ops/us FloatSquareBenchmark.squareOld 6 thrpt 5 320.319 ± 0.476 ops/us FloatSquareBenchmark.squareOld 8 thrpt 5 347.901 ± 0.762 ops/us FloatSquareBenchmark.squareOld 13 thrpt 5 223.134 ± 0.950 ops/us FloatSquareBenchmark.squareOld 16 thrpt 5 213.299 ± 0.439 ops/us FloatSquareBenchmark.squareOld 25 thrpt 5 141.332 ± 1.297 ops/us FloatSquareBenchmark.squareOld 32 thrpt 5 117.027 ± 0.792 ops/us FloatSquareBenchmark.squareOld 64 thrpt 5 63.886 ± 0.034 ops/us FloatSquareBenchmark.squareOld 100 thrpt 5 42.088 ± 0.016 ops/us FloatSquareBenchmark.squareOld 128 thrpt 5 33.163 ± 0.007 ops/us FloatSquareBenchmark.squareOld 207 thrpt 5 19.855 ± 0.098 ops/us FloatSquareBenchmark.squareOld 256 thrpt 5 16.870 ± 0.043 ops/us FloatSquareBenchmark.squareOld 300 thrpt 5 14.326 ± 0.973 ops/us FloatSquareBenchmark.squareOld 512 thrpt 5 8.837 ± 0.047 ops/us FloatSquareBenchmark.squareOld 702 thrpt 5 6.299 ± 0.042 ops/us FloatSquareBenchmark.squareOld 1024 thrpt 5 4.351 ± 0.062 ops/us — Reply to this email directly, view it on GitHub <#12311 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADNSFEV6IGKYOGAFT5AOADXJNDMDANCNFSM6AAAAAAYGM6GLY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- - MRM

mikemccand · 2023-06-03T20:14:10Z

They killed AVX-512 on consumer hardware a while back.

Boo. Should have gone for an AMD Ryzen build ...

markrmiller · 2023-06-03T20:28:14Z

To be fair, the claim was because you can’t have it because the efficiency cores don’t have it. But that’s a little suspect when as soon as some bios’ let you enable it without the efficiency cores, they quickly put an end to that in silicon. I think only the recent zen 4 processors have had it with AMD. My 7950X does. I don’t know about those efficiency cores - I like them in my MacBook, but I think I’ll continue to avoid them in the desktop

mikemccand · 2023-06-04T10:00:01Z

Thanks @markmiller.

Does the Panama API give us any insight into the underlying capabilities of the CPU? Not just which versions/widths of SIMD instructions are supported (in a general cross-platform sort of way I guess), but also how much "true" concurrency of these SIMD instructions is supported? It seems like an important detail of the bare metal that should somehow be accessible up in the clouds of javaland... e.g. it would inform how much intra-query concurrency an app should try to use with KNN queries.

uschindler · 2023-06-04T10:02:17Z

Does the Panama API give us any insight into the underlying capabilities of the CPU? Not just which versions/widths of SIMD instructions are supported (in a general cross-platform sort of way I guess), but also how much "true" concurrency of these SIMD instructions is supported?

Unfortunately, no. We get the preferred species size and number of lanes.

But everything that does not work on hardware is emulated in damn slow interpreted code.

This is why there's so much hacky code trying to figure out if C2 is enabled and shy we stop using panama when reported species size is too small.

dweiss · 2023-06-04T19:22:10Z

It's the same situation with Math.fma. The inability to tell whether it's accelerated or not makes the think unusable - the fallback is so bad that the risk is rarely worth taking...

ChrisHegarty · 2023-06-09T16:18:06Z

@rmuir @uschindler I just want to draw your attention to a potential issue that may arise from this change - there is a property permission check leaking out from the JDK - https://bugs.openjdk.org/browse/JDK-8309727

It may be that users of Lucene need to grant the necessary property read permission. Some more details in the Elasticsearch issue elastic/elasticsearch#96715

Note: ES has two potential issues, the aforementioned property read, and also a problem with logging. The logging is ES specific.

ChrisHegarty · 2023-06-09T16:27:07Z

Currently in ES we have two potential ways to workaround this - they are both crummy. :-(

Trigger initialisation of VectorUtils, and do a small operation, before executing a painless script, or
Add a couple of permissions to the painless environment.

Not sure if Lucene could/should have a workaround hack put in place. It would look something like: the do some dummy vector operation in the Panama VectorUtil clinit, while asserting permissions (in a doPriv block). That would force the JDK's vector implementation to initialise the class where the JDK-bug lives, while asserting the property permission. This assumes that Lucene is granted such a permission - it's a property read, which is not all that interesting

The above is kinda similar to the workaround, point no.1 above, that we're considering for ES (and yes we can do it in ES, but it could bit other Lucene consumers if not done in Lucene) ?

uschindler · 2023-06-09T16:43:52Z

I left a comment in ES.

We could wrap the provider initialization with doPrivileged in Lucene, similar to MmapDirectory. Maybe because of this we don't run into the logging issue from MmapDirectory.

But actually, the logging problem is not under control of Lucene. ES should initialize all logging early on startup, including the jul logger wrapper. There are other places logging outside do priv.

uschindler · 2023-06-09T16:47:00Z

Ah I see the other comment. We would also need to calculate a dot product. We could do that at end of the provider for Jdk20 and possibly for 21.

The dummy vector calc should be in the provider class only (in its ctor).

uschindler · 2023-06-09T17:01:52Z

I implemented a workaround for the initialization failure, PR coming in a moment.

uschindler · 2023-06-09T17:09:31Z

See the PR: #12362

We could also catch the SecurityException, log a warning and throw UOE, so it falls back to Lucene's VectorUtilProvider.

benwtrent · 2023-06-09T17:19:47Z

Thanks @uschindler for the work here! I am ignorant of the security manager and all its woes. Spent the last 2 days trying to get around this!

…dc8ca633e8bcf`) (#20) * Add next minor version 9.7.0 * Fix SynonymQuery equals implementation (apache#12260) The term member of TermAndBoost used to be a Term instance and became a BytesRef with apache#11941, which means its equals impl won't take the field name into account. The SynonymQuery equals impl needs to be updated accordingly to take the field into account as well, otherwise synonym queries with same term and boost across different fields are equal which is a bug. * Fix MMapDirectory documentation for Java 20 (apache#12265) * Don't generate stacktrace in CollectionTerminatedException (apache#12270) CollectionTerminatedException is always caught and never exposed to users so there's no point in filling in a stack-trace for it. * add missing changelog entry for apache#12260 * Add missing author to changelog entry for apache#12220 * Make query timeout members final in ExitableDirectoryReader (apache#12274) There's a couple of places in the Exitable wrapper classes where queryTimeout is set within the constructor and never modified. This commit makes such members final. * Update javadocs for QueryTimeout (apache#12272) QueryTimeout was introduced together with ExitableDirectoryReader but is now also optionally set to the IndexSearcher to wrap the bulk scorer with a TimeLimitingBulkScorer. Its javadocs needs updating. * Make TimeExceededException members final (apache#12271) TimeExceededException has three members that are set within its constructor and never modified. They can be made final. * DOAP changes for release 9.6.0 * Add back-compat indices for 9.6.0 * `ToParentBlockJoinQuery` Explain Support Score Mode (apache#12245) (apache#12283) * `ToParentBlockJoinQuery` Explain Support Score Mode --------- Co-authored-by: Marcus <marcuseagan@gmail.com> * Simplify SliceExecutor and QueueSizeBasedExecutor (apache#12285) The only behaviour that QueueSizeBasedExecutor overrides from SliceExecutor is when to execute on the caller thread. There is no need to override the whole invokeAll method for that. Instead, this commit introduces a shouldExecuteOnCallerThread method that can be overridden. * [Backport] GITHUB-11838 Add api to allow concurrent query rewrite (apache#12197) * GITHUB-11838 Change API to allow concurrent query rewrite (apache#11840) Replace Query#rewrite(IndexReader) with Query#rewrite(IndexSearcher) Co-authored-by: Patrick Zhai <zhaih@users.noreply.github.com> Co-authored-by: Adrien Grand <jpountz@gmail.com> Backport of apache#11840 Changes from original: - Query keeps `rewrite(IndexReader)`, but it is now deprecated - VirtualMethod is used to correct delegate to the overridden methods - The changes to `RewriteMethod` type classes are reverted, this increased the backwards compatibility impact. ------------------------------ ### Description Issue: apache#11838 #### Updated Proposal * Change signature of rewrite to `rewrite(IndexSearcher)` * How did I migrate the usage: * Use Intellij to do preliminary refactoring for me * For test usage, use searcher whenever is available, otherwise create one using `newSearcher(reader)` * For very few non-test classes which doesn't have IndexSearcher available but called rewrite, create a searcher using `new IndexSearcher(reader)`, tried my best to avoid creating it recurrently (Especially in `FieldQuery`) * For queries who have implemented the rewrite and uses some part of reader's functionality, use shortcut method when possible, otherwise pull out the reader from indexSearcher. * Backport: Concurrent rewrite for KnnVectorQuery (apache#12160) (apache#12288) * Concurrent rewrite for KnnVectorQuery (apache#12160) - Reduce overhead of non-concurrent search by preserving original execution - Improve readability by factoring into separate functions --------- Co-authored-by: Kaival Parikh <kaivalp2000@gmail.com> * adjusting for backport --------- Co-authored-by: Kaival Parikh <46070017+kaivalnp@users.noreply.github.com> Co-authored-by: Kaival Parikh <kaivalp2000@gmail.com> * toposort use iterator to avoid stackoverflow (apache#12286) Co-authored-by: tangdonghai <tangdonghai@meituan.com> # Conflicts: # lucene/CHANGES.txt * Fix test to compile with Java 11 after backport of apache#12286 * Update Javadoc for topoSortStates method after apache#12286 (apache#12292) * Optimize HNSW diversity calculation (apache#12235) * Minor cleanup and improvements to DaciukMihovAutomatonBuilder (apache#12305) * GITHUB-12291: Skip blank lines from stopwords list. (apache#12299) * Wrap Query rewrite backwards layer with AccessController (apache#12308) * Make sure APIJAR reproduces with different timezone (unfortunately java encodes the date using local timezone) (apache#12315) * Add multi-thread searchability to OnHeapHnswGraph (apache#12257) * Fix backport error * [MINOR] Update javadoc in Query class (apache#12233) - add a few missing full stops - update wording in the description of Query#equals method * [Backport] Integrate the Incubating Panama Vector API apache#12311 (apache#12327) Leverage accelerated vector hardware instructions in Vector Search. Lucene already has a mechanism that enables the use of non-final JDK APIs, currently used for the Previewing Pamana Foreign API. This change expands this mechanism to include the Incubating Pamana Vector API. When the jdk.incubator.vector module is present at run time the Panamaized version of the low-level primitives used by Vector Search is enabled. If not present, the default scalar version of these low-level primitives is used (as it was previously). Currently, we're only targeting support for JDK 20. A subsequent PR should evaluate JDK 21. --------- Co-authored-by: Uwe Schindler <uschindler@apache.org> Co-authored-by: Robert Muir <rmuir@apache.org> * Parallelize knn query rewrite across slices rather than segments (apache#12325) The concurrent query rewrite for knn vectory query introduced with apache#12160 requests one thread per segment to the executor. To align this with the IndexSearcher parallel behaviour, we should rather parallelize across slices. Also, we can reuse the same slice executor instance that the index searcher already holds, in that way we are using a QueueSizeBasedExecutor when a thread pool executor is provided. * Optimize ConjunctionDISI.createConjunction (apache#12328) This method is showing up as a little hot when profiling some queries. Almost all the time spent in this method is just burnt on ceremony around stream indirections that don't inline. Moving this to iterators, simplifying the check for same doc id and also saving one iteration (for the min cost) makes this method far cheaper and easier to read. * Update changes to be correct with ARM (it is called NEON there) * GH#12321: Marked DaciukMihovAutomatonBuilder as deprecated (apache#12332) Preparing to reduce visibility of this class in a future release * add BitSet.clear() (apache#12268) # Conflicts: # lucene/CHANGES.txt * Clenaup and update changes and synchronize with 9.x * Update TestVectorUtilProviders.java (apache#12338) * Don't generate stacktrace for TimeExceededException (apache#12335) The exception is package private and never rethrown, we can avoid generating a stacktrace for it. * Introduced the Word2VecSynonymFilter (apache#12169) Co-authored-by: Alessandro Benedetti <a.benedetti@sease.io> * Word2VecSynonymFilter constructor null check (apache#12169) * Use thread-safe search version of HnswGraphSearcher (apache#12246) Addressing comment received in the PR apache#12246 * Word2VecSynonymProvider to use standard Integer max value for hnsw searches (apache#12235) We observed this change was not ported previously from main in an old cherry-pick * Fix searchafter high latency when after value is out of range for segment (apache#12334) * Make memory fence in `ByteBufferGuard` explicit (apache#12290) * Add "direct to binary" option for DaciukMihovAutomatonBuilder and use it in TermInSetQuery#visit (apache#12320) * Add updateDocuments API which accept a query (reopen) (apache#12346) * GITHUB#11350: Handle backward compatibility when merging segments with different FieldInfo This commits restores Lucene 9's ability to handle indices created with Lucene 8 where there are discrepancies in FieldInfos, such as different IndexOptions * [Tessellator] Improve the checks that validate the diagonal between two polygon nodes (apache#12353) # Conflicts: # lucene/CHANGES.txt * feat: soft delete optimize (apache#12339) * Better paging when random reads go backwards (apache#12357) When reading data from outside the buffer, BufferedIndexInput always resets its buffer to start at the new read position. If we are reading backwards (for example, using an OffHeapFSTStore for a terms dictionary) then this can have the effect of re-reading the same data over and over again. This commit changes BufferedIndexInput to use paging when reading backwards, so that if we ask for a byte that is before the current buffer, we read a block of data of bufferSize that ends at the previous buffer start. Fixes apache#12356 * Work around SecurityManager issues during initialization of vector api (JDK-8309727) (apache#12362) * Restrict GraphTokenStreamFiniteStrings#articulationPointsRecurse recursion depth (apache#12249) * Implement MMapDirectory with Java 21 Project Panama Preview API (apache#12294) Backport incl JDK21 apijar file with java.util.Objects regenerated * remove relic in apijar folder caused by vector additions * Speed up IndexedDISI Sparse #AdvanceExactWithinBlock for tiny step advance (apache#12324) * Add checks in KNNVectorField / KNNVectorQuery to only allow non-null, non-empty and finite vectors (apache#12281) --------- Co-authored-by: Uwe Schindler <uschindler@apache.org> * Implement VectorUtilProvider with Java 21 Project Panama Vector API (apache#12363) (apache#12365) This commit enables the Panama Vector API for Java 21. The version of VectorUtilPanamaProvider for Java 21 is identical to that of Java 20. As such, there is no specific 21 version - the Java 20 version will be loaded from the MRJAR. * Add CHANGES.txt for apache#12334 Honor after value for skipping documents even if queue is not full for PagingFieldCollector (apache#12368) Signed-off-by: gashutos <gashutos@amazon.com> * Move TermAndBoost back to its original location. (apache#12366) PR apache#12169 accidentally moved the `TermAndBoost` class to a different location, which would break custom sub-classes of `QueryBuilder`. This commit moves it back to its original location. * GITHUB-12252: Add function queries for computing similarity scores between knn vectors (apache#12253) Co-authored-by: Alessandro Benedetti <a.benedetti@sease.io> * hunspell (minor): reduce allocations when processing compound rules (apache#12316) (cherry picked from commit a454388) * hunspell (minor): reduce allocations when reading the dictionary's morphological data (apache#12323) there can be many entries with morph data, so we'd better avoid compiling and matching regexes and even stream allocation (cherry picked from commit 4bf1b94) * TestHunspell: reduce the flakiness probability (apache#12351) * TestHunspell: reduce the flakiness probability We need to check how the timeout interacts with custom exception-throwing checkCanceled. The default timeout seems not enough for some CI agents, so let's increase it. Co-authored-by: Dawid Weiss <dawid.weiss@gmail.com> (cherry picked from commit 5b63a18) * This allows VectorUtilProvider tests to be executed although hardware may not fully support vectorization or if C2 is not enabled (apache#12376) --------- Signed-off-by: gashutos <gashutos@amazon.com> Co-authored-by: Alan Woodward <romseygeek@apache.org> Co-authored-by: Luca Cavanna <javanna@apache.org> Co-authored-by: Uwe Schindler <uschindler@apache.org> Co-authored-by: Armin Braun <me@obrown.io> Co-authored-by: Mikhail Khludnev <mkhludnev@users.noreply.github.com> Co-authored-by: Marcus <marcuseagan@gmail.com> Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com> Co-authored-by: Kaival Parikh <46070017+kaivalnp@users.noreply.github.com> Co-authored-by: Kaival Parikh <kaivalp2000@gmail.com> Co-authored-by: tang donghai <tangdhcs@gmail.com> Co-authored-by: Patrick Zhai <zhaih@users.noreply.github.com> Co-authored-by: Greg Miller <gsmiller@gmail.com> Co-authored-by: Jerry Chin <metrxqin@gmail.com> Co-authored-by: Patrick Zhai <zhai7631@gmail.com> Co-authored-by: Andrey Bozhko <andybozhko@gmail.com> Co-authored-by: Chris Hegarty <62058229+ChrisHegarty@users.noreply.github.com> Co-authored-by: Robert Muir <rmuir@apache.org> Co-authored-by: Jonathan Ellis <jbellis@datastax.com> Co-authored-by: Daniele Antuzi <daniele.antuzi@gmail.com> Co-authored-by: Alessandro Benedetti <a.benedetti@sease.io> Co-authored-by: Chaitanya Gohel <104654647+gashutos@users.noreply.github.com> Co-authored-by: Petr Portnov | PROgrm_JARvis <pportnov@ozon.ru> Co-authored-by: Tomas Eduardo Fernandez Lobbe <tflobbe@apache.org> Co-authored-by: Ignacio Vera <ivera@apache.org> Co-authored-by: fudongying <30896830+fudongyingluck@users.noreply.github.com> Co-authored-by: Chris Fournier <chris.fournier@shopify.com> Co-authored-by: gf2121 <52390227+gf2121@users.noreply.github.com> Co-authored-by: Adrien Grand <jpountz@gmail.com> Co-authored-by: Elia Porciani <e.porciani@sease.io> Co-authored-by: Peter Gromov <peter@jetbrains.com>

Include the Panama Vector API stubs in the generated 19/20 api jars

a02161a

ChrisHegarty mentioned this pull request May 18, 2023

vector API integration, plan B #12302

Closed

ChrisHegarty added 2 commits May 18, 2023 16:26

first dotProduct implementation

aad69f9

fix dimensions check

ab6300b

rmuir reviewed May 18, 2023

View reviewed changes

merge Robert's dotProduct implementation

d49100c

uschindler reviewed May 18, 2023

View reviewed changes

gradle/generation/panama-foreign.gradle Outdated Show resolved Hide resolved

uschindler reviewed May 18, 2023

View reviewed changes

lucene/core/src/java/org/apache/lucene/util/VectorUtil.java Outdated Show resolved Hide resolved

uschindler reviewed May 18, 2023

View reviewed changes

lucene/core/src/java20/org/apache/lucene/util/JDKVectorUtilProvider.java Outdated Show resolved Hide resolved

uschindler reviewed May 18, 2023

View reviewed changes

lucene/core/src/java/org/apache/lucene/util/VectorUtil.java Outdated Show resolved Hide resolved

ChrisHegarty added 3 commits May 18, 2023 20:41

remove public

a714486

remove doPriv and restructure module lookup

39bbc4d

remove unnecessary jvmArgs

3edee15

ChrisHegarty changed the title ~~Include the Panama Vector API stubs in the generated 19/20 api jars~~ Integrate the Incubating Panama Vector API May 18, 2023

Move DefaultVectorUtilProvider to it's own top-level class - since it…

030babd

… will likely grow over time

ChrisHegarty commented May 19, 2023

View reviewed changes

gradle/testing/defaults-tests.gradle Outdated Show resolved Hide resolved

ChrisHegarty added 2 commits May 19, 2023 09:59

refactor the provider and impl into a separate package

927f1c7

move to non-exported internal package

99f7e41

disable assertions in jdk.incubator.vector.LaneType

7b58a64

ChrisHegarty commented May 19, 2023

View reviewed changes

gradle/testing/defaults-tests.gradle Outdated Show resolved Hide resolved

revert assertion disablement. Check locale is workable

abb5979

xueyuanl mentioned this pull request May 28, 2023

Daily Hacker News 28-05-2023 xueyuanl/daily-hackernews#993

Open

github-actions bot mentioned this pull request May 29, 2023

2023-05-28 Hot Posts jiacai2050/mofish#297

Open

javanna mentioned this pull request May 30, 2023

Upgrade Lucene to a 9.7.0 snapshot elastic/elasticsearch#96433

Merged

reta mentioned this pull request May 30, 2023

OpenJDK Update (April 2023 Patch releases) opensearch-project/OpenSearch#7344

Merged

6 tasks

ChrisHegarty mentioned this pull request Jun 1, 2023

Enable the Panama Vector module elastic/elasticsearch#96453

Merged

reta mentioned this pull request Jun 2, 2023

Update Apache Lucene to 9.7.0-snapshot-204acc3 opensearch-project/OpenSearch#7887

Merged

6 tasks

ChrisHegarty deleted the panama_vector branch June 9, 2023 16:27

uschindler mentioned this pull request Jun 9, 2023

Work around SecurityManager issues during initialization of vector api (JDK-8309727) #12362

Merged

mikemccand mentioned this pull request Jun 10, 2023

Optimize count() for BooleanQuery disjunction #12358

Open

alessandrobenedetti added the vector-based-search label Jun 15, 2023

tang-hi mentioned this pull request Jun 25, 2023

Make ForUtil Vectorized #12396

Open

mayya-sharipova mentioned this pull request Jun 26, 2023

Increase the number of dims for KNN vectors to 2048 [LUCENE-10471] #11507

Closed

sohami mentioned this pull request Aug 22, 2023

[META] Explore usage of SIMD in OpenSearch opensearch-project/OpenSearch#9473

Closed

calixtus mentioned this pull request Jun 19, 2024

Cannot instantiate view: interface org.jabref.gui.frame.UiMessageHandler JabRef/jabref#11386

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate the Incubating Panama Vector API #12311

Integrate the Incubating Panama Vector API #12311

ChrisHegarty commented May 18, 2023 •

edited

Loading

rmuir left a comment •

edited

Loading

ChrisHegarty commented May 18, 2023

uschindler commented May 18, 2023

uschindler commented May 18, 2023

uschindler commented May 18, 2023

ChrisHegarty commented May 18, 2023

ChrisHegarty commented May 18, 2023

rmuir commented May 18, 2023

uschindler commented May 18, 2023

ChrisHegarty commented May 19, 2023

mikemccand commented Jun 3, 2023

markrmiller commented Jun 3, 2023 via email

mikemccand commented Jun 3, 2023

markrmiller commented Jun 3, 2023 via email

mikemccand commented Jun 4, 2023

uschindler commented Jun 4, 2023 •

edited

Loading

dweiss commented Jun 4, 2023

ChrisHegarty commented Jun 9, 2023

ChrisHegarty commented Jun 9, 2023 •

edited

Loading

uschindler commented Jun 9, 2023

uschindler commented Jun 9, 2023 •

edited

Loading

uschindler commented Jun 9, 2023

uschindler commented Jun 9, 2023

benwtrent commented Jun 9, 2023

Integrate the Incubating Panama Vector API #12311

Integrate the Incubating Panama Vector API #12311

Conversation

ChrisHegarty commented May 18, 2023 • edited Loading

rmuir left a comment • edited Loading

Choose a reason for hiding this comment

ChrisHegarty commented May 18, 2023

uschindler commented May 18, 2023

uschindler commented May 18, 2023

uschindler commented May 18, 2023

ChrisHegarty commented May 18, 2023

ChrisHegarty commented May 18, 2023

rmuir commented May 18, 2023

uschindler commented May 18, 2023

ChrisHegarty commented May 19, 2023

mikemccand commented Jun 3, 2023

markrmiller commented Jun 3, 2023 via email

mikemccand commented Jun 3, 2023

markrmiller commented Jun 3, 2023 via email

mikemccand commented Jun 4, 2023

uschindler commented Jun 4, 2023 • edited Loading

dweiss commented Jun 4, 2023

ChrisHegarty commented Jun 9, 2023

ChrisHegarty commented Jun 9, 2023 • edited Loading

uschindler commented Jun 9, 2023

uschindler commented Jun 9, 2023 • edited Loading

uschindler commented Jun 9, 2023

uschindler commented Jun 9, 2023

benwtrent commented Jun 9, 2023

ChrisHegarty commented May 18, 2023 •

edited

Loading

rmuir left a comment •

edited

Loading

uschindler commented Jun 4, 2023 •

edited

Loading

ChrisHegarty commented Jun 9, 2023 •

edited

Loading

uschindler commented Jun 9, 2023 •

edited

Loading