[RFC] Faiss Scalar Quantization FP16 (SQfp16) and enabling SIMD (AVX2 and NEON) #1138

naveentatikonda · 2023-09-15T04:33:15Z

Problem Statement

In k-NN plugin we mainly support vectors of type float where each dimension is 32 bits. This is getting expensive for use cases that requires ingestion on a large scale where we need to construct, load, save and search graphs(for native engines nmslib and faiss) which is getting even more costlier. Even though we have the byte vector support, it only supports lucene engine and also there is a considerable reduction in the recall when compared to float 32.

Adding support for Faiss SQFP16 helps to reduce the memory and storage footprints without compromising on recall where when user provides the 32 bit float vectors, the Faiss engine quantizes the vector into FP16 using their scalar quantizer (users don’t need to do any quantization on their end), stores it and decodes it back to FP32 while returning the results during search operations.

Faiss GitHub Issue - facebookresearch/faiss#3014
Faiss PR - facebookresearch/faiss#3166
Development Branch - https://github.com/naveentatikonda/k-NN/tree/add_sqfp16

How Does the User Experience looks like ?

In terms of UX, users just need to set the encoder as SQfp16 while creating the index using faiss engine (this works only with faiss engine) and there is no change wrt ingestion and search queries. An example is shown below:

PUT test-index
{
  "settings": {
    "index": {
      "knn": true,
      "knn.algo_param.ef_search": 100
    }
  },
  "mappings": {
    "properties": {
        "my_vector1": {
          "type": "knn_vector",
          "dimension": 16,
          "method": {
            "name": "hnsw",
            "space_type": "l2",
            "engine": "faiss",
            "parameters": {
              "encoder": {
                 "name": "sq",
                 "parameters":{
                    "type": "fp16",
                    "clip": false
                  }
              } 
              "ef_construction": 128,
              "m": 24
            }
          }
        }
    }
  }
}

Benchmarking Results

For latest results to see the comparison between Faiss HNSW with fp32 vectors against Faiss HNSW SQfp16 - link1 and link2

Similarly, the latest results to check the comparison for IVF are mentioned here

Benchmarking on POC (Old Results and Old Observations)

Setup Configuration

Implemented a POC using the Faiss SQFp16 encoder and ran benchmarks against some of the datasets. The cluster configuration and index mapping are shown in below table.

Instance Type	r5.16xlarge
Nodes	1
Primary Shards	24, 8
Replicas	0
Disk Space	1000 GB
ef_search	256
ef_construction	256
m	16
engine	faiss
method	hnsw

Note - These results are without a warmup operation.

Recall and Storage Results

S. No.	Datasets	Dimension of Vector	Train Data Size	Query Data Size	Training Data Range	Query Data Range	Space Type	faiss-hnsw Recall	faiss sqfp16 Recall	faiss-hnsw Index Size	faiss sqfp16 Index Size	% reduction in storage
1	gist-960-euclidean	960	1,000,000	1,000	[ 0.0, 1.4772]	[ 0.0, 0.7285 ]	L2
2	sift-128-euclidean	128	1,000,000	10,000	[ 0.0, 218.0 ]	[ 0.0, 184.0 ]	L2	0.99989	0.99989	1.4 gb	1.1 gb	21.4 %
3	mnist-784-euclidean	784	60, 000	10,000	[ 0.0, 255.0 ]	[ 0.0, 255.0 ]	L2	1	1	400.2 mb	310.4 mb	22.43 %
4	fashion-mnist-784-euclidean	784	60, 000	10,000	[ 0.0, 255.0 ]	[ 0.0, 255.0 ]	L2	0.99999		447.6 mb

Indexing and Querying Results

Indexing Throughput (docs/s) - The number of documents per second that can be ingested

Document_Cnt / (total_index_time_s + total_refresh_time_s)
Document_Cnt/ (( ingest_took_total)_s + (refresh_index_took_total)_s)

Query Time (ms) - Time per query, summarized as mean, p50, p90, p99

S. No.	Datasets	Dimension of Vector	Space Type	Primary Shards	faiss hnsw - Indexing Throughput (docs/sec)	faiss sqfp16 - Indexing Throughput (docs/sec)	faiss hnsw Query time p50 (ms)	faiss sqfp16 Query time p50 (ms)	faiss hnsw Query time p90 (ms)	faiss sqfp16 Query time p90 (ms)	faiss hnsw Query time p99 (ms)	faiss sqfp16 Query time p99 (ms)
1	gist-960-euclidean	960	L2	24
2	sift-128-euclidean	128	L2	24	14837	1495	10	105.6	11	124.1	12.1	136
3	mnist-784-euclidean	784	L2	24	5657	545	9.8	204.7	10.2	226.2	12.9	240.2
4	mnist-784-euclidean	784	L2	8	5065	243	6	110	7	127.9	9.7	140.6
5	fashion-mnist-784-euclidean	784	L2	24	5175		8		9		11.7

Memory Results

S. No.	Datasets	Dimension of Vector	Space Type	faiss hnsw - Graph Memory Usage (MB)	faiss sqfp16 - Graph Memory Usage (MB)	% reduction in Native Memory
1	gist-960-euclidean	960	L2
2	sift-128-euclidean	128	L2	633.51367	389.37109	38.5 %
3	mnist-784-euclidean	784	L2	188.14258	98.42188	47.7 %
4	fashion-mnist-784-euclidean	784	L2	188.14258

Observations (Old and outdated)

~~Even after quantizing fp32 vectors to fp16 using Faiss Scalar Quantizer, we are able to get the same recall.~~
~~We are saving more than 21% in terms of storage.~~
~~In terms of memory, we are seeing a huge reduction ranging from 38% to 48% by using SQfp16 when compared to just the faiss with HNSW.~~
~~But, on the down side there is a huge drop in Indexing Throughput where using Faiss HNSW SQfp16 we are able to achieve only 1/10th of what we get with Faiss HNSW~~
Also, the Query Processing times looks very disappointing where it takes 10 to 20 times more for SQfp16. But, reducing the number of primary shards from 24 to 8 helped to reduce the latencies a little but it still needs improvement.

Still working on ways to identify and reduce query latency and increase indexing throughput.

with Warmup and AVX2

These are the metrics after adding warmup operation and Faiss AVX2 support to k-NN.

Indexing and Querying Results

S. No.	Datasets	Dimension of Vector	Space Type	Primary Shards	faiss hnsw avx2- Indexing Throughput (docs/sec)	faiss hnsw sqfp16 avx2- Indexing Throughput (docs/sec)	faiss hnsw avx2- Query time p50 (ms)	faiss hnsw sqfp16 avx2- Query time p50 (ms)	faiss hnsw avx2- Query time p90 (ms)	faiss hnsw sqfp16 avx2- Query time p90 (ms)	faiss hnsw avx2- Query time p99 (ms)	faiss hnsw sqfp16 avx2- Query time p99 (ms)
1	gist-960-euclidean	960	L2	24	1016	1059	26	25.6	29.3	28.9	32	31.1
2	sift-128-euclidean	128	L2	24	13178	13664	9.1	9	10.4	10	11.1	11
3	mnist-784-euclidean	784	L2	24	5711	5734	10	9	11	10	11.3	10.3
4	fashion-mnist-784-euclidean	784	L2	24	5325	5296	9	8	9.1	9	10.1	9.6

Observations (with AVX2)

Before enabling AVX2, ran few tests with warmup operation but we still don't see a good improvement in query latencies
With AVX2, we can see a significant improvement in the query latencies and indexing throughput for faiss hnsw with sqfp16 encoder by reducing the same memory and storage usage like we have seen in our previous test results. The new metrics are at par with faiss-hnsw results, some of the results are even better than that.
But, faiss hnsw with AVX2 didn't show much impact on the performance (not much improvement but no cons).
Also, with AVX2 we are able to achieve the same recall.

Benchmarking Results on ARM

As we have seen above, after adding the Faiss AVX2 optimization helped to improve the overall performance. But, AVX2 optimization only supports x86. As of now we don't have a similar optimization added for SQ in Faiss to support ARM. Ran few tests on ARM instances and shared the perf results below:

Recall and Storage Results

S. No.	Datasets	Dimension of Vector	Train Data Size	Query Data Size	Training Data Range	Query Data Range	Space Type	faiss-hnsw Recall	faiss sqfp16 Recall	faiss-hnsw Index Size	faiss sqfp16 Index Size	% reduction in storage
1	gist-960-euclidean	960	1,000,000	1,000	[ 0.0, 1.4772]	[ 0.0, 0.7285 ]	L2	0.99783	0.99831	16.6 gb	14.8 gb	10.84 %
2	sift-128-euclidean	128	1,000,000	10,000	[ 0.0, 218.0 ]	[ 0.0, 184.0 ]	L2	0.99989	0.99989	1.4 gb	1.1 gb	21.4 %
3	mnist-784-euclidean	784	60, 000	10,000	[ 0.0, 255.0 ]	[ 0.0, 255.0 ]	L2	1	1	400.3 mb	310.4 mb	22.43 %
4	last-fm	65	292,385	50,000			innerproduct	0.99897	0.99857	487.8 mb	451.5 mb	7.4 %

Indexing and Querying Results

S. No.	Datasets	Dimension of Vector	Space Type	Primary Shards	faiss hnsw - Indexing Throughput (docs/sec)	faiss sqfp16 - Indexing Throughput (docs/sec)	faiss sqfp16 NEON - Indexing Throughput (docs/sec)	faiss hnsw - Query time p50 (ms)	faiss sqfp16 - Query time p50 (ms)	faiss sqfp16 NEON - Query time p50 (ms)	faiss hnsw - Query time p90 (ms)	faiss sqfp16 - Query time p90 (ms)	faiss sqfp16 NEON - Query time p90 (ms)	faiss hnsw - Query time p99 (ms)	faiss sqfp16 - Query time p99 (ms)	faiss sqfp16 NEON - Query time p99 (ms)
1	gist-960-euclidean	960	L2	24	983	952	997	22	73.2	34	24.1	84	38.1	25.6	88.2	40.1
2	sift-128-euclidean	128	L2	24	12869	6169	11126	8	18	10	8	20	11	9	21.9	11.1
3	sift-128-euclidean	128	L2	8	12182	4750	8992	4	10	5.4	4.9	11	6	5	12	7
4	mnist-784-euclidean	784	L2	24	4962	2910	4832	9	25	11	9.9	27	11.1	10.1	28	12.1
5	last-fm	65	innerproduct	24	9759		9845	6		6	6		6	6		7
6	last-fm	65	innerproduct	8	9515		9151	4		4	4		4	4		4

Memory Results

S. No.	Datasets	Dimension of Vector	Space Type	faiss hnsw - Graph Memory Usage (MB)	faiss sqfp16 - Graph Memory Usage (MB)	% reduction in Native Memory
1	gist-960-euclidean	960	L2	3807.34	1976.29	48 %
2	sift-128-euclidean	128	L2	633.51	389.37	38.5 %
3	mnist-784-euclidean	784	L2	188.14	98.42	47.7 %
4	last-fm	65	innerproduct	114.97	78.72	31.53 %

Observations

The recall is same with SQ.
The query latencies with SQ are a bit worse consuming 2 to 3.5 times more than what we see without SQ. But, far better when compared to those on x86 without AVX2 optimization.
The ingestion time is almost same. But, refresh and force merge are worse with SQ.
Like in x86, there is a significant drop in memory and storage utilization using SQ.
Adding SIMD optimization support using NEON for ARM helped to improve the query latencies and indexing throughput which are close to the corresponding metrics of faiss-hnsw without encoder (Except for large datasets like gist-960-euclidean, there is still a noticeable increase in query latencies).

The text was updated successfully, but these errors were encountered:

jmazanec15 · 2023-09-15T17:40:34Z

Awesome to see recall is not noticeably impacted. This provides a lot of potential for cost reduction. A couple notes:

For consistency with product quantizer, I think it might make more sense for the interface to look like:

    "encoder":{
      "name":"sq",
      "parameters":{
        "nbits": 8
      }
    }

The performance implications are unfortunate. Im guessing they have to do with repeated conversion between fp16 and fp32. Might be worth investigating if it is possible to do flops directly on fp16 data (might be compiler/processor specific).

navneet1v · 2023-09-18T07:17:44Z

@naveentatikonda thanks for sharing the results. I can see good improvements on the Memory but the latency is big concern for me. 10X extra Search latency and 1/10th indexing throughput for 1M documents are big red flags for me.

Can we do these below things:

Work with Faiss maintainers to see why there is so much extra latency, may be some compiler optimizations may be missing.
Can we test with lets say a 10M dataset of 128 dimension and see does the latency follow the same trend?

naveentatikonda · 2023-09-18T22:27:51Z

Awesome to see recall is not noticeably impacted. This provides a lot of potential for cost reduction. A couple notes:

For consistency with product quantizer, I think it might make more sense for the interface to look like:
    "encoder":{
      "name":"sq",
      "parameters":{
        "nbits": 8
      }
    }
The performance implications are unfortunate. Im guessing they have to do with repeated conversion between fp16 and fp32. Might be worth investigating if it is possible to do flops directly on fp16 data (might be compiler/processor specific).

@jmazanec15 In terms of UX, we will add the parameters that are related to encoder, inside the encoder like in your example. But, the generic parameters like "ef_construction" related to hnsw will be outside the encoder object. Please correct me if I'm wrong.

naveentatikonda · 2023-09-18T22:29:41Z

@naveentatikonda thanks for sharing the results. I can see good improvements on the Memory but the latency is big concern for me. 10X extra Search latency and 1/10th indexing throughput for 1M documents are big red flags for me.

Can we do these below things:

Work with Faiss maintainers to see why there is so much extra latency, may be some compiler optimizations may be missing.

Can we test with lets say a 10M dataset of 128 dimension and see does the latency follow the same trend?

@navneet1v After enabling Faiss AVX2 feature, the query latencies and indexing throughput are improved with the same reduction in memory and storage utilization. I have shared the results above. Please take a look.

luyuncheng · 2023-11-20T06:51:19Z

hi, Is there any progress for this? we wanna introduce fp16 as a new codec. and we want to keep the protocol consistency with this.

naveentatikonda · 2023-11-20T09:27:59Z

hi, Is there any progress for this? we wanna introduce fp16 as a new codec. and we want to keep the protocol consistency with this.

@luyuncheng We have the AVX2 optimization support for x86 architecture. But, we don't have a similar optimization for ARM architecture to SQ in Faiss. So, we are working on adding NEON support to SQFP16 in faiss. We did a POC for L2 space type using NEON, which improved the overall performance and I'm working on adding the NEON support to InnerProduct.

Can you share more details about adding fp16 as a new codec ? Also, do you have the code ready ?

luyuncheng · 2023-11-20T09:35:49Z

@naveentatikonda we do not test fp16 at ARM architecture. but we verify the fp16 at x86 architecture. so we want to introduce this into our services. also, i want to keep the protocol as same as you did, so it can backward in the future.

so i am wondering:

introduce a method name:"HNSWFP16"
or
introduce an parameter encoded like you do:

            "parameters": {
              "encoder": {
                 "name": "SQfp16"
              } 
              "ef_construction": 128,
              "m": 24
            }

luyuncheng · 2023-11-20T10:27:35Z

At #1139

i introduced a memory tests. with sift128 1M dataset, 8 thread. benchmark shows:

type	IndexTime	Build Index Memory	File Size	Query Index Memory	Query Time
HNSWSQfp16	276939 ms	1.529 g	512 MB(-32%)	533 MB	52553 ms(+54%)
HNSWFLAT	110546 ms	1.768 g	756 MB	776 MB	33948 ms

@naveentatikonda is there any possible we can added this future and do next version for Optimize ARM articture?

naveentatikonda · 2023-11-20T14:49:11Z

No @luyuncheng, can you pls explain how did you add fp16 support when you said that you will be adding it as a codec(in terms of implementation). Also, based on the results that you have shared, the performance seems to have dropped in terms of query time and indexing time. Besides that, the memory also doesn't looks like it is optimized much when compared to the metrics obtained using AVX2 optimization support for x86 architecture. (results shared by me above)

We are planning to complete NEON optimization for ARM and release this feature in OpenSearch 2.12 (in Jan, 2024). So, even though we add SQfp16 support for x86, it is not going to be released now.

luyuncheng · 2023-11-21T07:45:48Z

Besides that, the memory also doesn't looks like it is optimized much when compared to the metrics obtained using AVX2 optimization support for x86 architecture.

@naveentatikonda
i think the memory reduce size is almost same as your result shared above: 32%. (in query time).

Index performance i shared above do not using SIMD. i'll do a benchmark in avx2 with fp16 optimize.

luyuncheng · 2023-11-21T14:09:54Z

@naveentatikonda

hi, I did more test with avx2 in Intel(R) Xeon(R) architecture, 8thread, 128 sift 1M dataset, i think HNSWSQfp16 is pretty good in time and memory

type	IndexTime	Build Index Memory	File Size	Query Index Memory	Query Time
HNSWSQfp16(avx2)	28305ms	1.535 GB	512 MB(-32%)	648 MB (-27%)	4958 ms (-9.8%)
HNSWFLAT(avx2)	35543ms	1.772 GB	756 MB	892 MB	5498 ms
HNSWSQfp16	166505 ms	1.535 GB	512 MB(-32%)	647 MB	20578 ms(+54%)
HNSWFLAT	46688ms	1.768 GB	756 MB	891 MB	9012 ms

luyuncheng · 2023-11-21T14:52:44Z

For consistency with product quantizer, I think it might make more sense for the interface to look like:

@jmazanec15 @naveentatikonda
SQfp16 do not need train data. is there any possible we can treat it as an hnsw encode parameters? which we can have a SQfp16 config protocol consistency.

naveentatikonda · 2023-11-22T07:41:14Z

@luyuncheng I gave a thought on all your questions and tried to answer them here. Please take a look and let me know if you have any other questions.

Protocol and Index Mapping
In terms of protocol or index mapping, we will be accepting input from user as a new encoder like shown below and process the request. You can maintain the same consistency while using it in your services to avoid any backwards incompatibility.

            "parameters": {
              "encoder": {
                 "name": "SQfp16"
              } 
              "ef_construction": 128,
              "m": 24
            }

Index Description or Implementation
SQfp16 do not need any training. In terms of implementation, we will be setting it as an encoder with index_description as HNSW16,SQfp16 (for HNSW) while passing the request to Faiss.
Release Timeline and ARM Support
Being the maintainers of k-NN repository, we cannot release partial features. So, we also want to add support for ARM architecture with optimization and release the complete SQfp16 feature (which includes both x86 and ARM support) in OpenSearch version 2.12.

cc: @vamshin @navneet1v @jmazanec15

naveentatikonda · 2023-11-22T07:44:09Z

@naveentatikonda

hi, I did more test with avx2 in Intel(R) Xeon(R) architecture, 8thread, 128 sift 1M dataset, i think HNSWSQfp16 is pretty good in time and memory

type IndexTime Build Index Memory File Size Query Index Memory Query Time
HNSWSQfp16(avx2) 28305ms 1.535 GB 512 MB(-32%) 648 MB (-27%) 4958 ms (-9.8%)
HNSWFLAT(avx2) 35543ms 1.772 GB 756 MB 892 MB 5498 ms
HNSWSQfp16 166505 ms 1.535 GB 512 MB(-32%) 647 MB 20578 ms(+54%)
HNSWFLAT 46688ms 1.768 GB 756 MB 891 MB 9012 ms

@luyuncheng I'm curious to know how did you get these results. I mean did you run these tests directly on Faiss or through OpenSearch. Can you share more details ?

luyuncheng · 2023-11-22T14:15:33Z

@naveentatikonda Thanks for your reply!!

Protocol and Index Mapping
Index Description or Implementation

            "parameters": {
              "encoder": {
                 "name": "SQfp16"
              } 
              "ef_construction": 128,
              "m": 24
            }

LGTM for the protocol, thanks!!

luyuncheng · 2023-11-22T14:20:11Z

@naveentatikonda

I'm curious to know how did you get these results. I mean did you run these tests directly on Faiss or through OpenSearch.

i did it on OpenSearch.

Can you share more details

in #1139 , i added a memory tests, so it can easily using faiss_wrapper to create a new faiss index like opensearch jni did. when all native tests running ok with jni code, i added some parameter encode in the opensearch java code.

naveentatikonda · 2023-11-30T17:50:37Z

Updated the following results in Description of the issue:

Benchmarking Results on ARM

As we have seen above, after adding the Faiss AVX2 optimization helped to improve the overall performance. But, AVX2 optimization only supports x86. As of now we don't have a similar optimization added for SQ in Faiss to support ARM. Ran few tests on ARM instances and shared the perf results below:

Recall and Storage Results

S. No.	Datasets	Dimension of Vector	Train Data Size	Query Data Size	Training Data Range	Query Data Range	Space Type	faiss-hnsw Recall	faiss sqfp16 Recall	faiss-hnsw Index Size	faiss sqfp16 Index Size	% reduction in storage
1	gist-960-euclidean	960	1,000,000	1,000	[ 0.0, 1.4772]	[ 0.0, 0.7285 ]	L2	0.99783	0.99831	16.6 gb	14.8 gb	10.84 %
2	sift-128-euclidean	128	1,000,000	10,000	[ 0.0, 218.0 ]	[ 0.0, 184.0 ]	L2	0.99989	0.99989	1.4 gb	1.1 gb	21.4 %
3	mnist-784-euclidean	784	60, 000	10,000	[ 0.0, 255.0 ]	[ 0.0, 255.0 ]	L2	1	1	400.3 mb	310.4 mb	22.43 %
4	last-fm	65	292,385	50,000			innerproduct	0.99897	0.99857	487.8 mb	451.5 mb	7.4 %

Indexing and Querying Results

S. No.	Datasets	Dimension of Vector	Space Type	Primary Shards	faiss hnsw - Indexing Throughput (docs/sec)	faiss sqfp16 - Indexing Throughput (docs/sec)	faiss sqfp16 NEON - Indexing Throughput (docs/sec)	faiss hnsw - Query time p50 (ms)	faiss sqfp16 - Query time p50 (ms)	faiss sqfp16 NEON - Query time p50 (ms)	faiss hnsw - Query time p90 (ms)	faiss sqfp16 - Query time p90 (ms)	faiss sqfp16 NEON - Query time p90 (ms)	faiss hnsw - Query time p99 (ms)	faiss sqfp16 - Query time p99 (ms)	faiss sqfp16 NEON - Query time p99 (ms)
1	gist-960-euclidean	960	L2	24	983	952	997	22	73.2	34	24.1	84	38.1	25.6	88.2	40.1
2	sift-128-euclidean	128	L2	24	12869	6169	11126	8	18	10	8	20	11	9	21.9	11.1
3	sift-128-euclidean	128	L2	8	12182	4750	8992	4	10	5.4	4.9	11	6	5	12	7
4	mnist-784-euclidean	784	L2	24	4962	2910	4832	9	25	11	9.9	27	11.1	10.1	28	12.1
5	last-fm	65	innerproduct	24	9759		9845	6		6	6		6	6		7
6	last-fm	65	innerproduct	8	9515		9151	4		4	4		4	4		4

Memory Results

S. No.	Datasets	Dimension of Vector	Space Type	faiss hnsw - Graph Memory Usage (MB)	faiss sqfp16 - Graph Memory Usage (MB)	% reduction in Native Memory
1	gist-960-euclidean	960	L2	3807.34	1976.29	48 %
2	sift-128-euclidean	128	L2	633.51	389.37	38.5 %
3	mnist-784-euclidean	784	L2	188.14	98.42	47.7 %
4	last-fm	65	innerproduct	114.97	78.72	31.53 %

Observations

The recall is same with SQ.
The query latencies with SQ are a bit worse consuming 2 to 3.5 times more than what we see without SQ. But, far better when compared to those on x86 without AVX2 optimization.
The ingestion time is almost same. But, refresh and force merge are worse with SQ.
Like in x86, there is a significant drop in memory and storage utilization using SQ.
Adding SIMD optimization support using NEON for ARM helped to improve the query latencies and indexing throughput which are close to the corresponding metrics of faiss-hnsw without encoder (Except for large datasets like gist-960-euclidean, there is still a noticeable increase in query latencies).

navneet1v · 2023-12-06T00:15:22Z

@naveentatikonda can we run the benchmarks with 1 shard. I see that all the experiments are done with 8,24 shards. Also, can you provide what was the number of shards for the results which you have posted.

naveentatikonda · 2023-12-06T04:08:34Z

@naveentatikonda can we run the benchmarks with 1 shard. I see that all the experiments are done with 8,24 shards. Also, can you provide what was the number of shards for the results which you have posted.

@navneet1v The test results that were posted above were ran with 24 shards and 8 shards (for some datasets) and 0 replicas. Now, I'm rerunning the benchmarking tests using 1 shard, 8 and 24 shards (to compare with the existing results) with the updated SQ changes in this PR facebookresearch/faiss#3141

jmazanec15 · 2023-12-06T16:22:57Z

last-fm

With dimension 65, Im not sure the code path will be hit. Could you just run one sift with IP and see what the metrics diffs are?

Ref:

jmazanec15 · 2023-12-06T16:24:28Z

Also @naveentatikonda can you add in description tracking faiss issue? And then also the branch you are working on your changes with?

naveentatikonda · 2023-12-06T19:11:04Z

last-fm

With dimension 65, Im not sure the code path will be hit. Could you just run one sift with IP and see what the metrics diffs are?

Ref:

https://github.com/facebookresearch/faiss/blob/main/faiss/impl/ScalarQuantizer.cpp#L1164

https://github.com/facebookresearch/faiss/blob/main/faiss/impl/ScalarQuantizer.cpp#L1190

Good catch Jack. It won't hit the SIMD logic as it isn't a multiple of 8. Will run using some sift dataset to validate the query latencies.

naveentatikonda · 2023-12-06T19:23:52Z

Benchmarking results on x86 architecture with AVX2 using the new logic in SQ

These are the benchmarking results on x86 architecture with AVX2 using the new logic in SQ (with inline macros and other changes using SHUFFLE) included in this PR facebookresearch/faiss#3141

Recall and Storage Results

Note - We are seeing a drop in recall below for faiss-hnsw when compared with faiss-hnsw sqfp16 because the hyper parameter efConstruction used for sqfp16 is 512 whereas for faiss-hnsw the value was 256. Higher the value of efConstruction leads to more accurate graph and better recall.

S. No.	Datasets	Dimension of Vector	Primary Shards	Train Data Size	Query Data Size	Training Data Range	Query Data Range	Space Type	faiss-hnsw Recall	faiss-hnsw sqfp16 Recall	faiss-hnsw Index Size	faiss-hnsw sqfp16 Index Size	% reduction in storage
1	gist-960-euclidean	960	1	1,000,000	1,000	[ 0.0, 1.4772]	[ 0.0, 0.7285 ]	L2	0.93751	0.95368	16.6 gb	14.8 gb	10.84 %
2	gist-960-euclidean	960	24	1,000,000	1,000	[ 0.0, 1.4772]	[ 0.0, 0.7285 ]	L2	0.99779	0.99831	16.6 gb	14.8 gb	10.84 %
3	sift-128-euclidean	128	1	1,000,000	10,000	[ 0.0, 218.0 ]	[ 0.0, 184.0 ]	L2	0.99826	0.99867	1.4 gb	1.1 gb	21.4 %
4	sift-128-euclidean	128	24	1,000,000	10,000	[ 0.0, 218.0 ]	[ 0.0, 184.0 ]	L2	0.99989	0.99989	1.4 gb	1.1 gb	21.4 %
5	mnist-784-euclidean	784	1	60, 000	10,000	[ 0.0, 255.0 ]	[ 0.0, 255.0 ]	L2	0.99987	0.99993	400.1 mb	310.4 mb	22.43 %
6	mnist-784-euclidean	784	24	60, 000	10,000	[ 0.0, 255.0 ]	[ 0.0, 255.0 ]	L2	1	1	400.1 mb	310.4 mb	22.43 %

Indexing and Querying Results

S. No.	Datasets	Dimension of Vector	Primary Shards	Space Type	faiss-hnsw avx2 - Indexing Throughput	faiss sqfp16 avx2 - Indexing Throughput	faiss-hnsw avx2 - Query time p50 (ms)	faiss sqfp16 avx2 - Query time p50 (ms)	faiss-hnsw avx2 - Query time p90 (ms)	faiss sqfp16 avx2 - Query time p90 (ms)	faiss-hnsw avx2 - Query time p99 (ms)	faiss sqfp16 avx2 - Query time p99 (ms)
1	gist-960-euclidean	960	1	L2	683	686	34	35.7	35	36.7	37	39.7
2	gist-960-euclidean	960	24	L2	1035	1007	24.5	25.1	28	28.3	30.6	30.5
3	sift-128-euclidean	128	1	L2	4722	3269	7.1	7	8	8	8.5	8.1
4	sift-128-euclidean	128	24	L2	13128	12690	8	7.2	9	8.2	9.9	9.1
5	mnist-784-euclidean	784	1	L2	2921	2767	12.1	12	13	13	13.2	13.1
6	mnist-784-euclidean	784	24	L2	5759	5796	8	8	9	8	10	9.1

Also, the old benchmarking results are posted below for comparison:

Indexing and Querying Results (Old Results)

S. No.	Datasets	Dimension of Vector	Space Type	Primary Shards	faiss hnsw avx2- Indexing Throughput (docs/sec)	faiss hnsw sqfp16 avx2- Indexing Throughput (docs/sec)	faiss hnsw avx2- Query time p50 (ms)	faiss hnsw sqfp16 avx2- Query time p50 (ms)	faiss hnsw avx2- Query time p90 (ms)	faiss hnsw sqfp16 avx2- Query time p90 (ms)	faiss hnsw avx2- Query time p99 (ms)	faiss hnsw sqfp16 avx2- Query time p99 (ms)
1	gist-960-euclidean	960	L2	24	1016	1059	26	25.6	29.3	28.9	32	31.1
2	sift-128-euclidean	128	L2	24	13178	13664	9.1	9	10.4	10	11.1	11
3	mnist-784-euclidean	784	L2	24	5711	5734	10	9	11	10	11.3	10.3
4	fashion-mnist-784-euclidean	784	L2	24	5325	5296	9	8	9.1	9	10.1	9.6

Observations

There is a slight improvement in query latencies by 1 to 2 ms when compared with old benchmarking results.

cc: @navneet1v @jmazanec15

navneet1v · 2023-12-06T20:53:22Z

Thanks @naveentatikonda . Seems like the new changes helped reduce the latency further.

jmazanec15 · 2023-12-07T17:01:25Z

This looks good to me. Lets go ahead and start trying to get this change into faiss! Nice work!

naveentatikonda · 2023-12-07T19:41:32Z

Created a draft PR to faiss repo. Pls take a look.
facebookresearch/faiss#3166

cc: @vamshin @navneet1v @jmazanec15

naveentatikonda · 2024-02-19T02:15:56Z

Benchmarking results on x86 architecture with AVX2 using the new logic in SQ

These are the benchmarking results on x86 architecture with AVX2 using the new logic in SQ (with inline macros and other changes using SHUFFLE) included in this PR facebookresearch/faiss#3141

Recall and Storage Results

Note - We are seeing a drop in recall below for faiss-hnsw when compared with faiss-hnsw sqfp16 because the hyper parameter efConstruction used for sqfp16 is 512 whereas for faiss-hnsw the value was 256. Higher the value of efConstruction leads to more accurate graph and better recall.

As mentioned in the note, we have seen a drop in recall for faiss-hnsw due to the hyperparamter efConstruction. The updated recall values are listed below using the same default value (512). The below results also include the metrics comparison for cohere-wiki-simple-embeddings-768-l2 dataset.

S. No.	Datasets	Dimension of Vector	Primary Shards	Train Data Size	Query Data Size	Training Data Range	Query Data Range	Space Type	faiss-hnsw Recall	faiss-hnsw sqfp16 Recall	faiss-hnsw Index Size	faiss-hnsw sqfp16 Index Size	% reduction in storage
1	gist-960-euclidean	960	1	1,000,000	1,000	[ 0.0, 1.4772]	[ 0.0, 0.7285 ]	L2	0.95378	0.95368	16.6 gb	14.8 gb	10.84 %
2	gist-960-euclidean	960	24	1,000,000	1,000	[ 0.0, 1.4772]	[ 0.0, 0.7285 ]	L2	0.99863	0.99831	16.6 gb	14.8 gb	10.84 %
3	sift-128-euclidean	128	1	1,000,000	10,000	[ 0.0, 218.0 ]	[ 0.0, 184.0 ]	L2	0.9987	0.99867	1.4 gb	1.1 gb	21.4 %
4	sift-128-euclidean	128	24	1,000,000	10,000	[ 0.0, 218.0 ]	[ 0.0, 184.0 ]	L2	0.99989	0.99989	1.4 gb	1.1 gb	21.4 %
5	mnist-784-euclidean	784	1	60, 000	10,000	[ 0.0, 255.0 ]	[ 0.0, 255.0 ]	L2	0.99993	0.99993	400.1 mb	310.4 mb	22.43 %
6	mnist-784-euclidean	784	24	60, 000	10,000	[ 0.0, 255.0 ]	[ 0.0, 255.0 ]	L2	1	1	400.1 mb	310.4 mb	22.43 %
7	cohere-wiki-simple-embeddings-768	768	1	475,858	10,000	[ -4.1561704, 5.5478516 ]	[ -4.065383, 5.4902344 ]	L2	0.98625	0.98616	8 gb	7.4 gb	0.075 %
8	cohere-wiki-simple-embeddings-768	768	24	475,858	10,000	[ -4.1561704, 5.5478516 ]	[ -4.065383, 5.4902344 ]	L2	0.99641	0.99616	8 gb	7.4 gb	0.075 %

Indexing, Querying and Memory Results of Cohere Dataset

S. No.	Datasets	Dimension of Vector	Space Type	Primary Shards	faiss hnsw avx2- Indexing Throughput (docs/sec)	faiss hnsw sqfp16 avx2- Indexing Throughput (docs/sec)	faiss hnsw avx2- Query time p50 (ms)	faiss hnsw sqfp16 avx2- Query time p50 (ms)	faiss hnsw avx2- Query time p90 (ms)	faiss hnsw sqfp16 avx2- Query time p90 (ms)	faiss hnsw avx2- Query time p99 (ms)	faiss hnsw sqfp16 avx2- Query time p99 (ms)
1	cohere-wiki-simple-embeddings-768	768	L2	1	782	792	33	33	34	34.4	35	35.8
2	cohere-wiki-simple-embeddings-768	768	L2	24	1175	1190	21	20	23	22	24.2	24

S. No.	Datasets	Dimension of Vector	Space Type	faiss hnsw - Graph Memory Usage (MB)	faiss-hnsw sqfp16 - Graph Memory Usage (MB)	% reduction in Native Memory
1	cohere-wiki-simple-embeddings-768	768	L2	1463.1992	766.1436	47.6 %

naveentatikonda · 2024-03-14T17:19:11Z

Benchmarking Results Comparison Between Faiss HNSW with AVX2 and Faiss HNSW SQFP16 with AVX2 for ms_marco-1m and cohere-768-1m-IP datsets

Recall Results

S. No.	Datasets	Dimension of Vector	Primary Shards	Train Data Size	Query Data Size	Training Data Range	Query Data Range	Space Type	faiss-hnsw Recall	faiss-hnsw sqfp16 Recall
1	ms_marco	384	1	1,000,000	50,000	[ -0.31676134, 0.31685343 ]	[ -0.2875412, 0.30342942 ]	L2	0.93597	0.93597
2	ms_marco	384	24	1,000,000	50,000	[ -0.31676134, 0.31685343 ]	[ -0.2875412, 0.30342942 ]	L2	0.98998	0.98979
3	cohere-768-IP	768	1	1,000,000	10,000	[ -4.1073565, 5.504557 ]	[ -4.109505, 5.4809895 ]	InnerProduct	0.76391	0.76324
4	cohere-768-IP	768	8	1,000,000	10,000	[ -4.1073565, 5.504557 ]	[ -4.109505, 5.4809895 ]	InnerProduct	0.94299	0.94214
5	cohere-768-IP	768	24	1,000,000	10,000	[ -4.1073565, 5.504557 ]	[ -4.109505, 5.4809895 ]	InnerProduct	0.97981	0.97863

Indexing and Querying Results

S. No.	Datasets	Dimension of Vector	Space Type	Primary Shards	faiss hnsw avx2- Indexing Throughput (docs/sec)	faiss hnsw sqfp16 avx2- Indexing Throughput (docs/sec)	faiss hnsw avx2- Query time p50 (ms)	faiss hnsw sqfp16 avx2- Query time p50 (ms)	faiss hnsw avx2- Query time p90 (ms)	faiss hnsw sqfp16 avx2- Query time p90 (ms)	faiss hnsw avx2- Query time p99 (ms)	faiss hnsw sqfp16 avx2- Query time p99 (ms)
1	ms_marco	384	L2	1	1571	1556	15	15	15	15	16.5	16
2	ms_marco	384	L2	24	2772	2618	7	7	7	8	10.5	8
3	cohere-768-IP	768	InnerProduct	1	844	850	28	28	28	29	30.5	30
4	cohere-768-IP	768	InnerProduct	8	1271	1294	8	8	10	10	12	12
5	cohere-768-IP	768	InnerProduct	24	1437	1446	10	9	11	11	12	12

Memory Results

S. No.	Datasets	Dimension of Vector	Space Type	faiss hnsw - Graph Memory Usage (MB)	faiss-hnsw sqfp16 - Graph Memory Usage (MB)	% reduction in Native Memory
1	ms_marco	384	L2	1610	877.65	45.5 %
2	cohere-768-IP	768	InnerProduct	3074.98	1610	47.64 %

jmazanec15 · 2024-03-15T16:36:18Z

@naveentatikonda those numbers look really good

naveentatikonda · 2024-03-20T04:42:57Z

Benchmarking Results Comparison Between Faiss IVF with AVX2 and Faiss IVF SQFP16 with AVX2 for cohere-wiki-simple-embeddings-768, ms_marco-1m and cohere-768-1m-IP datsets

Recall Results

S. No.	Datasets	Dimension of Vector	Primary Shards	Train Data Size	Query Data Size	Training Data Range	Query Data Range	Space Type	nlist	nprobes	faiss-ivf Recall	faiss-ivf sqfp16 Recall
1	cohere-wiki-simple-embeddings-768	768	1	475,858	10,000	[ -4.1561704, 5.5478516 ]	[ -4.065383, 5.4902344 ]	L2	4	2	0.93446	0.93847
2	cohere-wiki-simple-embeddings-768	768	1	475,858	10,000	[ -4.1561704, 5.5478516 ]	[ -4.065383, 5.4902344 ]	L2	128	4	0.77699	0.7956
3	cohere-wiki-simple-embeddings-768	768	1	475,858	10,000	[ -4.1561704, 5.5478516 ]	[ -4.065383, 5.4902344 ]	L2	128	8	0.88069	0.88745
4	cohere-wiki-simple-embeddings-768	768	1	475,858	10,000	[ -4.1561704, 5.5478516 ]	[ -4.065383, 5.4902344 ]	L2	512	8	0.81003	0.80722
5	cohere-wiki-simple-embeddings-768	768	8	475,858	10,000	[ -4.1561704, 5.5478516 ]	[ -4.065383, 5.4902344 ]	L2	128	8	0.88069	0.88745
6	cohere-wiki-simple-embeddings-768	768	24	475,858	10,000	[ -4.1561704, 5.5478516 ]	[ -4.065383, 5.4902344 ]	L2	4	2	0.93588	0.91818
7	ms_marco	384	1	1,000,000	50,000	[ -0.31676134, 0.31685343 ]	[ -0.2875412, 0.30342942 ]	L2	64	4	0.79061	0.77586
8	ms_marco	384	1	1,000,000	50,000	[ -0.31676134, 0.31685343 ]	[ -0.2875412, 0.30342942 ]	L2	128	4	0.74512	0.74619
9	ms_marco	384	1	1,000,000	50,000	[ -0.31676134, 0.31685343 ]	[ -0.2875412, 0.30342942 ]	L2	128	8	0.83593	0.83357
10	ms_marco	384	24	1,000,000	50,000	[ -0.31676134, 0.31685343 ]	[ -0.2875412, 0.30342942 ]	L2	128	8	0.83727	0.83758
11	cohere-768-IP	768	1	1,000,000	10,000	[ -4.1073565, 5.504557 ]	[ -4.109505, 5.4809895 ]	InnerProduct	128	8	0.79611	0.79028
12	cohere-768-IP	768	8	1,000,000	10,000	[ -4.1073565, 5.504557 ]	[ -4.109505, 5.4809895 ]	InnerProduct	128	8	0.78425	0.79671
13	cohere-768-IP	768	24	1,000,000	10,000	[ -4.1073565, 5.504557 ]	[ -4.109505, 5.4809895 ]	InnerProduct	128	8	0.80341	0.80161

Indexing and Querying Results

S. No.	Datasets	Dimension of Vector	Space Type	Primary Shards	nlist	nprobes	faiss ivf avx2- Indexing Throughput (docs/sec)	faiss ivf sqfp16 avx2- Indexing Throughput (docs/sec)	faiss ivf avx2- Query time p50 (ms)	faiss ivf sqfp16 avx2- Query time p50 (ms)	faiss ivf avx2- Query time p90 (ms)	faiss ivf sqfp16 avx2- Query time p90 (ms)	faiss ivf avx2- Query time p99 (ms)	faiss ivf sqfp16 avx2- Query time p99 (ms)
1	cohere-wiki-simple-embeddings-768	768	L2	1	4	2	676	701	155.5	113	175.5	125	188.5	131.5
2	cohere-wiki-simple-embeddings-768	768	L2	1	128	4	692	494	42	36	50	40	54	43
3	cohere-wiki-simple-embeddings-768	768	L2	1	128	8	691	493	52	40	62	45	68.5	49
4	cohere-wiki-simple-embeddings-768	768	L2	1	512	8	684	514	40	33	46.5	37.5	51.5	39.5
5	cohere-wiki-simple-embeddings-768	768	L2	8	128	8	962	1081	13	10	16.5	12	19	14
6	cohere-wiki-simple-embeddings-768	768	L2	24	4	2	1031	1048	33.5	21	37	22.5	40	24
7	ms_marco	384	L2	1	64	4	1000	1042	30	23	35	25.5	39	28
8	ms_marco	384	L2	1	128	4	1014	1045	23	19	26	20	28	21
9	ms_marco	384	L2	1	128	8	1015	1041	29.5	22.5	33	25	36	26.5
10	ms_marco	384	L2	24	128	8	1981	2266	6	5	7	6	8	6
11	cohere-768-IP	768	InnerProduct	1	128	8	599	680	72	52.5	97.5	64.5	113.5	74.5
12	cohere-768-IP	768	InnerProduct	8	128	8	969	1096	19	13	25	16	29	19
13	cohere-768-IP	768	InnerProduct	24	128	8	1036	1182	16	11	23	14	26	17

Memory Results

S. No.	Datasets	Dimension of Vector	Space Type	faiss ivf - Graph Memory Usage (MB)	faiss-ivf sqfp16 - Graph Memory Usage (MB)	% reduction in Native Memory
1	cohere-wiki-simple-embeddings-768	768	L2	1408.02	710.96	49.51 %
1	ms_marco	384	L2	1492.27	759.85	49.08 %
2	cohere-768-IP	768	InnerProduct	2961.62	1496.77	49.46 %

github-actions bot added the untriaged label Sep 15, 2023

naveentatikonda added v2.11.0 and removed untriaged labels Sep 15, 2023

naveentatikonda self-assigned this Sep 15, 2023

naveentatikonda changed the title ~~Add Support for Faiss Scalar Quantization FP16 (SQfp16)~~ [RFC] Add Support for Faiss Scalar Quantization FP16 (SQfp16) Sep 15, 2023

naveentatikonda changed the title ~~[RFC] Add Support for Faiss Scalar Quantization FP16 (SQfp16)~~ [RFC] Faiss Scalar Quantization FP16 (SQfp16) Sep 15, 2023

naveentatikonda mentioned this issue Sep 15, 2023

[DOC] Faiss Scalar Quantization FP16 (SQfp16) opensearch-project/documentation-website#5038

Closed

4 tasks

naveentatikonda added v2.12.0 and removed v2.11.0 labels Oct 3, 2023

naveentatikonda added this to Vector Search RoadMap Oct 5, 2023

github-project-automation bot moved this to Backlog in Vector Search RoadMap Oct 5, 2023

naveentatikonda moved this from Backlog to 2.12.0 in Vector Search RoadMap Oct 5, 2023

naveentatikonda added enhancement k-NN labels Oct 5, 2023

This was referenced Jan 25, 2024

Add support for Faiss SQFP16 Quantization and enable AVX2 Optimization #1421

Merged

Change the Artifact Build Process for k-NN Plugin opensearch-project/opensearch-build#4386

Closed

naveentatikonda changed the title ~~[RFC] Faiss Scalar Quantization FP16 (SQfp16)~~ [RFC] Faiss Scalar Quantization FP16 (SQfp16) and enabling SIMD (AVX2 and NEON) Feb 5, 2024

naveentatikonda moved this from 2.12.0 to 2.13.0 in Vector Search RoadMap Feb 6, 2024

naveentatikonda added v2.13.0 and removed v2.12.0 labels Feb 6, 2024

naveentatikonda mentioned this issue Feb 21, 2024

Add Range Validation for SQFP16 #1493

Merged

5 tasks

naveentatikonda mentioned this issue Mar 4, 2024

Detect AVX2 Dynamically on the System #1503

Merged

5 tasks

This was referenced Mar 18, 2024

SQFP16 Range Validation for Faiss IVF Models #1557

Merged

Faiss SQFP16 Range Validation and Clipping #1562

Merged

naveentatikonda closed this as completed Mar 20, 2024

github-project-automation bot moved this from 2.13.0 to ✅ Done in Vector Search RoadMap Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Faiss Scalar Quantization FP16 (SQfp16) and enabling SIMD (AVX2 and NEON) #1138

[RFC] Faiss Scalar Quantization FP16 (SQfp16) and enabling SIMD (AVX2 and NEON) #1138

naveentatikonda commented Sep 15, 2023 •

edited

Loading

jmazanec15 commented Sep 15, 2023

navneet1v commented Sep 18, 2023

naveentatikonda commented Sep 18, 2023

naveentatikonda commented Sep 18, 2023 •

edited

Loading

luyuncheng commented Nov 20, 2023

naveentatikonda commented Nov 20, 2023 •

edited

Loading

luyuncheng commented Nov 20, 2023 •

edited

Loading

luyuncheng commented Nov 20, 2023 •

edited

Loading

naveentatikonda commented Nov 20, 2023

luyuncheng commented Nov 21, 2023

luyuncheng commented Nov 21, 2023 •

edited

Loading

luyuncheng commented Nov 21, 2023 •

edited

Loading

naveentatikonda commented Nov 22, 2023

naveentatikonda commented Nov 22, 2023

luyuncheng commented Nov 22, 2023

luyuncheng commented Nov 22, 2023

naveentatikonda commented Nov 30, 2023

navneet1v commented Dec 6, 2023

naveentatikonda commented Dec 6, 2023

jmazanec15 commented Dec 6, 2023

jmazanec15 commented Dec 6, 2023

naveentatikonda commented Dec 6, 2023

naveentatikonda commented Dec 6, 2023 •

edited

Loading

navneet1v commented Dec 6, 2023

jmazanec15 commented Dec 7, 2023

naveentatikonda commented Dec 7, 2023

naveentatikonda commented Feb 19, 2024

Benchmarking results on x86 architecture with AVX2 using the new logic in SQ

Recall and Storage Results

naveentatikonda commented Mar 14, 2024

jmazanec15 commented Mar 15, 2024

naveentatikonda commented Mar 20, 2024

[RFC] Faiss Scalar Quantization FP16 (SQfp16) and enabling SIMD (AVX2 and NEON) #1138

[RFC] Faiss Scalar Quantization FP16 (SQfp16) and enabling SIMD (AVX2 and NEON) #1138

Comments

naveentatikonda commented Sep 15, 2023 • edited Loading

Problem Statement

How Does the User Experience looks like ?

Benchmarking Results

Benchmarking on POC (Old Results and Old Observations)

Setup Configuration

Recall and Storage Results

Indexing and Querying Results

Memory Results

Observations (Old and outdated)

with Warmup and AVX2

Indexing and Querying Results

Observations (with AVX2)

Benchmarking Results on ARM

Recall and Storage Results

Indexing and Querying Results

Memory Results

Observations

jmazanec15 commented Sep 15, 2023

navneet1v commented Sep 18, 2023

naveentatikonda commented Sep 18, 2023

naveentatikonda commented Sep 18, 2023 • edited Loading

luyuncheng commented Nov 20, 2023

naveentatikonda commented Nov 20, 2023 • edited Loading

luyuncheng commented Nov 20, 2023 • edited Loading

luyuncheng commented Nov 20, 2023 • edited Loading

naveentatikonda commented Nov 20, 2023

luyuncheng commented Nov 21, 2023

luyuncheng commented Nov 21, 2023 • edited Loading

luyuncheng commented Nov 21, 2023 • edited Loading

naveentatikonda commented Nov 22, 2023

naveentatikonda commented Nov 22, 2023

luyuncheng commented Nov 22, 2023

luyuncheng commented Nov 22, 2023

naveentatikonda commented Nov 30, 2023

Benchmarking Results on ARM

Recall and Storage Results

Indexing and Querying Results

Memory Results

Observations

navneet1v commented Dec 6, 2023

naveentatikonda commented Dec 6, 2023

jmazanec15 commented Dec 6, 2023

jmazanec15 commented Dec 6, 2023

naveentatikonda commented Dec 6, 2023

naveentatikonda commented Dec 6, 2023 • edited Loading

Benchmarking results on x86 architecture with AVX2 using the new logic in SQ

Recall and Storage Results

Indexing and Querying Results

Indexing and Querying Results (Old Results)

navneet1v commented Dec 6, 2023

jmazanec15 commented Dec 7, 2023

naveentatikonda commented Dec 7, 2023

naveentatikonda commented Feb 19, 2024

Benchmarking results on x86 architecture with AVX2 using the new logic in SQ

Recall and Storage Results

Indexing, Querying and Memory Results of Cohere Dataset

naveentatikonda commented Mar 14, 2024

Benchmarking Results Comparison Between Faiss HNSW with AVX2 and Faiss HNSW SQFP16 with AVX2 for ms_marco-1m and cohere-768-1m-IP datsets

Recall Results

Indexing and Querying Results

Memory Results

jmazanec15 commented Mar 15, 2024

naveentatikonda commented Mar 20, 2024

Benchmarking Results Comparison Between Faiss IVF with AVX2 and Faiss IVF SQFP16 with AVX2 for cohere-wiki-simple-embeddings-768, ms_marco-1m and cohere-768-1m-IP datsets

Recall Results

Indexing and Querying Results

Memory Results

naveentatikonda commented Sep 15, 2023 •

edited

Loading

naveentatikonda commented Sep 18, 2023 •

edited

Loading

naveentatikonda commented Nov 20, 2023 •

edited

Loading

luyuncheng commented Nov 20, 2023 •

edited

Loading

luyuncheng commented Nov 20, 2023 •

edited

Loading

luyuncheng commented Nov 21, 2023 •

edited

Loading

luyuncheng commented Nov 21, 2023 •

edited

Loading

naveentatikonda commented Dec 6, 2023 •

edited

Loading