Add Memory Evaluation For different algorithm #1139

luyuncheng · 2023-09-15T09:57:15Z

Description

When we want to introduce a new algorithm or engine, we prefer to evaluate the performance, memory, disk size. in benchmark tests, we can evaluate the performance as a single node.

But we can not evaluate a engine/algorithm takes how much memory, because in benchmark jvm make it hard to evaluate the memory only in jni layer.

in #946 we try to assess memory usage with different algorithm, and when using benchmark/intergration tests, java heap memory and other memory usage make it hard to evaluate algorithm real memory usage.

so i write a memory tests, and only use faiss_wrapper/nmslib_wrapper. it can evaluate memory usage, file size.
i use http://corpus-texmex.irisa.fr/ vector file format. and add test_util::load_data to read sift.fvecs with SIFT1M datasets.

i do some tests like following report:

SIFT1M

Algotightm	Index RES	FileSize	Query RES
HNSW32	1.8GB	634MB	769MB
NSG64	3.1GB	586MB	752MB
NSG32	3.1GB	577MB	639MB

GIST960

Algotightm	Index RES	FileSize	Query RES
HNSW32	11.2GB	3.9GB	3944MB
NSG64	12GB	3.7GB	3923MB
NSG32	12GB	3.7GB	3806MB

Usage:

go to http://corpus-texmex.irisa.fr/, and download SIFT1M dataset, and unzip into a directory like 'dataset/sift/sift_base.fvecs'
and run different tests

./bin/jni_memory_test --gtest_filter=FaissNSGQueryMemoryTest.*

Issues Resolved

#946

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

jni/CMakeLists.txt

jni/tests/faiss_memory_test.cpp

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

Update and rename faiss_memory_test.cpp to memory_test.cpp Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

jni/CMakeLists.txt

Update CMake file Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

navneet1v · 2023-09-21T05:30:35Z

@luyuncheng can you add details around what is Index RES and Query RES?

navneet1v · 2023-09-21T05:33:38Z

jni/tests/memory_test.cpp

+using ::testing::Return;
+#define GTEST_COUT std::cerr << "[          ] [ INFO ]"
+
+TEST(FaissHNSWIndexMemoryTest, BasicAssertions) {


can we add on all the tests what we are testing and what is the expectation as comments on top of all test function.

Also, I want to understand little bit here in terms of what are the failure scenario for these tests. may be you explained it in older comments, if yes can you point me there.

@navneet1v

These test just evaluate the memory only with EngineWrapper, not unit test.

When we want to introduce a new algorithm or engine, we prefer to evaluate the performance, memory, disk size. in benchmark tests, we can evaluate the performance as a single node.

But we can not evaluate a engine/algorithm takes how much memory, because in benchmark jvm make it hard to evaluate the memory only in jni layer.

so i added these code and want to evaluate different algorithm/engine in different param, at index time, query time memory usage, and time usage.

luyuncheng · 2023-09-21T07:41:18Z

@luyuncheng can you add details around what is Index RES and Query RES?

@navneet1v
Index Res: Build Graph Index takes a long time, i use a monitor to check the avg resident memory during the Build Index.
Query Res: Query 1000 vector sequential, used a monitor to check the avg resident memory during the Query Index.

jmazanec15 · 2023-09-26T22:12:09Z

In general, I really like having the ability to use the jni_wrapper in order to test our code with real data sets (not just random data). This has a lot of potential to help us debug memory problems as well as performance problems.

That being said, I think that the memory monitoring should be done outside of the tests. Adding memory monitoring inside the test may make them difficult to work across platforms. I see the tests themselves more like JNI integration tests or end to end tests or microbenchmarks. We should remove all calls to get specific memory information from faiss and change the name from memory_test to integ_test or e2e_test or microbenchmarks. Instead, to check memory, I think that they can be run with an external monitor. For instance, I believe you used gperftools at some point.

jmazanec15 · 2023-09-26T22:03:03Z

jni/tests/test_util.h

@@ -150,6 +150,16 @@ namespace test_util {

    float RandomFloat(float min, float max);

+    // Read vector file formats


Add comment about the format the data is expected to be in

jmazanec15 · 2023-09-26T22:03:40Z

jni/tests/test_util.h

+    // Read vector file formats
+    void load_data(char* filename, float*& data, unsigned& num, unsigned& dim);
+
+    // asign data into vector


nit: asign -> assign. Also, can we add more detail about how this function should be used in the comment?

navneet1v · 2024-01-31T06:06:33Z

@luyuncheng are you still working on this PR?

luyuncheng requested review from heemin32, navneet1v, VijayanB, vamshin, jmazanec15, naveentatikonda, junqiu-lei and martin-gaievski as code owners September 15, 2023 09:57

luyuncheng force-pushed the UnitTestMemory branch from bfc211b to b3e1fff Compare September 15, 2023 10:01

luyuncheng mentioned this pull request Sep 15, 2023

[Feature] Introduce New NSG Graph into KNN for faiss Engine #946

Open

luyuncheng changed the title ~~Add Memory Tests For different algorightm~~ Add Memory Evaluation For different algorightm Sep 15, 2023

jmazanec15 reviewed Sep 18, 2023

View reviewed changes

luyuncheng added 2 commits September 19, 2023 15:01

Add Memory Unit Tests

85b37c1

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

Update and rename faiss_memory_test.cpp to memory_test.cpp (#2)

6c43fa9

Update and rename faiss_memory_test.cpp to memory_test.cpp Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

luyuncheng force-pushed the UnitTestMemory branch from 897fa07 to 6c43fa9 Compare September 19, 2023 07:01

jmazanec15 reviewed Sep 19, 2023

View reviewed changes

jni/CMakeLists.txt Outdated Show resolved Hide resolved

luyuncheng added 2 commits September 20, 2023 14:46

Update CMakeLists.txt

5cf07c6

Update CMake file Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

Update memory_test.cpp

9d41972

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

navneet1v reviewed Sep 21, 2023

View reviewed changes

luyuncheng changed the title ~~Add Memory Evaluation For different algorightm~~ Add Memory Evaluation For different algorithm Sep 21, 2023

luyuncheng requested a review from jmazanec15 September 26, 2023 11:23

jmazanec15 reviewed Sep 26, 2023

View reviewed changes

luyuncheng mentioned this pull request Nov 20, 2023

[RFC] Faiss Scalar Quantization FP16 (SQfp16) and enabling SIMD (AVX2 and NEON) #1138

Closed

jmazanec15 closed this Oct 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Memory Evaluation For different algorithm #1139

Add Memory Evaluation For different algorithm #1139

luyuncheng commented Sep 15, 2023 •

edited

Loading

navneet1v commented Sep 21, 2023

navneet1v Sep 21, 2023

luyuncheng Sep 21, 2023

luyuncheng commented Sep 21, 2023

jmazanec15 commented Sep 26, 2023 •

edited

Loading

jmazanec15 Sep 26, 2023

jmazanec15 Sep 26, 2023

navneet1v commented Jan 31, 2024

		@@ -150,6 +150,16 @@ namespace test_util {

		float RandomFloat(float min, float max);

		// Read vector file formats

Add Memory Evaluation For different algorithm #1139

Add Memory Evaluation For different algorithm #1139

Conversation

luyuncheng commented Sep 15, 2023 • edited Loading

Description

Issues Resolved

Check List

navneet1v commented Sep 21, 2023

navneet1v Sep 21, 2023

Choose a reason for hiding this comment

luyuncheng Sep 21, 2023

Choose a reason for hiding this comment

luyuncheng commented Sep 21, 2023

jmazanec15 commented Sep 26, 2023 • edited Loading

jmazanec15 Sep 26, 2023

Choose a reason for hiding this comment

jmazanec15 Sep 26, 2023

Choose a reason for hiding this comment

navneet1v commented Jan 31, 2024

luyuncheng commented Sep 15, 2023 •

edited

Loading

jmazanec15 commented Sep 26, 2023 •

edited

Loading