Update TileDB benchmarks #2

jparismorgan · 2024-04-05T08:09:53Z

What

Updates the TileDB benchmarks.

Results

How to run yourself

Launch Amazon Linux x86 r6i.16xlarge EC2 instance.
SSH in
Install things:

sudo yum update -y

sudo yum install git -y

(maybe - check if you already have this) sudo yum install python3.11 -y
sudo yum install python3.11-pip -y
If you get mismatched Python versions
	sudo ln -sf /usr/bin/python3.11 /usr/bin/python
	sudo ln -sf /usr/bin/pip3.11 /usr/bin/pip
	sudo ln -sf /usr/bin/python3.11 /usr/bin/python3
	sudo ln -sf /usr/bin/pip3.11 /usr/bin/pip3

(maybe - check if you already have this) sudo yum install docker -y
sudo service docker start
sudo usermod -a -G docker ec2-user
(Log back out and log back in)
(Run this and make sure you see "docker") groups

mkdir repo
cd repo
git clone https://github.com/TileDB-Inc/ann-benchmarks.git
cd ann-benchmarks
git checkout npapa/tiledb
pip3 install -r requirements.txt

Then to run TileDB:

python install.py --algorithm tiledb
pip3 install requests==2.28.1
- Fix for: urllib3 v2 incompatibility docker/docker-py#3113 (comment)
python run.py --dataset sift-128-euclidean --algorithm tiledb-ivf-flat --force --batch
- Note that only --batch mode works well currently, we need to investigate query mode. It will run but our performance is quite bad.
sudo chmod -R 777 results/sift-128-euclidean/10/tiledb-ivf-flat
- Fix for permissions error on in results/sift-128-euclidean/10/tiledb-ivf-flat
python create_website.py
Download sift-128-euclidean_10_euclidean-batch.html and sift-128-euclidean_10_euclidean-batch.png

Then to run all algorithms:

python install.py --algorithm tiledb
(note it will take a while, you can leave it overnight) python run.py --dataset sift-128-euclidean --batch

NikolaosPapailiou

Thanks for updating this!

NikolaosPapailiou · 2024-04-05T09:20:27Z

ann_benchmarks/algorithms/tiledb/module.py

@@ -47,15 +66,18 @@ def fit(self, X):
        elif X.dtype == "float32":


We can update this to use the numpy arrays directly for ingestion

Thanks, added this as a TODO and will fix this the next time I set up an EC2 instance and run this. Doing this b/c it seems nice to get this PR merged versus leaving in flux until then.

NikolaosPapailiou · 2024-04-05T09:21:19Z

ann_benchmarks/algorithms/tiledb/module.py

+            index_uri=array_uri,
+            source_uri=source_uri,
+            source_type=source_type,
+            size=X.shape[0],


I think we can just use the defaults for most params in ingestion

Thanks, added this as a TODO and will fix this the next time I set up an EC2 instance and run this. Doing this b/c it seems nice to get this PR merged versus leaving in flux until then.

NikolaosPapailiou · 2024-04-05T09:22:06Z

ann_benchmarks/algorithms/tiledb/module.py

+            input_vectors_per_work_item=100000000,
+            mode=Mode.LOCAL
+        )
+        # memory_budget=-1 will load the data into main memory.
        self.index = IVFFlatIndex(uri=array_uri, dtype=X.dtype, memory_budget=-1)


dtype, memory_budget args could be removed as this is the default behavior.

Thanks, added this as a TODO and will fix this the next time I set up an EC2 instance and run this. Doing this b/c it seems nice to get this PR merged versus leaving in flux until then.

jparismorgan added 2 commits April 5, 2024 10:01

fix benchmarks

d4069c2

cleanup code

2065290

jparismorgan requested review from ihnorton, dudoslav and NikolaosPapailiou April 5, 2024 08:20

NikolaosPapailiou approved these changes Apr 5, 2024

View reviewed changes

add todos

0502021

jparismorgan merged commit 6e35ef6 into npapa/tiledb May 13, 2024

jparismorgan deleted the jparismorgan/tiledb-benchmarks branch May 13, 2024 20:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update TileDB benchmarks #2

Update TileDB benchmarks #2

jparismorgan commented Apr 5, 2024 •

edited

Loading

NikolaosPapailiou left a comment

NikolaosPapailiou Apr 5, 2024

jparismorgan May 13, 2024

NikolaosPapailiou Apr 5, 2024

jparismorgan May 13, 2024

NikolaosPapailiou Apr 5, 2024

jparismorgan May 13, 2024

		@@ -47,15 +66,18 @@ def fit(self, X):
		elif X.dtype == "float32":

Update TileDB benchmarks #2

Update TileDB benchmarks #2

Conversation

jparismorgan commented Apr 5, 2024 • edited Loading

What

Results

How to run yourself

NikolaosPapailiou left a comment

Choose a reason for hiding this comment

NikolaosPapailiou Apr 5, 2024

Choose a reason for hiding this comment

jparismorgan May 13, 2024

Choose a reason for hiding this comment

NikolaosPapailiou Apr 5, 2024

Choose a reason for hiding this comment

jparismorgan May 13, 2024

Choose a reason for hiding this comment

NikolaosPapailiou Apr 5, 2024

Choose a reason for hiding this comment

jparismorgan May 13, 2024

Choose a reason for hiding this comment

jparismorgan commented Apr 5, 2024 •

edited

Loading