Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update TileDB benchmarks #2

Merged
merged 3 commits into from
May 13, 2024

Conversation

jparismorgan
Copy link

@jparismorgan jparismorgan commented Apr 5, 2024

What

Updates the TileDB benchmarks.

Results

sift-128-euclidean_10_euclidean-batch

How to run yourself

  • Launch Amazon Linux x86 r6i.16xlarge EC2 instance.
  • SSH in
  • Install things:
sudo yum update -y

sudo yum install git -y

(maybe - check if you already have this) sudo yum install python3.11 -y
sudo yum install python3.11-pip -y
If you get mismatched Python versions
	sudo ln -sf /usr/bin/python3.11 /usr/bin/python
	sudo ln -sf /usr/bin/pip3.11 /usr/bin/pip
	sudo ln -sf /usr/bin/python3.11 /usr/bin/python3
	sudo ln -sf /usr/bin/pip3.11 /usr/bin/pip3

(maybe - check if you already have this) sudo yum install docker -y
sudo service docker start
sudo usermod -a -G docker ec2-user
(Log back out and log back in)
(Run this and make sure you see "docker") groups

mkdir repo
cd repo
git clone https://github.com/TileDB-Inc/ann-benchmarks.git
cd ann-benchmarks
git checkout npapa/tiledb
pip3 install -r requirements.txt

Then to run TileDB:

  • python install.py --algorithm tiledb
  • pip3 install requests==2.28.1
  • python run.py --dataset sift-128-euclidean --algorithm tiledb-ivf-flat --force --batch
    • Note that only --batch mode works well currently, we need to investigate query mode. It will run but our performance is quite bad.
  • sudo chmod -R 777 results/sift-128-euclidean/10/tiledb-ivf-flat
    • Fix for permissions error on in results/sift-128-euclidean/10/tiledb-ivf-flat
  • python create_website.py
  • Download sift-128-euclidean_10_euclidean-batch.html and sift-128-euclidean_10_euclidean-batch.png

Then to run all algorithms:

  • python install.py --algorithm tiledb
  • (note it will take a while, you can leave it overnight) python run.py --dataset sift-128-euclidean --batch

Copy link

@NikolaosPapailiou NikolaosPapailiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this!

@@ -47,15 +66,18 @@ def fit(self, X):
elif X.dtype == "float32":

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can update this to use the numpy arrays directly for ingestion

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, added this as a TODO and will fix this the next time I set up an EC2 instance and run this. Doing this b/c it seems nice to get this PR merged versus leaving in flux until then.

index_uri=array_uri,
source_uri=source_uri,
source_type=source_type,
size=X.shape[0],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just use the defaults for most params in ingestion

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, added this as a TODO and will fix this the next time I set up an EC2 instance and run this. Doing this b/c it seems nice to get this PR merged versus leaving in flux until then.

input_vectors_per_work_item=100000000,
mode=Mode.LOCAL
)
# memory_budget=-1 will load the data into main memory.
self.index = IVFFlatIndex(uri=array_uri, dtype=X.dtype, memory_budget=-1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dtype, memory_budget args could be removed as this is the default behavior.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, added this as a TODO and will fix this the next time I set up an EC2 instance and run this. Doing this b/c it seems nice to get this PR merged versus leaving in flux until then.

@jparismorgan jparismorgan merged commit 6e35ef6 into npapa/tiledb May 13, 2024
@jparismorgan jparismorgan deleted the jparismorgan/tiledb-benchmarks branch May 13, 2024 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants