-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update TileDB benchmarks #2
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating this!
@@ -47,15 +66,18 @@ def fit(self, X): | |||
elif X.dtype == "float32": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can update this to use the numpy arrays directly for ingestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, added this as a TODO and will fix this the next time I set up an EC2 instance and run this. Doing this b/c it seems nice to get this PR merged versus leaving in flux until then.
index_uri=array_uri, | ||
source_uri=source_uri, | ||
source_type=source_type, | ||
size=X.shape[0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just use the defaults for most params in ingestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, added this as a TODO and will fix this the next time I set up an EC2 instance and run this. Doing this b/c it seems nice to get this PR merged versus leaving in flux until then.
input_vectors_per_work_item=100000000, | ||
mode=Mode.LOCAL | ||
) | ||
# memory_budget=-1 will load the data into main memory. | ||
self.index = IVFFlatIndex(uri=array_uri, dtype=X.dtype, memory_budget=-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dtype, memory_budget
args could be removed as this is the default behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, added this as a TODO and will fix this the next time I set up an EC2 instance and run this. Doing this b/c it seems nice to get this PR merged versus leaving in flux until then.
What
Updates the TileDB benchmarks.
Results
How to run yourself
Then to run TileDB:
python install.py --algorithm tiledb
pip3 install requests==2.28.1
python run.py --dataset sift-128-euclidean --algorithm tiledb-ivf-flat --force --batch
--batch
mode works well currently, we need to investigate query mode. It will run but our performance is quite bad.sudo chmod -R 777 results/sift-128-euclidean/10/tiledb-ivf-flat
results/sift-128-euclidean/10/tiledb-ivf-flat
python create_website.py
sift-128-euclidean_10_euclidean-batch.html
andsift-128-euclidean_10_euclidean-batch.png
Then to run all algorithms:
python install.py --algorithm tiledb
python run.py --dataset sift-128-euclidean --batch