Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating SBQ #72

Merged
merged 44 commits into from
May 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
8955ced
Refactor to isolate quantization code
cevian Dec 13, 2023
041db69
Optimize: don't reread node for neighbor list
cevian Dec 14, 2023
0daa77b
Optimization: memory allocations
cevian Dec 14, 2023
dc39a0f
Optimization: optimize lsr with separate storage
cevian Dec 14, 2023
6e6ccd1
preallocate vec capacity
cevian Dec 11, 2023
fe8a516
optimize pq load
cevian Dec 14, 2023
8f001d9
Make changes to distance functions
cevian Dec 15, 2023
2ff844a
Make cosine distance always positive
cevian Dec 18, 2023
30e8077
Switch to using euclidean distance for PQ always
cevian Dec 19, 2023
08df297
Optimize distance calc for PQ
cevian Dec 20, 2023
3e793a8
Initial bq implementation
cevian Dec 21, 2023
5c30fcf
Big refactor to break out storage properly and fix architecture
cevian Jan 4, 2024
ca385d5
Optimize lsr
cevian Jan 23, 2024
db9ad1d
add first xor benchmarks
cevian Jan 23, 2024
3af296c
write optimized xor func
cevian Jan 24, 2024
f9cb577
Change bq to use u64
cevian Jan 24, 2024
df4ca43
cleanup
cevian Jan 24, 2024
48da6db
Building in quantized distances instead of full distances
cevian Feb 13, 2024
f3dc51b
basic resort
cevian Mar 26, 2024
54e8000
cleanup+pq fix
cevian Mar 26, 2024
7cd2310
Make meta page backwards compatible from v1
cevian Mar 27, 2024
5089a95
Switch to versioned .so and support upgrades
cevian Mar 29, 2024
6e205b5
Adjust index options
cevian Apr 4, 2024
ca569b5
Remove PQ
cevian Apr 4, 2024
b98d741
Meta page cleanup
cevian Apr 4, 2024
904d192
change resort->rescore
cevian Apr 4, 2024
f7d72df
Support matryoshka embeddings with the num_dimensions option
cevian Apr 5, 2024
3ff4dcc
Add update test
cevian Apr 9, 2024
24fecf1
Bug fix: fix locking of buffer in the resort case
cevian Apr 22, 2024
3ae4b36
Optimize prune by using the cache in BqNodeDistanceMeasure
cevian Apr 19, 2024
926f1d5
Optimize prune by getting rid of another read
cevian Apr 19, 2024
f61d20b
Optimize write in finalize_node_at_end_of_build to be ordered
cevian Apr 19, 2024
89c19e9
cleanup resort to use explicit parameter, not capacity
cevian Apr 23, 2024
ea74738
Make rescore parameter work off if the std dev of the distances
cevian Apr 23, 2024
5cdcd43
Bug fix for num_dimensions
cevian Apr 24, 2024
cdf0719
Implement bq_compressed
cevian Apr 24, 2024
7a4d5fd
Better progress tracking and statistics
cevian Apr 24, 2024
35f0a79
Allow using multiple bits in bq compression
cevian Apr 27, 2024
b5f5b34
change default num_bits to 2 for dim=768
cevian May 3, 2024
edb94fe
cleanup tests + nits
cevian May 7, 2024
4334c6e
Adjust debug level down during index builds
cevian May 13, 2024
d6f485e
cleanup
cevian May 15, 2024
5479b32
Remove docker infra for now
cevian May 21, 2024
54fe7fc
Rename BQ->SBQ
cevian May 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 0 additions & 109 deletions .github/workflows/docker.yaml

This file was deleted.

105 changes: 0 additions & 105 deletions Dockerfile

This file was deleted.

25 changes: 19 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
Timescale Vector

Say something chat gpt.
A vector index for speeding up ANN search in `pgvector`.

🔧 Tools Setup
Building the extension requires valid rust (we build and test on 1.65), rustfmt, and clang installs, along with the postgres headers for whichever version of postgres you are running, and pgx. We recommend installing rust using the official instructions:

Building the extension requires valid rust, rustfmt, and clang installs, along with the postgres headers for whichever version of postgres you are running, and pgx. We recommend installing rust using the official instructions:
```shell
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
```
and build tools, the postgres headers, in the preferred manner for your system. You may also need to install OpenSSl. For Ubuntu you can follow the postgres install instructions then run
and build tools, the postgres headers, in the preferred manner for your system. You may also need to install OpenSSL. For Ubuntu you can follow the postgres install instructions then run

```shell
sudo apt-get install make gcc pkg-config clang postgresql-server-dev-16 libssl-dev
Expand All @@ -18,7 +19,7 @@ Next you need cargo-pgx, which can be installed with
cargo install --locked cargo-pgrx
```

You must reinstall cargo-pgx whenever you update your Rust compiler, since cargo-pgx needs to be built with the same compiler as Toolkit.
You must reinstall cargo-pgx whenever you update your Rust compiler, since cargo-pgx needs to be built with the same compiler as Timescale Vector.

Finally, setup the pgx development environment with
```shell
Expand All @@ -28,10 +29,11 @@ cargo pgrx init --pg16 pg_config
Installing from source is also available on macOS and requires the same set of prerequisites and set up commands listed above.

💾 Building and Installing the extension

Download or clone this repository, and switch to the extension subdirectory, e.g.
```shell
git clone https://github.com/timescale/timescale-vector && \
cd timescale-vector/extension
cd timescale-vector/timescale_vector
```

Then run
Expand All @@ -41,9 +43,13 @@ cargo pgrx install --release

To initialize the extension after installation, enter the following into psql:

```sql
CREATE EXTENSION timescale_vector;
```

✏️ Get Involved
The Timescale Vecotr project is still in the initial planning stage as we decide our priorities and what to implement first. As such, now is a great time to help shape the project's direction! Have a look at the list of features we're thinking of working on and feel free to comment on the features, expand the list, or hop on the Discussions forum for more in-depth discussions.

The Timescale Vector project is still in it's early stage as we decide our priorities and what to implement. As such, now is a great time to help shape the project's direction! Have a look at the list of features we're thinking of working on and feel free to comment on the features, expand the list, or hop on the Discussions forum for more in-depth discussions.

🔨 Testing
See above for prerequisites and installation instructions.
Expand All @@ -52,7 +58,14 @@ You can run tests against a postgres version pg16 using
```shell
cargo pgrx test ${postgres_version}
```

To run all tests run:
```shell
cargo test -- --ignored && cargo pgrx test ${postgres_version}
```

🐯 About Timescale

TimescaleDB is a distributed time-series database built on PostgreSQL that scales to over 10 million of metrics per second, supports native compression, handles high cardinality, and offers native time-series capabilities, such as data retention policies, continuous aggregate views, downsampling, data gap-filling and interpolation.

TimescaleDB also supports full SQL, a variety of data types (numerics, text, arrays, JSON, booleans), and ACID semantics. Operationally mature capabilities include high availability, streaming backups, upgrades over time, roles and permissions, and security.
Expand Down
25 changes: 19 additions & 6 deletions timescale_vector/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
[package]
name = "timescale_vector"
version = "0.0.2"
version = "0.0.3-dev"
edition = "2021"

[lib]
crate-type = ["cdylib"]
crate-type = ["cdylib", "rlib"]

[features]
default = ["pg16"]
Expand All @@ -14,9 +14,9 @@ pg_test = []

[dependencies]
memoffset = "0.9.0"
pgrx = "=0.11.1"
pgrx = "=0.11.4"
rkyv = { version="0.7.42", features=["validation"]}
simdeez = {version = "1.0"}
simdeez = {version = "1.0.8"}
reductive = { version = "0.9.0"}
ndarray = { version = "0.15.0", features = ["blas"] }
blas-src = { version = "0.8", features = ["openblas"] }
Expand All @@ -26,10 +26,14 @@ rand_chacha = "0.3"
rand_core = "0.6"
rand_xorshift = "0.3"
rayon = "1"

timescale_vector_derive = { path = "timescale_vector_derive" }
semver = "1.0.22"

[dev-dependencies]
pgrx-tests = "=0.11.1"
pgrx-tests = "=0.11.4"
pgrx-pg-config = "=0.11.4"
criterion = "0.5.1"
tempfile = "3.3.0"

[profile.dev]
panic = "unwind"
Expand All @@ -39,3 +43,12 @@ panic = "unwind"
opt-level = 3
lto = "fat"
codegen-units = 1
#debug = true

[[bench]]
name = "distance"
harness = false

[[bench]]
name = "lsr"
harness = false
Loading
Loading