You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+131-44
Original file line number
Diff line number
Diff line change
@@ -16,84 +16,130 @@ pgvecto.rs is a Postgres extension that provides vector similarity search functi
16
16
- 🦀 **Rewrite in Rust**: Rewriting in Rust offers benefits such as improved memory safety, better performance, and reduced **maintenance costs** over time.
17
17
- 🙋 **Community**: People loves Rust We are happy to help you with any questions you may have. You could join our [Discord](https://discord.gg/KqswhpVgdU) to get in touch with us.
psql -U postgres -c 'ALTER SYSTEM SET shared_preload_libraries = "vectors"'
40
53
```
54
+
41
55
You need restart your PostgreSQL server for the changes to take effect, like `systemctl restart postgresql.service`.
56
+
42
57
</details>
43
58
59
+
<details>
60
+
<summary>Install from release</summary>
61
+
62
+
Download the deb package in the release page, and type `sudo apt install vectors-pg15-*.deb` to install the deb package.
63
+
64
+
</details>
65
+
66
+
Configure your PostgreSQL by modifying the `shared_preload_libraries` to include `vectors.so`.
67
+
68
+
```sh
69
+
psql -U postgres -c 'ALTER SYSTEM SET shared_preload_libraries = "vectors.so"'
70
+
```
71
+
72
+
You need restart the PostgreSQL cluster.
73
+
74
+
```
75
+
sudo systemctl restart postgresql.service
76
+
```
44
77
45
-
## Install the extension in postgres
78
+
Connect to the database and enable the extension.
46
79
47
80
```sql
48
-
-- install the extension
49
81
DROP EXTENSION IF EXISTS vectors;
50
82
CREATE EXTENSION vectors;
51
-
-- check the extension related functions
52
-
\df+
53
83
```
54
84
55
-
## Get started with pgvecto.rs
85
+
## Get started
86
+
87
+
pgvecto.rs allows columns of a table to be defined as vectors.
88
+
89
+
The data type `vector(n)` denotes an n-dimensional vector. The `n` within the brackets signifies the dimensions of the vector. For instance, `vector(1000)` would represent a vector with 1000 dimensions, so you could create a table like this.
90
+
91
+
```sql
92
+
-- create table with a vector column
93
+
94
+
CREATETABLEitems (
95
+
id bigserialPRIMARY KEY,
96
+
embedding vector(3) NOT NULL
97
+
);
98
+
```
99
+
100
+
You can then populate the table with vector data as follows.
101
+
102
+
```sql
103
+
-- insert values
104
+
105
+
INSERT INTO items (embedding)
106
+
VALUES ('[1,2,3]'), ('[4,5,6]');
107
+
```
56
108
57
-
We support three operators to calculate the distance between two vectors:
109
+
We support three operators to calculate the distance between two vectors.
58
110
59
-
-`<->`: square Euclidean distance
60
-
-`<#>`: negative dot product distance
61
-
-`<=>`: negative square cosine distance
111
+
-`<->`: squared Euclidean distance, defined as $\Sigma (x_i - y_i) ^ 2$.
112
+
-`<#>`: negative dot product distance, defined as $- \Sigma x_iy_i$.
113
+
-`<=>`: negative squared cosine distance, defined as $- \frac{(\Sigma x_iy_i)^2}{\Sigma x_i^2 \Sigma y_i^2}$.
62
114
63
115
```sql
64
116
-- call the distance function through operators
65
117
66
-
--square Euclidean distance
118
+
--squared Euclidean distance
67
119
SELECT'[1, 2, 3]'<->'[3, 2, 1]';
68
-
-- dot product distance
120
+
--negative dot product distance
69
121
SELECT'[1, 2, 3]'<#> '[3, 2, 1]';
70
-
-- cosine distance
122
+
--negative square cosine distance
71
123
SELECT'[1, 2, 3]'<=>'[3, 2, 1]';
72
124
```
73
125
74
-
Note that, "square Euclidean distance" is defined as $ \Sigma (x_i - y_i) ^ 2 $, "negative dot product distance" is defined as $ - \Sigma x_iy_i $, and "negative square cosine distance" is defined as $ - \frac{(\Sigma x_iy_i)^2}{\Sigma x_i^2 \Sigma y_i^2} $, so that you can use `ORDER BY` to perform a KNN search directly without a `DESC` keyword.
75
-
76
-
### Create a table
77
-
78
-
You could use the `CREATE TABLE` statement to create a table with a vector column.
126
+
You can search for a vector simply like this.
79
127
80
128
```sql
81
-
-- create table
82
-
CREATETABLEitems (id bigserialPRIMARY KEY, emb vector(3));
83
-
-- insert values
84
-
INSERT INTO items (emb) VALUES ('[1,2,3]'), ('[4,5,6]');
85
129
-- query the similar embeddings
86
-
SELECT*FROM items ORDER BYemb<->'[3,2,1]'LIMIT5;
130
+
SELECT*FROM items ORDER BYembedding<->'[3,2,1]'LIMIT5;
87
131
-- query the neighbors within a certain distance
88
-
SELECT*FROM items WHEREemb<->'[3,2,1]'<5;
132
+
SELECT*FROM items WHEREembedding<->'[3,2,1]'<5;
89
133
```
90
134
91
-
### Create an index
135
+
### Indexing
92
136
93
-
You can create an index, using HNSW algorithm and square Euclidean distance with the following SQL.
137
+
You can create an index, using squared Euclidean distance with the following SQL.
94
138
95
139
```sql
96
-
CREATEINDEXON train USING vectors (emb l2_ops)
140
+
-- Using HNSW algorithm.
141
+
142
+
CREATEINDEXON items USING vectors (embedding l2_ops)
97
143
WITH (options = $$
98
144
capacity =2097152
99
145
size_ram =4294967296
@@ -103,12 +149,10 @@ storage = "ram"
103
149
m =32
104
150
ef =256
105
151
$$);
106
-
```
107
152
108
-
Or using IVFFlat algorithm.
153
+
--- Or using IVFFlat algorithm.
109
154
110
-
```sql
111
-
CREATEINDEXON train USING vectors (emb l2_ops)
155
+
CREATEINDEXON items USING vectors (embedding l2_ops)
112
156
WITH (options = $$
113
157
capacity =2097152
114
158
size_ram =2147483648
@@ -120,22 +164,56 @@ nprobe = 10
120
164
$$);
121
165
```
122
166
123
-
The index must be built on a vector column. Failure to match the actual vector dimension with the dimension type modifier may result in an unsuccessful index building.
167
+
Now you can perform a KNN search with the following SQL simply.
124
168
125
-
The operator class determines the type of distance measurement to be used. At present, `l2_ops`, `dot_ops`, and `cosine_ops` are supported.
169
+
```sql
170
+
SELECT*, emb <->'[0, 0, 0]'AS score
171
+
FROM items
172
+
ORDER BY embedding <->'[0, 0, 0]'LIMIT10;
173
+
```
126
174
127
-
You can specify the indexing and the vectors to be stored in the disk by setting `storage_vectors = "disk"`, and `storage = "disk"`. On this condition, `size_disk` must be specified.
175
+
Please note, vector indexes are not loaded by default when PostgreSQL restarts. To load or unload the index, you can use `vectors_load` and `vectors_unload`.
128
176
129
-
Now you can perform a KNN search with the following SQL simply.
177
+
```sql
178
+
--- get the index name
179
+
\d items
130
180
131
-
```SQL
132
-
SELECT*, emb <->'[0, 0, 0, 0]'AS score FROM items ORDER BY embedding <->'[0, 0, 0, 0]'LIMIT10;
| capacity | integer | The index's capacity. The value should be greater than the number of rows in your table. |
206
+
| size_ram | integer | (Optional) The maximum amount of memory the persisent part of index can occupy. |
207
+
| size_disk | integer | (Optional) The maximum amount of disk-backed memory-mapped file size the persisent part of index can occupy. |
208
+
| storage_vectors | string |`ram` ensures that the vectors always stays in memory while `disk` suggests otherwise. |
209
+
| algorithm.ivf | table | If this table is set, the IVF algorithm will be used for the index. |
210
+
| algorithm.ivf.storage | string | (Optional) `ram` ensures that the persisent part of algorithm always stays in memory while `disk` suggests otherwise. |
211
+
| algorithm.ivf.nlist | integer | (Optional) Number of cluster units. |
212
+
| algorithm.ivf.nprobe | integer | (Optional) Number of units to query. |
213
+
| algorithm.hnsw | table | If this table is set, the HNSW algorithm will be used for the index. |
214
+
| algorithm.hnsw.storage | string | (Optional) `ram` ensures that the persisent part of algorithm always stays in memory while `disk` suggests otherwise. |
215
+
| algorithm.hnsw.m | integer | (Optional) Maximum degree of the node. |
0 commit comments