建网站需要多少钱,云电脑免费体验30天,网站开发代理合同,重庆比较好的软件开发培训学校一、PGVector 介绍 PGVector 是一个基于 PostgreSQL 的扩展插件#xff0c;为用户提供了一套强大的向量存储和查询的功能#xff1a;
精确和近似最近邻搜索单精度#xff08;Single-precision#xff09;、半精度#xff08;Half-precision#xff09;、二进制#xff…一、PGVector 介绍 PGVector 是一个基于 PostgreSQL 的扩展插件为用户提供了一套强大的向量存储和查询的功能
精确和近似最近邻搜索单精度Single-precision、半精度Half-precision、二进制Binary和稀疏向量Sparse VectorsL2 距离L2 Distance、内积Inner Product、余弦距离Cosine Distance、L1 距离L1 Distance、汉明距离Hamming Distance和 Jaccard 距离Jaccard Distance支持 ACID 事务、点时间恢复、JOIN 操作以及 Postgres 所有的其他优秀特性
二、安装 PGVector
2.1 安装 PostgreSQL PGVector是基于PostgreSQL的扩展插件要使用PGVector需要先安装PostgreSQL(支持Postgres 12以上)PostgreSQL具体安装操作可参考PostgreSQL基本操作。
2.2 安装 PGVector # 1.下载 git clone --branch v0.7.0 https://github.com/pgvector/pgvector.git # 2.进入下载目录 cd pgvector # 3.编译安装 make make install 2.3 启用 PGVector 登录PostgreSQL数据库执行以下命令启用PGVector CREATE EXTENSION IF NOT EXISTS vector; 三、PGVector 日常使用
3.1 存储数据 创建向量字段 #建表时创建向量字段 CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3)); #已有表新增向量字段 ALTER TABLE items ADD COLUMN embedding vector(3); 插入向量数据 INSERT INTO items (embedding) VALUES ([1,2,3]), ([4,5,6]); 更新向量数据 UPDATE items SET embedding [1,2,3] WHERE id 1; 删除向量数据 DELETE FROM items WHERE id 1; 3.2 查询数据
距离函数 操作符函数距离类型- l2_distance两个向量相减得到的新向量的长度#vector_negative_inner_product两个向量内积的负值cosine_distance两个向量夹角的cos值
Get the nearest neighbors to a vector SELECT * FROM items ORDER BY embedding - [3,1,2] LIMIT 5; Get the nearest neighbors to a row SELECT * FROM items WHERE id ! 1 ORDER BY embedding - (SELECT embedding FROM items WHERE id 1) LIMIT 5; Get rows within a certain distance SELECT * FROM items WHERE embedding - [3,1,2] 5; Get the distance SELECT embedding - [3,1,2] AS distance FROM items; For inner product, multiply by -1 (since # returns the negative inner product) SELECT (embedding # [3,1,2]) * -1 AS inner_product FROM items; For cosine similarity, use 1 - cosine distance SELECT 1 - (embedding [3,1,2]) AS cosine_similarity FROM items; Average vectors SELECT AVG(embedding) FROM items; Average groups of vectors SELECT category_id, AVG(embedding) FROM items GROUP BY category_id; 3.3 HNSW 索引 HNSW索引创建了一个多层图。在速度-召回权衡方面它的查询性能优于IVFFlat但构建时间较慢且占用更多内存。另外由于没有像IVFFlat那样的训练步骤可以在表中没有数据的情况下创建索引。 Supported types are:
vector - up to 2,000 dimensionshalfvec - up to 4,000 dimensions (added in 0.7.0)bit - up to 64,000 dimensions (added in 0.7.0)sparsevec - up to 1,000 non-zero elements (added in 0.7.0) L2 distance CREATE INDEX ON items USING hnsw (embedding vector_l2_ops); Inner product CREATE INDEX ON items USING hnsw (embedding vector_ip_ops); Cosine distance CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops); L1 distance - added in 0.7.0 CREATE INDEX ON items USING hnsw (embedding vector_l1_ops); Hamming distance - added in 0.7.0 CREATE INDEX ON items USING hnsw (embedding bit_hamming_ops); Jaccard distance - added in 0.7.0 CREATE INDEX ON items USING hnsw (embedding bit_jaccard_ops); 3.4 IVFFlat 索引 IVFFlat索引将向量划分为列表然后搜索最接近查询向量的那些列表的子集。它的构建时间比HNSW快且占用更少内存但查询性能就速度-召回权衡而言较低。 Supported types are:
vector - up to 2,000 dimensionshalfvec - up to 4,000 dimensions (added in 0.7.0)bit - up to 64,000 dimensions (added in 0.7.0) L2 distance CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists 100); Inner product CREATE INDEX ON items USING ivfflat (embedding vector_ip_ops) WITH (lists 100); Cosine distance CREATE INDEX ON items USING ivfflat (embedding vector_cosine_ops) WITH (lists 100); Hamming distance - added in 0.7.0 CREATE INDEX ON items USING ivfflat (embedding bit_hamming_ops) WITH (lists 100);