这是本节的多页打印视图。点击此处打印.

快速开始

介绍如何快速开始使用 Proxima Bilin Engine，包括 Proxima Bilin Engine 的安装、使用样例等。

1: 安装指南
2: 使用样例

1 - 安装指南

ProximaBE 具备极简依赖、快速部署、多平台支持等特点。其目前支持主流的 RedHat 系列 Linux 发行版本，包括 RHEL、Centos、taobao7u 等，支持 x86/x64 和 aarch64 等体系架构。

1. Docker 镜像安装

1.1. 安装前提

我们推荐安装 1.12.6 以上版本 Docker。用户可以在安装镜像前确认自己的 Docker 版本以及状态

$ sudo docker info

1.2. 拉取Docker镜像

$ docker pull ghcr.io/proximabilin/proxima-be

1.3. 启动容器

1.3.1. 快速启动容器

使用容器内置的配置文件来启动服务，便于快速上手。但是容器销毁时数据也会丢失，生产环境部署请用映射文件启动的模式。

$ sudo docker run -d --name proxima_be -p 16000:16000 -p 16001:16001 \
      ghcr.io/proximabilin/proxima-be

NOTE: 低版本 docker 需要在上述命令中增加--net=host选项

1.3.2. 映射文件启动容器

通过将本机目录映射到容器，在容器删除时数据文件仍然可用。

$ mkdir -p $HOME/proxima-be/{conf,data,log}
$ vim /$HOME/proxima-be/conf/proxima_be.conf # 镜像中有默认配置 /var/lib/proxima-be/conf/proxima_be.conf

执行下面的命令启动容器

$ sudo docker run -d --name proxima_be -p 16000:16000 -p 16001:16001 \
      -v $HOME/proxima-be/conf:/var/lib/proxima-be/conf \
      -v $HOME/proxima-be/data:/var/lib/proxima-be/data \
      -v $HOME/proxima-be/log:/var/lib/proxima-be/log \
      ghcr.io/proximabilin/proxima-be

1.3.3. 其他容器操作

停止服务

$ docker stop proxima_be

WARNING: kill 容器或者强制删除容器有可能导致数据丢失，慎用下面的命令。
$ docker kill proxima_be
$ docker rm -f proxima_be

2 - 使用样例

在上文安装指南中，我们详细介绍了 ProximaBE 的安装和启动方法，接下来我们将使用一个样例来介绍 ProximaBE 的服务使用方法。在样例程序中，我们将使用两种示例代码演示创建一个名为 “Plants” 的集合，并尝试插入一批数据，然后查询比对的过程。

1. Http 示例

示例中我们选用 curl 工具来演示 http 协议的访问 ProximaBE 过程，用户也可以使用其它的 http 发送工具来替代。

1.1. 创建Collection

下列示例脚本将创建一个名为 Plants 的集合，有一个名为 “ImageVector” 的向量索引列，以及 “Price” 和 “Description” 两个正排列。

其中索引列的具体承载数据类型为 8 维 float 类型的向量，索引类型为向量图索引。

$ curl -X POST http://127.0.0.1:16001/v1/collection/Plants \
       -d '{"collection_name":"Plants",
          	"forward_column_names":["Price", "Description"],
          	"index_column_params":[{
            	"column_name":"ImageVector",
            	"index_type":"IT_PROXIMA_GRAPH_INDEX",
            	"data_type":"DT_VECTOR_FP32",
            	"dimension": 8}]
           }'

Exec output:

{"code":0,"reason":"Success"}

1.2 查看Collection

创建成功的集合，我们可以直接使用访问查看。

$ curl -X GET http://127.0.0.1:16001/v1/collection/Plants

Exec output:

{"status":{"code":0,"reason":"Success"},"collection":{"uuid":"8e5b79bbdc198cc2dccdde0710f86f45","config":
{"collection_name":"Plants","index_column_params":
[{"data_type":"DT_VECTOR_FP32","dimension":8,"index_type":"IT_PROXIMA_GRAPH_INDEX","column_name":"ImageVector","
extra_params":[]}],"forward_column_names":
["Price","Description"],"max_docs_per_segment":"18446744073709551615"},"status":"CS_SERVING","magic_number":"0"}
}

1.3. 写入文档

下面示例中向 Plants 集合插入了 2 条数据(受限于展示，用户可以自由扩展 rows 字段)，注意rows中的字段值顺序应与 row_meta 中的描述保持一致。

$ curl -X POST http://127.0.0.1:16001/v1/collection/Plants/index \
       -d '{"collection_name":"Plants",
            "row_meta": {
              "forward_column_names":["Price", "Description"],
              "index_column_metas": [{
                "column_name":"ImageVector",
                "data_type":"DT_VECTOR_FP32",
                "dimension":8}]
            },
            "rows":[
              {
                "primary_key":0,
                "operation_type":"OP_INSERT",
                "forward_column_values":{
                  "values":[{"float_value":0.1}, {"string_value":"ginkgo tree with number 0"}]},
                "index_column_values":{
                  "values":[{"string_value":"[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]"}]}
              },
              {
                "primary_key":1,
                "operation_type":"OP_INSERT",
                "forward_column_values":{
                  "values":[{"float_value":1.1}, {"string_value":"ginkgo tree with number 1"}]},
                "index_column_values":{
                  "values":[{"string_value":"[1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8]"}]}
              }
            ]
          }'

Exec output:

{"code":0,"reason":"Success"}

1.4. 查询文档

下面示例代码中展示了我们Knn 查找top1图像的过程，结果应该准确命中第一条文档，用户可以自由设置topk。

$ curl -X POST http://127.0.0.1:16001/v1/collection/Plants/query \
       -d '{"collection_name":"Plants",
            "query_type":"QT_KNN",
            "knn_param": {
              "column_name":"ImageVector",
              "topk":1,
              "matrix": "[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]",
              "batch_count":1,
              "dimension":8,
              "data_type":"DT_VECTOR_FP32"}
           }'

Exec output:

{"status":{"code":0,"reason":"Success"},"results":[{"documents":
[{"score":0,"primary_key":"0","forward_column_values":[{"key":"Price","value":{"float_value":0.1}},
{"key":"Description","value":{"string_value":"ginkgo tree with number
0"}}]}]}],"debug_info":"","latency_us":"636"}

1.5 Collection统计

经过一段时间累积，我们希望能够获取某个集合的详细信息，则可以通过下列示例访问:

$ curl -X GET http://127.0.0.1:16001/v1/collection/Plants/stats

Exec output:

{"status":{"code":0,"reason":"Success"},"collection_stats":{"segment_stats":
[{"state":"SS_WRITING","max_lsn":"0","min_lsn":"0","doc_count":"2","max_doc_id":"1","min_doc_id":"0","segment_id
":0,"segment_path":"","max_timestamp":"0","min_timestamp":"0","index_file_size":"2142208","max_primary_key":"1",
"min_primary_key":"0","index_file_count":"2"}],"collection_name":"Plants","collection_path":"/home/x x x/c
ode/proxima-
se/build/test_dir/Plants","total_doc_count":"2","total_segment_count":"1","total_index_file_size":"22241280","to
tal_index_file_count":"6"}}

1.6 删除Collection

$ curl -X DELETE http://127.0.0.1:16001/v1/collection/Plants

Exec output:

{"code":0,"reason":"Success"}

2. Python 示例

2.1. Python SDK 安装

在安装python sdk之前请确保系统中已安装python3.6以及pip程序。

$ pip install -i https://pypi.antfin-inc.com/simple/ -U pyproximabe==0.1.2

2.2. 连接Client

在具体操作之前，我们需要创建client连接ProximaBE，当前支持http和grpc协议。Client默认为同步请求，同时我们也支持AsyncClient做异步请求。参考代码如下:

from pyproximabe import *

# Init client
client = Client('127.0.0.1', 16000)

2.3. 创建Collection

在下列代码中，我们创建了一个名为Plants的集合。其中一个索引列名为"ImageVector"，顾名思义主要是存放图片向量，以及"Price"和"Description"两个正排列。

创建Collection的参数主要是需要描述索引列的结构，比如索引类型、数据类型、维度等信息。正排列则仅描述列名即可。

# Init index column
index_column = IndexColumnParam(name='ImageVector',
                                dimension=8,
                                data_type=DataType.VECTOR_FP32,
                                index_type=IndexType.PROXIMA_GRAPH_INDEX)
# Init collection config
collection_config = CollectionConfig('Plants',
                                     index_column_params=[index_column],
                                     max_docs_per_segment=0,
                                     forward_column_names=['Price','Description'])
# Create collection
status = client.create_collection(collection_config)

# Check Return
print(status)

Exec output:

success

2.3 查看Collection

# Get collection info
status = client.describe_collection('Plants')

print(status)

Exec output:

(<pyproximabe.core.types.ProximaBeStatus object at 0x7f63b0a41ac8>, CollectionInfo{'collection_config':
CollectionConfig{'collection_name': 'Plants', 'index_column_params': [IndexColumnParam{'name': 'ImageVector',
'dimension': 8, 'index_type': <IndexType.PROXIMA_GRAPH_INDEX: 1>, 'data_type': <DataType.VECTOR_FP32: 23>,
'extra_params': {}}], 'max_docs_per_segment': 18446744073709551615, 'forward_column_names': ['Price',
'Description'], 'repository_config': None}, 'status': <Status.SERVING: 1>, 'uuid':
'dbd57611e812164b7e7b697ea0af684c', 'latest_lsn_context': LsnContext{'lsn': 0, 'context': ''}, 'magic_number':
0})

2.4. 写入文档

名为"plants"的Collection创建完成之后，我们向其中写入100条文档。

发送数据前，我们需要描述发送的数据格式RowMeta，主要供后端校验使用。至于写入模式分两种，可以支持批量，也可以支持单条发送，而写入请求类型支持OperationType.INSERT，OperationType.DELETE，OperationType.UPDATE 三种。

# Set record data format
index_column_meta = WriteRequest.IndexColumnMeta(name='ImageVector',
                                                 data_type=DataType.VECTOR_FP32,
                                                 dimension=8)
row_meta = WriteRequest.RowMeta(index_column_metas=[index_column_meta],
                                forward_column_names=['Price','Description'],
                                forward_column_types=[DataType.FLOAT, DataType.STRING])
# Send 100 records
rows = []
for i in range(0, 100):
  vector = [i+0.1, i+0.2, i+0.3, i+0.4, i+0.5, i+0.6, i+0.7, i+0.8]
  price = i + 0.1
  description = "ginkgo tree with number " + str(i)
  row = WriteRequest.Row(primary_key=i,
                         operation_type=WriteRequest.OperationType.INSERT,
                         index_column_values=[vector],
                         forward_column_values=[price, description])
  rows.append(row)

write_request = WriteRequest(collection_name='Plants',
                             rows=rows,
                             row_meta = row_meta)
status = client.write(write_request)
print(status)

Exec Output:

success

2.5. 查询文档

插入完100条文档到Plants之后，我们挑选5条查询看看，是否能够准确返回对应的图片。

# Query
query_vector = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]
status, knn_res = client.query(collection_name='Plants',
                               column_name='ImageVector',
                               features=query_vector,
                               data_type=DataType.VECTOR_FP32,
                               topk = 5)
print(status)
print(knn_res)

Exec ouput:

success

QueryResponse{'results': [[Document{'primary_key': 0, 'score': 0.0, 'forward_column_values': {'Price':
0.10000000149011612, 'Description': 'ginkgo tree with number 0'}}, Document{'primary_key': 1, 'score': 8.0,
'forward_column_values': {'Price': 1.100000023841858, 'Description': 'ginkgo tree with number 1'}},
Document{'primary_key': 2, 'score': 32.0, 'forward_column_values': {'Price': 2.0999999046325684, 'Description':
'ginkgo tree with number 2'}}, Document{'primary_key': 3, 'score': 72.0, 'forward_column_values': {'Price':
3.0999999046325684, 'Description': 'ginkgo tree with number 3'}}, Document{'primary_key': 4, 'score': 128.0,
'forward_column_values': {'Price': 4.099999904632568, 'Description': 'ginkgo tree with number 4'}}]],
'debug_info': '', 'latency_us': 269}

2.6. Collection统计

我们也可以通过统计接口查看Collection的内部数据和状态。

# Get collection statistics
status, collection_stats = client.stats_collection('Plants')

print(status)
print(collection_stats)

Exec ouput:

success
CollectionStats{'collection_name': 'Plants', 'collection_path': '/home/xxx/code/proxima-
be/build/test_dir/Plants', 'total_doc_count': 100, 'total_segment_count': 1, 'total_index_file_count': 6,
'total_index_file_size': 22241280, 'segment_stats': [SegmentStats{'segment_id': 0, 'state':
<SegmentState.WRITING: 1>, 'doc_count': 100, 'index_file_count': 2, 'index_file_size': 2142208, 'min_doc_id': 0,
'max_doc_id': 99, 'min_primary_key': 0, 'max_primary_key': 99, 'min_timestamp': 0, 'max_timestamp': 0, 'min_lsn': 0, 'max_lsn': 0, 'segment_path': ''}]}

2.7. 删除Collection

实验完关于"plants"的一切之后，我们希望能够删除Collection。

# Drop collection
status = client.drop_collection('Plants')

print(status)

Exec output:

success