From 622132ecbf6fa617e76cf960488193686e5c30d3 Mon Sep 17 00:00:00 2001 From: writinwaters <93570324+writinwaters@users.noreply.github.com> Date: Tue, 22 Oct 2024 17:10:23 +0800 Subject: [PATCH] DRAFT: Updated python and http api references (#2973) ### What problem does this PR solve? ### Type of change - [x] Documentation Update --- api/http_api_reference.md | 102 ++++++++++++++++++++---------------- api/python_api_reference.md | 90 +++++++++++++++---------------- 2 files changed, 103 insertions(+), 89 deletions(-) diff --git a/api/http_api_reference.md b/api/http_api_reference.md index 2c11fb55075..bba39fa811b 100644 --- a/api/http_api_reference.md +++ b/api/http_api_reference.md @@ -20,7 +20,7 @@ Creates a dataset. ### Request - Method: POST -- URL: `http://{address}/api/v1/dataset` +- URL: `/api/v1/dataset` - Headers: - `'content-Type: application/json'` - `'Authorization: Bearer {YOUR_API_KEY}'` @@ -163,7 +163,7 @@ Deletes datasets by ID. ### Request - Method: DELETE -- URL: `http://{address}/api/v1/dataset` +- URL: `/api/v1/dataset` - Headers: - `'content-Type: application/json'` - `'Authorization: Bearer {YOUR_API_KEY}'` @@ -219,7 +219,7 @@ Updates configurations for a specified dataset. ### Request - Method: PUT -- URL: `http://{address}/api/v1/dataset/{dataset_id}` +- URL: `/api/v1/dataset/{dataset_id}` - Headers: - `'content-Type: application/json'` - `'Authorization: Bearer {YOUR_API_KEY}'` @@ -243,8 +243,6 @@ curl --request PUT \ --data '{ "name": "test", "embedding_model": "BAAI/bge-zh-v1.5", - "chunk_count": 0, - "document_count": 0, "parse_method": "naive" }' ``` @@ -293,14 +291,12 @@ An error response includes a JSON object like the following: **GET** `/api/v1/dataset?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&name={dataset_name}&id={dataset_id}` -Lists all datasets????? - -Retrieves a list of datasets. +Lists datasets. ### Request - Method: GET -- URL: `http://{address}/api/v1/dataset?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&name={dataset_name}&id={dataset_id}` +- URL: `/api/v1/dataset?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&name={dataset_name}&id={dataset_id}` - Headers: - `'Authorization: Bearer {YOUR_API_KEY}'` @@ -407,10 +403,10 @@ Uploads documents to a specified dataset. - Method: POST - URL: `/api/v1/dataset/{dataset_id}/document` - Headers: - - 'Content-Type: multipart/form-data' + - `'Content-Type: multipart/form-data'` - `'Authorization: Bearer {YOUR_API_KEY}'` - Form: - - 'file=@{FILE_PATH}' + - `'file=@{FILE_PATH}'` #### Request example @@ -425,9 +421,9 @@ curl --request POST \ #### Request parameters - `"dataset_id"`: (*Path parameter*) - The dataset ID. + The ID of the dataset to which the documents will be uploaded. - `"file"`: (*Body parameter*) - The file to upload. + The document???? to upload. ### Response @@ -459,25 +455,25 @@ Updates configurations for a specified document. ### Request - Method: PUT -- URL: `http://{address}/api/v1/dataset/{dataset_id}/document/{document_id}` +- URL: `/api/v1/dataset/{dataset_id}/document/{document_id}` - Headers: - `'content-Type: application/json'` - `'Authorization: Bearer {YOUR_API_KEY}'` - Body: - - `name`:`string` - - `parser_method`:`string` - - `parser_config`:`dict` + - `"name"`:`string` + - `"chunk_method"`:`string` + - `"parser_config"`:`dict` #### Request example ```bash curl --request PUT \ --url http://{address}/api/v1/dataset/{dataset_id}/info/{document_id} \ - --header 'Authorization: Bearer {YOUR_ACCESS TOKEN}' \ + --header 'Authorization: Bearer {YOUR_API_KEY}' \ --header 'Content-Type: application/json' \ --data '{ "name": "manual.txt", - "parser_method": "manual", + "chunk_method": "manual", "parser_config": {"chunk_token_count": 128, "delimiter": "\n!?。;!?", "layout_recognize": true, "task_page_size": 12} }' @@ -485,8 +481,24 @@ curl --request PUT \ #### Request parameters -- `"parser_method"`: (*Body parameter*) - Method used to parse the document. +- `"name"`: (*Body parameter*), `string` +- `"chunk_method"`: (*Body parameter*), `string` + The parsing method to apply to the document. + - `"naive"`: General + - `"manual`: Manual + - `"qa"`: Q&A + - `"table"`: Table + - `"paper"`: Paper + - `"book"`: Book + - `"laws"`: Laws + - `"presentation"`: Presentation + - `"picture"`: Picture + - `"one"`: One + - `"knowledge_graph"`: Knowledge Graph + - `"email"`: Email +- + +### Returns - `"parser_config"`: (*Body parameter*) Configuration object for the parser. @@ -525,7 +537,7 @@ Downloads a document from a specified dataset. ### Request - Method: GET -- URL: `http://{address}/api/v1/dataset/{dataset_id}/document/{document_id}` +- URL: `/api/v1/dataset/{dataset_id}/document/{document_id}` - Headers: - `'Authorization: Bearer {YOUR_API_KEY}'` - Output: @@ -570,7 +582,7 @@ An error response includes a JSON object like the following: **GET** `/api/v1/dataset/{dataset_id}/info?offset={offset}&limit={limit}&orderby={orderby}&desc={desc}&keywords={keywords}&id={document_id}` -Retrieves a list of documents from a specified dataset. +Lists documents in a specified dataset. ### Request @@ -670,7 +682,7 @@ Deletes documents by ID. ### Request - Method: DELETE -- URL: `http://{address}/api/v1/dataset/{dataset_id}/document` +- URL: `/api/v1/dataset/{dataset_id}/document` - Headers: - `'Content-Type: application/json'` - `'Authorization: Bearer {YOUR_API_KEY}'` @@ -724,7 +736,7 @@ Parses documents in a specified dataset. ### Request - Method: POST -- URL: `http://{address}/api/v1/dataset/{dataset_id}/chunk ` +- URL: `/api/v1/dataset/{dataset_id}/chunk ` - Headers: - `'content-Type: application/json'` - 'Authorization: Bearer {YOUR_API_KEY}' @@ -777,7 +789,7 @@ Stops parsing specified documents. ### Request - Method: DELETE -- URL: `http://{address}/api/v1/dataset/{dataset_id}/chunk` +- URL: `/api/v1/dataset/{dataset_id}/chunk` - Headers: - `'content-Type: application/json'` - `'Authorization: Bearer {YOUR_API_KEY}'` @@ -831,7 +843,7 @@ Adds a chunk to a specified document in a specified dataset. ### Request - Method: POST -- URL: `http://{address}/api/v1/dataset/{dataset_id}/document/{document_id}/chunk` +- URL: `/api/v1/dataset/{dataset_id}/document/{document_id}/chunk` - Headers: - `'content-Type: application/json'` - `'Authorization: Bearer {YOUR_API_KEY}'` @@ -896,12 +908,12 @@ An error response includes a JSON object like the following: **GET** `/api/v1/dataset/{dataset_id}/document/{document_id}/chunk?keywords={keywords}&offset={offset}&limit={limit}&id={id}` -Retrieves a list of chunks from a specified document in a specified dataset. +Lists chunks in a specified document. ### Request - Method: GET -- URL: `http://{address}/api/v1/dataset/{dataset_id}/document/{document_id}/chunk?keywords={keywords}&offset={offset}&limit={limit}&id={id}` +- URL: `/api/v1/dataset/{dataset_id}/document/{document_id}/chunk?keywords={keywords}&offset={offset}&limit={limit}&id={id}` - Headers: - `'Authorization: Bearer {YOUR_API_KEY}'` @@ -992,7 +1004,7 @@ Deletes chunks by ID. ### Request - Method: DELETE -- URL: `http://{address}/api/v1/dataset/{dataset_id}/document/{document_id}/chunk` +- URL: `/api/v1/dataset/{dataset_id}/document/{document_id}/chunk` - Headers: - `'content-Type: application/json'` - `'Authorization: Bearer {YOUR_API_KEY}'` @@ -1046,7 +1058,7 @@ Updates content or configurations for a specified chunk. ### Request - Method: PUT -- URL: `http://{address}/api/v1/dataset/{dataset_id}/document/{document_id}/chunk/{chunk_id}` +- URL: `/api/v1/dataset/{dataset_id}/document/{document_id}/chunk/{chunk_id}` - Headers: - `'content-Type: application/json'` - `'Authorization: Bearer {YOUR_API_KEY}'` @@ -1102,12 +1114,12 @@ An error response includes a JSON object like the following: **GET** `/api/v1/retrieval` -Retrieval test of a dataset +Retrieves chunks from specified datasets. ### Request - Method: POST -- URL: `http://{address}/api/v1/retrieval` +- URL: `/api/v1/retrieval` - Headers: - `'content-Type: application/json'` - `'Authorization: Bearer {YOUR_API_KEY}'` @@ -1252,7 +1264,7 @@ Creates a chat assistant. ### Request - Method: POST -- URL: `http://{address}/api/v1/chat` +- URL: `/api/v1/chat` - Headers: - `'content-Type: application/json'` - `'Authorization: Bearer {YOUR_API_KEY}'` @@ -1486,7 +1498,7 @@ Updates configurations for a specified chat assistant. ### Request - Method: PUT -- URL: `http://{address}/api/v1/chat/{chat_id}` +- URL: `/api/v1/chat/{chat_id}` - Headers: - `'content-Type: application/json'` - `'Authorization: Bearer {YOUR_API_KEY}'` @@ -1538,7 +1550,7 @@ Deletes chat assistants by ID. ### Request - Method: DELETE -- URL: `http://{address}/api/v1/chat` +- URL: `/api/v1/chat` - Headers: - `'content-Type: application/json'` - `'Authorization: Bearer {YOUR_API_KEY}'` @@ -1586,16 +1598,16 @@ An error response includes a JSON object like the following: --- -## List chats (INCONSISTENT WITH THE PYTHON API) +## List chats -**GET** `/api/v1/chat?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&name={dataset_name}&id={dataset_id}` +**GET** `/api/v1/chat?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&name={chat_name}&id={chat_id}` -Retrieves a list of chat assistants. +Lists chat assistants. ### Request - Method: GET -- URL: `http://{address}/api/v1/chat?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&name={dataset_name}&id={dataset_id}` +- URL: `/api/v1/chat?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&name={dataset_name}&id={dataset_id}` - Headers: - `'Authorization: Bearer {YOUR_API_KEY}'` @@ -1732,7 +1744,7 @@ Create a chat session. ### Request - Method: POST -- URL: `http://{address}/api/v1/chat/{chat_id}/session` +- URL: `/api/v1/chat/{chat_id}/session` - Headers: - `'content-Type: application/json'` - `'Authorization: Bearer {YOUR_API_KEY}'` @@ -1827,7 +1839,7 @@ Update a chat session ### Request - Method: PUT -- URL: `http://{address}/api/v1/chat/{chat_id}/session/{session_id}` +- URL: `/api/v1/chat/{chat_id}/session/{session_id}` - Headers: - `'content-Type: application/json'` - `'Authorization: Bearer {YOUR_API_KEY}'` @@ -1882,7 +1894,7 @@ Lists sessions associated with a specified????????????? chat assistant. ### Request - Method: GET -- URL: `http://{address}/api/v1/chat/{chat_id}/session?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&name={dataset_name}&id={dataset_id}` +- URL: `/api/v1/chat/{chat_id}/session?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&name={dataset_name}&id={dataset_id}` - Headers: - `'Authorization: Bearer {YOUR_API_KEY}'` @@ -1967,7 +1979,7 @@ Deletes sessions by ID. ### Request - Method: DELETE -- URL: `http://{address}/api/v1/chat/{chat_id}/session` +- URL: `/api/v1/chat/{chat_id}/session` - Headers: - `'content-Type: application/json'` - `'Authorization: Bearer {YOUR_API_KEY}'` @@ -2023,7 +2035,7 @@ Asks a question to start a conversation. ### Request - Method: POST -- URL: `http://{address}/api/v1/chat/{chat_id}/completion` +- URL: `/api/v1/chat/{chat_id}/completion` - Headers: - `'content-Type: application/json'` - `'Authorization: Bearer {YOUR_API_KEY}'` diff --git a/api/python_api_reference.md b/api/python_api_reference.md index 96ab1ef45fb..36af91890b4 100644 --- a/api/python_api_reference.md +++ b/api/python_api_reference.md @@ -17,10 +17,9 @@ RAGFlow.create_dataset( name: str, avatar: str = "", description: str = "", + embedding_model: str = "BAAI/bge-zh-v1.5", language: str = "English", permission: str = "me", - document_count: int = 0, - chunk_count: int = 0, chunk_method: str = "naive", parser_config: DataSet.ParserConfig = None ) -> DataSet @@ -143,7 +142,7 @@ RAGFlow.list_datasets( ) -> list[DataSet] ``` -Retrieves a list of datasets. +Lists datasets. ### Parameters @@ -296,7 +295,7 @@ Updates configurations for the current document. A dictionary representing the attributes to update, with the following keys: -- `"name"`: `str` The name of the document to update. +- `"display_name"`: `str` The name of the document to update. - `"parser_config"`: `dict[str, Any]` The parsing configuration for the document: - `"chunk_token_count"`: Defaults to `128`. - `"layout_recognize"`: Defaults to `True`. @@ -370,7 +369,7 @@ print(doc) Dataset.list_documents(id:str =None, keywords: str=None, offset: int=0, limit:int = 1024,order_by:str = "create_time", desc: bool = True) -> list[Document] ``` -Retrieves a list of documents from the current dataset. +Lists documents in the current dataset. ### Parameters @@ -388,7 +387,7 @@ The starting index for the documents to retrieve. Typically used in confunction #### limit: `int` -The maximum number of documents to retrieve. Defaults to `1024`. A value of `-1` indicates that all documents should be returned. +The maximum number of documents to retrieve. Defaults to `1024`. #### orderby: `str` @@ -412,7 +411,7 @@ A `Document` object contains the following attributes: - `name`: The document name. Defaults to `""`. - `thumbnail`: The thumbnail image of the document. Defaults to `None`. - `knowledgebase_id`: The dataset ID associated with the document. Defaults to `None`. -- `chunk_method` The chunk method name. Defaults to `""`. ?????naive?????? +- `chunk_method` The chunk method name. Defaults to `"naive"`. - `parser_config`: `ParserConfig` Configuration object for the parser. Defaults to `{"pages": [[1, 1000000]]}`. - `source_type`: The source type of the document. Defaults to `"local"`. - `type`: Type or category of the document. Defaults to `""`. Reserved for future use. @@ -425,7 +424,7 @@ A `Document` object contains the following attributes: - `process_begin_at`: `datetime` The start time of document processing. Defaults to `None`. - `process_duation`: `float` Duration of the processing in seconds. Defaults to `0.0`. - `run`: `str` The document's processing status: - - `"0"`: UNSTART (default) + - `"0"`: UNSTART (default) ????????? - `"1"`: RUNNING - `"2"`: CANCEL - `"3"`: DONE @@ -506,9 +505,9 @@ The IDs of the documents to parse. rag_object = RAGFlow(api_key="", base_url="http://:9380") dataset = rag_object.create_dataset(name="dataset_name") documents = [ - {'name': 'test1.txt', 'blob': open('./test_data/test1.txt',"rb").read()}, - {'name': 'test2.txt', 'blob': open('./test_data/test2.txt',"rb").read()}, - {'name': 'test3.txt', 'blob': open('./test_data/test3.txt',"rb").read()} + {'display_name': 'test1.txt', 'blob': open('./test_data/test1.txt',"rb").read()}, + {'display_name': 'test2.txt', 'blob': open('./test_data/test2.txt',"rb").read()}, + {'display_name': 'test3.txt', 'blob': open('./test_data/test3.txt',"rb").read()} ] dataset.upload_documents(documents) documents = dataset.list_documents(keywords="test") @@ -546,9 +545,9 @@ The IDs of the documents for which parsing should be stopped. rag_object = RAGFlow(api_key="", base_url="http://:9380") dataset = rag_object.create_dataset(name="dataset_name") documents = [ - {'name': 'test1.txt', 'blob': open('./test_data/test1.txt',"rb").read()}, - {'name': 'test2.txt', 'blob': open('./test_data/test2.txt',"rb").read()}, - {'name': 'test3.txt', 'blob': open('./test_data/test3.txt',"rb").read()} + {'display_name': 'test1.txt', 'blob': open('./test_data/test1.txt',"rb").read()}, + {'display_name': 'test2.txt', 'blob': open('./test_data/test2.txt',"rb").read()}, + {'display_name': 'test3.txt', 'blob': open('./test_data/test3.txt',"rb").read()} ] dataset.upload_documents(documents) documents = dataset.list_documents(keywords="test") @@ -566,7 +565,7 @@ print("Async bulk parsing cancelled.") ## Add chunk ```python -Document.add_chunk(content:str) -> Chunk ????????????????????? +Document.add_chunk(content:str, important_keywords:list[str] = []) -> Chunk ``` Adds a chunk to the current document. @@ -577,7 +576,7 @@ Adds a chunk to the current document. The text content of the chunk. -#### important_keywords: `list[str]` ?????????????????????? +#### important_keywords: `list[str]` The key terms or phrases to tag with the chunk. @@ -588,7 +587,7 @@ The key terms or phrases to tag with the chunk. A `Chunk` object contains the following attributes: -- `id`: `str` +- `id`: `str` - `content`: `str` Content of the chunk. - `important_keywords`: `list[str]` A list of key terms or phrases to tag with the chunk. - `create_time`: `str` The time when the chunk was created (added to the document). @@ -596,9 +595,9 @@ A `Chunk` object contains the following attributes: - `knowledgebase_id`: `str` The ID of the associated dataset. - `document_name`: `str` The name of the associated document. - `document_id`: `str` The ID of the associated document. -- `available`: `int`???? The chunk's availability status in the dataset. Value options: - - `0`: Unavailable - - `1`: Available +- `available`: `bool` The chunk's availability status in the dataset. Value options: + - `False`: Unavailable + - `True`: Available ### Examples @@ -619,26 +618,26 @@ chunk = doc.add_chunk(content="xxxxxxx") ## List chunks ```python -Document.list_chunks(keywords: str = None, offset: int = 0, limit: int = -1, id : str = None) -> list[Chunk] +Document.list_chunks(keywords: str = None, offset: int = 1, limit: int = 1024, id : str = None) -> list[Chunk] ``` -Retrieves a list of chunks from the current document. +Lists chunks in the current document. ### Parameters -#### keywords: `str` +#### keywords: `str` The keywords used to match chunk content. Defaults to `None` #### offset: `int` -The starting index for the chunks to retrieve. Defaults to `1`?????? +The starting index for the chunks to retrieve. Defaults to `1`. -#### limit +#### limit: `int` -The maximum number of chunks to retrieve. Default: `30`????????? +The maximum number of chunks to retrieve. Default: `1024` -#### id +#### id: `str` The ID of the chunk to retrieve. Default: `None` @@ -713,9 +712,9 @@ A dictionary representing the attributes to update, with the following keys: - `"content"`: `str` Content of the chunk. - `"important_keywords"`: `list[str]` A list of key terms or phrases to tag with the chunk. -- `"available"`: `int` The chunk's availability status in the dataset. Value options: - - `0`: Unavailable - - `1`: Available +- `"available"`: `bool` The chunk's availability status in the dataset. Value options: + - `False`: Unavailable + - `True`: Available ### Returns @@ -741,10 +740,10 @@ chunk.update({"content":"sdfx..."}) ## Retrieve chunks ```python -RAGFlow.retrieve(question:str="", datasets:list[str]=None, document=list[str]=None, offset:int=1, limit:int=30, similarity_threshold:float=0.2, vector_similarity_weight:float=0.3, top_k:int=1024,rerank_id:str=None,keyword:bool=False,higlight:bool=False) -> list[Chunk] +RAGFlow.retrieve(question:str="", datasets:list[str]=None, document=list[str]=None, offset:int=1, limit:int=1024, similarity_threshold:float=0.2, vector_similarity_weight:float=0.3, top_k:int=1024,rerank_id:str=None,keyword:bool=False,higlight:bool=False) -> list[Chunk] ``` -??????? +Retrieves chunks from specified datasets. ### Parameters @@ -752,21 +751,21 @@ RAGFlow.retrieve(question:str="", datasets:list[str]=None, document=list[str]=No The user query or query keywords. Defaults to `""`. -#### datasets: `list[str]`, *Required*????? +#### datasets: `list[str]`, *Required* The datasets to search from. #### document: `list[str]` -The documents to search from. `None` means no limitation. Defaults to `None`. +The documents to search from. Defaults to `None`. #### offset: `int` -The starting index for the documents to retrieve. Defaults to `0`??????. +The starting index for the documents to retrieve. Defaults to `1`. #### limit: `int` -The maximum number of chunks to retrieve. Defaults to `6`.??????????????? +The maximum number of chunks to retrieve. Defaults to `1024`. #### Similarity_threshold: `float` @@ -786,14 +785,17 @@ The ID of the rerank model. Defaults to `None`. #### keyword: `bool` -Indicates whether keyword-based matching is enabled: +Indicates whether to enable keyword-based matching: -- `True`: Enabled. -- `False`: Disabled (default). +- `True`: Enable keyword-based matching. +- `False`: Disable keyword-based matching (default). #### highlight: `bool` -Specifying whether to enable highlighting of matched terms in the results (True) or not (False). +Specifying whether to enable highlighting of matched terms in the results: + +- `True`: Enable highlighting of matched terms. +- `False`: Disable highlighting of matched terms (default). ### Returns @@ -849,15 +851,15 @@ Creates a chat assistant. The following shows the attributes of a `Chat` object: -#### name: `str`, *Required*???????? +#### name: `str`, *Required* -The name of the chat assistant. Defaults to `"assistant"`. +The name of the chat assistant.. #### avatar: `str` Base64 encoding of the avatar. Defaults to `""`. -#### knowledgebases: `list[str]` +#### knowledgebases: `list[str]` The IDs of the associated datasets. Defaults to `[""]`. @@ -1016,7 +1018,7 @@ RAGFlow.list_chats( ) -> list[Chat] ``` -Retrieves a list of chat assistants. +Lists chat assistants. ### Parameters