Lấy nội dung tài liệu

Retrieve contents of a document by its ID.

get

Returns the content of the document with the specified ID, along with the index of the latest retrieved chunk. Each call fetches up to 20 chunks. To get more, use the end_chunk value from the response as the start_chunk for the next call.

Path parameters

document_idintegerRequired

The ID of the document to retrieve contents for.

Query parameters

start_chunkintegerOptional

Indicate the starting chunk that you want to retrieve. If not specified, the default value is 0.

end_chunkintegerOptional

Indicate the ending chunk that you want to retrieve. If not specified, the default value is start_chunk + 20.

Responses

200

Content of the document and index of the latest retrieved chunk.

application/json

404

Document not found.

500

Internal server error.

get

GET /api/v1/documents/{document_id}/contents/ HTTP/1.1
Host: api.rememberizer.ai
Accept: */*

{
  "content": "text",
  "end_chunk": 20
}

Ví dụ Yêu cầu

curl -X GET \
  "https://api.rememberizer.ai/api/v1/documents/12345/contents/?start_chunk=0&end_chunk=20" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Thay thế YOUR_JWT_TOKEN bằng mã thông báo JWT thực tế của bạn và 12345 bằng ID tài liệu thực tế.

const getDocumentContents = async (documentId, startChunk = 0, endChunk = 20) => {
  const url = new URL(`https://api.rememberizer.ai/api/v1/documents/${documentId}/contents/`);
  url.searchParams.append('start_chunk', startChunk);
  url.searchParams.append('end_chunk', endChunk);
  
  const response = await fetch(url.toString(), {
    method: 'GET',
    headers: {
      'Authorization': 'Bearer YOUR_JWT_TOKEN'
    }
  });
  
  const data = await response.json();
  console.log(data);
  
  // Nếu có nhiều chunk hơn, bạn có thể lấy chúng
  if (data.end_chunk < totalChunks) {
    // Lấy tập hợp chunk tiếp theo
    await getDocumentContents(documentId, data.end_chunk, data.end_chunk + 20);
  }
};

getDocumentContents(12345);

Thay thế YOUR_JWT_TOKEN bằng mã thông báo JWT thực tế của bạn và 12345 bằng ID tài liệu thực tế.

import requests

def get_document_contents(document_id, start_chunk=0, end_chunk=20):
    headers = {
        "Authorization": "Bearer YOUR_JWT_TOKEN"
    }
    
    params = {
        "start_chunk": start_chunk,
        "end_chunk": end_chunk
    }
    
    response = requests.get(
        f"https://api.rememberizer.ai/api/v1/documents/{document_id}/contents/",
        headers=headers,
        params=params
    )
    
    data = response.json()
    print(data)
    
    # Nếu có nhiều chunk hơn, bạn có thể lấy chúng
    # Đây là một ví dụ đơn giản - bạn có thể muốn triển khai một kiểm tra đệ quy đúng
    if 'end_chunk' in data and data['end_chunk'] < total_chunks:
        get_document_contents(document_id, data['end_chunk'], data['end_chunk'] + 20)

get_document_contents(12345)

Thay thế YOUR_JWT_TOKEN bằng mã thông báo JWT thực tế của bạn và 12345 bằng ID tài liệu thực tế.

Tham số Đường dẫn

Tham số

Loại

Mô tả

document_id

số nguyên

Bắt buộc. ID của tài liệu để lấy nội dung.

Tham số truy vấn

Tham số

Loại

Mô tả

start_chunk

số nguyên

Chỉ số khối bắt đầu. Mặc định là 0.

end_chunk

số nguyên

Chỉ số khối kết thúc. Mặc định là start_chunk + 20.

Định dạng Phản hồi

{
  "content": "Nội dung văn bản đầy đủ của các đoạn tài liệu...",
  "end_chunk": 20
}

Phản hồi lỗi

Mã trạng thái

Mô tả

404

Tài liệu không tìm thấy

500

Lỗi máy chủ nội bộ

Phân trang cho Tài liệu Lớn

Đối với các tài liệu lớn, nội dung được chia thành các phần. Bạn có thể lấy toàn bộ tài liệu bằng cách thực hiện nhiều yêu cầu:

Thực hiện một yêu cầu ban đầu với start_chunk=0
Sử dụng giá trị end_chunk được trả về làm start_chunk cho yêu cầu tiếp theo
Tiếp tục cho đến khi bạn đã lấy tất cả các phần

Điểm cuối này trả về nội dung văn bản thô của một tài liệu, cho phép bạn truy cập đầy đủ thông tin để xử lý hoặc phân tích chi tiết.

PreviousLấy tài liệu NextLấy nội dung Slack

Last updated 1 month ago