Retrieve document contents

Retrieve contents of a document by its ID.

get

Returns the content of the document with the specified ID, along with the index of the latest retrieved chunk. Each call fetches up to 20 chunks. To get more, use the end_chunk value from the response as the start_chunk for the next call.

Path parameters

document_idintegerRequired

The ID of the document to retrieve contents for.

Query parameters

start_chunkintegerOptional

Indicate the starting chunk that you want to retrieve. If not specified, the default value is 0.

end_chunkintegerOptional

Indicate the ending chunk that you want to retrieve. If not specified, the default value is start_chunk + 20.

Responses

200

Content of the document and index of the latest retrieved chunk.

application/json

404

Document not found.

500

Internal server error.

get

GET /api/v1/documents/{document_id}/contents/ HTTP/1.1
Host: api.rememberizer.ai
Accept: */*

{
  "content": "text",
  "end_chunk": 20
}

Example Requests

curl -X GET \
  "https://api.rememberizer.ai/api/v1/documents/12345/contents/?start_chunk=0&end_chunk=20" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Replace YOUR_JWT_TOKEN with your actual JWT token and 12345 with an actual document ID.

const getDocumentContents = async (documentId, startChunk = 0, endChunk = 20) => {
  const url = new URL(`https://api.rememberizer.ai/api/v1/documents/${documentId}/contents/`);
  url.searchParams.append('start_chunk', startChunk);
  url.searchParams.append('end_chunk', endChunk);
  
  const response = await fetch(url.toString(), {
    method: 'GET',
    headers: {
      'Authorization': 'Bearer YOUR_JWT_TOKEN'
    }
  });
  
  const data = await response.json();
  console.log(data);
  
  // If there are more chunks, you can fetch them
  if (data.end_chunk < totalChunks) {
    // Fetch the next set of chunks
    await getDocumentContents(documentId, data.end_chunk, data.end_chunk + 20);
  }
};

getDocumentContents(12345);

Replace YOUR_JWT_TOKEN with your actual JWT token and 12345 with an actual document ID.

import requests

def get_document_contents(document_id, start_chunk=0, end_chunk=20):
    headers = {
        "Authorization": "Bearer YOUR_JWT_TOKEN"
    }
    
    params = {
        "start_chunk": start_chunk,
        "end_chunk": end_chunk
    }
    
    response = requests.get(
        f"https://api.rememberizer.ai/api/v1/documents/{document_id}/contents/",
        headers=headers,
        params=params
    )
    
    data = response.json()
    print(data)
    
    # If there are more chunks, you can fetch them
    # This is a simplistic example - you might want to implement a proper recursion check
    if 'end_chunk' in data and data['end_chunk'] < total_chunks:
        get_document_contents(document_id, data['end_chunk'], data['end_chunk'] + 20)

get_document_contents(12345)

Replace YOUR_JWT_TOKEN with your actual JWT token and 12345 with an actual document ID.

Path Parameters

Parameter

Type

Description

document_id

integer

Required. The ID of the document to retrieve contents for.

Query Parameters

Parameter

Type

Description

start_chunk

integer

The starting chunk index. Default is 0.

end_chunk

integer

The ending chunk index. Default is start_chunk + 20.

Response Format

{
  "content": "The full text content of the document chunks...",
  "end_chunk": 20
}

Error Responses

Status Code

Description

404

Document not found

500

Internal server error

Pagination for Large Documents

For large documents, the content is split into chunks. You can retrieve the full document by making multiple requests:

Make an initial request with start_chunk=0
Use the returned end_chunk value as the start_chunk for the next request
Continue until you have retrieved all chunks

This endpoint returns the raw text content of a document, allowing you to access the full information for detailed processing or analysis.

PreviousRetrieve documents NextRetrieve Slack content

Last updated 2 months ago