Upload files to a Vector Store
Upload file content to Vector Store with batch operations
Upload files to a vector store.
The ID of the vector store.
The API key for authentication.
The files to upload.
POST /api/v1/vector-stores/{vector-store-id}/documents/upload HTTP/1.1
Host: api.rememberizer.ai
x-api-key: text
Content-Type: multipart/form-data
Accept: */*
Content-Length: 20
{
"files": [
"binary"
]
}
{
"documents": [
{
"id": 1,
"name": "text"
}
],
"errors": [
{
"file": "text",
"error": "text"
}
]
}
Example Requests
curl -X POST \
https://api.rememberizer.ai/api/v1/vector-stores/vs_abc123/documents/upload \
-H "x-api-key: YOUR_API_KEY" \
-F "files=@/path/to/document1.pdf" \
-F "files=@/path/to/document2.docx"
Path Parameters
vector-store-id
string
Required. The ID of the vector store to upload files to.
Request Body
This endpoint accepts a multipart/form-data
request with one or more files in the files
field.
Response Format
{
"documents": [
{
"id": 1234,
"name": "document1.pdf",
"type": "application/pdf",
"size": 250000,
"status": "processing",
"created": "2023-06-15T10:15:00Z",
"vector_store": "vs_abc123"
},
{
"id": 1235,
"name": "document2.docx",
"type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"size": 180000,
"status": "processing",
"created": "2023-06-15T10:15:00Z",
"vector_store": "vs_abc123"
}
],
"errors": []
}
If some files fail to upload, they will be listed in the errors
array:
{
"documents": [
{
"id": 1234,
"name": "document1.pdf",
"type": "application/pdf",
"size": 250000,
"status": "processing",
"created": "2023-06-15T10:15:00Z",
"vector_store": "vs_abc123"
}
],
"errors": [
{
"file": "document2.docx",
"error": "File format not supported"
}
]
}
Authentication
This endpoint requires authentication using an API key in the x-api-key
header.
Supported File Formats
PDF (
.pdf
)Microsoft Word (
.doc
,.docx
)Microsoft Excel (
.xls
,.xlsx
)Microsoft PowerPoint (
.ppt
,.pptx
)Text files (
.txt
)Markdown (
.md
)JSON (
.json
)HTML (
.html
,.htm
)
File Size Limits
Individual file size limit: 50MB
Total request size limit: 100MB
Maximum number of files per request: 20
Error Responses
400
Bad Request - No files provided or invalid request format
401
Unauthorized - Invalid or missing API key
404
Not Found - Vector Store not found
413
Payload Too Large - Files exceed size limit
415
Unsupported Media Type - File format not supported
500
Internal Server Error
207
Multi-Status - Some files were uploaded successfully, but others failed
Processing Status
Files are initially accepted with a status of processing
. You can check the processing status of the documents using the Get a List of Documents in a Vector Store endpoint. Final status will be one of:
done
: Document was successfully processederror
: An error occurred during processingprocessing
: Document is still being processed
Processing time depends on file size and complexity. Typical processing time is between 30 seconds to 5 minutes per document.
Batch Operations
For efficiently uploading multiple files to your Vector Store, Rememberizer supports batch operations. This approach helps optimize performance when dealing with large numbers of documents.
Batch Upload Implementation
import os
import requests
import time
import concurrent.futures
from pathlib import Path
def batch_upload_to_vector_store(vector_store_id, folder_path, batch_size=5, file_types=None):
"""
Upload all files from a directory to a Vector Store in batches
Args:
vector_store_id: ID of the vector store
folder_path: Path to folder containing files to upload
batch_size: Number of files to upload in each batch
file_types: Optional list of file extensions to filter by (e.g., ['.pdf', '.docx'])
Returns:
List of upload results
"""
api_key = "YOUR_API_KEY"
headers = {"x-api-key": api_key}
# Get list of files in directory
files = []
for entry in os.scandir(folder_path):
if entry.is_file():
file_path = Path(entry.path)
# Filter by file extension if specified
if file_types is None or file_path.suffix.lower() in file_types:
files.append(file_path)
print(f"Found {len(files)} files to upload")
results = []
# Process files in batches
for i in range(0, len(files), batch_size):
batch = files[i:i+batch_size]
print(f"Processing batch {i//batch_size + 1}/{(len(files) + batch_size - 1)//batch_size}: {len(batch)} files")
# Upload batch
upload_files = []
for file_path in batch:
upload_files.append(('files', (file_path.name, open(file_path, 'rb'))))
try:
response = requests.post(
f"https://api.rememberizer.ai/api/v1/vector-stores/{vector_store_id}/documents/upload",
headers=headers,
files=upload_files
)
# Close all file handles
for _, (_, file_obj) in upload_files:
file_obj.close()
if response.status_code in (200, 201, 207):
batch_result = response.json()
results.append(batch_result)
print(f"Successfully uploaded batch - {len(batch_result.get('documents', []))} documents processed")
# Check for errors
if batch_result.get('errors') and len(batch_result['errors']) > 0:
print(f"Errors encountered: {len(batch_result['errors'])}")
for error in batch_result['errors']:
print(f"- {error['file']}: {error['error']}")
else:
print(f"Batch upload failed with status code {response.status_code}: {response.text}")
results.append({"error": f"Batch failed: {response.text}"})
except Exception as e:
print(f"Exception during batch upload: {str(e)}")
results.append({"error": str(e)})
# Close any remaining file handles in case of exception
for _, (_, file_obj) in upload_files:
try:
file_obj.close()
except:
pass
# Rate limiting - pause between batches
if i + batch_size < len(files):
print("Pausing before next batch...")
time.sleep(2)
return results
# Example usage
results = batch_upload_to_vector_store(
'vs_abc123',
'/path/to/documents/folder',
batch_size=5,
file_types=['.pdf', '.docx', '.txt']
)
Batch Upload Best Practices
To optimize performance and reliability when uploading large volumes of files:
Manage Batch Size: Keep batch sizes between 5-10 files for optimal performance. Too many files in a single request increases the risk of timeouts.
Implement Rate Limiting: Add delays between batches (2-3 seconds recommended) to avoid hitting API rate limits.
Add Error Retry Logic: For production systems, implement retry logic for failed uploads with exponential backoff.
Validate File Types: Pre-filter files to ensure they're supported types before attempting upload.
Monitor Batch Progress: For user-facing applications, provide progress feedback on batch operations.
Handle Partial Success: The API may return a 207 status code for partial success. Always check individual document statuses.
Clean Up Resources: Ensure all file handles are properly closed, especially when errors occur.
Parallelize Wisely: For very large uploads (thousands of files), consider multiple concurrent batch processes targeting different vector stores, then combine results later if needed.
Implement Checksums: For critical data, verify file integrity before and after upload with checksums.
Log Comprehensive Results: Maintain detailed logs of all upload operations for troubleshooting.
By following these best practices, you can efficiently manage large-scale document ingestion into your vector stores.
Last updated