# Upload files to a Vector Store

{% openapi src="/files/7T1Jx6BU3fZ2U5LsImW1" path="/vector-stores/{vector-store-id}/documents/upload" method="post" %}
[rememberizer\_openapi.yml](https://2952947711-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FyNqpTh7Mh66N0RnO0k24%2Fuploads%2Fgit-blob-77b6137eeb641262ec8e531c78123c02b825b865%2Frememberizer_openapi.yml?alt=media\&token=cbad765b-1613-4222-b591-9ae17a3b7cfa)
{% endopenapi %}

## Example Requests

{% tabs %}
{% tab title="cURL" %}

```bash
curl -X POST \
  https://api.rememberizer.ai/api/v1/vector-stores/vs_abc123/documents/upload \
  -H "x-api-key: YOUR_API_KEY" \
  -F "files=@/path/to/document1.pdf" \
  -F "files=@/path/to/document2.docx"
```

{% hint style="info" %}
Replace `YOUR_API_KEY` with your actual Vector Store API key, `vs_abc123` with your Vector Store ID, and provide the paths to your local files.
{% endhint %}
{% endtab %}

{% tab title="JavaScript" %}

```javascript
const uploadFiles = async (vectorStoreId, files) => {
  const formData = new FormData();
  
  // Add multiple files to the form data
  for (const file of files) {
    formData.append('files', file);
  }
  
  const response = await fetch(`https://api.rememberizer.ai/api/v1/vector-stores/${vectorStoreId}/documents/upload`, {
    method: 'POST',
    headers: {
      'x-api-key': 'YOUR_API_KEY'
      // Note: Do not set Content-Type header, it will be set automatically with the correct boundary
    },
    body: formData
  });
  
  const data = await response.json();
  console.log(data);
};

// Example usage with file input element
const fileInput = document.getElementById('fileInput');
uploadFiles('vs_abc123', fileInput.files);
```

{% hint style="info" %}
Replace `YOUR_API_KEY` with your actual Vector Store API key and `vs_abc123` with your Vector Store ID.
{% endhint %}
{% endtab %}

{% tab title="Python" %}

```python
import requests

def upload_files(vector_store_id, file_paths):
    headers = {
        "x-api-key": "YOUR_API_KEY"
    }
    
    files = [('files', (file_path.split('/')[-1], open(file_path, 'rb'))) for file_path in file_paths]
    
    response = requests.post(
        f"https://api.rememberizer.ai/api/v1/vector-stores/{vector_store_id}/documents/upload",
        headers=headers,
        files=files
    )
    
    data = response.json()
    print(data)

upload_files('vs_abc123', ['/path/to/document1.pdf', '/path/to/document2.docx'])
```

{% hint style="info" %}
Replace `YOUR_API_KEY` with your actual Vector Store API key, `vs_abc123` with your Vector Store ID, and provide the paths to your local files.
{% endhint %}
{% endtab %}

{% tab title="Ruby" %}

```ruby
require 'net/http'
require 'uri'
require 'json'

def upload_files(vector_store_id, file_paths)
  uri = URI("https://api.rememberizer.ai/api/v1/vector-stores/#{vector_store_id}/documents/upload")
  
  # Create a new HTTP object
  http = Net::HTTP.new(uri.host, uri.port)
  http.use_ssl = true
  
  # Create a multipart-form request
  request = Net::HTTP::Post.new(uri)
  request['x-api-key'] = 'YOUR_API_KEY'
  
  # Create a multipart boundary
  boundary = "RubyFormBoundary#{rand(1000000)}"
  request['Content-Type'] = "multipart/form-data; boundary=#{boundary}"
  
  # Build the request body
  body = []
  file_paths.each do |file_path|
    file_name = File.basename(file_path)
    file_content = File.read(file_path, mode: 'rb')
    
    body << "--#{boundary}\r\n"
    body << "Content-Disposition: form-data; name=\"files\"; filename=\"#{file_name}\"\r\n"
    body << "Content-Type: #{get_content_type(file_name)}\r\n\r\n"
    body << file_content
    body << "\r\n"
  end
  body << "--#{boundary}--\r\n"
  
  request.body = body.join
  
  # Send the request
  response = http.request(request)
  
  # Parse and return the response
  JSON.parse(response.body)
end

# Helper method to determine content type
def get_content_type(filename)
  ext = File.extname(filename).downcase
  case ext
  when '.pdf'  then 'application/pdf'
  when '.doc'  then 'application/msword'
  when '.docx' then 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
  when '.txt'  then 'text/plain'
  when '.md'   then 'text/markdown'
  when '.json' then 'application/json'
  else 'application/octet-stream'
  end
end

# Example usage
result = upload_files('vs_abc123', ['/path/to/document1.pdf', '/path/to/document2.docx'])
puts result
```

{% hint style="info" %}
Replace `YOUR_API_KEY` with your actual Vector Store API key, `vs_abc123` with your Vector Store ID, and provide the paths to your local files.
{% endhint %}
{% endtab %}
{% endtabs %}

## Path Parameters

| Parameter       | Type   | Description                                                  |
| --------------- | ------ | ------------------------------------------------------------ |
| vector-store-id | string | **Required.** The ID of the vector store to upload files to. |

## Request Body

This endpoint accepts a `multipart/form-data` request with one or more files in the `files` field.

## Response Format

```json
{
  "documents": [
    {
      "id": 1234,
      "name": "document1.pdf",
      "type": "application/pdf",
      "size": 250000,
      "status": "processing",
      "created": "2023-06-15T10:15:00Z",
      "vector_store": "vs_abc123"
    },
    {
      "id": 1235,
      "name": "document2.docx",
      "type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
      "size": 180000,
      "status": "processing",
      "created": "2023-06-15T10:15:00Z",
      "vector_store": "vs_abc123"
    }
  ],
  "errors": []
}
```

If some files fail to upload, they will be listed in the `errors` array:

```json
{
  "documents": [
    {
      "id": 1234,
      "name": "document1.pdf",
      "type": "application/pdf",
      "size": 250000,
      "status": "processing",
      "created": "2023-06-15T10:15:00Z",
      "vector_store": "vs_abc123"
    }
  ],
  "errors": [
    {
      "file": "document2.docx",
      "error": "File format not supported"
    }
  ]
}
```

## Authentication

This endpoint requires authentication using an API key in the `x-api-key` header.

## Supported File Formats

* PDF (`.pdf`)
* Microsoft Word (`.doc`, `.docx`)
* Microsoft Excel (`.xls`, `.xlsx`)
* Microsoft PowerPoint (`.ppt`, `.pptx`)
* Text files (`.txt`)
* Markdown (`.md`)
* JSON (`.json`)
* HTML (`.html`, `.htm`)

## File Size Limits

* Individual file size limit: 50MB
* Total request size limit: 100MB
* Maximum number of files per request: 20

## Error Responses

| Status Code | Description                                                             |
| ----------- | ----------------------------------------------------------------------- |
| 400         | Bad Request - No files provided or invalid request format               |
| 401         | Unauthorized - Invalid or missing API key                               |
| 404         | Not Found - Vector Store not found                                      |
| 413         | Payload Too Large - Files exceed size limit                             |
| 415         | Unsupported Media Type - File format not supported                      |
| 500         | Internal Server Error                                                   |
| 207         | Multi-Status - Some files were uploaded successfully, but others failed |

## Processing Status

Files are initially accepted with a status of `processing`. You can check the processing status of the documents using the [Get a List of Documents in a Vector Store](/developer-resources/api-docs/vector-store/get-a-list-of-documents-in-a-vector-store.md) endpoint. Final status will be one of:

* `done`: Document was successfully processed
* `error`: An error occurred during processing
* `processing`: Document is still being processed

Processing time depends on file size and complexity. Typical processing time is between 30 seconds to 5 minutes per document.

## Batch Operations

For efficiently uploading multiple files to your Vector Store, Rememberizer supports batch operations. This approach helps optimize performance when dealing with large numbers of documents.

### Batch Upload Implementation

{% tabs %}
{% tab title="Python" %}

```python
import os
import requests
import time
import concurrent.futures
from pathlib import Path

def batch_upload_to_vector_store(vector_store_id, folder_path, batch_size=5, file_types=None):
    """
    Upload all files from a directory to a Vector Store in batches
    
    Args:
        vector_store_id: ID of the vector store
        folder_path: Path to folder containing files to upload
        batch_size: Number of files to upload in each batch
        file_types: Optional list of file extensions to filter by (e.g., ['.pdf', '.docx'])
        
    Returns:
        List of upload results
    """
    api_key = "YOUR_API_KEY"
    headers = {"x-api-key": api_key}
    
    # Get list of files in directory
    files = []
    for entry in os.scandir(folder_path):
        if entry.is_file():
            file_path = Path(entry.path)
            # Filter by file extension if specified
            if file_types is None or file_path.suffix.lower() in file_types:
                files.append(file_path)
    
    print(f"Found {len(files)} files to upload")
    results = []
    
    # Process files in batches
    for i in range(0, len(files), batch_size):
        batch = files[i:i+batch_size]
        print(f"Processing batch {i//batch_size + 1}/{(len(files) + batch_size - 1)//batch_size}: {len(batch)} files")
        
        # Upload batch
        upload_files = []
        for file_path in batch:
            upload_files.append(('files', (file_path.name, open(file_path, 'rb'))))
        
        try:
            response = requests.post(
                f"https://api.rememberizer.ai/api/v1/vector-stores/{vector_store_id}/documents/upload",
                headers=headers,
                files=upload_files
            )
            
            # Close all file handles
            for _, (_, file_obj) in upload_files:
                file_obj.close()
            
            if response.status_code in (200, 201, 207):
                batch_result = response.json()
                results.append(batch_result)
                print(f"Successfully uploaded batch - {len(batch_result.get('documents', []))} documents processed")
                
                # Check for errors
                if batch_result.get('errors') and len(batch_result['errors']) > 0:
                    print(f"Errors encountered: {len(batch_result['errors'])}")
                    for error in batch_result['errors']:
                        print(f"- {error['file']}: {error['error']}")
            else:
                print(f"Batch upload failed with status code {response.status_code}: {response.text}")
                results.append({"error": f"Batch failed: {response.text}"})
                
        except Exception as e:
            print(f"Exception during batch upload: {str(e)}")
            results.append({"error": str(e)})
            
            # Close any remaining file handles in case of exception
            for _, (_, file_obj) in upload_files:
                try:
                    file_obj.close()
                except:
                    pass
        
        # Rate limiting - pause between batches
        if i + batch_size < len(files):
            print("Pausing before next batch...")
            time.sleep(2)
    
    return results

# Example usage
results = batch_upload_to_vector_store(
    'vs_abc123',
    '/path/to/documents/folder',
    batch_size=5,
    file_types=['.pdf', '.docx', '.txt']
)
```

{% endtab %}

{% tab title="JavaScript" %}

```javascript
/**
 * Upload files to a Vector Store in batches
 * 
 * @param {string} vectorStoreId - ID of the Vector Store
 * @param {FileList|File[]} files - Files to upload
 * @param {Object} options - Configuration options
 * @returns {Promise<Array>} - List of upload results
 */
async function batchUploadToVectorStore(vectorStoreId, files, options = {}) {
  const {
    batchSize = 5,
    delayBetweenBatches = 2000,
    onProgress = null
  } = options;
  
  const apiKey = 'YOUR_API_KEY';
  const results = [];
  const fileList = Array.from(files);
  const totalBatches = Math.ceil(fileList.length / batchSize);
  
  console.log(`Preparing to upload ${fileList.length} files in ${totalBatches} batches`);
  
  // Process files in batches
  for (let i = 0; i < fileList.length; i += batchSize) {
    const batch = fileList.slice(i, i + batchSize);
    const batchNumber = Math.floor(i / batchSize) + 1;
    
    console.log(`Processing batch ${batchNumber}/${totalBatches}: ${batch.length} files`);
    
    if (onProgress) {
      onProgress({
        currentBatch: batchNumber,
        totalBatches: totalBatches,
        filesInBatch: batch.length,
        totalFiles: fileList.length,
        completedFiles: i
      });
    }
    
    // Create FormData for this batch
    const formData = new FormData();
    batch.forEach(file => {
      formData.append('files', file);
    });
    
    try {
      const response = await fetch(
        `https://api.rememberizer.ai/api/v1/vector-stores/${vectorStoreId}/documents/upload`,
        {
          method: 'POST',
          headers: {
            'x-api-key': apiKey
          },
          body: formData
        }
      );
      
      if (response.ok) {
        const batchResult = await response.json();
        results.push(batchResult);
        
        console.log(`Successfully uploaded batch - ${batchResult.documents?.length || 0} documents processed`);
        
        // Check for errors
        if (batchResult.errors && batchResult.errors.length > 0) {
          console.warn(`Errors encountered: ${batchResult.errors.length}`);
          batchResult.errors.forEach(error => {
            console.warn(`- ${error.file}: ${error.error}`);
          });
        }
      } else {
        console.error(`Batch upload failed with status ${response.status}: ${await response.text()}`);
        results.push({ error: `Batch failed with status: ${response.status}` });
      }
    } catch (error) {
      console.error(`Exception during batch upload: ${error.message}`);
      results.push({ error: error.message });
    }
    
    // Add delay between batches to avoid rate limiting
    if (i + batchSize < fileList.length) {
      console.log(`Pausing for ${delayBetweenBatches}ms before next batch...`);
      await new Promise(resolve => setTimeout(resolve, delayBetweenBatches));
    }
  }
  
  console.log(`Upload complete. Processed ${fileList.length} files.`);
  return results;
}

// Example usage with file input element
document.getElementById('upload-button').addEventListener('click', async () => {
  const fileInput = document.getElementById('file-input');
  const vectorStoreId = 'vs_abc123';
  
  const progressBar = document.getElementById('progress-bar');
  
  try {
    const results = await batchUploadToVectorStore(vectorStoreId, fileInput.files, {
      batchSize: 5,
      onProgress: (progress) => {
        // Update progress UI
        const percentage = Math.round((progress.completedFiles / progress.totalFiles) * 100);
        progressBar.style.width = `${percentage}%`;
        progressBar.textContent = `${percentage}% (Batch ${progress.currentBatch}/${progress.totalBatches})`;
      }
    });
    
    console.log('Complete upload results:', results);
  } catch (error) {
    console.error('Upload failed:', error);
  }
});
```

{% endtab %}

{% tab title="Ruby" %}

```ruby
require 'net/http'
require 'uri'
require 'json'
require 'mime/types'

# Upload files to a Vector Store in batches
#
# @param vector_store_id [String] ID of the Vector Store
# @param folder_path [String] Path to folder containing files to upload
# @param batch_size [Integer] Number of files to upload in each batch
# @param file_types [Array<String>] Optional array of file extensions to filter by
# @param delay_between_batches [Float] Seconds to wait between batches
# @return [Array] List of upload results
def batch_upload_to_vector_store(vector_store_id, folder_path, batch_size: 5, file_types: nil, delay_between_batches: 2.0)
  api_key = 'YOUR_API_KEY'
  results = []
  
  # Get list of files in directory
  files = Dir.entries(folder_path)
    .select { |f| File.file?(File.join(folder_path, f)) }
    .select { |f| file_types.nil? || file_types.include?(File.extname(f).downcase) }
    .map { |f| File.join(folder_path, f) }
  
  puts "Found #{files.count} files to upload"
  total_batches = (files.count.to_f / batch_size).ceil
  
  # Process files in batches
  files.each_slice(batch_size).with_index do |batch, batch_index|
    puts "Processing batch #{batch_index + 1}/#{total_batches}: #{batch.count} files"
    
    # Prepare the HTTP request
    uri = URI("https://api.rememberizer.ai/api/v1/vector-stores/#{vector_store_id}/documents/upload")
    request = Net::HTTP::Post.new(uri)
    request['x-api-key'] = api_key
    
    # Create a multipart form boundary
    boundary = "RubyBoundary#{rand(1000000)}"
    request['Content-Type'] = "multipart/form-data; boundary=#{boundary}"
    
    # Build the request body
    body = []
    batch.each do |file_path|
      file_name = File.basename(file_path)
      mime_type = MIME::Types.type_for(file_path).first&.content_type || 'application/octet-stream'
      
      begin
        file_content = File.binread(file_path)
        
        body << "--#{boundary}\r\n"
        body << "Content-Disposition: form-data; name=\"files\"; filename=\"#{file_name}\"\r\n"
        body << "Content-Type: #{mime_type}\r\n\r\n"
        body << file_content
        body << "\r\n"
      rescue => e
        puts "Error reading file #{file_path}: #{e.message}"
      end
    end
    body << "--#{boundary}--\r\n"
    
    request.body = body.join
    
    # Send the request
    begin
      http = Net::HTTP.new(uri.host, uri.port)
      http.use_ssl = true
      response = http.request(request)
      
      if response.code.to_i == 200 || response.code.to_i == 201 || response.code.to_i == 207
        batch_result = JSON.parse(response.body)
        results << batch_result
        
        puts "Successfully uploaded batch - #{batch_result['documents']&.count || 0} documents processed"
        
        # Check for errors
        if batch_result['errors'] && !batch_result['errors'].empty?
          puts "Errors encountered: #{batch_result['errors'].count}"
          batch_result['errors'].each do |error|
            puts "- #{error['file']}: #{error['error']}"
          end
        end
      else
        puts "Batch upload failed with status code #{response.code}: #{response.body}"
        results << { "error" => "Batch failed: #{response.body}" }
      end
    rescue => e
      puts "Exception during batch upload: #{e.message}"
      results << { "error" => e.message }
    end
    
    # Rate limiting - pause between batches
    if batch_index < total_batches - 1
      puts "Pausing for #{delay_between_batches} seconds before next batch..."
      sleep(delay_between_batches)
    end
  end
  
  puts "Upload complete. Processed #{files.count} files."
  results
end

# Example usage
results = batch_upload_to_vector_store(
  'vs_abc123',
  '/path/to/documents/folder',
  batch_size: 5,
  file_types: ['.pdf', '.docx', '.txt'],
  delay_between_batches: 2.0
)
```

{% endtab %}
{% endtabs %}

### Batch Upload Best Practices

To optimize performance and reliability when uploading large volumes of files:

1. **Manage Batch Size**: Keep batch sizes between 5-10 files for optimal performance. Too many files in a single request increases the risk of timeouts.
2. **Implement Rate Limiting**: Add delays between batches (2-3 seconds recommended) to avoid hitting API rate limits.
3. **Add Error Retry Logic**: For production systems, implement retry logic for failed uploads with exponential backoff.
4. **Validate File Types**: Pre-filter files to ensure they're supported types before attempting upload.
5. **Monitor Batch Progress**: For user-facing applications, provide progress feedback on batch operations.
6. **Handle Partial Success**: The API may return a 207 status code for partial success. Always check individual document statuses.
7. **Clean Up Resources**: Ensure all file handles are properly closed, especially when errors occur.
8. **Parallelize Wisely**: For very large uploads (thousands of files), consider multiple concurrent batch processes targeting different vector stores, then combine results later if needed.
9. **Implement Checksums**: For critical data, verify file integrity before and after upload with checksums.
10. **Log Comprehensive Results**: Maintain detailed logs of all upload operations for troubleshooting.

By following these best practices, you can efficiently manage large-scale document ingestion into your vector stores.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.rememberizer.ai/developer-resources/api-docs/vector-store/upload-files-to-a-vector-store.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
