將文件上傳到向量存儲

批次操作將檔案內容上傳至向量儲存

post

Upload files to a vector store.

Path parameters

vector-store-idstringRequired

The ID of the vector store.

Header parameters

x-api-keystringRequired

The API key for authentication.

Body

filesstring · binary[]Optional

The files to upload.

Responses

201

Files uploaded successfully.

application/json

207

Some files failed to upload.

post

POST /api/v1/vector-stores/{vector-store-id}/documents/upload HTTP/1.1
Host: api.rememberizer.ai
x-api-key: text
Content-Type: multipart/form-data
Accept: */*
Content-Length: 20

{
  "files": [
    "binary"
  ]
}

{
  "documents": [
    {
      "id": 1,
      "name": "text"
    }
  ],
  "errors": [
    {
      "file": "text",
      "error": "text"
    }
  ]
}

範例請求

curl -X POST \
  https://api.rememberizer.ai/api/v1/vector-stores/vs_abc123/documents/upload \
  -H "x-api-key: YOUR_API_KEY" \
  -F "files=@/path/to/document1.pdf" \
  -F "files=@/path/to/document2.docx"

將 YOUR_API_KEY 替換為您的實際向量儲存 API 金鑰，vs_abc123 替換為您的向量儲存 ID，並提供您本地文件的路徑。

const uploadFiles = async (vectorStoreId, files) => {
  const formData = new FormData();
  
  // 將多個文件添加到表單數據中
  for (const file of files) {
    formData.append('files', file);
  }
  
  const response = await fetch(`https://api.rememberizer.ai/api/v1/vector-stores/${vectorStoreId}/documents/upload`, {
    method: 'POST',
    headers: {
      'x-api-key': 'YOUR_API_KEY'
      // 注意：請勿設置 Content-Type 標頭，將自動使用正確的邊界設置
    },
    body: formData
  });
  
  const data = await response.json();
  console.log(data);
};

// 使用文件輸入元素的範例
const fileInput = document.getElementById('fileInput');
uploadFiles('vs_abc123', fileInput.files);

將 YOUR_API_KEY 替換為您的實際向量儲存 API 金鑰，vs_abc123 替換為您的向量儲存 ID。

import requests

def upload_files(vector_store_id, file_paths):
    headers = {
        "x-api-key": "YOUR_API_KEY"
    }
    
    files = [('files', (file_path.split('/')[-1], open(file_path, 'rb'))) for file_path in file_paths]
    
    response = requests.post(
        f"https://api.rememberizer.ai/api/v1/vector-stores/{vector_store_id}/documents/upload",
        headers=headers,
        files=files
    )
    
    data = response.json()
    print(data)

upload_files('vs_abc123', ['/path/to/document1.pdf', '/path/to/document2.docx'])

將 YOUR_API_KEY 替換為您的實際向量儲存 API 金鑰，vs_abc123 替換為您的向量儲存 ID，並提供您本地文件的路徑。

require 'net/http'
require 'uri'
require 'json'

def upload_files(vector_store_id, file_paths)
  uri = URI("https://api.rememberizer.ai/api/v1/vector-stores/#{vector_store_id}/documents/upload")
  
  # 創建一個新的 HTTP 對象
  http = Net::HTTP.new(uri.host, uri.port)
  http.use_ssl = true
  
  # 創建一個多部分表單請求
  request = Net::HTTP::Post.new(uri)
  request['x-api-key'] = 'YOUR_API_KEY'
  
  # 創建一個多部分邊界
  boundary = "RubyFormBoundary#{rand(1000000)}"
  request['Content-Type'] = "multipart/form-data; boundary=#{boundary}"
  
  # 構建請求主體
  body = []
  file_paths.each do |file_path|
    file_name = File.basename(file_path)
    file_content = File.read(file_path, mode: 'rb')
    
    body << "--#{boundary}\r\n"
    body << "Content-Disposition: form-data; name=\"files\"; filename=\"#{file_name}\"\r\n"
    body << "Content-Type: #{get_content_type(file_name)}\r\n\r\n"
    body << file_content
    body << "\r\n"
  end
  body << "--#{boundary}--\r\n"
  
  request.body = body.join
  
  # 發送請求
  response = http.request(request)
  
  # 解析並返回響應
  JSON.parse(response.body)
end

幫助方法以確定內容類型

def get_content_type(filename) ext = File.extname(filename).downcase case ext when '.pdf' then 'application/pdf' when '.doc' then 'application/msword' when '.docx' then 'application/vnd.openxmlformats-officedocument.wordprocessingml.document' when '.txt' then 'text/plain' when '.md' then 'text/markdown' when '.json' then 'application/json' else 'application/octet-stream' end end

示例用法

result = upload_files('vs_abc123', ['/path/to/document1.pdf', '/path/to/document2.docx']) puts result


<div data-gb-custom-block data-tag="hint" data-style='info'>

將 `YOUR_API_KEY` 替換為您的實際向量存儲 API 金鑰，`vs_abc123` 替換為您的向量存儲 ID，並提供您本地文件的路徑。

</div>

</div>

</div>

## 路徑參數

| 參數 | 類型 | 描述 |
|-----------|------|-------------|
| vector-store-id | 字串 | **必填。** 要上傳檔案的向量儲存的 ID。 |

## 請求主體

此端點接受一個 `multipart/form-data` 請求，該請求在 `files` 欄位中包含一個或多個檔案。

## 回應格式

```json
{
  "documents": [
    {
      "id": 1234,
      "name": "document1.pdf",
      "type": "application/pdf",
      "size": 250000,
      "status": "processing",
      "created": "2023-06-15T10:15:00Z",
      "vector_store": "vs_abc123"
    },
    {
      "id": 1235,
      "name": "document2.docx",
      "type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
      "size": 180000,
      "status": "processing",
      "created": "2023-06-15T10:15:00Z",
      "vector_store": "vs_abc123"
    }
  ],
  "errors": []
}

如果某些文件上傳失敗，它們將被列在 errors 陣列中：

{
  "documents": [
    {
      "id": 1234,
      "name": "document1.pdf",
      "type": "application/pdf",
      "size": 250000,
      "status": "processing",
      "created": "2023-06-15T10:15:00Z",
      "vector_store": "vs_abc123"
    }
  ],
  "errors": [
    {
      "file": "document2.docx",
      "error": "不支援的文件格式"
    }
  ]
}

認證

此端點需要使用 x-api-key 標頭中的 API 金鑰進行認證。

支援的檔案格式

PDF (.pdf)
Microsoft Word (.doc, .docx)
Microsoft Excel (.xls, .xlsx)
Microsoft PowerPoint (.ppt, .pptx)
文字檔案 (.txt)
Markdown (.md)
JSON (.json)
HTML (.html, .htm)

檔案大小限制

單個檔案大小限制：50MB
總請求大小限制：100MB
每個請求的最大檔案數量：20

錯誤回應

狀態碼

描述

400

錯誤的請求 - 未提供檔案或請求格式無效

401

未授權 - API 金鑰無效或遺失

404

找不到 - 未找到向量儲存

413

載荷過大 - 檔案超出大小限制

415

不支援的媒體類型 - 檔案格式不支援

500

內部伺服器錯誤

207

多重狀態 - 部分檔案上傳成功，但其他檔案失敗

處理狀態

檔案最初以 processing 狀態被接受。您可以使用獲取向量儲存中的文件列表端點檢查文件的處理狀態。最終狀態將是以下之一：

done: 文件已成功處理
error: 處理過程中發生錯誤
processing: 文件仍在處理中

處理時間取決於檔案大小和複雜性。典型的處理時間為每個文件 30 秒到 5 分鐘之間。

批次操作

為了有效地將多個文件上傳到您的向量存儲，Rememberizer 支持批次操作。這種方法有助於在處理大量文檔時優化性能。

批次上傳實作

import os
import requests
import time
import concurrent.futures
from pathlib import Path

def batch_upload_to_vector_store(vector_store_id, folder_path, batch_size=5, file_types=None):
    """
    將目錄中的所有檔案批次上傳至向量儲存庫
    
    參數:
        vector_store_id: 向量儲存庫的 ID
        folder_path: 包含要上傳檔案的資料夾路徑
        batch_size: 每批次上傳的檔案數量
        file_types: 可選的檔案擴展名列表以進行過濾（例如，['.pdf', '.docx']）
        
    返回:
        上傳結果列表
    """
    api_key = "YOUR_API_KEY"
    headers = {"x-api-key": api_key}
    
    # 獲取目錄中的檔案列表
    files = []
    for entry in os.scandir(folder_path):
        if entry.is_file():
            file_path = Path(entry.path)
            # 如果指定了，則按檔案擴展名過濾
            if file_types is None or file_path.suffix.lower() in file_types:
                files.append(file_path)
    
    print(f"找到 {len(files)} 個檔案要上傳")
    results = []
    
    # 批次處理檔案
    for i in range(0, len(files), batch_size):
        batch = files[i:i+batch_size]
        print(f"處理批次 {i//batch_size + 1}/{(len(files) + batch_size - 1)//batch_size}: {len(batch)} 個檔案")
        
        # 上傳批次
        upload_files = []
        for file_path in batch:
            upload_files.append(('files', (file_path.name, open(file_path, 'rb'))))
        
        try:
            response = requests.post(
                f"https://api.rememberizer.ai/api/v1/vector-stores/{vector_store_id}/documents/upload",
                headers=headers,
                files=upload_files
            )
            
            # 關閉所有檔案句柄
            for _, (_, file_obj) in upload_files:
                file_obj.close()
            
            if response.status_code in (200, 201, 207):
                batch_result = response.json()
                results.append(batch_result)
                print(f"成功上傳批次 - 處理了 {len(batch_result.get('documents', []))} 個檔案")
                
                # 檢查錯誤
                if batch_result.get('errors') and len(batch_result['errors']) > 0:
                    print(f"遇到錯誤: {len(batch_result['errors'])}")
                    for error in batch_result['errors']:
                        print(f"- {error['file']}: {error['error']}")
            else:
                print(f"批次上傳失敗，狀態碼 {response.status_code}: {response.text}")
                results.append({"error": f"批次失敗: {response.text}"})
                
        except Exception as e:
            print(f"批次上傳期間發生異常: {str(e)}")
            results.append({"error": str(e)})
            
            # 在異常情況下關閉任何剩餘的檔案句柄
            for _, (_, file_obj) in upload_files:
                try:
                    file_obj.close()
                except:
                    pass
        
        # 速率限制 - 批次之間暫停
        if i + batch_size < len(files):
            print("在下一批次之前暫停...")
            time.sleep(2)
    
    return results

# 範例用法
results = batch_upload_to_vector_store(
    'vs_abc123',
    '/path/to/documents/folder',
    batch_size=5,
    file_types=['.pdf', '.docx', '.txt']
)

/**
 * 批次上傳檔案到向量儲存
 * 
 * @param {string} vectorStoreId - 向量儲存的 ID
 * @param {FileList|File[]} files - 要上傳的檔案
 * @param {Object} options - 配置選項
 * @returns {Promise<Array>} - 上傳結果列表
 */
async function batchUploadToVectorStore(vectorStoreId, files, options = {}) {
  const {
    batchSize = 5,
    delayBetweenBatches = 2000,
    onProgress = null
  } = options;
  
  const apiKey = 'YOUR_API_KEY';
  const results = [];
  const fileList = Array.from(files);
  const totalBatches = Math.ceil(fileList.length / batchSize);
  
  console.log(`準備上傳 ${fileList.length} 個檔案，共 ${totalBatches} 批次`);
  
  // 批次處理檔案
  for (let i = 0; i < fileList.length; i += batchSize) {
    const batch = fileList.slice(i, i + batchSize);
    const batchNumber = Math.floor(i / batchSize) + 1;
    
    console.log(`處理批次 ${batchNumber}/${totalBatches}: ${batch.length} 個檔案`);
    
    if (onProgress) {
      onProgress({
        currentBatch: batchNumber,
        totalBatches: totalBatches,
        filesInBatch: batch.length,
        totalFiles: fileList.length,
        completedFiles: i
      });
    }
    
    // 為此批次創建 FormData
    const formData = new FormData();
    batch.forEach(file => {
      formData.append('files', file);
    });
    
    try {
      const response = await fetch(
        `https://api.rememberizer.ai/api/v1/vector-stores/${vectorStoreId}/documents/upload`,
        {
          method: 'POST',
          headers: {
            'x-api-key': apiKey
          },
          body: formData
        }
      );
      
      if (response.ok) {
        const batchResult = await response.json();
        results.push(batchResult);
        
        console.log(`成功上傳批次 - ${batchResult.documents?.length || 0} 個檔案已處理`);
        
        // 檢查錯誤
        if (batchResult.errors && batchResult.errors.length > 0) {
          console.warn(`遇到錯誤: ${batchResult.errors.length}`);
          batchResult.errors.forEach(error => {
            console.warn(`- ${error.file}: ${error.error}`);
          });
        }
      } else {
        console.error(`批次上傳失敗，狀態 ${response.status}: ${await response.text()}`);
        results.push({ error: `批次失敗，狀態: ${response.status}` });
      }
    } catch (error) {
      console.error(`批次上傳期間發生異常: ${error.message}`);
      results.push({ error: error.message });
    }
    
    // 在批次之間添加延遲以避免速率限制
    if (i + batchSize < fileList.length) {
      console.log(`在下一批次之前暫停 ${delayBetweenBatches} 毫秒...`);
      await new Promise(resolve => setTimeout(resolve, delayBetweenBatches));
    }
  }
  
  console.log(`上傳完成。已處理 ${fileList.length} 個檔案。`);
  return results;
}

// 使用檔案輸入元素的範例
document.getElementById('upload-button').addEventListener('click', async () => {
  const fileInput = document.getElementById('file-input');
  const vectorStoreId = 'vs_abc123';
  
  const progressBar = document.getElementById('progress-bar');
  
  try {
    const results = await batchUploadToVectorStore(vectorStoreId, fileInput.files, {
      batchSize: 5,
      onProgress: (progress) => {
        // 更新進度 UI
        const percentage = Math.round((progress.completedFiles / progress.totalFiles) * 100);
        progressBar.style.width = `${percentage}%`;
        progressBar.textContent = `${percentage}% (批次 ${progress.currentBatch}/${progress.totalBatches})`;
      }
    });
    
    console.log('完整上傳結果:', results);
  } catch (error) {
    console.error('上傳失敗:', error);
  }
});

require 'net/http'
require 'uri'
require 'json'
require 'mime/types'

# 批量上傳檔案到向量儲存
#
# @param vector_store_id [String] 向量儲存的 ID
# @param folder_path [String] 上傳檔案的資料夾路徑
# @param batch_size [Integer] 每批上傳的檔案數量
# @param file_types [Array<String>] 可選的文件擴展名數組，用於過濾
# @param delay_between_batches [Float] 批次之間等待的秒數
# @return [Array] 上傳結果列表
def batch_upload_to_vector_store(vector_store_id, folder_path, batch_size: 5, file_types: nil, delay_between_batches: 2.0)
  api_key = 'YOUR_API_KEY'
  results = []
  
  # 獲取目錄中的文件列表
  files = Dir.entries(folder_path)
    .select { |f| File.file?(File.join(folder_path, f)) }
    .select { |f| file_types.nil? || file_types.include?(File.extname(f).downcase) }
    .map { |f| File.join(folder_path, f) }
  
  puts "找到 #{files.count} 個文件進行上傳"
  total_batches = (files.count.to_f / batch_size).ceil
  
  # 批量處理文件
  files.each_slice(batch_size).with_index do |batch, batch_index|
    puts "正在處理批次 #{batch_index + 1}/#{total_batches}: #{batch.count} 個文件"
    
    # 準備 HTTP 請求
    uri = URI("https://api.rememberizer.ai/api/v1/vector-stores/#{vector_store_id}/documents/upload")
    request = Net::HTTP::Post.new(uri)
    request['x-api-key'] = api_key
    
    # 創建多部分表單邊界
    boundary = "RubyBoundary#{rand(1000000)}"
    request['Content-Type'] = "multipart/form-data; boundary=#{boundary}"
    
    # 構建請求主體
    body = []
    batch.each do |file_path|
      file_name = File.basename(file_path)
      mime_type = MIME::Types.type_for(file_path).first&.content_type || 'application/octet-stream'
      
      begin
        file_content = File.binread(file_path)
        
        body << "--#{boundary}\r\n"
        body << "Content-Disposition: form-data; name=\"files\"; filename=\"#{file_name}\"\r\n"
        body << "Content-Type: #{mime_type}\r\n\r\n"
        body << file_content
        body << "\r\n"
      rescue => e
        puts "讀取文件 #{file_path} 時出錯: #{e.message}"
      end
    end
    body << "--#{boundary}--\r\n"
    
    request.body = body.join
    
    # 發送請求
    begin
      http = Net::HTTP.new(uri.host, uri.port)
      http.use_ssl = true
      response = http.request(request)
      
      if response.code.to_i == 200 || response.code.to_i == 201 || response.code.to_i == 207
        batch_result = JSON.parse(response.body)
        results << batch_result
        
        puts "成功上傳批次 - 處理了 #{batch_result['documents']&.count || 0} 個文件"
        
        # 檢查錯誤
        if batch_result['errors'] && !batch_result['errors'].empty?
          puts "遇到錯誤: #{batch_result['errors'].count}"
          batch_result['errors'].each do |error|
            puts "- #{error['file']}: #{error['error']}"
          end
        end
      else
        puts "批量上傳失敗，狀態碼 #{response.code}: #{response.body}"
        results << { "error" => "批次失敗: #{response.body}" }
      end
    rescue => e
      puts "批量上傳期間發生異常: #{e.message}"
      results << { "error" => e.message }
    end
    
    # 速率限制 - 批次之間暫停
    if batch_index < total_batches - 1
      puts "在下一批次之前暫停 #{delay_between_batches} 秒..."
      sleep(delay_between_batches)
    end
  end
  
  puts "上傳完成。處理了 #{files.count} 個文件。"
  results
end

# 範例用法
results = batch_upload_to_vector_store(
  'vs_abc123',
  '/path/to/documents/folder',
  batch_size: 5,
  file_types: ['.pdf', '.docx', '.txt'],
  delay_between_batches: 2.0
)

批次上傳最佳實踐

為了在上傳大量文件時優化性能和可靠性：

管理批次大小：保持批次大小在 5-10 個文件之間以獲得最佳性能。單個請求中的文件過多會增加超時的風險。
實施速率限制：在批次之間添加延遲（建議 2-3 秒）以避免觸及 API 速率限制。
添加錯誤重試邏輯：對於生產系統，對失敗的上傳實施重試邏輯，並使用指數退避。
驗證文件類型：在嘗試上傳之前，預先過濾文件以確保它們是支持的類型。
監控批次進度：對於面向用戶的應用程序，提供批次操作的進度反饋。
處理部分成功：API 可能會返回 207 狀態碼以表示部分成功。始終檢查單個文檔的狀態。
清理資源：確保所有文件句柄在發生錯誤時正確關閉。
明智地並行化：對於非常大的上傳（數千個文件），考慮針對不同的向量存儲進行多個並發批次處理，然後在需要時合併結果。
實施校驗和：對於關鍵數據，在上傳前後使用校驗和驗證文件完整性。
記錄全面結果：保持所有上傳操作的詳細日誌以便故障排除。

通過遵循這些最佳實踐，您可以有效地管理大規模文檔的攝取到您的向量存儲中。

Previous向向量存儲添加新文本文檔 Next更新向量存儲中的文件內容

Last updated 2 months ago