벡터 저장소에 파일 업로드하기

파일 내용을 배치 작업으로 벡터 스토어에 업로드

post

Upload files to a vector store.

Path parameters

vector-store-idstringRequired

The ID of the vector store.

Header parameters

x-api-keystringRequired

The API key for authentication.

Body

filesstring · binary[]Optional

The files to upload.

Responses

201

Files uploaded successfully.

application/json

207

Some files failed to upload.

post

POST /api/v1/vector-stores/{vector-store-id}/documents/upload HTTP/1.1
Host: api.rememberizer.ai
x-api-key: text
Content-Type: multipart/form-data
Accept: */*
Content-Length: 20

{
  "files": [
    "binary"
  ]
}

{
  "documents": [
    {
      "id": 1,
      "name": "text"
    }
  ],
  "errors": [
    {
      "file": "text",
      "error": "text"
    }
  ]
}

예제 요청

curl -X POST \
  https://api.rememberizer.ai/api/v1/vector-stores/vs_abc123/documents/upload \
  -H "x-api-key: YOUR_API_KEY" \
  -F "files=@/path/to/document1.pdf" \
  -F "files=@/path/to/document2.docx"

YOUR_API_KEY를 실제 Vector Store API 키로, vs_abc123를 Vector Store ID로 교체하고, 로컬 파일의 경로를 제공하세요.

const uploadFiles = async (vectorStoreId, files) => {
  const formData = new FormData();
  
  // 여러 파일을 폼 데이터에 추가
  for (const file of files) {
    formData.append('files', file);
  }
  
  const response = await fetch(`https://api.rememberizer.ai/api/v1/vector-stores/${vectorStoreId}/documents/upload`, {
    method: 'POST',
    headers: {
      'x-api-key': 'YOUR_API_KEY'
      // 참고: Content-Type 헤더를 설정하지 마세요. 올바른 경계로 자동 설정됩니다.
    },
    body: formData
  });
  
  const data = await response.json();
  console.log(data);
};

// 파일 입력 요소를 사용한 예제
const fileInput = document.getElementById('fileInput');
uploadFiles('vs_abc123', fileInput.files);

YOUR_API_KEY를 실제 Vector Store API 키로, vs_abc123를 Vector Store ID로 교체하세요.

import requests

def upload_files(vector_store_id, file_paths):
    headers = {
        "x-api-key": "YOUR_API_KEY"
    }
    
    files = [('files', (file_path.split('/')[-1], open(file_path, 'rb'))) for file_path in file_paths]
    
    response = requests.post(
        f"https://api.rememberizer.ai/api/v1/vector-stores/{vector_store_id}/documents/upload",
        headers=headers,
        files=files
    )
    
    data = response.json()
    print(data)

upload_files('vs_abc123', ['/path/to/document1.pdf', '/path/to/document2.docx'])

YOUR_API_KEY를 실제 Vector Store API 키로, vs_abc123를 Vector Store ID로 교체하고, 로컬 파일의 경로를 제공하세요.

require 'net/http'
require 'uri'
require 'json'

def upload_files(vector_store_id, file_paths)
  uri = URI("https://api.rememberizer.ai/api/v1/vector-stores/#{vector_store_id}/documents/upload")
  
  # 새로운 HTTP 객체 생성
  http = Net::HTTP.new(uri.host, uri.port)
  http.use_ssl = true
  
  # 멀티파트 폼 요청 생성
  request = Net::HTTP::Post.new(uri)
  request['x-api-key'] = 'YOUR_API_KEY'
  
  # 멀티파트 경계 생성
  boundary = "RubyFormBoundary#{rand(1000000)}"
  request['Content-Type'] = "multipart/form-data; boundary=#{boundary}"
  
  # 요청 본문 구성
  body = []
  file_paths.each do |file_path|
    file_name = File.basename(file_path)
    file_content = File.read(file_path, mode: 'rb')
    
    body << "--#{boundary}\r\n"
    body << "Content-Disposition: form-data; name=\"files\"; filename=\"#{file_name}\"\r\n"
    body << "Content-Type: #{get_content_type(file_name)}\r\n\r\n"
    body << file_content
    body << "\r\n"
  end
  body << "--#{boundary}--\r\n"
  
  request.body = body.join
  
  # 요청 전송
  response = http.request(request)
  
  # 응답 파싱 및 반환
  JSON.parse(response.body)
end

콘텐츠 유형을 결정하는 헬퍼 메서드

def get_content_type(filename) ext = File.extname(filename).downcase case ext when '.pdf' then 'application/pdf' when '.doc' then 'application/msword' when '.docx' then 'application/vnd.openxmlformats-officedocument.wordprocessingml.document' when '.txt' then 'text/plain' when '.md' then 'text/markdown' when '.json' then 'application/json' else 'application/octet-stream' end end

예제 사용법

result = upload_files('vs_abc123', ['/path/to/document1.pdf', '/path/to/document2.docx']) puts result


<div data-gb-custom-block data-tag="hint" data-style='info'>

`YOUR_API_KEY`를 실제 Vector Store API 키로, `vs_abc123`를 Vector Store ID로 교체하고, 로컬 파일의 경로를 제공하세요.

</div>

</div>

</div>

## 경로 매개변수

| 매개변수 | 유형 | 설명 |
|-----------|------|-------------|
| vector-store-id | 문자열 | **필수.** 파일을 업로드할 벡터 저장소의 ID입니다. |

## 요청 본문

이 엔드포인트는 `files` 필드에 하나 이상의 파일이 포함된 `multipart/form-data` 요청을 수락합니다.

## 응답 형식

```json
{
  "documents": [
    {
      "id": 1234,
      "name": "document1.pdf",
      "type": "application/pdf",
      "size": 250000,
      "status": "processing",
      "created": "2023-06-15T10:15:00Z",
      "vector_store": "vs_abc123"
    },
    {
      "id": 1235,
      "name": "document2.docx",
      "type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
      "size": 180000,
      "status": "processing",
      "created": "2023-06-15T10:15:00Z",
      "vector_store": "vs_abc123"
    }
  ],
  "errors": []
}

파일 업로드에 실패한 경우, errors 배열에 나열됩니다:

{
  "documents": [
    {
      "id": 1234,
      "name": "document1.pdf",
      "type": "application/pdf",
      "size": 250000,
      "status": "processing",
      "created": "2023-06-15T10:15:00Z",
      "vector_store": "vs_abc123"
    }
  ],
  "errors": [
    {
      "file": "document2.docx",
      "error": "지원되지 않는 파일 형식"
    }
  ]
}

인증

이 엔드포인트는 x-api-key 헤더에 API 키를 사용하여 인증이 필요합니다.

지원되는 파일 형식

PDF (.pdf)
Microsoft Word (.doc, .docx)
Microsoft Excel (.xls, .xlsx)
Microsoft PowerPoint (.ppt, .pptx)
텍스트 파일 (.txt)
Markdown (.md)
JSON (.json)
HTML (.html, .htm)

파일 크기 제한

개별 파일 크기 제한: 50MB
총 요청 크기 제한: 100MB
요청당 최대 파일 수: 20

오류 응답

상태 코드

설명

400

잘못된 요청 - 파일이 제공되지 않거나 요청 형식이 잘못됨

401

권한 없음 - 유효하지 않거나 누락된 API 키

404

찾을 수 없음 - 벡터 저장소를 찾을 수 없음

413

페이로드가 너무 큼 - 파일이 크기 제한을 초과함

415

지원되지 않는 미디어 유형 - 파일 형식이 지원되지 않음

500

내부 서버 오류

207

다중 상태 - 일부 파일은 성공적으로 업로드되었지만 다른 파일은 실패함

처리 상태

파일은 처음에 processing 상태로 수락됩니다. 벡터 스토어에서 문서 목록 가져오기 엔드포인트를 사용하여 문서의 처리 상태를 확인할 수 있습니다. 최종 상태는 다음 중 하나입니다:

done: 문서가 성공적으로 처리되었습니다
error: 처리 중 오류가 발생했습니다
processing: 문서가 아직 처리 중입니다

처리 시간은 파일 크기와 복잡성에 따라 다릅니다. 일반적인 처리 시간은 문서당 30초에서 5분 사이입니다.

배치 작업

여러 파일을 효율적으로 Vector Store에 업로드하기 위해 Rememberizer는 배치 작업을 지원합니다. 이 접근 방식은 많은 수의 문서를 처리할 때 성능을 최적화하는 데 도움이 됩니다.

배치 업로드 구현

import os
import requests
import time
import concurrent.futures
from pathlib import Path

def batch_upload_to_vector_store(vector_store_id, folder_path, batch_size=5, file_types=None):
    """
    디렉토리의 모든 파일을 배치로 벡터 스토어에 업로드합니다.
    
    Args:
        vector_store_id: 벡터 스토어의 ID
        folder_path: 업로드할 파일이 포함된 폴더의 경로
        batch_size: 각 배치에서 업로드할 파일 수
        file_types: 필터링할 파일 확장자의 선택적 목록 (예: ['.pdf', '.docx'])
        
    Returns:
        업로드 결과 목록
    """
    api_key = "YOUR_API_KEY"
    headers = {"x-api-key": api_key}
    
    # 디렉토리의 파일 목록 가져오기
    files = []
    for entry in os.scandir(folder_path):
        if entry.is_file():
            file_path = Path(entry.path)
            # 지정된 경우 파일 확장자로 필터링
            if file_types is None or file_path.suffix.lower() in file_types:
                files.append(file_path)
    
    print(f"업로드할 파일 {len(files)}개를 찾았습니다.")
    results = []
    
    # 파일을 배치로 처리
    for i in range(0, len(files), batch_size):
        batch = files[i:i+batch_size]
        print(f"배치 처리 중 {i//batch_size + 1}/{(len(files) + batch_size - 1)//batch_size}: {len(batch)} 파일")
        
        # 배치 업로드
        upload_files = []
        for file_path in batch:
            upload_files.append(('files', (file_path.name, open(file_path, 'rb'))))
        
        try:
            response = requests.post(
                f"https://api.rememberizer.ai/api/v1/vector-stores/{vector_store_id}/documents/upload",
                headers=headers,
                files=upload_files
            )
            
            # 모든 파일 핸들 닫기
            for _, (_, file_obj) in upload_files:
                file_obj.close()
            
            if response.status_code in (200, 201, 207):
                batch_result = response.json()
                results.append(batch_result)
                print(f"배치 업로드 성공 - {len(batch_result.get('documents', []))} 문서 처리됨")
                
                # 오류 확인
                if batch_result.get('errors') and len(batch_result['errors']) > 0:
                    print(f"발생한 오류: {len(batch_result['errors'])}")
                    for error in batch_result['errors']:
                        print(f"- {error['file']}: {error['error']}")
            else:
                print(f"배치 업로드 실패, 상태 코드 {response.status_code}: {response.text}")
                results.append({"error": f"배치 실패: {response.text}"})
                
        except Exception as e:
            print(f"배치 업로드 중 예외 발생: {str(e)}")
            results.append({"error": str(e)})
            
            # 예외 발생 시 남아 있는 파일 핸들 닫기
            for _, (_, file_obj) in upload_files:
                try:
                    file_obj.close()
                except:
                    pass
        
        # 속도 제한 - 배치 간 일시 중지
        if i + batch_size < len(files):
            print("다음 배치 전 일시 중지 중...")
            time.sleep(2)
    
    return results

# 예제 사용법
results = batch_upload_to_vector_store(
    'vs_abc123',
    '/path/to/documents/folder',
    batch_size=5,
    file_types=['.pdf', '.docx', '.txt']
)

/**
 * 파일을 배치로 벡터 스토어에 업로드합니다.
 * 
 * @param {string} vectorStoreId - 벡터 스토어의 ID
 * @param {FileList|File[]} files - 업로드할 파일
 * @param {Object} options - 구성 옵션
 * @returns {Promise<Array>} - 업로드 결과 목록
 */
async function batchUploadToVectorStore(vectorStoreId, files, options = {}) {
  const {
    batchSize = 5,
    delayBetweenBatches = 2000,
    onProgress = null
  } = options;
  
  const apiKey = 'YOUR_API_KEY';
  const results = [];
  const fileList = Array.from(files);
  const totalBatches = Math.ceil(fileList.length / batchSize);
  
  console.log(`총 ${fileList.length}개의 파일을 ${totalBatches}개의 배치로 업로드 준비 중`);
  
  // 파일을 배치로 처리
  for (let i = 0; i < fileList.length; i += batchSize) {
    const batch = fileList.slice(i, i + batchSize);
    const batchNumber = Math.floor(i / batchSize) + 1;
    
    console.log(`배치 ${batchNumber}/${totalBatches} 처리 중: ${batch.length}개의 파일`);
    
    if (onProgress) {
      onProgress({
        currentBatch: batchNumber,
        totalBatches: totalBatches,
        filesInBatch: batch.length,
        totalFiles: fileList.length,
        completedFiles: i
      });
    }
    
    // 이 배치에 대한 FormData 생성
    const formData = new FormData();
    batch.forEach(file => {
      formData.append('files', file);
    });
    
    try {
      const response = await fetch(
        `https://api.rememberizer.ai/api/v1/vector-stores/${vectorStoreId}/documents/upload`,
        {
          method: 'POST',
          headers: {
            'x-api-key': apiKey
          },
          body: formData
        }
      );
      
      if (response.ok) {
        const batchResult = await response.json();
        results.push(batchResult);
        
        console.log(`배치 성공적으로 업로드됨 - ${batchResult.documents?.length || 0}개의 문서 처리됨`);
        
        // 오류 확인
        if (batchResult.errors && batchResult.errors.length > 0) {
          console.warn(`발생한 오류: ${batchResult.errors.length}`);
          batchResult.errors.forEach(error => {
            console.warn(`- ${error.file}: ${error.error}`);
          });
        }
      } else {
        console.error(`배치 업로드 실패, 상태 ${response.status}: ${await response.text()}`);
        results.push({ error: `배치 실패, 상태: ${response.status}` });
      }
    } catch (error) {
      console.error(`배치 업로드 중 예외 발생: ${error.message}`);
      results.push({ error: error.message });
    }
    
    // 속도 제한을 피하기 위해 배치 사이에 지연 추가
    if (i + batchSize < fileList.length) {
      console.log(`다음 배치 전 ${delayBetweenBatches}ms 동안 일시 중지...`);
      await new Promise(resolve => setTimeout(resolve, delayBetweenBatches));
    }
  }
  
  console.log(`업로드 완료. 총 ${fileList.length}개의 파일 처리됨.`);
  return results;
}

// 파일 입력 요소로 예제 사용법
document.getElementById('upload-button').addEventListener('click', async () => {
  const fileInput = document.getElementById('file-input');
  const vectorStoreId = 'vs_abc123';
  
  const progressBar = document.getElementById('progress-bar');
  
  try {
    const results = await batchUploadToVectorStore(vectorStoreId, fileInput.files, {
      batchSize: 5,
      onProgress: (progress) => {
        // 진행 UI 업데이트
        const percentage = Math.round((progress.completedFiles / progress.totalFiles) * 100);
        progressBar.style.width = `${percentage}%`;
        progressBar.textContent = `${percentage}% (배치 ${progress.currentBatch}/${progress.totalBatches})`;
      }
    });
    
    console.log('업로드 완료 결과:', results);
  } catch (error) {
    console.error('업로드 실패:', error);
  }
});

require 'net/http'
require 'uri'
require 'json'
require 'mime/types'

# 벡터 스토어에 파일을 배치로 업로드하기
#
# @param vector_store_id [String] 벡터 스토어의 ID
# @param folder_path [String] 업로드할 파일이 포함된 폴더의 경로
# @param batch_size [Integer] 각 배치에서 업로드할 파일 수
# @param file_types [Array<String>] 선택적 파일 확장자 필터링을 위한 배열
# @param delay_between_batches [Float] 배치 간 대기할 초
# @return [Array] 업로드 결과 목록
def batch_upload_to_vector_store(vector_store_id, folder_path, batch_size: 5, file_types: nil, delay_between_batches: 2.0)
  api_key = 'YOUR_API_KEY'
  results = []
  
  # 디렉토리의 파일 목록 가져오기
  files = Dir.entries(folder_path)
    .select { |f| File.file?(File.join(folder_path, f)) }
    .select { |f| file_types.nil? || file_types.include?(File.extname(f).downcase) }
    .map { |f| File.join(folder_path, f) }
  
  puts "업로드할 파일 #{files.count}개를 찾았습니다"
  total_batches = (files.count.to_f / batch_size).ceil
  
  # 배치로 파일 처리
  files.each_slice(batch_size).with_index do |batch, batch_index|
    puts "배치 #{batch_index + 1}/#{total_batches} 처리 중: #{batch.count}개 파일"
    
    # HTTP 요청 준비
    uri = URI("https://api.rememberizer.ai/api/v1/vector-stores/#{vector_store_id}/documents/upload")
    request = Net::HTTP::Post.new(uri)
    request['x-api-key'] = api_key
    
    # 멀티파트 폼 경계 생성
    boundary = "RubyBoundary#{rand(1000000)}"
    request['Content-Type'] = "multipart/form-data; boundary=#{boundary}"
    
    # 요청 본문 구성
    body = []
    batch.each do |file_path|
      file_name = File.basename(file_path)
      mime_type = MIME::Types.type_for(file_path).first&.content_type || 'application/octet-stream'
      
      begin
        file_content = File.binread(file_path)
        
        body << "--#{boundary}\r\n"
        body << "Content-Disposition: form-data; name=\"files\"; filename=\"#{file_name}\"\r\n"
        body << "Content-Type: #{mime_type}\r\n\r\n"
        body << file_content
        body << "\r\n"
      rescue => e
        puts "파일 #{file_path} 읽기 오류: #{e.message}"
      end
    end
    body << "--#{boundary}--\r\n"
    
    request.body = body.join
    
    # 요청 전송
    begin
      http = Net::HTTP.new(uri.host, uri.port)
      http.use_ssl = true
      response = http.request(request)
      
      if response.code.to_i == 200 || response.code.to_i == 201 || response.code.to_i == 207
        batch_result = JSON.parse(response.body)
        results << batch_result
        
        puts "배치 업로드 성공 - 처리된 문서 #{batch_result['documents']&.count || 0}개"
        
        # 오류 확인
        if batch_result['errors'] && !batch_result['errors'].empty?
          puts "발생한 오류: #{batch_result['errors'].count}개"
          batch_result['errors'].each do |error|
            puts "- #{error['file']}: #{error['error']}"
          end
        end
      else
        puts "배치 업로드 실패, 상태 코드 #{response.code}: #{response.body}"
        results << { "error" => "배치 실패: #{response.body}" }
      end
    rescue => e
      puts "배치 업로드 중 예외 발생: #{e.message}"
      results << { "error" => e.message }
    end
    
    # 속도 제한 - 배치 간 일시 정지
    if batch_index < total_batches - 1
      puts "다음 배치 전 #{delay_between_batches}초 동안 일시 정지 중..."
      sleep(delay_between_batches)
    end
  end
  
  puts "업로드 완료. 처리된 파일 #{files.count}개."
  results
end

# 예제 사용법
results = batch_upload_to_vector_store(
  'vs_abc123',
  '/path/to/documents/folder',
  batch_size: 5,
  file_types: ['.pdf', '.docx', '.txt'],
  delay_between_batches: 2.0
)

배치 업로드 모범 사례

대량의 파일을 업로드할 때 성능과 신뢰성을 최적화하려면:

배치 크기 관리: 최적의 성능을 위해 배치 크기를 5-10 파일 사이로 유지하세요. 단일 요청에 너무 많은 파일이 포함되면 타임아웃 위험이 증가합니다.
요율 제한 구현: API 요율 제한에 도달하지 않도록 배치 간에 지연(2-3초 권장)을 추가하세요.
오류 재시도 로직 추가: 프로덕션 시스템의 경우, 지수 백오프를 사용하여 실패한 업로드에 대한 재시도 로직을 구현하세요.
파일 유형 검증: 업로드를 시도하기 전에 지원되는 유형인지 확인하기 위해 파일을 미리 필터링하세요.
배치 진행 상황 모니터링: 사용자 대면 애플리케이션의 경우, 배치 작업에 대한 진행 피드백을 제공하세요.
부분 성공 처리: API는 부분 성공에 대해 207 상태 코드를 반환할 수 있습니다. 항상 개별 문서 상태를 확인하세요.
자원 정리: 특히 오류가 발생할 때 모든 파일 핸들이 제대로 닫혔는지 확인하세요.
신중하게 병렬화: 매우 큰 업로드(수천 개의 파일)의 경우, 서로 다른 벡터 스토어를 대상으로 하는 여러 동시 배치 프로세스를 고려한 다음 필요에 따라 결과를 나중에 결합하세요.
체크섬 구현: 중요한 데이터의 경우, 체크섬을 사용하여 업로드 전후에 파일 무결성을 확인하세요.
포괄적인 결과 기록: 문제 해결을 위해 모든 업로드 작업에 대한 자세한 로그를 유지하세요.

이러한 모범 사례를 따르면 벡터 스토어에 대규모 문서 수집을 효율적으로 관리할 수 있습니다.

Previous벡터 저장소에 새 텍스트 문서 추가하기 Next벡터 저장소의 파일 내용 업데이트하기

Last updated 2 months ago