Rememberizer Docs
Sign inSign upContact us
English
English
  • Why Rememberizer?
  • Background
    • What are Vector Embeddings and Vector Databases?
    • Glossary
    • Standardized Terminology
  • Personal Use
    • Getting Started
      • Search your knowledge
      • Mementos Filter Access
      • Common knowledge
      • Manage your embedded knowledge
  • Integrations
    • Rememberizer App
    • Rememberizer Slack integration
    • Rememberizer Google Drive integration
    • Rememberizer Dropbox integration
    • Rememberizer Gmail integration
    • Rememberizer Memory integration
    • Rememberizer MCP Servers
    • Manage third-party apps
  • Developer Resources
    • Developer Overview
  • Integration Options
    • Registering and using API Keys
    • Registering Rememberizer apps
    • Authorizing Rememberizer apps
    • Creating a Rememberizer GPT
    • LangChain integration
    • Vector Stores
    • Talk-to-Slack the Sample Web App
  • Enterprise Integration
    • Enterprise Integration Patterns
  • API Reference
    • API Documentation Home
    • Authentication
  • Core APIs
    • Search for documents by semantic similarity
    • Retrieve documents
    • Retrieve document contents
    • Retrieve Slack content
    • Memorize content to Rememberizer
  • Account & Configuration
    • Retrieve current user account details
    • List available data source integrations
    • Mementos
    • Get all added public knowledge
  • Vector Store APIs
    • Vector Store Documentation
    • Get vector store information
    • Get a list of documents in a Vector Store
    • Get document information
    • Add new text document to a Vector Store
    • Upload files to a Vector Store
    • Update file content in a Vector Store
    • Remove a document in Vector Store
    • Search for Vector Store documents by semantic similarity
  • Additional Resources
    • Notices
      • Terms of Use
      • Privacy Policy
      • B2B
        • About Reddit Agent
  • Releases
    • Release Notes Home
  • 2025 Releases
    • Apr 25th, 2025
    • Apr 18th, 2025
    • Apr 11th, 2025
    • Apr 4th, 2025
    • Mar 28th, 2025
    • Mar 21st, 2025
    • Mar 14th, 2025
    • Jan 17th, 2025
  • 2024 Releases
    • Dec 27th, 2024
    • Dec 20th, 2024
    • Dec 13th, 2024
    • Dec 6th, 2024
  • Nov 29th, 2024
  • Nov 22nd, 2024
  • Nov 15th, 2024
  • Nov 8th, 2024
  • Nov 1st, 2024
  • Oct 25th, 2024
  • Oct 18th, 2024
  • Oct 11th, 2024
  • Oct 4th, 2024
  • Sep 27th, 2024
  • Sep 20th, 2024
  • Sep 13th, 2024
  • Aug 16th, 2024
  • Aug 9th, 2024
  • Aug 2nd, 2024
  • Jul 26th, 2024
  • Jul 12th, 2024
  • Jun 28th, 2024
  • Jun 14th, 2024
  • May 31st, 2024
  • May 17th, 2024
  • May 10th, 2024
  • Apr 26th, 2024
  • Apr 19th, 2024
  • Apr 12th, 2024
  • Apr 5th, 2024
  • Mar 25th, 2024
  • Mar 18th, 2024
  • Mar 11th, 2024
  • Mar 4th, 2024
  • Feb 26th, 2024
  • Feb 19th, 2024
  • Feb 12th, 2024
  • Feb 5th, 2024
  • Jan 29th, 2024
  • Jan 22nd, 2024
  • Jan 15th, 2024
  • LLM Documentation
    • Rememberizer LLM Ready Documentation
Powered by GitBook
On this page
  1. Core APIs

Search for documents by semantic similarity

Semantic search endpoint with batch processing capabilities

PreviousAuthenticationNextRetrieve documents

Last updated 1 month ago

Example Requests

curl -X GET \
  "https://api.rememberizer.ai/api/v1/documents/search/?q=How%20to%20integrate%20Rememberizer%20with%20custom%20applications&n=5&from=2023-01-01T00:00:00Z&to=2023-12-31T23:59:59Z" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Replace YOUR_JWT_TOKEN with your actual JWT token.

const searchDocuments = async (query, numResults = 5, from = null, to = null) => {
  const url = new URL('https://api.rememberizer.ai/api/v1/documents/search/');
  url.searchParams.append('q', query);
  url.searchParams.append('n', numResults);
  
  if (from) {
    url.searchParams.append('from', from);
  }
  
  if (to) {
    url.searchParams.append('to', to);
  }
  
  const response = await fetch(url.toString(), {
    method: 'GET',
    headers: {
      'Authorization': 'Bearer YOUR_JWT_TOKEN'
    }
  });
  
  const data = await response.json();
  console.log(data);
};

searchDocuments('How to integrate Rememberizer with custom applications', 5);

Replace YOUR_JWT_TOKEN with your actual JWT token.

import requests

def search_documents(query, num_results=5, from_date=None, to_date=None):
    headers = {
        "Authorization": "Bearer YOUR_JWT_TOKEN"
    }
    
    params = {
        "q": query,
        "n": num_results
    }
    
    if from_date:
        params["from"] = from_date
    
    if to_date:
        params["to"] = to_date
    
    response = requests.get(
        "https://api.rememberizer.ai/api/v1/documents/search/",
        headers=headers,
        params=params
    )
    
    data = response.json()
    print(data)

search_documents("How to integrate Rememberizer with custom applications", 5)

Replace YOUR_JWT_TOKEN with your actual JWT token.

require 'net/http'
require 'uri'
require 'json'

def search_documents(query, num_results=5, from_date=nil, to_date=nil)
  uri = URI('https://api.rememberizer.ai/api/v1/documents/search/')
  params = {
    q: query,
    n: num_results
  }
  
  params[:from] = from_date if from_date
  params[:to] = to_date if to_date
  
  uri.query = URI.encode_www_form(params)
  
  request = Net::HTTP::Get.new(uri)
  request['Authorization'] = 'Bearer YOUR_JWT_TOKEN'
  
  http = Net::HTTP.new(uri.host, uri.port)
  http.use_ssl = true
  
  response = http.request(request)
  data = JSON.parse(response.body)
  puts data
end

search_documents("How to integrate Rememberizer with custom applications", 5)

Replace YOUR_JWT_TOKEN with your actual JWT token.

Query Parameters

Parameter
Type
Description

q

string

Required. The search query text (up to 400 words).

n

integer

Number of results to return. Default: 3. Use higher values (e.g., 10) for more comprehensive results.

from

string

Start of the time range for documents to be searched, in ISO 8601 format.

to

string

End of the time range for documents to be searched, in ISO 8601 format.

prev_chunks

integer

Number of preceding chunks to include for context. Default: 2.

next_chunks

integer

Number of following chunks to include for context. Default: 2.

Response Format

{
  "data_sources": [
    {
      "name": "Google Drive",
      "documents": 3
    },
    {
      "name": "Slack",
      "documents": 2
    }
  ],
  "matched_chunks": [
    {
      "document": {
        "id": 12345,
        "document_id": "1aBcD2efGhIjK3lMnOpQrStUvWxYz",
        "name": "Rememberizer API Documentation.pdf",
        "type": "application/pdf",
        "path": "/Documents/Rememberizer/API Documentation.pdf",
        "url": "https://drive.google.com/file/d/1aBcD2efGhIjK3lMnOpQrStUvWxYz/view",
        "size": 250000,
        "created_time": "2023-05-10T14:30:00Z",
        "modified_time": "2023-06-15T09:45:00Z",
        "indexed_on": "2023-06-15T10:30:00Z",
        "integration": {
          "id": 101,
          "integration_type": "google_drive"
        }
      },
      "matched_content": "To integrate Rememberizer with custom applications, you can use the OAuth2 authentication flow to authorize your application to access a user's Rememberizer data. Once authorized, your application can use the Rememberizer APIs to search for documents, retrieve content, and more.",
      "distance": 0.123
    },
    // ... more matched chunks
  ],
  "message": "Search completed successfully",
  "code": "success"
}

Search Optimization Tips

For Question Answering

When searching for an answer to a question, try formulating your query as if it were an ideal answer. For example:

Instead of: "What is vector embedding?" Try: "Vector embedding is a technique that converts text into numerical vectors in a high-dimensional space."

Adjusting Result Count

  • Start with n=3 for quick, high-relevance results

  • Increase to n=10 or higher for more comprehensive information

  • If search returns insufficient information, try increasing the n parameter

Time-Based Filtering

Use the from and to parameters to focus on documents from specific time periods:

  • Recent documents: Set from to a recent date

  • Historical analysis: Specify a specific date range

  • Excluding outdated information: Set an appropriate to date

Batch Operations

For efficiently handling large volumes of search queries, Rememberizer supports batch operations to optimize performance and reduce API call overhead.

Batch Search

import requests
import time
import json
from concurrent.futures import ThreadPoolExecutor

def batch_search_documents(queries, num_results=5, batch_size=10):
    """
    Perform batch searches with multiple queries
    
    Args:
        queries: List of search query strings
        num_results: Number of results to return per query
        batch_size: Number of queries to process in parallel
    
    Returns:
        List of search results for each query
    """
    headers = {
        "Authorization": "Bearer YOUR_JWT_TOKEN",
        "Content-Type": "application/json"
    }
    
    results = []
    
    # Process queries in batches
    for i in range(0, len(queries), batch_size):
        batch = queries[i:i+batch_size]
        
        # Create a thread pool to send requests in parallel
        with ThreadPoolExecutor(max_workers=batch_size) as executor:
            futures = []
            
            for query in batch:
                params = {
                    "q": query,
                    "n": num_results
                }
                
                future = executor.submit(
                    requests.get,
                    "https://api.rememberizer.ai/api/v1/documents/search/",
                    headers=headers,
                    params=params
                )
                futures.append(future)
            
            # Collect results as they complete
            for future in futures:
                response = future.result()
                results.append(response.json())
        
        # Rate limiting - pause between batches to avoid API throttling
        if i + batch_size < len(queries):
            time.sleep(1)
    
    return results

# Example usage
queries = [
    "How to use OAuth with Rememberizer",
    "Vector database configuration options",
    "Best practices for semantic search",
    # Add more queries as needed
]

results = batch_search_documents(queries, num_results=3, batch_size=5)
/**
 * Perform batch searches with multiple queries
 * 
 * @param {string[]} queries - List of search query strings
 * @param {number} numResults - Number of results to return per query
 * @param {number} batchSize - Number of queries to process in parallel
 * @param {number} delayBetweenBatches - Milliseconds to wait between batches
 * @returns {Promise<Array>} - List of search results for each query
 */
async function batchSearchDocuments(queries, numResults = 5, batchSize = 10, delayBetweenBatches = 1000) {
  const results = [];
  
  // Process queries in batches
  for (let i = 0; i < queries.length; i += batchSize) {
    const batch = queries.slice(i, i + batchSize);
    
    // Create an array of promises for concurrent requests
    const batchPromises = batch.map(query => {
      const url = new URL('https://api.rememberizer.ai/api/v1/documents/search/');
      url.searchParams.append('q', query);
      url.searchParams.append('n', numResults);
      
      return fetch(url.toString(), {
        method: 'GET',
        headers: {
          'Authorization': 'Bearer YOUR_JWT_TOKEN'
        }
      }).then(response => response.json());
    });
    
    // Wait for all requests in the batch to complete
    const batchResults = await Promise.all(batchPromises);
    results.push(...batchResults);
    
    // Rate limiting - pause between batches to avoid API throttling
    if (i + batchSize < queries.length) {
      await new Promise(resolve => setTimeout(resolve, delayBetweenBatches));
    }
  }
  
  return results;
}

// Example usage
const queries = [
  "How to use OAuth with Rememberizer",
  "Vector database configuration options",
  "Best practices for semantic search",
  // Add more queries as needed
];

batchSearchDocuments(queries, 3, 5)
  .then(results => console.log(results))
  .catch(error => console.error('Error in batch search:', error));
require 'net/http'
require 'uri'
require 'json'
require 'concurrent'

# Perform batch searches with multiple queries
#
# @param queries [Array<String>] List of search query strings
# @param num_results [Integer] Number of results to return per query
# @param batch_size [Integer] Number of queries to process in parallel
# @param delay_between_batches [Float] Seconds to wait between batches
# @return [Array] List of search results for each query
def batch_search_documents(queries, num_results = 5, batch_size = 10, delay_between_batches = 1.0)
  results = []
  
  # Process queries in batches
  queries.each_slice(batch_size).with_index do |batch, batch_index|
    # Create a thread pool for concurrent requests
    pool = Concurrent::FixedThreadPool.new(batch_size)
    futures = []
    
    batch.each do |query|
      futures << Concurrent::Future.execute(executor: pool) do
        uri = URI('https://api.rememberizer.ai/api/v1/documents/search/')
        params = {
          q: query,
          n: num_results
        }
        
        uri.query = URI.encode_www_form(params)
        
        request = Net::HTTP::Get.new(uri)
        request['Authorization'] = 'Bearer YOUR_JWT_TOKEN'
        
        http = Net::HTTP.new(uri.host, uri.port)
        http.use_ssl = true
        
        response = http.request(request)
        JSON.parse(response.body)
      end
    end
    
    # Collect results from all threads
    batch_results = futures.map(&:value)
    results.concat(batch_results)
    
    # Rate limiting - pause between batches to avoid API throttling
    if batch_index < (queries.length / batch_size.to_f).ceil - 1
      sleep(delay_between_batches)
    end
  end
  
  pool.shutdown
  results
end

# Example usage
queries = [
  "How to use OAuth with Rememberizer",
  "Vector database configuration options",
  "Best practices for semantic search",
  # Add more queries as needed
]

results = batch_search_documents(queries, 3, 5)
puts results

Performance Considerations

When implementing batch operations, consider these best practices:

  1. Optimal Batch Size: Start with batch sizes of 5-10 queries and adjust based on your application's performance characteristics.

  2. Rate Limiting: Include delays between batches to prevent API throttling. A good starting point is 1 second between batches.

  3. Error Handling: Implement robust error handling to manage failed requests within batches.

  4. Resource Management: Monitor client-side resource usage, particularly with large batch sizes, to prevent excessive memory consumption.

  5. Response Processing: Process batch results asynchronously when possible to improve user experience.

For high-volume applications, consider implementing a queue system to manage large numbers of search requests efficiently.

This endpoint provides powerful semantic search capabilities across your entire knowledge base. It uses vector embeddings to find content based on meaning rather than exact keyword matches.

For a deeper understanding of how vector embeddings work and why this search approach is effective, see

What are Vector Embeddings and Vector Databases?
get

Initiate a search operation with a query text of up to 400 words and receive the most semantically similar responses from the stored knowledge. For question-answering, convert your question into an ideal answer and submit it to receive similar real answers.

Query parameters
qstringOptional

Up to 400 words sentence for which you wish to find semantically similar chunks of knowledge.

nintegerOptional

Number of semantically similar chunks of text to return. Use 'n=3' for up to 5, and 'n=10' for more information. If you do not receive enough information, consider trying again with a larger 'n' value.

fromstring · date-timeOptional

Start of the time range for documents to be searched, in ISO 8601 format.

tostring · date-timeOptional

End of the time range for documents to be searched, in ISO 8601 format.

Responses
200
Successful retrieval of documents
application/json
400
Bad request
401
Unauthorized
404
Not found
500
Internal server error
get
GET /api/v1/documents/search/ HTTP/1.1
Host: api.rememberizer.ai
Accept: */*
{
  "data_sources": [
    {
      "name": "text",
      "documents": 1
    }
  ],
  "matched_chunks": [
    {
      "document": {
        "id": 18,
        "document_id": "text",
        "name": "text",
        "type": "text",
        "path": "text",
        "url": "text",
        "size": 1,
        "created_time": "2025-05-16T14:28:47.378Z",
        "modified_time": "2025-05-16T14:28:47.378Z",
        "indexed_on": "2025-05-16T14:28:47.378Z",
        "integration": {
          "id": 1,
          "integration_type": "text"
        }
      },
      "matched_content": "text",
      "distance": 1
    }
  ]
}
  • GET/documents/search/
  • Example Requests
  • Query Parameters
  • Response Format
  • Search Optimization Tips
  • For Question Answering
  • Adjusting Result Count
  • Time-Based Filtering
  • Batch Operations
  • Batch Search
  • Performance Considerations