Vector Stores

This guide will help you understand how to use the Rememberizer Vector Store as a developer.

The Rememberizer Vector Store simplifies the process of dealing with vector data, allowing you to focus on text input and leveraging the power of vectors for various applications such as search and data analysis.

Introduction

The Rememberizer Vector Store provides an easy-to-use interface for handling vector data while abstracting away the complexity of vector embeddings. Powered by PostgreSQL with the pgvector extension, Rememberizer Vector Store allows you to work directly with text. The service handles chunking, vectorizing, and storing the text data, making it easier for you to focus on your core application logic.

For a deeper understanding of the theoretical concepts behind vector embeddings and vector databases, see What are Vector Embeddings and Vector Databases?.

Technical Overview

How Vector Stores Work

Rememberizer Vector Stores convert text into high-dimensional vector representations (embeddings) that capture semantic meaning. This enables:

  1. Semantic Search: Find documents based on meaning rather than just keywords

  2. Similarity Matching: Identify conceptually related content

  3. Efficient Retrieval: Quickly locate relevant information from large datasets

Key Components

  • Document Processing: Text is split into optimally sized chunks with overlapping boundaries for context preservation

  • Vectorization: Chunks are converted to embeddings using state-of-the-art models

  • Indexing: Specialized algorithms organize vectors for efficient similarity search

  • Query Processing: Search queries are vectorized and compared against stored embeddings

Architecture

Rememberizer implements vector stores using:

  • PostgreSQL with pgvector extension: For efficient vector storage and search

  • Collection-based organization: Each vector store has its own isolated collection

  • API-driven access: Simple RESTful endpoints for all operations

Getting Started

Creating a Vector Store

  1. Navigate to the Vector Stores Section in your dashboard

  2. Click on "Create new Vector Store":

    • A form will appear prompting you to enter details.

  3. Fill in the Details:

    • Name: Provide a unique name for your vector store.

    • Description: Write a brief description of the vector store.

    • Embedding Model: Select the model that converts text to vectors.

    • Indexing Algorithm: Choose how vectors will be organized for search.

    • Search Metric: Define how similarity between vectors is calculated.

    • Vector Dimension: The size of the vector embeddings (typically 768-1536).

  4. Submit the Form:

    • Click on the "Create" button. You will receive a success notification, and the new store will appear in your vector store list.

Configuration Options

Embedding Models

Model
Dimensions
Description
Best For

openai/text-embedding-3-large

1536

High-accuracy embedding model from OpenAI

Production applications requiring maximum accuracy

openai/text-embedding-3-small

1536

Smaller, faster embedding model from OpenAI

Applications with higher throughput requirements

Indexing Algorithms

Algorithm
Description
Tradeoffs

IVFFLAT (default)

Inverted file with flat compression

Good balance of speed and accuracy; works well for most datasets

HNSW

Hierarchical Navigable Small World

Better accuracy for large datasets; higher memory requirements

Search Metrics

Metric
Description
Best For

cosine (default)

Measures angle between vectors

General purpose similarity matching

inner product (ip)

Dot product between vectors

When vector magnitude is important

L2 (Euclidean)

Straight-line distance between vectors

When spatial relationships matter

Managing Vector Stores

  1. View and Edit Vector Stores:

    • Access the management dashboard to view, edit, or delete vector stores.

  2. Viewing Documents:

    • Browse individual documents and their associated metadata within a specific vector store.

  3. Statistics:

    • View detailed statistics such as the number of vectors stored, query performance, and operational metrics.

API Key Management

API keys are used to authenticate and authorize access to the Rememberizer Vector Store's API endpoints. Proper management of API keys is essential for maintaining the security and integrity of your vector stores.

Creating API Keys

  1. Head over to your Vector Store details page

  2. Navigate to the API Key Management Section:

    • It can be found within the "Configuration" tab

  3. Click on "Add API Key":

    • A form will appear prompting you to enter details.

  4. Fill in the Details:

    • Name: Provide a name for the API key to help you identify its use case.

  5. Submit the Form:

    • Click on the "Create" button. The new API key will be generated and displayed. Make sure to copy and store it securely. This key is used to authenticate requests to that specific vector store.

Revoking API Keys

If an API key is no longer needed, you can delete it to prevent any potential misuse.

For security reasons, you may want to rotate your API keys periodically. This involves generating a new key and revoking the old one.

Using the Vector Store API

After creating a Vector Store and generating an API key, you can interact with it using the REST API.

Code Examples

Performance Considerations

Coming soon: Vector Store Architecture Diagram

This technical architecture diagram will illustrate:

  • The PostgreSQL + pgvector foundation architecture

  • Indexing algorithm structures (IVFFLAT vs. HNSW)

  • How search metrics work in vector space (visual comparison)

  • Document chunking process with overlap visualization

  • Performance considerations visualized across different scales

Optimizing for Different Data Volumes

Data Volume
Recommended Configuration
Notes

Small (<10k documents)

IVFFLAT, cosine similarity

Simple configuration provides good performance

Medium (10k-100k documents)

IVFFLAT, ensure regular reindexing

Balance between search speed and index maintenance

Large (>100k documents)

HNSW, consider increasing vector dimensions

Higher memory usage but maintains performance at scale

Chunking Strategies

The chunking process significantly impacts search quality:

  • Chunk Size: Rememberizer uses a default chunk size of 1024 bytes with a 200-byte overlap

  • Smaller Chunks (512-1024 bytes): More precise matches, better for specific questions

  • Larger Chunks (1500-2048 bytes): More context in each match, better for broader topics

  • Overlap: Ensures context is not lost at chunk boundaries

Query Optimization

  • Context Windows: Use prev_chunks and next_chunks to retrieve surrounding content

  • Results Count: Start with 3-5 results (n parameter) and adjust based on precision needs

  • Threshold: Adjust the t parameter to filter results by similarity score

Advanced Usage

Reindexing

Rememberizer automatically triggers reindexing when vector counts exceed predefined thresholds, but consider manual reindexing after:

  • Uploading a large number of documents

  • Changing the embedding model

  • Modifying the indexing algorithm

Query Enhancement

For better search results:

  1. Be specific in search queries

  2. Include context when possible

  3. Use natural language rather than keywords

  4. Adjust parameters based on result quality

Migrating from Other Vector Databases

If you're currently using other vector database solutions and want to migrate to Rememberizer Vector Store, the following guides will help you transition your data efficiently.

Migration Overview

Migrating vector data involves:

  1. Exporting data from your source vector database

  2. Converting the data to a format compatible with Rememberizer

  3. Importing the data into your Rememberizer Vector Store

  4. Verifying the migration was successful

Benefits of Migrating to Rememberizer

  • PostgreSQL Foundation: Built on mature database technology with built-in backup and recovery

  • Integrated Ecosystem: Seamless connection with other Rememberizer components

  • Simplified Management: Unified interface for vector operations

  • Advanced Security: Row-level security and fine-grained access controls

  • Scalable Architecture: Performance optimization as your data grows

Migrating from Pinecone

Migrating from Qdrant

Migrating from Supabase pgvector

If you're already using Supabase with pgvector, the migration to Rememberizer is particularly straightforward since both use PostgreSQL with the pgvector extension.

Migration Best Practices

Follow these recommendations for a successful migration:

  1. Plan Ahead:

    • Estimate the data volume and time required for migration

    • Schedule migration during low-traffic periods

    • Increase disk space before starting large migrations

  2. Test First:

    • Create a test vector store in Rememberizer

    • Migrate a small subset of data (100-1000 vectors)

    • Verify search functionality with key queries

  3. Data Validation:

    • Compare document counts before and after migration

    • Run benchmark queries to ensure similar results

    • Validate that metadata is correctly preserved

  4. Optimize for Performance:

    • Use batch operations for efficiency

    • Consider geographic colocation of source and target databases

    • Monitor API rate limits and adjust batch sizes accordingly

  5. Post-Migration Steps:

    • Verify index creation in Rememberizer

    • Update application configurations to point to new vector store

    • Keep source database as backup until migration is verified

For detailed API reference and endpoint documentation, visit the Vector Store APIs page.


Make sure to handle the API keys securely and follow best practices for API key management.

Last updated