Vector Stores
This guide will help you understand how to use the Rememberizer Vector Store as a developer.
The Rememberizer Vector Store simplifies the process of dealing with vector data, allowing you to focus on text input and leveraging the power of vectors for various applications such as search and data analysis.
Introduction
The Rememberizer Vector Store provides an easy-to-use interface for handling vector data while abstracting away the complexity of vector embeddings. Powered by PostgreSQL with the pgvector extension, Rememberizer Vector Store allows you to work directly with text. The service handles chunking, vectorizing, and storing the text data, making it easier for you to focus on your core application logic.
For a deeper understanding of the theoretical concepts behind vector embeddings and vector databases, see What are Vector Embeddings and Vector Databases?.
Technical Overview
How Vector Stores Work
Rememberizer Vector Stores convert text into high-dimensional vector representations (embeddings) that capture semantic meaning. This enables:
Semantic Search: Find documents based on meaning rather than just keywords
Similarity Matching: Identify conceptually related content
Efficient Retrieval: Quickly locate relevant information from large datasets
Key Components
Document Processing: Text is split into optimally sized chunks with overlapping boundaries for context preservation
Vectorization: Chunks are converted to embeddings using state-of-the-art models
Indexing: Specialized algorithms organize vectors for efficient similarity search
Query Processing: Search queries are vectorized and compared against stored embeddings
Architecture
Rememberizer implements vector stores using:
PostgreSQL with pgvector extension: For efficient vector storage and search
Collection-based organization: Each vector store has its own isolated collection
API-driven access: Simple RESTful endpoints for all operations
Getting Started
Creating a Vector Store
Navigate to the Vector Stores Section in your dashboard
Click on "Create new Vector Store":
A form will appear prompting you to enter details.
Fill in the Details:
Name: Provide a unique name for your vector store.
Description: Write a brief description of the vector store.
Embedding Model: Select the model that converts text to vectors.
Indexing Algorithm: Choose how vectors will be organized for search.
Search Metric: Define how similarity between vectors is calculated.
Vector Dimension: The size of the vector embeddings (typically 768-1536).
Submit the Form:
Click on the "Create" button. You will receive a success notification, and the new store will appear in your vector store list.

Configuration Options
Embedding Models
openai/text-embedding-3-large
1536
High-accuracy embedding model from OpenAI
Production applications requiring maximum accuracy
openai/text-embedding-3-small
1536
Smaller, faster embedding model from OpenAI
Applications with higher throughput requirements
Indexing Algorithms
IVFFLAT (default)
Inverted file with flat compression
Good balance of speed and accuracy; works well for most datasets
HNSW
Hierarchical Navigable Small World
Better accuracy for large datasets; higher memory requirements
Search Metrics
cosine (default)
Measures angle between vectors
General purpose similarity matching
inner product (ip)
Dot product between vectors
When vector magnitude is important
L2 (Euclidean)
Straight-line distance between vectors
When spatial relationships matter
Managing Vector Stores
View and Edit Vector Stores:
Access the management dashboard to view, edit, or delete vector stores.
Viewing Documents:
Browse individual documents and their associated metadata within a specific vector store.
Statistics:
View detailed statistics such as the number of vectors stored, query performance, and operational metrics.

API Key Management
API keys are used to authenticate and authorize access to the Rememberizer Vector Store's API endpoints. Proper management of API keys is essential for maintaining the security and integrity of your vector stores.
Creating API Keys
Head over to your Vector Store details page
Navigate to the API Key Management Section:
It can be found within the "Configuration" tab
Click on "Add API Key":
A form will appear prompting you to enter details.
Fill in the Details:
Name: Provide a name for the API key to help you identify its use case.
Submit the Form:
Click on the "Create" button. The new API key will be generated and displayed. Make sure to copy and store it securely. This key is used to authenticate requests to that specific vector store.

Revoking API Keys
If an API key is no longer needed, you can delete it to prevent any potential misuse.
For security reasons, you may want to rotate your API keys periodically. This involves generating a new key and revoking the old one.
Using the Vector Store API
After creating a Vector Store and generating an API key, you can interact with it using the REST API.
Code Examples
import requests
import json
API_KEY = "your_api_key_here"
VECTOR_STORE_ID = "vs_abc123" # Replace with your vector store ID
BASE_URL = "https://api.rememberizer.ai/api/v1"
# Upload a document to the vector store
def upload_document(file_path, document_name=None):
if document_name is None:
document_name = file_path.split("/")[-1]
with open(file_path, "rb") as f:
files = {"file": (document_name, f)}
headers = {"x-api-key": API_KEY}
response = requests.post(
f"{BASE_URL}/vector-stores/{VECTOR_STORE_ID}/documents",
headers=headers,
files=files
)
if response.status_code == 201:
print(f"Document '{document_name}' uploaded successfully!")
return response.json()
else:
print(f"Error uploading document: {response.text}")
return None
# Upload text content to the vector store
def upload_text(content, document_name):
headers = {
"x-api-key": API_KEY,
"Content-Type": "application/json"
}
data = {
"name": document_name,
"content": content
}
response = requests.post(
f"{BASE_URL}/vector-stores/{VECTOR_STORE_ID}/documents/text",
headers=headers,
json=data
)
if response.status_code == 201:
print(f"Text document '{document_name}' uploaded successfully!")
return response.json()
else:
print(f"Error uploading text: {response.text}")
return None
# Search the vector store
def search_vector_store(query, num_results=5, prev_chunks=1, next_chunks=1):
headers = {"x-api-key": API_KEY}
params = {
"q": query,
"n": num_results,
"prev_chunks": prev_chunks,
"next_chunks": next_chunks
}
response = requests.get(
f"{BASE_URL}/vector-stores/{VECTOR_STORE_ID}/documents/search",
headers=headers,
params=params
)
if response.status_code == 200:
results = response.json()
print(f"Found {len(results['matched_chunks'])} matches for '{query}'")
# Print the top result
if results['matched_chunks']:
top_match = results['matched_chunks'][0]
print(f"Top match (distance: {top_match['distance']}):")
print(f"Document: {top_match['document']['name']}")
print(f"Content: {top_match['matched_content']}")
return results
else:
print(f"Error searching: {response.text}")
return None
# Example usage
# upload_document("path/to/document.pdf")
# upload_text("This is a sample text to be vectorized", "sample-document.txt")
# search_vector_store("How does vector similarity work?")
Performance Considerations
Coming soon: Vector Store Architecture Diagram
This technical architecture diagram will illustrate:
The PostgreSQL + pgvector foundation architecture
Indexing algorithm structures (IVFFLAT vs. HNSW)
How search metrics work in vector space (visual comparison)
Document chunking process with overlap visualization
Performance considerations visualized across different scales
Optimizing for Different Data Volumes
Small (<10k documents)
IVFFLAT, cosine similarity
Simple configuration provides good performance
Medium (10k-100k documents)
IVFFLAT, ensure regular reindexing
Balance between search speed and index maintenance
Large (>100k documents)
HNSW, consider increasing vector dimensions
Higher memory usage but maintains performance at scale
Chunking Strategies
The chunking process significantly impacts search quality:
Chunk Size: Rememberizer uses a default chunk size of 1024 bytes with a 200-byte overlap
Smaller Chunks (512-1024 bytes): More precise matches, better for specific questions
Larger Chunks (1500-2048 bytes): More context in each match, better for broader topics
Overlap: Ensures context is not lost at chunk boundaries
Query Optimization
Context Windows: Use
prev_chunks
andnext_chunks
to retrieve surrounding contentResults Count: Start with 3-5 results (
n
parameter) and adjust based on precision needsThreshold: Adjust the
t
parameter to filter results by similarity score
Advanced Usage
Reindexing
Rememberizer automatically triggers reindexing when vector counts exceed predefined thresholds, but consider manual reindexing after:
Uploading a large number of documents
Changing the embedding model
Modifying the indexing algorithm
Query Enhancement
For better search results:
Be specific in search queries
Include context when possible
Use natural language rather than keywords
Adjust parameters based on result quality
Migrating from Other Vector Databases
If you're currently using other vector database solutions and want to migrate to Rememberizer Vector Store, the following guides will help you transition your data efficiently.
Migration Overview
Migrating vector data involves:
Exporting data from your source vector database
Converting the data to a format compatible with Rememberizer
Importing the data into your Rememberizer Vector Store
Verifying the migration was successful
Benefits of Migrating to Rememberizer
PostgreSQL Foundation: Built on mature database technology with built-in backup and recovery
Integrated Ecosystem: Seamless connection with other Rememberizer components
Simplified Management: Unified interface for vector operations
Advanced Security: Row-level security and fine-grained access controls
Scalable Architecture: Performance optimization as your data grows
Migrating from Pinecone
import os
import pinecone
import requests
import json
import time
# Set up Pinecone client
pinecone.init(api_key="PINECONE_API_KEY", environment="PINECONE_ENV")
source_index = pinecone.Index("your-pinecone-index")
# Set up Rememberizer Vector Store client
REMEMBERIZER_API_KEY = "your_rememberizer_api_key"
VECTOR_STORE_ID = "vs_abc123" # Your Rememberizer vector store ID
BASE_URL = "https://api.rememberizer.ai/api/v1"
# 1. Set up batch size for migration (adjust based on your data size)
BATCH_SIZE = 100
# 2. Function to get vectors from Pinecone
def fetch_vectors_from_pinecone(index_name, batch_size, cursor=None):
# Use the list operation if available in your Pinecone version
try:
result = source_index.list(limit=batch_size, cursor=cursor)
vectors = result.get("vectors", {})
next_cursor = result.get("cursor")
return vectors, next_cursor
except AttributeError:
# For older Pinecone versions without list operation
# This is a simplified approach; actual implementation depends on your data access pattern
query_response = source_index.query(
vector=[0] * source_index.describe_index_stats()["dimension"],
top_k=batch_size,
include_metadata=True,
include_values=True
)
return {item.id: {"id": item.id, "values": item.values, "metadata": item.metadata}
for item in query_response.matches}, None
# 3. Function to upload vectors to Rememberizer
def upload_to_rememberizer(vectors):
headers = {
"x-api-key": REMEMBERIZER_API_KEY,
"Content-Type": "application/json"
}
for vector_id, vector_data in vectors.items():
# Convert Pinecone vector data to Rememberizer format
document_name = vector_data.get("metadata", {}).get("filename", f"pinecone_doc_{vector_id}")
content = vector_data.get("metadata", {}).get("text", "")
if not content:
print(f"Skipping {vector_id} - no text content found in metadata")
continue
data = {
"name": document_name,
"content": content,
# Optional: include additional metadata
"metadata": vector_data.get("metadata", {})
}
response = requests.post(
f"{BASE_URL}/vector-stores/{VECTOR_STORE_ID}/documents/text",
headers=headers,
json=data
)
if response.status_code == 201:
print(f"Document '{document_name}' uploaded successfully!")
else:
print(f"Error uploading document {document_name}: {response.text}")
# Add a small delay to prevent rate limiting
time.sleep(0.1)
# 4. Main migration function
def migrate_pinecone_to_rememberizer():
cursor = None
total_migrated = 0
print("Starting migration from Pinecone to Rememberizer...")
while True:
vectors, cursor = fetch_vectors_from_pinecone("your-pinecone-index", BATCH_SIZE, cursor)
if not vectors:
break
print(f"Fetched {len(vectors)} vectors from Pinecone")
upload_to_rememberizer(vectors)
total_migrated += len(vectors)
print(f"Progress: {total_migrated} vectors migrated")
if not cursor:
break
print(f"Migration complete! {total_migrated} total vectors migrated to Rememberizer")
# Run the migration
# migrate_pinecone_to_rememberizer()
Migrating from Qdrant
import requests
import json
import time
from qdrant_client import QdrantClient
from qdrant_client.http import models as rest
# Set up Qdrant client
QDRANT_URL = "http://localhost:6333" # or your Qdrant cloud URL
QDRANT_API_KEY = "your_qdrant_api_key" # if using Qdrant Cloud
QDRANT_COLLECTION_NAME = "your_collection"
qdrant_client = QdrantClient(
url=QDRANT_URL,
api_key=QDRANT_API_KEY # Only for Qdrant Cloud
)
# Set up Rememberizer Vector Store client
REMEMBERIZER_API_KEY = "your_rememberizer_api_key"
VECTOR_STORE_ID = "vs_abc123" # Your Rememberizer vector store ID
BASE_URL = "https://api.rememberizer.ai/api/v1"
# Batch size for processing
BATCH_SIZE = 100
# Function to fetch points from Qdrant
def fetch_points_from_qdrant(collection_name, batch_size, offset=0):
try:
# Get collection info to determine vector dimension
collection_info = qdrant_client.get_collection(collection_name=collection_name)
# Scroll through points
scroll_result = qdrant_client.scroll(
collection_name=collection_name,
limit=batch_size,
offset=offset,
with_payload=True,
with_vectors=True
)
points = scroll_result[0] # Tuple of (points, next_offset)
next_offset = scroll_result[1]
return points, next_offset
except Exception as e:
print(f"Error fetching points from Qdrant: {e}")
return [], None
# Function to upload vectors to Rememberizer
def upload_to_rememberizer(points):
headers = {
"x-api-key": REMEMBERIZER_API_KEY,
"Content-Type": "application/json"
}
results = []
for point in points:
# Extract data from Qdrant point
point_id = point.id
metadata = point.payload
text_content = metadata.get("text", "")
document_name = metadata.get("filename", f"qdrant_doc_{point_id}")
if not text_content:
print(f"Skipping {point_id} - no text content found in payload")
continue
data = {
"name": document_name,
"content": text_content,
# Optional: include additional metadata
"metadata": metadata
}
try:
response = requests.post(
f"{BASE_URL}/vector-stores/{VECTOR_STORE_ID}/documents/text",
headers=headers,
json=data
)
if response.status_code == 201:
print(f"Document '{document_name}' uploaded successfully!")
results.append({"id": point_id, "success": True})
else:
print(f"Error uploading document {document_name}: {response.text}")
results.append({"id": point_id, "success": False, "error": response.text})
except Exception as e:
print(f"Exception uploading document {document_name}: {str(e)}")
results.append({"id": point_id, "success": False, "error": str(e)})
# Add a small delay to prevent rate limiting
time.sleep(0.1)
return results
# Main migration function
def migrate_qdrant_to_rememberizer():
offset = None
total_migrated = 0
print("Starting migration from Qdrant to Rememberizer...")
while True:
points, next_offset = fetch_points_from_qdrant(
QDRANT_COLLECTION_NAME,
BATCH_SIZE,
offset
)
if not points:
break
print(f"Fetched {len(points)} points from Qdrant")
results = upload_to_rememberizer(points)
success_count = sum(1 for r in results if r.get("success", False))
total_migrated += success_count
print(f"Progress: {total_migrated} points migrated successfully")
if next_offset is None:
break
offset = next_offset
print(f"Migration complete! {total_migrated} total points migrated to Rememberizer")
# Run the migration
# migrate_qdrant_to_rememberizer()
Migrating from Supabase pgvector
If you're already using Supabase with pgvector, the migration to Rememberizer is particularly straightforward since both use PostgreSQL with the pgvector extension.
import psycopg2
import requests
import json
import time
import os
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
# Supabase PostgreSQL configuration
SUPABASE_DB_HOST = os.getenv("SUPABASE_DB_HOST")
SUPABASE_DB_PORT = os.getenv("SUPABASE_DB_PORT", "5432")
SUPABASE_DB_NAME = os.getenv("SUPABASE_DB_NAME")
SUPABASE_DB_USER = os.getenv("SUPABASE_DB_USER")
SUPABASE_DB_PASSWORD = os.getenv("SUPABASE_DB_PASSWORD")
SUPABASE_VECTOR_TABLE = os.getenv("SUPABASE_VECTOR_TABLE", "documents")
# Rememberizer configuration
REMEMBERIZER_API_KEY = os.getenv("REMEMBERIZER_API_KEY")
VECTOR_STORE_ID = os.getenv("VECTOR_STORE_ID") # e.g., "vs_abc123"
BASE_URL = "https://api.rememberizer.ai/api/v1"
# Batch size for processing
BATCH_SIZE = 100
# Connect to Supabase PostgreSQL
def connect_to_supabase():
try:
conn = psycopg2.connect(
host=SUPABASE_DB_HOST,
port=SUPABASE_DB_PORT,
dbname=SUPABASE_DB_NAME,
user=SUPABASE_DB_USER,
password=SUPABASE_DB_PASSWORD
)
return conn
except Exception as e:
print(f"Error connecting to Supabase PostgreSQL: {e}")
return None
# Fetch documents from Supabase pgvector
def fetch_documents_from_supabase(conn, batch_size, offset=0):
try:
cursor = conn.cursor()
# Adjust this query based on your table structure
query = f"""
SELECT id, content, metadata, embedding
FROM {SUPABASE_VECTOR_TABLE}
ORDER BY id
LIMIT %s OFFSET %s
"""
cursor.execute(query, (batch_size, offset))
documents = cursor.fetchall()
cursor.close()
return documents
except Exception as e:
print(f"Error fetching documents from Supabase: {e}")
return []
# Upload documents to Rememberizer
def upload_to_rememberizer(documents):
headers = {
"x-api-key": REMEMBERIZER_API_KEY,
"Content-Type": "application/json"
}
results = []
for doc in documents:
doc_id, content, metadata, embedding = doc
# Parse metadata if it's stored as JSON string
if isinstance(metadata, str):
try:
metadata = json.loads(metadata)
except:
metadata = {}
elif metadata is None:
metadata = {}
document_name = metadata.get("filename", f"supabase_doc_{doc_id}")
if not content:
print(f"Skipping {doc_id} - no content found")
continue
data = {
"name": document_name,
"content": content,
"metadata": metadata
}
try:
response = requests.post(
f"{BASE_URL}/vector-stores/{VECTOR_STORE_ID}/documents/text",
headers=headers,
json=data
)
if response.status_code == 201:
print(f"Document '{document_name}' uploaded successfully!")
results.append({"id": doc_id, "success": True})
else:
print(f"Error uploading document {document_name}: {response.text}")
results.append({"id": doc_id, "success": False, "error": response.text})
except Exception as e:
print(f"Exception uploading document {document_name}: {str(e)}")
results.append({"id": doc_id, "success": False, "error": str(e)})
# Add a small delay to prevent rate limiting
time.sleep(0.1)
return results
# Main migration function
def migrate_supabase_to_rememberizer():
conn = connect_to_supabase()
if not conn:
print("Failed to connect to Supabase. Aborting migration.")
return
offset = 0
total_migrated = 0
print("Starting migration from Supabase pgvector to Rememberizer...")
try:
while True:
documents = fetch_documents_from_supabase(conn, BATCH_SIZE, offset)
if not documents:
break
print(f"Fetched {len(documents)} documents from Supabase")
results = upload_to_rememberizer(documents)
success_count = sum(1 for r in results if r.get("success", False))
total_migrated += success_count
print(f"Progress: {total_migrated} documents migrated successfully")
offset += BATCH_SIZE
finally:
conn.close()
print(f"Migration complete! {total_migrated} total documents migrated to Rememberizer")
# Run the migration
# migrate_supabase_to_rememberizer()
Migration Best Practices
Follow these recommendations for a successful migration:
Plan Ahead:
Estimate the data volume and time required for migration
Schedule migration during low-traffic periods
Increase disk space before starting large migrations
Test First:
Create a test vector store in Rememberizer
Migrate a small subset of data (100-1000 vectors)
Verify search functionality with key queries
Data Validation:
Compare document counts before and after migration
Run benchmark queries to ensure similar results
Validate that metadata is correctly preserved
Optimize for Performance:
Use batch operations for efficiency
Consider geographic colocation of source and target databases
Monitor API rate limits and adjust batch sizes accordingly
Post-Migration Steps:
Verify index creation in Rememberizer
Update application configurations to point to new vector store
Keep source database as backup until migration is verified
For detailed API reference and endpoint documentation, visit the Vector Store Documentation page.
Make sure to handle the API keys securely and follow best practices for API key management.
Last updated