# LangChain integration

Rememberizer integrates with LangChain through the `RememberizerRetriever` class, allowing you to easily incorporate Rememberizer's semantic search capabilities into your LangChain-powered applications. This guide explains how to set up and use this integration to build advanced LLM applications with access to your knowledge base.

## Introduction

LangChain is a popular framework for building applications with large language models (LLMs). By integrating Rememberizer with LangChain, you can:

* Use your Rememberizer knowledge base in RAG (Retrieval Augmented Generation) applications
* Create chatbots with access to your documents and data
* Build question-answering systems that leverage your knowledge
* Develop agents that can search and reason over your information

The integration is available in the `langchain_community.retrievers` module.

{% embed url="<https://python.langchain.com/docs/integrations/retrievers/rememberizer/>" %}

## Getting Started

### Prerequisites

Before you begin, you need:

1. A Rememberizer account with Common Knowledge created
2. An API key for accessing your Common Knowledge
3. Python environment with LangChain installed

For detailed instructions on creating Common Knowledge and generating an API key, see [Registering and Using API Keys](https://docs.rememberizer.ai/developer/registering-and-using-api-keys).

### Installation

Install the required packages:

```bash
pip install langchain langchain_community
```

If you plan to use OpenAI models (as shown in examples below):

```bash
pip install langchain_openai
```

### Authentication Setup

There are two ways to authenticate the `RememberizerRetriever`:

1. **Environment Variable**: Set the `REMEMBERIZER_API_KEY` environment variable

   ```python
   import os
   os.environ["REMEMBERIZER_API_KEY"] = "rem_ck_your_api_key"
   ```
2. **Direct Parameter**: Pass the API key directly when initializing the retriever

   ```python
   retriever = RememberizerRetriever(rememberizer_api_key="rem_ck_your_api_key")
   ```

## Configuration Options

The `RememberizerRetriever` class accepts these parameters:

| Parameter              | Type | Default | Description                                                          |
| ---------------------- | ---- | ------- | -------------------------------------------------------------------- |
| `top_k_results`        | int  | 10      | Number of documents to return from search                            |
| `rememberizer_api_key` | str  | None    | API key for authentication (optional if set as environment variable) |

Behind the scenes, the retriever makes API calls to Rememberizer's search endpoint with additional configurable parameters:

| Advanced Parameter    | Description                                                       |
| --------------------- | ----------------------------------------------------------------- |
| `prev_chunks`         | Number of chunks before the matched chunk to include (default: 2) |
| `next_chunks`         | Number of chunks after the matched chunk to include (default: 2)  |
| `return_full_content` | Whether to return full document content (default: true)           |

## Basic Usage

Here's a simple example of retrieving documents from Rememberizer using LangChain:

```python
import os
from langchain_community.retrievers import RememberizerRetriever

# Set your API key
os.environ["REMEMBERIZER_API_KEY"] = "rem_ck_your_api_key"

# Initialize the retriever
retriever = RememberizerRetriever(top_k_results=5)

# Get relevant documents for a query
docs = retriever.get_relevant_documents(query="How do vector embeddings work?")

# Display the first document
if docs:
    print(f"Document: {docs[0].metadata['name']}")
    print(f"Content: {docs[0].page_content[:200]}...")
```

### Understanding Document Structure

Each document returned by the retriever has:

* `page_content`: The text content of the matched document chunk
* `metadata`: Additional information about the document

Example of metadata structure:

```python
{
  'id': 13646493,
  'document_id': '17s3LlMbpkTk0ikvGwV0iLMCj-MNubIaP',
  'name': 'What is a large language model (LLM)_ _ Cloudflare.pdf',
  'type': 'application/pdf',
  'path': '/langchain/What is a large language model (LLM)_ _ Cloudflare.pdf',
  'url': 'https://drive.google.com/file/d/17s3LlMbpkTk0ikvGwV0iLMCj-MNubIaP/view',
  'size': 337089,
  'created_time': '',
  'modified_time': '',
  'indexed_on': '2024-04-04T03:36:28.886170Z',
  'integration': {'id': 347, 'integration_type': 'google_drive'}
}
```

## Advanced Examples

### Building a RAG Question-Answering System

This example creates a question-answering system that retrieves information from Rememberizer and uses GPT-3.5 to formulate answers:

```python
import os
from langchain_community.retrievers import RememberizerRetriever
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI

# Set up API keys
os.environ["REMEMBERIZER_API_KEY"] = "rem_ck_your_api_key"
os.environ["OPENAI_API_KEY"] = "your_openai_api_key"

# Initialize the retriever and language model
retriever = RememberizerRetriever(top_k_results=5)
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

# Create a retrieval QA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",  # Simplest method - just stuff all documents into the prompt
    retriever=retriever,
    return_source_documents=True
)

# Ask a question
response = qa_chain.invoke({"query": "What is RAG in the context of AI?"})

# Print the answer
print(f"Answer: {response['result']}")
print("\nSources:")
for idx, doc in enumerate(response['source_documents']):
    print(f"{idx+1}. {doc.metadata['name']}")
```

### Building a Conversational Agent with Memory

This example creates a conversational agent that can maintain conversation history:

```python
import os
from langchain_community.retrievers import RememberizerRetriever
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
from langchain_openai import ChatOpenAI

# Set up API keys
os.environ["REMEMBERIZER_API_KEY"] = "rem_ck_your_api_key"
os.environ["OPENAI_API_KEY"] = "your_openai_api_key"

# Initialize components
retriever = RememberizerRetriever(top_k_results=5)
llm = ChatOpenAI(model_name="gpt-3.5-turbo")
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)

# Create the conversational chain
conversation = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=retriever,
    memory=memory
)

# Example conversation
questions = [
    "What is RAG?",
    "How do large language models use it?",
    "What are the limitations of this approach?",
]

for question in questions:
    response = conversation.invoke({"question": question})
    print(f"Question: {question}")
    print(f"Answer: {response['answer']}\n")
```

## Best Practices

### Optimizing Retrieval Performance

1. **Be specific with queries**: More specific queries usually yield better results
2. **Adjust `top_k_results`**: Start with 3-5 results and adjust based on application needs
3. **Use context windows**: The retriever automatically includes context around matched chunks

### Security Considerations

1. **Protect your API key**: Store it securely using environment variables or secret management tools
2. **Create dedicated keys**: Create separate API keys for different applications
3. **Rotate keys regularly**: Periodically generate new keys and phase out old ones

### Integration Patterns

1. **Pre-retrieval processing**: Consider preprocessing user queries to improve search relevance
2. **Post-retrieval filtering**: Filter or rank retrieved documents before passing to the LLM
3. **Hybrid search**: Combine Rememberizer with other retrievers using `EnsembleRetriever`

```python
from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import RememberizerRetriever, WebResearchRetriever

# Create retrievers
rememberizer_retriever = RememberizerRetriever(top_k_results=3)
web_retriever = WebResearchRetriever(...)  # Configure another retriever

# Create an ensemble with weighted score
ensemble_retriever = EnsembleRetriever(
    retrievers=[rememberizer_retriever, web_retriever],
    weights=[0.7, 0.3]  # Rememberizer results have higher weight
)
```

## Troubleshooting

### Common Issues

1. **Authentication errors**: Verify your API key is correct and properly configured
2. **No results returned**: Ensure your Common Knowledge contains relevant information
3. **Rate limiting**: Be mindful of API rate limits for high-volume applications

### Debug Tips

* Set the LangChain debug mode to see detailed API calls:

  ```python
  import langchain
  langchain.debug = True
  ```
* Examine raw search results before passing to LLM to identify retrieval issues

## Related Resources

* LangChain [Retriever conceptual guide](https://python.langchain.com/docs/concepts/#retrievers)
* LangChain [Retriever how-to guides](https://python.langchain.com/docs/how_to/#retrievers)
* Rememberizer [API Documentation](https://docs.rememberizer.ai/developer/api-docs/)
* [Vector Stores](https://docs.rememberizer.ai/developer/vector-stores) in Rememberizer
* [Creating a Rememberizer GPT](https://docs.rememberizer.ai/developer-resources/integration-options/creating-a-rememberizer-gpt) - An alternative approach for AI integration
