RAG Chat with Gemini

2025

5-minute read

Python

Streamlit

LangChain

Google AI

Docker

Try it yourself

Rag chat

I built a comprehensive RAG (Retrieval-Augmented Generation) chat system using Google Gemini and LangChain. The application combines document retrieval with AI generation, allowing users to chat with their PDFs, analyze images, and have intelligent conversations powered by Google&aposs latest Gemini 1.5 Flash model.

The core of the system revolves around vector embeddings that transform documents into searchable representations. When you upload a PDF, the system chunks the text and creates embeddings using Google&aposs embedding model:

def create_vector_store(embeddings, documents):
    if isinstance(documents[0], str):
        docs = [Document(page_content=doc) for doc in documents if doc.strip()]
    else:
        docs = documents
    
    vectorstore = DocArrayInMemorySearch.from_documents(
        docs,
        embedding=embeddings
    )
    
    return vectorstore

DocArrayInMemorySearch creates a vector database that enables semantic search. When you ask a question, the system finds the most relevant document chunks and passes them to Gemini for context-aware responses.

Multi-Modal Architecture

The application supports three distinct chat modes, each optimized for different use cases. I implemented a flexible architecture using Streamlit&aposs session state management and LangChain&aposs modular components to handle various input types seamlessly.

Now, there are three main features to focus on:

1. RAG Chat Mode: This is where the magic happens. The system processes your documents, creates embeddings, and uses a retrieval chain to find relevant context before generating responses. Each answer includes source attribution so you know exactly which documents informed the AI&aposs response.

qa_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=retriever,
    memory=st.session_state.memory,
    return_source_documents=True,
    verbose=True
)

response = qa_chain({"question": prompt})
answer = response["answer"]
source_docs = response.get("source_documents", [])

2. Image Analysis with Gemini Vision: The system can process multiple image formats and analyze them using Gemini&aposs vision capabilities. Images are converted to base64 and sent to the model along with your questions, enabling visual understanding and detailed analysis.

def process_image_with_gemini(image_file, prompt, vision_llm):
    image = Image.open(image_file)
    
    buffered = io.BytesIO()
    image.save(buffered, format="PNG")
    img_base64 = base64.b64encode(buffered.getvalue()).decode()
    
    message = HumanMessage(
        content=[
            {"type": "text", "text": prompt},
            {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{img_base64}"}}
        ]
    )
    
    response = vision_llm.invoke([message])
    return response.content

3. Intelligent PDF Processing: Large documents are automatically chunked using recursive text splitting to maintain context while staying within token limits. The system handles multiple file formats and preserves document structure for better retrieval accuracy.

def chunk_text(text, chunk_size=1000, chunk_overlap=200):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap,
        length_function=len,
    )
    return text_splitter.split_text(text)

The conversation memory ensures context is maintained across multiple turns, while the modular design allows easy switching between chat modes. The entire system is built with privacy considerations, using temporary file handling and session-based state management.

You can find the full project on my github: RAG Chat with Gemini