Build a RAG-Powered Chat App with Spring AI and PGVector


Large Language Models (LLMs) like ChatGPT and Gemini are incredibly powerful, but their knowledge is typically limited to public data.

What if you need an AI to answer questions about your company’s internal documents or a specific product’s confidential data? The solution is Retrieval-Augmented Generation, or RAG.

This powerful technique allows you to provide an AI with a private repository of information to use as a reference, enabling it to generate accurate, context-specific answers.

In this tutorial, we’ll walk through building a complete RAG pipeline using Spring Boot. You’ll learn how to create a Java application that takes a user’s question, retrieves relevant information from your private data source, and generates a precise, AI-powered response. We’ll cover everything from vector embeddings to deploying the final application.

If you just want to see the code, you can find it on GitHub.

Understanding the Core Concepts

Before we start coding, let’s break down the two key concepts that make this all possible: Vector Embeddings and RAG.

How do Vector Embeddings Work?

Have you ever wondered how AI models understand the meaning and relationships behind words? The magic is in vector embeddings, which are numerical representations of data like text. You can think of them as a special translator that converts a word, sentence, or even a whole document into a list of numbers called a vector.

What’s powerful is that these numbers are meaningful. Words or phrases with similar meanings will have vectors that are numerically close to each other. Imagine plotting these vectors on a simple 2D graph. The vectors for “dog” and “bark” would be clustered together, just as “cat” and “meow” would be. The distance between any two vectors indicates how closely related they are.

A conceptual diagram showing related words clustered together in a 2D vector space.

In a real application, these vectors have hundreds or even thousands of dimensions, but the principle remains the same. To find information relevant to a user’s query, we can convert the query to a vector and then search for the text vectors in our database that are closest to it.

What is Retrieval-Augmented Generation (RAG)?

RAG is a framework that enhances the responses of LLMs by connecting them to external knowledge bases. Instead of just relying on its training data, the model can query your private data source to find relevant information before generating an answer.

Here’s a simplified overview of the RAG flow:

  1. Retrieve: When a user asks a question, the system first searches a knowledge base (in our case, a vector database) for relevant documents.
  2. Augment: The content of these retrieved documents is then added to the user’s original question, creating an “augmented” prompt.
  3. Generate: This augmented prompt is sent to the LLM, which uses the provided context to generate a detailed and accurate answer.

This approach ensures the AI’s responses are not just generic but are grounded in the specific, proprietary information you provide.

The Overall Architecture

Our RAG system will consist of several key components working together. The following diagram illustrates the entire workflow from receiving a user’s query to returning an AI-generated response.

architecture digram

Setting Up the Backend Infrastructure

With the concepts clear, let’s prepare the necessary tools and database for our application.

Step 1: Running a Local Embedding Model

To create vector embeddings, you need a specialized AI model. While paid APIs are an option, you can easily run powerful open-source models on your local machine. A great tool for this is LM Studio.

  1. Download and install LM Studio.
  2. Use the search bar to find and download an embedding model. For this demo, we’ll use the Nomic Embed Text model.
  3. Navigate to the Local Server tab.
  4. Select the downloaded model and click Start Server.

This creates a local API endpoint that is compatible with the OpenAI API, which makes it incredibly easy to integrate with libraries like Spring AI.

Step 2: Preparing the PostgreSQL Vector Store

Next, we need a place to store our text and its corresponding vector embeddings. We’ll use PostgreSQL with the pgvector extension, which adds powerful vector similarity search capabilities.

If you’re on a Mac, Postgres.app is a great option as it comes with pgvector pre-installed. Otherwise, follow the official installation guides for PostgreSQL and pgvector.

Once Postgres is running, use the psql command line to create your database and table:

  1. Create the database:

    CREATE DATABASE rag_demo;
  2. Connect to your new database:

    \c rag_demo
  3. Enable extensions and create the table:

    -- Enable required extensions
    CREATE EXTENSION IF NOT EXISTS vector;
    CREATE EXTENSION IF NOT EXISTS hstore;
    CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
    
    -- Create the table to store our documents and embeddings
    CREATE TABLE IF NOT EXISTS vector_store (
        id uuid DEFAULT uuid_generate_v4() PRIMARY KEY,
        content text,
        metadata json,
        embedding vector(768)
    );

    The embedding column’s dimension (768) must match the output dimension of our Nomic Embed Text model.

  4. Create an index for efficient search:

    CREATE INDEX ON vector_store USING HNSW (embedding vector_cosine_ops);

    HNSW (Hierarchical Navigable Small Worlds) is an indexing algorithm that significantly speeds up nearest-neighbor searches in high-dimensional spaces.

Building the Spring Boot Application

Now it’s time to write the code. We’ll use the Spring Initializr to create a new project.

Step 3: Project Setup and Dependencies

Head over to start.spring.io and create a new Gradle project with the following dependencies:

  • Spring Web: To build our REST controller.
  • Spring AI OpenAI Starter: To connect to our local embedding model (via its OpenAI-compatible API) and our online chat model.
  • Spring AI PGVector Store Starter: To connect to our PostgreSQL vector database.

spring ai initialization

Here is the build.gradle file with the necessary dependencies:

plugins {
	id 'java'
	id 'org.springframework.boot' version '3.2.7' 
	id 'io.spring.dependency-management' version '1.1.7'
}

group = 'com.sohamkamani'
version = '0.0.1-SNAPSHOT'

java {
	toolchain {
		languageVersion = JavaLanguageVersion.of(21)
	}
}

repositories {
	mavenCentral()
	maven { url 'https://repo.spring.io/milestone' }
	maven { url 'https://repo.spring.io/snapshot' }
	maven {
		name = 'Central Portal Snapshots'
		url = 'https://central.sonatype.com/repository/maven-snapshots/'
	}
}

ext {
	set('springAiVersion', "1.0.0")
}

dependencies {
  implementation 'org.springframework.boot:spring-boot-starter-web'
  implementation 'org.springframework.ai:spring-ai-advisors-vector-store'
  implementation 'org.springframework.ai:spring-ai-starter-model-openai'
  implementation 'org.springframework.ai:spring-ai-starter-vector-store-pgvector'
  testImplementation 'org.springframework.boot:spring-boot-starter-test'
  testRuntimeOnly 'org.junit.platform:junit-platform-launcher'
}

dependencyManagement {
	imports {
		mavenBom "org.springframework.ai:spring-ai-bom:${springAiVersion}"
	}
}

tasks.named('test') {
	useJUnitPlatform()
}

Step 4: Loading Data into the Vector Store (DocumentLoader)

To populate our database with proprietary information, we’ll create a DocumentLoader component. It implements CommandLineRunner, so its run method executes once on application startup.

For this example, we’ll feed our RAG system information about a fictional database called “StarlightDB”.

package com.sohamkamani.spring_rag_demo.rag;

import java.util.List;
import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.boot.CommandLineRunner;
import org.springframework.stereotype.Component;

@Component
public class DocumentLoader implements CommandLineRunner {

    private final VectorStore vectorStore;

    public DocumentLoader(VectorStore vectorStore) {
        this.vectorStore = vectorStore;
    }

    @Override
    public void run(String... args) {
        List<Document> documents = List.of(
            new Document("StarlightDB is a serverless graph database designed for real-time analytics on complex, interconnected data."),
            new Document("The core of StarlightDB is its 'Quantum-Leap' query engine, which uses speculative execution to deliver query results up to 100x faster than traditional graph databases."),
            new Document("StarlightDB features 'Chrono-Sync' for effortless time-travel queries, allowing developers to query the state of their graph at any point in the past."),
            new Document("StarlightDB includes a built-in visualization tool called 'Nebula' that renders interactive 3D graphs directly within the development environment for easier analysis."),
            new Document("Security in StarlightDB is handled by 'Cosmic Shield', which provides end-to-end encryption and fine-grained access control at the node and edge level.")
        );
        vectorStore.add(documents);
        System.out.println("Documents loaded into VectorStore.");
    }
}

When vectorStore.add() is called, Spring AI automatically sends our text to the configured embedding model, gets the vectors back, and stores both the original text and the embeddings in our pgvector table.

Step 5: Implementing the RAG Service (RagService)

The RagService contains the core retrieval and generation logic.

package com.sohamkamani.spring_rag_demo.rag;

import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.messages.UserMessage;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.chat.prompt.SystemPromptTemplate;
import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.core.io.Resource;
import org.springframework.stereotype.Service;

@Service
public class RagService {

    private final ChatClient chatClient;
    private final VectorStore vectorStore;

    @Value("classpath:/prompts/rag-prompt.st")
    private Resource ragPromptTemplate;

    public RagService(ChatClient.Builder chatClientBuilder, VectorStore vectorStore) {
        this.chatClient = chatClientBuilder.build();
        this.vectorStore = vectorStore;
    }

    public String retrieveAndGenerate(String message) {
        // 1. Retrieve similar documents
        List<Document> similarDocuments = vectorStore.similaritySearch(SearchRequest.builder().query(message).topK(4).build());
        String information = similarDocuments.stream()
                .map(Document::getText)
                .collect(Collectors.joining("\n"));

        // 2. Augment the prompt
        SystemPromptTemplate systemPromptTemplate = new SystemPromptTemplate(ragPromptTemplate);
        Prompt prompt = new Prompt(List.of(
                systemPromptTemplate.createMessage(Map.of("information", information)),
                new UserMessage(message)));
        
        // 3. Generate the response
        return chatClient.prompt(prompt).call().content();
    }
}

Step 6: Creating the API Endpoint (RagController)

Finally, the RagController exposes a simple REST endpoint to interact with our service.

package com.sohamkamani.spring_rag_demo.rag;

import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class RagController {

    private final RagService ragService;

    public RagController(RagService ragService) {
        this.ragService = ragService;
    }

    @PostMapping("/ai/rag")
    public String generate(@RequestBody MessageRequest request) {
        return ragService.retrieveAndGenerate(request.message());
    }

    public static record MessageRequest(String message) {
    }
}

Step 7: Configuring the Application

All the components are tied together in the application.properties file.

# src/main/resources/application.properties

spring.application.name=spring-rag-demo

# PostgreSQL Configuration
spring.datasource.url=jdbc:postgresql://localhost:5432/rag_demo
spring.datasource.username=soham
# spring.datasource.password=password
spring.datasource.driver-class-name=org.postgresql.Driver

# Spring AI OpenAI Configuration for LOCAL EMBEDDING MODEL
spring.ai.openai.base-url=http://127.0.0.1:1234
spring.ai.openai.api-key=""
spring.ai.openai.embedding.options.model=text-embedding-nomic-embed-text-v2-moe
spring.ai.openai.embedding.options.dimensions=768

# Spring AI OpenAI Configuration for ONLINE CHAT MODEL (via OpenRouter)
spring.ai.openai.chat.base-url=https://openrouter.ai/api
spring.ai.openai.chat.api-key=${OPENROUTER_API_KEY}
spring.ai.openai.chat.options.model=google/gemini-2.5-flash

# Spring AI PgVector Configuration
spring.ai.vectorstore.pgvector.index-type=HNSW
spring.ai.vectorstore.pgvector.distance-type=COSINE_DISTANCE
spring.ai.vectorstore.pgvector.dimensions=768

We also need a prompt template file at src/main/resources/prompts/rag-prompt.st to guide the LLM.

You are a helpful assistant. Use the following information to answer the question in detail. Please use a friendly and professional tone. Please acknowledge the question and relate the answer back to it. 
If the answer is not in the provided information, say "I don't know."

Information:
{information}

Answer:

Screenshot of application.properties with all the settings configured.

Running and Testing the RAG Pipeline

Let’s see our application in action!

  1. Start the application:

    OPENROUTER_API_KEY=YOUR_OPENROUTER_API_KEY ./gradlew bootRun

    You should see the log Documents loaded into VectorStore. on startup.

  2. Test the endpoint: Open a new terminal and use curl to send a question to the /ai/rag endpoint.

    Question: “is starlight secure?”

    curl -X POST -H "Content-Type: application/json" http://localhost:8080/ai/rag -d '{"message": "is starlight secure?"}'

    Response:

    "Yes, StarlightDB is secure!\n\nYou asked if StarlightDB is secure, and I'm happy to tell you that it is. Security in StarlightDB is managed by a feature called 'Cosmic Shield'. This provides comprehensive protection through end-to-end encryption and also offers fine-grained access control. This control is applied at both the node and edge levels, ensuring robust security for your data."

    The model correctly answered using the imaginary information about “Cosmic Shield” that only exists in our private vector store!

Production Considerations

While this example works well, here are a few things to consider for a production environment:

  • Document Chunking: Our example uses small text snippets. For large documents, similarity search works better if you break them down into smaller, logical chunks. Each chunk can then be stored as a separate entry in your vector store.
  • Embedding Quality: The performance of your RAG system heavily depends on the quality of your embeddings. The Nomic model is a great start, but you might want to experiment with other models, including paid APIs like OpenAI’s Ada or Google’s Gemini embeddings, to see what works best for the nuances of your specific data.

Like what I write? Let me know your email, and I'll send you more posts like this. No spam, I promise!