AI & AutomationApril 14, 20266 min read

How to Build an AI Chatbot for Your Website with LangChain and Next.js

Why Build Your Own Chatbot?

Third-party chat widgets are fine for simple FAQ bots. But if you want an AI that knows your product, answers questions from your docs, and hands off to a human when needed, you need to build it yourself. The stack I use on every project: LangChain for orchestration, OpenAI for the language model, Pinecone or pgvector for retrieval, and Next.js for the frontend.

Architecture Overview

The chatbot has three components: a vector store with your knowledge base, a LangChain chain that retrieves relevant context and generates responses, and a streaming API route in Next.js that pushes tokens to the client as they are generated. This gives users a fast, responsive experience without waiting for the full response.

Setting Up the RAG Pipeline

RAG (Retrieval Augmented Generation) is the pattern that makes chatbots actually useful. Instead of relying on the LLM's training data, you feed it relevant chunks from your own documents. Here is the flow: chunk your documents into 500-token segments, embed them with OpenAI's embedding model, store the vectors in Pinecone or pgvector, and at query time retrieve the top-k most similar chunks to include in the prompt.

// LangChain retrieval chain setup
const vectorStore = await PineconeStore.fromExistingIndex(embeddings, { pineconeIndex });
const retriever = vectorStore.asRetriever({ k: 4 });
const chain = ConversationalRetrievalQAChain.fromLLM(llm, retriever, {
  returnSourceDocuments: true,
  memory: new BufferMemory({ memoryKey: "chat_history", returnMessages: true }),
});

Streaming Responses in Next.js

Next.js App Router supports streaming natively via the ReadableStream API. Create a route handler that returns a stream, and on the client use the Vercel AI SDK's useChat hook for a clean developer experience. The user sees tokens appearing in real time, which dramatically improves perceived performance.

Conversation Memory

LangChain's BufferMemory stores conversation history in memory, which works for single sessions. For production, use PostgresChatMessageHistory or RedisChatMessageHistory so conversations persist across page reloads and can be retrieved per user.

Deploying to Production

Deploy the Next.js app to Vercel. Store your OpenAI and Pinecone API keys as environment variables. Set a rate limiter on the API route to prevent abuse — I use Upstash Redis with a sliding window of 20 requests per minute per IP. Add a fallback to a contact form if the API is unavailable.

An AI chatbot built this way converts significantly better than a generic contact form. Reach out if you want one built for your product.

AILangChainNext.jsOpenAI