15 Days to LangChain Mastery: Build Production‑Ready AI Apps Like a Pro
Master LangChain in 15 Days 🚀
What if you could go from zero to LangChain expert in just 15 days?
Most developers only scratch the surface of LLM APIs—but the real breakthroughs happen when you learn to orchestrate memory, RAG, agents, and deployment pipelines like a pro. In this video, I’m breaking down a proven 15‑day LangChain Expert Engineer roadmap that top AI teams are already using to ship production‑ready intelligent apps. If you skip this, you’ll keep building toy projects while others build the future.
🔧 Learn the full LangChain ecosystem: LangChain Core, LangGraph, LangSmith, LangServe, and more
🧠 RAG pipelines, memory management, and agent design—explained with real-world patterns
⚡ Advanced workflows: Multi‑query retrieval, contextual compression, and ReAct loops
🚀 Production deployment: Turn your LangChain app into scalable APIs
✅ Debug & monitor like a pro: LangSmith insights, evaluation loops, and best practices
Whether you’re a machine learning engineer, startup founder, or AI enthusiast hungry for a career‑boosting skill, this curriculum takes you from concept to production. Don’t miss your chance to join the ranks of true LangChain experts.
The 15-Day LangChain Expert Engineer Curriculum: From Beginner to Production-Grade Proficiency
Introduction: The Modern LLM Application Stack
The advent of powerful Large Language Models (LLMs) has marked a paradigm shift in software development. While a direct API call to a model like GPT-4 or Gemini can produce impressive results, transitioning from a simple proof-of-concept to a robust, production-grade application reveals a host of engineering challenges. A production application requires more than just text generation; it necessitates a structured way to manage prompts, connect to external data sources, maintain conversational state (memory), and enable the LLM to interact with other systems and APIs through tools. LangChain emerged in October 2022 to address precisely this need, providing a comprehensive open-source framework for orchestrating the complex workflows inherent in LLM-powered applications. Its rapid adoption and vibrant community underscore its critical role in the modern AI development landscape.
This report provides an intensive, 15-day curriculum designed for a Machine Learning Engineer to achieve expert-level proficiency with the LangChain framework. The syllabus progresses from foundational components to the construction of intelligent, stateful agents and culminates in productionization, evaluation, and advanced architectural patterns.
The LangChain Ecosystem
To achieve mastery, it is essential to understand that LangChain is not a monolithic library but a suite of interoperable products designed to support the entire application lifecycle. An expert engineer leverages the entire stack to build, debug, and deploy reliable applications.
LangChain: This is the core framework providing the components for what can be described as an application's "cognitive architecture". It includes a vast library of integrations with hundreds of model providers, databases, and tools, alongside high-level abstractions for composing these components into chains and agents.
LangGraph: For applications that demand more than a linear sequence of operations, LangGraph provides a powerful framework for building stateful, multi-actor systems. By modeling workflows as graphs with nodes and edges, it naturally supports complex control flows like cycles, branching, and human-in-the-loop interventions, which are difficult to implement with simple chains.
LangSmith: An indispensable platform for productionization, LangSmith offers critical observability into the inner workings of LLM applications. It provides detailed tracing, monitoring, and evaluation capabilities that allow developers to debug poor performance, test new versions, and continuously improve their applications with confidence. Notably, LangSmith is framework-agnostic and can be used to trace and evaluate any LLM application, regardless of whether it is built with LangChain.
LangServe & LangGraph Platform: These are the deployment solutions within the ecosystem. LangServe facilitates the deployment of LangChain chains as production-ready REST APIs, while the LangGraph Platform is specifically designed to deploy and manage the stateful, long-running workflows built with LangGraph.
Architectural Overview
LangChain's power and flexibility stem from its intentionally modular architecture. This design separates core abstractions from specific implementations, promoting maintainability and allowing developers to select only the components they need. Understanding this structure is the first step toward using the framework effectively.
langchain-core
: This is the foundational package. It contains the base abstractions for all key components, such asBaseChatModel
,BaseRetriever
, andBaseLoader
, as well as the LangChain Expression Language (LCEL), the modern syntax for composing components. A key design principle is its lightweight nature; it has minimal dependencies and contains no third-party integrations.langchain
: This package builds uponlangchain-core
and contains the generic, high-level implementations of chains, agents, and retrieval strategies. These components form the cognitive architecture of an application and are designed to be independent of any specific integration.Integration Packages &
langchain-community
: The vast ecosystem of third-party integrations is what connects LangChain to the outside world. To keep the core libraries lightweight and to manage versioning effectively, popular integrations (e.g.,langchain-openai
,langchain-anthropic
) are maintained in their own packages. Thelangchain-community
package serves as a repository for a wider range of community-maintained integrations, with all dependencies being optional to minimize installation footprint.
The following table provides a clear reference for these architectural components, demystifying their roles and dependencies.
Part I: The Foundational Components (Days 1-5)
This first week focuses on mastering the fundamental building blocks of any LangChain application. By the end of this section, the engineer will be able to construct and debug a complete, data-aware question-answering application.
Day 1: Models, Prompts, and First Invocations
Conceptual Learning
The core of any LLM application is the interaction between a well-crafted prompt and a language model. This first day establishes a solid foundation in managing this interaction using LangChain's abstractions.
Models as the Core Engine: LangChain provides a standardized interface for two primary types of language models. The first,
BaseLLM
, represents older models that operate on a simple string-in, string-out basis. The modern and more common type is theBaseChatModel
, which is designed for conversational interactions and uses a sequence of messages (each with an assigned role) as both input and output. For nearly all modern use cases,ChatModels
are the standard. The value of this abstraction lies in model interoperability. An application built against theBaseChatModel
interface can swap the underlying model provider—for instance, from OpenAI to Google—with a single line of code change, future-proofing the application against a rapidly evolving model landscape.Instantiating Models: To use a model, one must first instantiate it from its corresponding integration package (e.g.,
langchain_openai
orlangchain_google_genai
). This typically involves setting an environment variable with the provider's API key and then calling the model class. For example:from langchain_openai import ChatOpenAI; model = ChatOpenAI(model="gpt-4o")
.The Power of Prompt Engineering: Prompts are the primary mechanism for instructing and guiding an LLM's behavior. While simple f-strings can work for basic tasks, building scalable and maintainable applications requires a more structured approach.
PromptTemplates
in LangChain provide this structure, formalizing the process of prompt construction and making them reusable components.LangChain Prompt Templates: LangChain offers several template classes to handle different scenarios.
PromptTemplate
: The most basic type, used for creating a single formatted string from input variables. It is suitable for completion-style models.ChatPromptTemplate
: The standard forChatModels
. It is constructed from a list of message templates, each with a specific role (e.g.,system
,human
,ai
). This allows for the creation of complex, multi-turn conversational prompts.MessagesPlaceholder
: A crucial component for building chatbots. It acts as a variable within aChatPromptTemplate
that can be dynamically populated with a list of previous conversation messages, enabling the model to have context of the ongoing dialogue.
Hands-On Lab
Setup: Obtain API keys from at least two different LLM providers (e.g., OpenAI, Google). Set them as environment variables.
Installation: Install the necessary packages:
pip install langchain langchain-openai langchain-google-genai python-dotenv
.Model Instantiation: In a Python script or Jupyter notebook, import and instantiate both a
ChatOpenAI
and aChatGoogleGenerativeAI
model.Prompt Creation: Create a
ChatPromptTemplate
. The template should have a system message that sets a persona (e.g., "You are a helpful translation assistant.") and a user message that takes two input variables:input_text
andtarget_language
. The user message should be something like "Translate the following text to {target_language}: {input_text}".Invocation: Use the
.invoke()
method on the prompt template, passing a dictionary with values for the input variables (e.g.,{"input_text": "Hello, world!", "target_language": "French"}
). This will return aPromptValue
.Model Call: Pass the resulting
PromptValue
to the.invoke()
method of one of the instantiated models. Print thecontent
of the returnedAIMessage
.LangSmith Integration: Set the
LANGCHAIN_TRACING_V2
environment variable to"true"
and provide yourLANGCHAIN_API_KEY
. Rerun the script and navigate to the LangSmith dashboard to view the trace of your first run. This demonstrates the immediate observability gained from the platform.
Day 2: Composing with LangChain Expression Language (LCEL)
Conceptual Learning
LangChain Expression Language (LCEL) is the modern, declarative way to compose components into chains. It makes the flow of data explicit and unlocks powerful, production-ready features.
The "Why" of LCEL: In earlier versions of LangChain, chains were built by subclassing and using classes like
LLMChain
. This approach was often criticized for being too "magical" and for hiding the underlying logic, making debugging difficult. LCEL was introduced as a direct response, providing a transparent and composable syntax that makes the data flow explicit.The Runnable Protocol: The foundation of LCEL is the
Runnable
protocol. Every core component in LangChain—prompts, models, parsers, retrievers—implements this standard interface. This protocol guarantees that each component has a consistent set of invocation methods:.invoke()
for single inputs,.batch()
for multiple inputs in parallel,.stream()
for streaming output, and asynchronous counterparts for each (.ainvoke()
,.abatch()
,.astream()
). This consistency is what makes them seamlessly composable.Sequential Composition with the Pipe Operator (
|
): The pipe operator is the primary syntax for building aRunnableSequence
, which chains components together sequentially. The expressionprompt | model | parser
creates a chain where the output of the prompt is piped as input to the model, and the model's output is piped to the parser. This is not just syntactic sugar; it leverages a pre-built, optimized execution layer. A developer could manually write a Python function to perform these steps, but they would then have to implement their own logic for streaming, batching, and parallelism. By using LCEL, they get these production-grade features for free, as theRunnable
protocol enables the LangChain runtime to handle this complex boilerplate code automatically.Parallel Execution with
RunnableParallel
: For more complex data flows,RunnableParallel
allows for the concurrent execution of multipleRunnables
. It is typically used as a dictionary where each key is mapped to aRunnable
. When invoked, it runs allRunnables
in parallel with the same input and returns a dictionary of their outputs. This is essential for patterns like RAG, where both the original question and the retrieved documents need to be passed to the next step.RunnablePassthrough
andRunnableLambda
: These are utilityRunnables
for fine-grained control over the data flow.RunnablePassthrough
simply passes its input through unchanged, which is useful in aRunnableParallel
map to preserve an original input value.RunnableLambda
allows any arbitrary Python function to be wrapped as aRunnable
component, making it easy to integrate custom logic into an LCEL chain.
Hands-On Lab
Refactor Day 1 Lab: Take the translation application from Day 1 and rewrite it as a single LCEL chain using the pipe operator:
chain = prompt_template | model | output_parser
.Introduce an Output Parser: Import and add a
StrOutputParser
fromlangchain_core.output_parsers
to the end of the chain. This will automatically convert theAIMessage
object from the chat model into a simple string.Build a Multi-Step Chain: Create a new, more complex chain with the following logic:
Input: A
topic
and alanguage
.Step 1: Generate a joke about the
topic
.Step 2: Translate the resulting joke into the specified
language
.
Architect with
RunnableParallel
: This requires a more sophisticated chain. The first part of the chain will generate the joke. The second part needs both the joke and the originallanguage
. This is a perfect use case forRunnableParallel
.Python
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
# A prompt to generate a joke
joke_prompt = ChatPromptTemplate.from_template("Tell me a short joke about {topic}")
joke_chain = joke_prompt | model | StrOutputParser()
# A prompt to translate
translate_prompt = ChatPromptTemplate.from_template("Translate this joke to {language}: {joke}")
# The full chain
full_chain = {
"joke": joke_chain,
"language": RunnablePassthrough() # Passes the original 'language' input through
} | translate_prompt | model | StrOutputParser()
# Invoke with a dictionary containing both 'topic' and 'language'
full_chain.invoke({"topic": "bears", "language": "Spanish"})
Debug with LangSmith: Run the chain and inspect the trace in LangSmith. The visualization will clearly show the parallel step and how the outputs were combined for the final prompt, reinforcing the understanding of the data flow.
Day 3: The RAG Pipeline - Data Ingestion & Preparation
Conceptual Learning
Retrieval-Augmented Generation (RAG) is arguably the most important pattern in applied LLM development. It grounds the model in factual, external data, mitigating hallucinations and allowing it to answer questions about information it was not trained on. This day begins the process of building a RAG pipeline by focusing on the crucial first stage: data ingestion.
Document Loaders: The RAG process begins with loading data from a source. LangChain provides a vast collection of
DocumentLoaders
, each designed for a specific data source, such as web pages (WebBaseLoader
), PDFs (PyPDFLoader
), CSV files (CSVLoader
), or databases (SQLDatabase
). All loaders return data structured as a list ofDocument
objects. ADocument
is the standard unit of data in LangChain, consisting ofpage_content
(the text itself) andmetadata
(a dictionary of associated information like the source URL or file path).The Importance of Text Splitting: Once loaded, large documents must be split into smaller, more manageable chunks. This step is critical for several reasons :
Context Window Limits: LLMs have a finite context window, and passing an entire large document may exceed this limit.
Retrieval Accuracy: Searching for a specific fact within a massive chunk of text is inefficient and inaccurate. Smaller, more focused chunks lead to better retrieval results.
Cost and Latency: Sending fewer, more relevant tokens to the LLM is cheaper and faster.
Text Splitter Strategies: LangChain provides various strategies for splitting text. The most common and recommended approach for generic text is the
RecursiveCharacterTextSplitter
. This splitter works by attempting to split the text based on a hierarchical list of separators (by default:["\n\n", "\n", " ", ""]
). It first tries to split by double newlines (paragraphs). If a resulting chunk is still too large, it takes that chunk and tries to split it by single newlines (sentences), and so on, down to individual characters. This strategy is effective because it tries to keep the most semantically related blocks of text together for as long as possible. Thechunk_size
andchunk_overlap
parameters allow for fine-tuning the size of the chunks and the amount of text repeated between them to maintain context.
It is vital to recognize that the metadata
field of a Document
is not merely for bookkeeping; it is a powerful feature for enabling advanced retrieval. Loaders automatically populate metadata with information like the source
file or URL. During custom ingestion pipelines, an engineer can enrich this metadata further, adding publication dates, authors, or categorical tags. This foresight during the ingestion stage allows for the implementation of sophisticated filtered searches later on (e.g., "find documents about 'attention mechanism' from papers by 'Vaswani' published in 2017"), a capability that is impossible without well-structured metadata.
Hands-On Lab
Select a Data Source: Choose a document to work with. A good starting point is a long, text-heavy web page, such as a technical blog post or a Wikipedia article. The Lilian Weng blog post on agents is a common example used in official tutorials. Alternatively, find a multi-page PDF of a research paper.
Install Dependencies:
pip install langchain-community beautifulsoup4 pypdf
.Load the Data:
If using a web page, use
from langchain_community.document_loaders import WebBaseLoader
and instantiate it with the URL.If using a PDF, use
from langchain_community.document_loaders import PyPDFLoader
and instantiate it with the local file path.Call the
.load()
method on your loader instance to get a list ofDocument
objects.
Split the Documents:
Import
from langchain_text_splitters import RecursiveCharacterTextSplitter
.Instantiate the splitter, setting a
chunk_size
(e.g., 1000 characters) and achunk_overlap
(e.g., 200 characters).Call the
.split_documents()
method on the splitter instance, passing in the list of documents loaded in the previous step. This will return a new list of smaller document chunks.
Inspect the Results: Print the
page_content
andmetadata
of the first few chunks. Observe how the text has been divided and how thesource
metadata has been preserved.
Day 4: The RAG Pipeline - Vectorization & Retrieval
Conceptual Learning
After preparing the data chunks, the next stage is to index them in a way that enables efficient semantic search. This involves converting the text into numerical representations (embeddings) and storing them in a specialized database (a vector store).
Embedding Models: An embedding model is a neural network that converts a piece of text into a high-dimensional vector. The key property of these embeddings is that texts with similar semantic meanings will have vectors that are numerically close to each other in the vector space. LangChain provides a standard
Embeddings
class with numerous integrations, such asOpenAIEmbeddings
orHuggingFaceEmbeddings
.Vector Stores: A vector store is a database specifically designed to store these vector embeddings and perform extremely fast nearest-neighbor searches. Given a query vector, the store can efficiently find the vectors (and their corresponding text chunks) that are most similar to it. LangChain integrates with dozens of vector stores. For local development and rapid prototyping, in-memory options like
FAISS
(from Facebook AI) andChroma
are excellent choices. For production applications, managed and scalable solutions likePinecone
,Weaviate
, or database extensions likepgvector
for PostgreSQL are more common.The Retriever Interface: In LangChain, the process of fetching documents is abstracted by the
Retriever
interface. A retriever is a component that takes a string query as input and returns a list of relevantDocument
objects. While a vector store is the most common component to back a retriever, the interface is intentionally general. For example, a retriever could be built to wrap the Wikipedia API or a traditional keyword search engine. Any vector store object in LangChain can be easily converted into a retriever by calling its.as_retriever()
method.
The quality of the entire RAG application is fundamentally constrained by the quality of its retriever. The LLM in the final step only has access to the documents that the retriever provides; it cannot see the rest ofthe vector store. If the retriever fails to fetch the correct documents for a given query, even the most advanced LLM will be unable to generate an accurate answer. This makes the retrieval step the most common point of failure and the most critical area for optimization in any RAG system. This reality justifies the focus on advanced retrieval strategies in the later stages of this curriculum.
Hands-On Lab
Continue from Day 3: Start with the list of split document chunks from the previous lab.
Install Dependencies:
pip install langchain-openai "chromadb<0.5.0"
(or another vector store of choice, likefaiss-cpu
).Initialize Components:
Import and initialize an embedding model:
from langchain_openai import OpenAIEmbeddings; embeddings = OpenAIEmbeddings()
.Import the chosen vector store:
from langchain_community.vectorstores import Chroma
.
Index the Documents: Use the vector store's class method
.from_documents()
to perform the indexing in a single step. This method takes the list of document chunks and the embedding model instance as arguments. It will iterate through the chunks, generate an embedding for each one, and store the chunk and its embedding in the vector store.Python
# all_splits is the list of document chunks from Day 3
vectorstore = Chroma.from_documents(documents=all_splits, embedding=embeddings)
Create a Retriever: Convert the initialized vector store into a retriever. The
.as_retriever()
method can be configured, for example, to specify the number of documents to return (k
).Python
retriever = vectorstore.as_retriever(search_kwargs={"k": 5}) # Retrieve the top 5 most relevant chunks
Test the Retriever: Use the retriever's
.invoke()
method with a sample query related to your document's content.Python
query = "What is task decomposition in AI agents?"
retrieved_docs = retriever.invoke(query)
Inspect the Results: Print the
page_content
of theretrieved_docs
. Verify that they are semantically relevant to the query.
Day 5: Building Your First RAG Application
Conceptual Learning
This session culminates the work of the first week by assembling all the previously built components—the loader, splitter, embedding model, and retriever—into a complete, end-to-end question-answering chain using LCEL.
Architecting the RAG Chain: The goal is to create a chain that takes a user's question, uses it to retrieve relevant context, and then passes both the context and the original question to an LLM to generate an answer. This data flow requires parallel processing, which is a perfect use case for
RunnableParallel
. The structure of the chain will be as follows :Python
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
# Assume 'retriever', 'prompt', and 'model' are already defined
rag_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt | model | StrOutputParser() ) ```
Breaking Down the LCEL Chain:
{"context": retriever, "question": RunnablePassthrough()}
: This is aRunnableParallel
map. When the chain is invoked with a question string, this map is the first step.The
retriever
is invoked with the question, and its output (a list ofDocument
objects) is assigned to thecontext
key.RunnablePassthrough()
is invoked with the question, and it simply passes the question string through unchanged, assigning it to thequestion
key.This step runs in parallel and produces a dictionary:
{"context": [docs], "question": "user's question"}
.
| prompt
: The dictionary from the previous step is piped into the prompt template. The template must havecontext
andquestion
as itsinput_variables
. It formats these into the final prompt.| model
: The formatted prompt is sent to the LLM.| StrOutputParser()
: TheAIMessage
output from the model is parsed into a clean string for the final answer.
A common pitfall in RAG is creating a prompt that allows the model to ignore the provided context and answer from its own internal knowledge, leading to hallucinations. The prompt is a critical guardrail. A weak prompt like
"Context: {context}\nQuestion: {question}"
can be ineffective. A much stronger prompt explicitly instructs the model on its task and constraints: "You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer from the context, just say that you don't know. Do not try to make up an answer. \n\nContext: {context} \n\nQuestion: {question} \n\nAnswer:"
. This level of instruction is a crucial piece of prompt engineering that forces the model to adhere to the RAG pattern.
Hands-On Lab
Assemble Components: Bring together the
retriever
from Day 4 and aChatOpenAI
model instance.Create the RAG Prompt: Define a
ChatPromptTemplate
with the robust instructional format described above, ensuring it hascontext
andquestion
as input variables.Build the LCEL Chain: Construct the full RAG chain using the
RunnableParallel
structure and pipe operators as detailed in the conceptual section.Invoke the Chain: Call the chain's
.invoke()
method with several different questions that can be answered from your source document. Print the results.Python
# Example invocation
answer = rag_chain.invoke("What are the main components of an AI agent system?")
print(answer)
Full-Cycle Debugging with LangSmith: Examine the LangSmith trace for one of your invocations. This is a critical step to solidify understanding.
Click into the trace for the overall
rag_chain
.Observe the
RunnableParallel
step. You can inspect the inputs and outputs for both theretriever
and theRunnablePassthrough
.Examine the
ChatPromptTemplate
step to see the final, formatted prompt that was sent to the model, including the full text of the retrieved context.Check the
ChatOpenAI
step to see token usage and latency.This end-to-end visibility is invaluable for debugging and optimization.
Part II: Building Intelligent & Stateful Applications (Days 6-10)
Having mastered the fundamentals of stateless RAG applications, the second week focuses on adding intelligence and statefulness. This involves giving applications memory to conduct conversations and empowering them with tools to interact with external systems, moving from simple chains to reasoning agents.
Day 6: Adding Memory to Your Chains
Conceptual Learning
Standard LCEL chains are stateless; each invocation is treated as an independent event. This is insufficient for building applications like chatbots, which must remember the history of the conversation to provide coherent and contextually relevant responses. LangChain's memory components are designed to solve this problem.
How Memory Works: At a high level, memory components work by integrating into a chain to perform two actions:
Load Variables: Before the core logic of the chain executes, the memory component reads the conversation history and loads it into the prompt's input variables (e.g., under a key like
chat_history
).Save Context: After the chain executes and the model generates a response, the memory component takes the latest human input and AI output and saves them to its history store.
LangChain Memory Components: LangChain provides several types of memory, each representing a different trade-off between context fidelity, token cost, and complexity.
Hands-On Lab
This lab focuses on adding memory to a simple, non-RAG conversational chain to isolate and understand the memory mechanism itself.
Setup: Ensure you have
langchain-openai
installed.Create a Basic Conversational Chain:
Python
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.chains import LLMChain
from langchain.memory import ConversationBufferWindowMemory
llm = ChatOpenAI(model="gpt-4o-mini")
# The prompt now includes a placeholder for memory
prompt = ChatPromptTemplate.from_messages()
Instantiate Memory: Create an instance of
ConversationBufferWindowMemory
. We usek=2
to remember the last two pairs of interactions. Thememory_key
must match thevariable_name
in theMessagesPlaceholder
.Python
memory = ConversationBufferWindowMemory(k=2, memory_key="chat_history", return_messages=True)
Create the Chain with Memory: While modern applications often use LangGraph for state, the legacy
LLMChain
provides a simple way to demonstrate memory.Python
conversation_chain = LLMChain(
llm=llm,
prompt=prompt,
memory=memory,
verbose=True # Set to True to see the prompt being sent
)
Have a Multi-Turn Conversation: Interact with the chain several times to see the memory in action.
Python
conversation_chain.predict(input="Hi, my name is Alice.")
conversation_chain.predict(input="I live in London.")
# The model should now know your name and location
conversation_chain.predict(input="What is my name?")
# The model should forget your name after more interactions
conversation_chain.predict(input="What's the weather like today?")
conversation_chain.predict(input="And tomorrow?")
conversation_chain.predict(input="What was my name again?") # Should fail here
Inspect with LangSmith: Review the traces in LangSmith. For each call to
predict
, expand theLLMChain
trace and look at the prompt sent to the model. You will see thechat_history
being populated with previous turns, demonstrating how the memory component works.
Day 7: Building a Conversational RAG Chatbot
Conceptual Learning
This day integrates the concepts from the entire curriculum so far, building a chatbot that is both conversational (using memory) and data-aware (using RAG). This is a highly practical and common application pattern.
The Architectural Challenge: Combining memory and RAG presents a new challenge. A RAG retriever needs a clear, self-contained query to find relevant documents. However, in a conversation, a user's follow-up question might be ambiguous without the prior context (e.g., User: "What is LangChain?", Bot: "...", User: "How does it work?"). Feeding "How does it work?" directly to the retriever would yield poor results.
The History-Aware Retriever Pattern: The standard solution is to create a preliminary chain whose sole purpose is to "condense" the chat history and the follow-up question into a single, standalone question. This rephrased question is then passed to the retriever.
Input:
chat_history
and the newinput
(follow-up question).Process: An LLM call with a prompt like:
"Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.\n\nChat History:\n{chat_history}\nFollow Up Input: {input}\nStandalone question:"
Output: A single, self-contained question (e.g., "How does LangChain work?").
This pattern ensures that the retriever always receives a high-quality, context-rich query, dramatically improving the relevance of the retrieved documents for conversational RAG.
While this entire flow can be built using LCEL, the increasing complexity of managing state (chat history) and orchestrating multiple steps (condense question, retrieve, generate answer) highlights the limitations of a purely linear chain. This is where a graph-based approach becomes more natural and maintainable. The official LangChain documentation and tutorials are increasingly recommending LangGraph
for any application that involves state or memory.
LangGraph
models the application as an explicit state machine, which is a more robust and scalable paradigm for complex conversational systems. This shift in the framework's direction is a key indicator for an expert engineer that for new, stateful projects, LangGraph
is the preferred architectural choice.
Hands-On Lab
This lab builds the full, multi-step conversational RAG chain.
Create the "Condense Question" Chain:
Define a
PromptTemplate
for the history-aware rephrasing task described above.Create an LCEL chain:
condense_question_prompt | model | StrOutputParser()
.
Create the "Answer" Chain:
This is the standard RAG chain from Day 5. It takes
context
andquestion
as input.answer_prompt = ChatPromptTemplate.from_template(...)
answer_chain = answer_prompt | model | StrOutputParser()
Assemble the Full Conversational Chain:
This requires chaining these components together and managing the flow of inputs.
We can use
RunnableWithMessageHistory
fromlangchain_core.runnables.history
to wrap the entire logic, which helps manage the memory component.
Test the Full System:
Start a conversation with a question about your document.
Ask a follow-up question that relies on the context of the first question.
Verify that the system correctly rephrases the question, retrieves relevant documents, and provides a contextual answer.
Debug with LangSmith: This is essential. The trace will show the nested chains. You can inspect the input and output of the "condense question" chain to see the rephrased query, then see that query being used by the retriever in the main RAG chain.
Day 8: Introduction to Agents and Tools
Conceptual Learning
While chains execute a predetermined sequence of steps, agents introduce a new level of autonomy. An agent uses an LLM as a reasoning engine to dynamically decide which actions to take to accomplish a goal. This makes them suitable for tasks where the path to a solution is not known in advance.
The Agent as a Reasoning Engine (ReAct): The most common framework for agentic behavior is ReAct (Reason + Act). The agent operates in a loop :
Reason (Thought): The LLM is given the user's query and a list of available tools. It reasons about the problem and decides which tool, if any, can help it make progress.
Act (Action): The LLM outputs a specific tool call with the necessary arguments.
Observe (Observation): The
AgentExecutor
runs the specified tool and captures its output.Repeat: The observation is fed back to the LLM along with the original goal, and the loop continues until the LLM determines that the task is complete and generates a final answer.
Tools: The Agent's Capabilities: A
Tool
is the agent's interface to the outside world. It is a function or API that the agent can call to perform a specific action, such as searching the web, querying a database, performing a calculation, or interacting with a proprietary API.The Importance of Tool Descriptions: The LLM's decision-making process is entirely dependent on the
name
anddescription
provided for each tool. The agent does not "see" the tool's code; it only reads the description to understand what the tool does and when to use it. Therefore, crafting clear, precise, and unambiguous tool descriptions is one of the most critical skills in building reliable agents.
The Tool
abstraction is also a powerful way to build what are effectively multi-modal applications. A text-based LLM can act as a "brain" or "router" for an agent. By providing it with tools that wrap other modalities—for example, a tool that calls the DALL-E API for image generation or a tool that calls a text-to-speech API—the agent can orchestrate complex, multi-modal workflows. The LLM reasons in text (e.g., "I need to create an image of a robot writing code") and outputs a textual tool call, which the executor then translates into an actual API call, bridging the gap between different modalities.
Hands-On Lab
Setup: Install packages for common tools:
pip install langchain-community duckduckgo-search
.Instantiate Built-in Tools:
Import and initialize a web search tool:
from langchain_community.tools import DuckDuckGoSearchRun; search = DuckDuckGoSearchRun()
.LangChain also has a simple
LLMMathChain
that can be wrapped as a tool for calculations.
Create a Tool List: Put the instantiated tools into a Python list.
Python
from langchain.tools import Tool
from langchain.chains import LLMMathChain
# llm should be an initialized ChatOpenAI model
llm_math_chain = LLMMathChain.from_llm(llm=llm)
tools =
Practice Tool Invocation: This lab focuses on understanding the tools themselves, not yet the full agent. Practice calling the
.run()
or.invoke()
method on each tool directly with sample inputs to see what kind of output they produce. This helps in understanding what the agent will be "observing" in its reasoning loop.
Day 9: Creating Custom Tools
Conceptual Learning
The true power of agents is unlocked when they are given access to proprietary data, internal APIs, and custom business logic. This is achieved by creating custom tools.
The
@tool
Decorator: This is the simplest and most recommended way to create a custom tool from any Python function. The decorator intelligently inspects the function:The function name becomes the tool's name.
The function's docstring becomes the tool's description.
The function's type hints are used to automatically generate a JSON schema for the arguments. This makes creating simple tools extremely straightforward.
StructuredTool
and Pydantic: For tools that require more complex, multi-argument inputs, it is best practice to define the input schema explicitly using a PydanticBaseModel
. This schema can then be passed to aStructuredTool
instance. This provides the LLM with a clear, structured format for the arguments it needs to generate, which significantly improves the reliability of tool calls compared to parsing a single string.Error Handling: Production systems must be robust to failure. Tools can fail for many reasons (e.g., an external API is down, an invalid input is provided). The
StructuredTool
class has ahandle_tool_error
parameter, and developers can raise aToolException
from within their function. This allows for graceful error handling, where the error message can be passed back to the agent as an observation, allowing it to potentially retry or choose a different action.
The process of designing a tool's description and argument schema is a form of prompt engineering. The agent's LLM does not see the tool's Python code; it only sees the metadata provided. Vague descriptions like "search tool" will confuse an agent if multiple search tools are available. A precise description like, "Use this tool to search for financial data in the internal company SQL database. Input should be a full, valid SQL query." provides clear guidance. Similarly, a well-defined argument schema with descriptions for each field (e.g.,
date: str = Field(description="A date in YYYY-MM-DD format")
) prevents the LLM from passing malformed data. An expert engineer iterates on these descriptions and schemas as a core part of the development process.
Hands-On Lab
Create a Simple Custom Tool:
Define a basic Python function, for example,
get_word_length(word: str) -> int
.Add a clear docstring:
"""Returns the number of letters in a word."""
.Apply the
@tool
decorator to the function.Inspect the
.name
,.description
, and.args
attributes of the resulting tool object.
Create a Structured Custom Tool:
Imagine a function to interact with a fictional API:
get_flight_price(origin: str, destination: str, departure_date: str) -> str
.Import
BaseModel
andField
from Pydantic. Create an input schema class:class FlightSearchInput(BaseModel):...
with fields for origin, destination, and departure_date, each with a description.Use
StructuredTool.from_function()
to create the tool, passing the function and theargs_schema
.
Test the Tools: Invoke both custom tools directly with sample inputs to ensure they execute correctly and return the expected output.
Day 10: Building Your First Agent
Conceptual Learning
This session combines the tools from the previous days with a reasoning LLM to build a fully functional agent.
The Agent Executor: The
AgentExecutor
is the runtime that orchestrates the ReAct loop. It is responsible for:Passing the input and tool definitions to the agent's LLM.
Parsing the LLM's response to identify a tool call.
Executing the specified tool with the provided arguments.
Passing the tool's output (the observation) back to the LLM for the next reasoning step.
Repeating this process until the LLM outputs a final answer instead of a tool call.
Modern Agent Construction: The modern, high-level constructor
create_react_agent
from thelanggraph.prebuilt
module is the recommended way to build a ReAct agent. It simplifies the process by taking the model and the list of tools as direct inputs. It can also accept acheckpointer
argument to seamlessly add conversational memory.Observing the Thought Process: Debugging an agent is impossible without seeing its reasoning process. When using older agent types, setting
verbose=True
in theAgentExecutor
prints the thought, action, and observation steps to the console. In modern LangGraph-based agents, this entire process is automatically and beautifully visualized in LangSmith, which is the superior method for debugging.
While powerful, agentic systems can be fragile. An LLM might get stuck in a loop, repeatedly calling the same tool with the same bad arguments. This is why guardrails are essential for production agents. The
AgentExecutor
includes a max_iterations
parameter to prevent infinite loops. It also has parameters like handle_parsing_errors
to gracefully manage cases where the LLM produces a malformed tool call. An expert engineer builds not just the agent, but a robust system around it with these safety mechanisms, and potentially a human-in-the-loop approval step for critical actions.
Hands-On Lab
Assemble Tools: Create a Python list containing the tools developed over the past two days: the web search tool and your custom calculator or word length tool.
Instantiate the Agent:
Import
create_react_agent
fromlanggraph.prebuilt
.Initialize a tool-calling capable chat model (e.g.,
ChatOpenAI(model="gpt-4o")
).Call
agent_executor = create_react_agent(model, tools)
.
Define a Complex Query: Formulate a question that requires the agent to use multiple tools in a specific sequence. For example: "Search for the name of the current Chancellor of Germany, and then tell me how many letters are in their first name."
Run the Agent: Invoke the
agent_executor
with the complex query.Python
query = "Search for the current Chancellor of Germany and tell me the length of their first name."
result = agent_executor.invoke({"messages": [{"role": "user", "content": query}]})
print(result['messages'][-1].content)
Deep Dive into the LangSmith Trace: This is the most important part of the lab. Open the trace for the agent run.
Follow the execution graph step-by-step.
Observe the first call to the agent node, where the LLM outputs a
thought
and anaction
to call the search tool.See the tool executor node run the search and produce an
observation
.Follow the execution back to the agent node, where the LLM now receives the observation and outputs a new
thought
and anaction
to call your custom word length tool.Finally, see the agent produce the
Final Answer
. This detailed trace makes the entire ReAct loop transparent and debuggable.
Part III: Productionization and Advanced Topics (Days 11-15)
The final week of the curriculum transitions from building applications to ensuring they are robust, reliable, and ready for production. This involves mastering the tools for evaluation, deploying complex architectures with LangGraph, implementing advanced RAG techniques, and deploying the final application as a service.
Day 11: LangSmith - Observability, Debugging, and Evaluation
Conceptual Learning
Complex LLM applications, especially agents, can behave like "black boxes," making them difficult to debug and improve. LangSmith is the solution to this problem, providing the essential tools for observability and evaluation needed for a professional engineering workflow.
Tracing and Debugging: LangSmith automatically captures a detailed trace of every execution of a LangChain
Runnable
. The trace provides a hierarchical view of the run, allowing an engineer to inspect the exact inputs, outputs, latency, and token usage of every single component in the chain or agent. This granular visibility is indispensable for identifying bottlenecks and debugging failures.Evaluation: Moving beyond anecdotal testing ("it seems to work") is critical for improving application quality. LangSmith provides a comprehensive evaluation framework :
Datasets: The process starts with creating a dataset, which is a collection of examples, each typically containing an input and a reference (ground truth) output. This serves as a "golden set" for benchmarking.
Evaluators: An evaluator is a function that scores an application's output against the reference output. LangSmith offers built-in evaluators for common tasks (e.g.,
QA
orCorrectness
evaluators) and supports custom evaluators. A powerful pattern is the "LLM-as-judge," where another LLM is used to score the output based on criteria like helpfulness or style.
Monitoring and Feedback: For applications in production, LangSmith allows for the logging of user feedback programmatically. This feedback can be reviewed and used to curate high-quality examples, which can then be added to evaluation datasets, creating a continuous feedback loop for improving the application over time.
The core of a professional LLM engineering workflow is an iterative, data-driven cycle: build, evaluate, debug, improve, and re-evaluate. LangSmith is the platform that powers this entire loop. By establishing a benchmark with an evaluation dataset, an engineer can make a change (e.g., modify a prompt, swap a model), run a new evaluation, and receive quantitative evidence of whether the change resulted in an improvement or a regression. This transforms the "art" of prompt engineering into a more rigorous, scientific process.
Hands-On Lab
Create an Evaluation Dataset:
Take the RAG application built in Part I.
In the LangSmith UI, create a new dataset.
Manually add 10-15 examples. For each example, provide an
input
(a question you can ask your RAG app) and areference
output (the ideal, hand-written answer).
Run an Evaluation:
Navigate to your RAG application's project in LangSmith.
Select a set of recent runs to evaluate, or run the application on the inputs from your newly created dataset.
From the project view, initiate an evaluation. Select your dataset and choose a built-in evaluator, such as the
Correctness
evaluator.
Analyze the Results:
LangSmith will run the evaluation and produce a results table, showing the score for each example.
Identify the examples where your application performed poorly.
Click into the trace for a failed run. Use the detailed trace to diagnose the root cause. Was the retrieval step poor? Did the retriever fetch irrelevant documents? Was the final prompt to the LLM flawed? This hands-on debugging process is a critical skill.
Day 12: LangGraph - Building Complex, Stateful Agents
Conceptual Learning
While LCEL is excellent for linear or simple branching chains, it becomes cumbersome for applications that require more complex control flow, such as cycles (loops), multi-step reasoning, or robust state management. LangGraph is the purpose-built solution for these scenarios, modeling applications as state machines.
LangGraph Fundamentals:
State: The core of a LangGraph application is its state, which is defined using a Python
TypedDict
. This object holds all the information that needs to persist and be passed between steps in the graph.Nodes: A node is a function or a
Runnable
that represents a unit of work. Each node receives the current state as input and can return an object to update the state.Edges: Edges define the control flow, connecting the nodes. LangGraph supports standard edges (always go from node A to node B) and conditional edges. A conditional edge directs the flow to one of several possible next nodes based on the content of the current state, enabling dynamic branching logic.
Human-in-the-Loop: A key feature of LangGraph is the ability to build graphs that can be interrupted. By adding a specific node for interruption, the graph can pause its execution, wait for human input or approval, and then resume. This is critical for building reliable agents that perform sensitive or costly actions.
A common point of confusion for developers is deciding which orchestration strategy to use. The following decision matrix clarifies the ideal use cases for each.
Hands-On Lab
This lab involves re-implementing the agent from Day 10 using the more explicit and powerful LangGraph framework. This will solidify the understanding of how the ReAct loop is constructed.
Define the State: Create a
TypedDict
for the agent's state. It should include keys likemessages: Annotated[list, add_messages]
,agent_outcome: Union[AgentAction, AgentFinish, None]
, andintermediate_steps: Annotated[list[tuple[AgentAction, str]], operator.add]
.Define the Nodes:
agent_node
: A function that takes the state, calls the core agentRunnable
, and returns an update for theagent_outcome
key.tool_node
: A function that takes the state, executes the tool call specified inagent_outcome
, and returns the result as an update to theintermediate_steps
key.
Define the Conditional Edge: Write a function that inspects the
agent_outcome
in the state. If the outcome is anAgentFinish
object, it should return the string"end"
. If it's anAgentAction
, it should return"continue"
.Construct the Graph:
Instantiate a
StateGraph
.Add the
agent_node
andtool_node
to the graph.Set the
agent_node
as the entry point.Add the conditional edge originating from the
agent_node
, mapping"continue"
to thetool_node
and"end"
to the specialEND
state.Add a standard edge from the
tool_node
back to theagent_node
, creating the ReAct cycle.
Compile and Run: Compile the graph into a
Runnable
and invoke it with the same complex query from the Day 10 lab. The behavior will be identical, but the underlying structure is now more robust, explicit, and extensible.
Day 13: Advanced RAG Techniques
Conceptual Learning
Revisiting the principle from Day 4—that retrieval quality is the primary driver of RAG performance—this day focuses on advanced strategies for improving the "R" in RAG. Production-grade RAG is rarely a single vector search; it is a multi-stage pipeline designed to handle ambiguity and increase precision.
Advanced Retrievers:
MultiQueryRetriever
: This retriever addresses the problem of user query ambiguity. It uses an LLM to generate several different versions of the user's question from various perspectives. It then runs a search for each generated query, collects all the results, and returns the unique set of documents. This broadens the search to catch relevant documents that might have been missed by the original phrasing.SelfQueryRetriever
: This powerful retriever enables structured, metadata-based filtering. It uses an LLM to parse a natural language query into a structured query that contains both a semanticquery
string and afilter
to apply to the vector store's metadata. For example, the query "What did papers by Hinton from 2022 say about capsules?" would be parsed into a semantic query for "capsules" and a metadata filter forauthor='Hinton'
andyear=2022
.
Contextual Compression and Reranking: Standard retrievers return whole document chunks. These chunks can contain a lot of text that is irrelevant to the specific query, adding noise to the context sent to the LLM. Contextual compression solves this.
A
ContextualCompressionRetriever
wraps a base retriever. It first fetches a larger number of documents (e.g., 20) and then passes them through aDocumentCompressor
.The compressor's job is to filter out irrelevant documents or extract only the most relevant sentences from each document.
A common and highly effective compression technique is to use a reranker. A reranker, such as one from Cohere or a cross-encoder model, is a more powerful (but slower) model that re-scores the initial set of documents based on their relevance to the query. The compression retriever then returns only the new top-k documents (e.g., the top 3), resulting in a much smaller, more relevant, and less noisy context for the final generation step.
Hands-On Lab
Setup: Install necessary packages:
pip install langchain-cohere
. Obtain a Cohere API key for the reranker.Upgrade the RAG Application: Start with the RAG chain from Day 5.
Implement Reranking:
Import
ContextualCompressionRetriever
andCohereRerank
.Initialize the reranker:
compressor = CohereRerank()
.Wrap your base retriever:
compression_retriever = ContextualCompressionRetriever(base_compressor=compressor, base_retriever=retriever)
.
Test and Compare:
Run the same query through both your original
retriever
and the newcompression_retriever
.Print the number of documents returned by each and inspect their content. Observe how the
compression_retriever
returns fewer, more precisely relevant documents.Integrate the new
compression_retriever
into your full RAG chain and compare the quality of the final answers.
Day 14: LangServe - Deploying Your Application
Conceptual Learning
A completed LLM application in a Jupyter notebook is not a product. To be useful, it must be deployed as a scalable and reliable service. LangServe is LangChain's solution for turning any Runnable
into a production-ready API with minimal effort.
Key Features of LangServe:
Automatic API Generation: LangServe automatically creates REST API endpoints for your
Runnable
, including/invoke
,/batch
, and/stream
, directly from theRunnable
's methods.Interactive UI (Playground): When you run a LangServe server, it automatically hosts a web-based playground at
/your-endpoint/playground/
. This UI allows for interactive testing of your API directly in the browser.FastAPI Integration: LangServe is built on top of FastAPI, a modern, high-performance Python web framework. This means it is robust and can be integrated into existing FastAPI applications.
Project Structure: A typical LangServe project involves creating a package (a directory with an
__init__.py
file) and aserver.py
file inside it. Theserver.py
file is where you define your LangChain application and use theadd_routes
function fromlangserve
to expose it.
The power of LangServe is the ultimate payoff of the standardized Runnable
interface that underpins LCEL and LangGraph. Because every chain and graph conforms to this protocol, LangServe knows exactly how to interact with it and can generate the corresponding API endpoints automatically. This abstracts away a significant amount of boilerplate web development work (defining endpoints, handling serialization, implementing streaming logic), allowing the engineer to focus on the application logic and dramatically accelerating the path from prototype to production.
Hands-On Lab
Install LangServe:
pip install "langchain[server]"
.Create a Project:
Create a new directory for your project, e.g.,
my-rag-app
.Inside it, create another directory
app
. Add an empty__init__.py
file to it.Inside
app
, create achain.py
file and aserver.py
file.
Define the Chain: In
app/chain.py
, place the code for your complete conversational RAG application (from Day 7 or Day 13). Ensure it is defined as a singleRunnable
variable (e.g.,my_chain
).Create the Server: In
app/server.py
:Python
from fastapi import FastAPI
from langserve import add_routes
from app.chain import my_chain # Import your chain
app = FastAPI(
title="LangChain Server",
version="1.0",
description="A simple api server using Langchain's Runnable interfaces",
)
# Add the routes for your chain
add_routes(
app,
my_chain,
path="/rag-chatbot",
)
Run the Server: From your top-level project directory (
my-rag-app
), run the command:langchain serve
.Test the API:
Open your browser and navigate to
http://localhost:8000/rag-chatbot/playground/
. Use the interactive UI to send requests to your application.Write a separate Python script using the
requests
library to programmatically send a POST request to thehttp://localhost:8000/rag-chatbot/invoke
endpoint and print the response.
Day 15: Becoming a LangChain Expert
Conceptual Learning
The final day focuses on developing the mindset of an expert engineer. This involves understanding the framework's limitations, adhering to best practices, and knowing how to choose the right architectural pattern for a given problem.
Addressing the Critiques: It is important to acknowledge and understand the common criticisms of LangChain.
"Too much magic/leaky abstractions": This was a valid critique of early versions. However, the modern framework built on the explicit
Runnable
protocol of LCEL and the state machine paradigm of LangGraph is far more transparent. An expert understands these underlying protocols, which demystifies the framework."Inflexible/forces you into a box": LangChain is not an all-or-nothing framework. An expert knows that its value lies in its composability. They can choose to use only a
DocumentLoader
and anEmbedding
model and write the rest of the logic themselves, or use a high-levelcreate_react_agent
constructor when it fits the use case. Mastery involves knowing when to use the abstractions and when to build from lower-level components.
Best Practices:
Security: Always be mindful of security risks like prompt injection. Sanitize user inputs and be cautious about the tools you give to an agent, especially any that can execute code or interact with sensitive systems.
Asynchronous Programming: For production servers that need to handle many concurrent requests, using the asynchronous methods of the
Runnable
interface (.ainvoke
,.astream
, etc.) is critical for performance and scalability.
The Expert Mindset: A true LangChain expert is not someone who has memorized every function. They are an engineer who deeply understands the fundamental patterns of building LLM applications—like RAG, ReAct, and multi-stage reasoning—and views LangChain as a powerful, high-leverage toolkit for implementing those patterns efficiently and robustly. The framework provides the "LEGO bricks" (loaders, models, tools) and the "instruction manuals" (LCEL, LangGraph). The expert's job is to be the architect who knows how to combine them to build a sound structure.
Capstone Project Outline
To solidify the skills learned over the 15 days, the final task is to design a complex, multi-step agent. A good example is a "Research Assistant Agent" built with LangGraph.
Goal: Take a complex research question (e.g., "What are the latest advancements in constitutional AI and how do they compare to traditional reinforcement learning from human feedback?"), break it down, research each part, and synthesize a final report.
State (
TypedDict
):original_question: str
sub_questions: list[str]
research_results: dict[str, str]
(mapping sub-questions to their researched answers)final_report: str
Nodes:
decompose_question
: An LLM node that takes theoriginal_question
and generates a list ofsub_questions
.research_node
: A RAG chain (using web search as the retriever) that takes a single sub-question and returns a researched answer.synthesis_node
: A final LLM node that takes all theresearch_results
and generates a comprehensivefinal_report
.
Control Flow (Edges):
Start ->
decompose_question
.From
decompose_question
, a conditional edge checks if there are sub-questions to research. If yes, it proceeds to a loop.The loop iterates through the
sub_questions
list, calling theresearch_node
for each one and aggregating the results in theresearch_results
dictionary within the state.Once all sub-questions are researched, an edge transitions to the
synthesis_node
.synthesis_node
-> END.
This capstone project requires combining state management, conditional logic, looping (cycles), and multiple LLM calls, representing a true culmination of the skills required to be a world-class LangChain expert engineer.