PolarSPARC

Common LangChain Recipes


Bhaskar S 12/28/2024


Overview

In the article on LangChain , we covered the basics of LangChain framework and how to get started with it. In this article, we will provide some common code recipes for working with LangChain.


Installation and Setup

The installation and setup will be on a Ubuntu 24.04 LTS based Linux desktop. Ensure that Ollama is installed and setup on the desktop (see instructions).

In addition, ensure that the Python 3.x programming language as well as the Jupyter Notebook package is installed and setup on the desktop.

IFinally, ensure that all the LangChain packages are properly setup on the desktop (see instructions).

For the LLM model, we will be using the recently released IBM Granite 3.1 1B model.

Open a new terminal window and execute the following docker command to download the LLM model:


$ docker exec -it ollama ollama run granite3.1-moe:1b


With this, we are ready to showcase the common code recipes using LangChain.


Code Recipes for LangChain

Create a file called .env with the following environment variables defined:


LLM_TEMPERATURE=0.0
OLLAMA_MODEL='granite3.1-moe:1b'
OLLAMA_BASE_URL='http://192.168.1.25:11434'
CHROMA_DB_DIR='/home/bswamina/.chromadb'
GPU_DATASET='./data/gpu_specs.csv'

To load the environment variables and assign them to Python variable, execute the following code snippet:


from dotenv import load_dotenv, find_dotenv

import os

load_dotenv(find_dotenv())

llm_temperature = float(os.getenv('LLM_TEMPERATURE'))
ollama_model = os.getenv('OLLAMA_MODEL')
ollama_base_url = os.getenv('OLLAMA_BASE_URL')
gpu_dataset = os.getenv('GPU_DATASET')
chroma_db_dir = os.getenv('CHROMA_DB_DIR')

Executing the above Python code snippet generates no output.

To initialize an instance of Ollama running the our desired LLM model granite3.1-moe:1b, execute the following code snippet:


from langchain_ollama import OllamaLLM

ollama_llm = OllamaLLM(base_url=ollama_base_url, model=ollama_model, temperature=llm_temperature)

Executing the above Python code snippet generates no output.

To initialize an instance of vector embedding class corresponding to the model running in Ollama, execute the following code snippet:


from langchain_ollama import OllamaEmbeddings

ollama_embedding = OllamaEmbeddings(base_url=ollama_base_url, model=ollama_model)

Executing the above Python code snippet generates no output.


Recipe 1 : Execute a simple LLM prompt


The following code snippet creates a simple prompt template, an LLM chain, and executes the chain to get a response from the LLM model:


from langchain_core.prompts import PromptTemplate

template = """
Question: {question}

Answer: Summarize in less than {tokens} words.
"""

prompt = PromptTemplate.from_template(template=template)

chain = prompt | ollama_llm

result = chain.invoke({'question': 'describe langchain ai framework', 'tokens': 50})

print(result)

Executing the above Python code snippet generates the following typical output:


Output.1

Langchain is an open-source AI framework that enables developers to build and deploy language models for various applications, including chatbots, content generation, and translation services. It supports multiple languages and offers a modular architecture for customization.

We have successfully demonstrated the Python code snippet for Recipe 1.


Recipe 2 : Generate Structured Output from LLM


The following code snippet creates an LLM chat instance, a data class that will conform to the structured output, a chain using the data class as the schema, and executes the chain to get a structuredresponse from the LLM model:


from pydantic import BaseModel
from langchain_ollama import ChatOllama

class GpuSpecs(BaseModel):
  name: str
  vram: int
  cuda_cores: int

ollama_struct_llm = ChatOllama(base_url=ollama_base_url, model=ollama_model, format='json', temperature=llm_temperature)

structured_llm = ollama_struct_llm.with_structured_output(GpuSpecs)

result2 = structured_llm.invoke('Get the GPU specs for RTX 4070 Ti')

print(result2)

Executing the above Python code snippet generates the following typical output:


Output.2

name='RTX 4070 Ti' vram=128 cuda_cores=128

We have successfully demonstrated the Python code snippet for Recipe 2.


Recipe 3 : Chat with LLM preserving History


The following code snippet creates a chat session, an in-memory chat history store, an LLM chain using the history store, and executes the chain to get responses from the LLM model:


from langchain_ollama import ChatOllama
from langchain_core.chat_history import BaseChatMessageHistory, InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

ollama_chat_llm = ChatOllama(base_url=ollama_base_url, model=ollama_model, temperature=llm_temperature)

in_memory_store = {}

def get_chat_history(session_id: str) -> BaseChatMessageHistory:
  if session_id not in in_memory_store:
    in_memory_store[session_id] = InMemoryChatMessageHistory()
  return in_memory_store[session_id]

prompt2 = ChatPromptTemplate.from_messages([
  ('system', 'Helpful AI assistant!'),
  MessagesPlaceholder(variable_name='chat_history'),
  ('human', '{input}')
])

config = {'configurable': {'session_id': 'recipes'}}

chain2 = prompt2 | ollama_chat_llm

chain2_with_history = RunnableWithMessageHistory(chain2, get_chat_history, input_messages_key='input', history_messages_key='chat_history')

result3 = chain2_with_history.invoke({'input': 'Suggest top 3 budget GPUs in one line'}, config=config)

print(result3.content)

print('-------------------------')

result4 = chain2_with_history.invoke({'input': 'Not impressed, try again'}, config=config)

print(result4.content)

Executing the above Python code snippet generates the following typical output:


Output.3

1. NVIDIA GeForce RTX 3060: High-performance GPU for gaming and professional use, with competitive price.
2. AMD Radeon RX 5600 XT: Affordable option offering excellent single-GPU performance.
3. NVIDIA GeForce GTX 1650 Super: Balanced budget choice providing good gaming performance.
-------------------------
1. NVIDIA GeForce RTX 3060 Ti: High-end GPU for demanding tasks and professional use, with excellent price-performance ratio.
2. AMD Radeon RX 5700 XT: Budget-friendly option offering competitive single-GPU performance in gaming and professional applications.
3. NVIDIA GeForce GTX 1660 Super: Balanced budget choice providing good gaming performance at a lower price point.

We have successfully demonstrated the Python code snippet for Recipe 3.


Recipe 4 : Q&A on a CSV file content using LLM


For this recipe, we will create a CSV file called gpu_specs.csv that will contain the GPU card specs for a handful of popular GPU cards. The following list the contents of the CSV file:


manufacturer,productName,memorySize,memoryClock,gpuChip
NVIDIA,GeForce RTX 4060,12,5888,AD104
NVIDIA,GeForce RTX 4070,12,7680,AD104
NVIDIA,GeForce RTX 4080,16,9728,AD103
NVIDIA,GeForce RTX 4090,24,17408,AD102
AMD,Radeon RX 6950 XT,16,5120,Navi 21
AMD,Radeon RX 7700 XT,8, 4096,Navi 33
AMD,Radeon RX 7800 XT,12,8192,Navi 32
AMD,Radeon RX 7900 XT,16,12288,Navi 31
NVIDIA,GeForce RTX 3060 Ti,8,4864,GA103S
NVIDIA,GeForce RTX 3070 Ti,8,5632,GA104
NVIDIA,GeForce RTX 3080,12,8960,GA102
NVIDIA,GeForce RTX 3090 Ti,24,10752,GA102
AMD,Radeon RX 6400,4,768,Navi 24
AMD,Radeon RX 6500 XT,4,1024,Navi 24
AMD,Radeon RX 6650 XT,8,2048,Navi 23
AMD,Radeon RX 6750 XT,12,2560,Navi 22
AMD,Radeon RX 6850M XT,12,2560,Navi 22
Intel,Arc A770,16,4096,DG2-512
Intel,Arc A780,16,4096,DG2-512

The following code snippet creates a CSV file loader, loads the rows from the file as documents into an in-memory vector store using the embedding class, creates a prompt template with the data from the CSV file as the context, creates a Q&A LLM chain, and executes the chain to get answers to questions from the LLM model:


from langchain_core.documents import Document
from langchain.document_loaders import CSVLoader
from langchain_core.vectorstores import InMemoryVectorStore
from langchain.chains import RetrievalQA

loader = CSVLoader(gpu_dataset, encoding='utf-8')

encoding = '\ufeff'
newline = '\n'
comma_space = ', '

documents = [Document(metadata=doc.metadata, 
                      page_content=doc.page_content.lstrip(encoding)
                                                   .replace(newline, comma_space)) for doc in loader.load()]

vector_store = InMemoryVectorStore(ollama_embedding)

vector_store.add_documents(documents)

template2 = """
Given the following context, answer the question based only on the provided context.

Context: {context}

Question: {question}
"""

prompt2 = PromptTemplate.from_template(template2)

retriever = vector_store.as_retriever()

qa_chain = RetrievalQA.from_chain_type(llm=ollama_llm, retriever=retriever, chain_type_kwargs={'prompt': prompt})

result5 = qa_chain.invoke({'query': 'what is the memory size on GeForce RTX 3070 Ti'})

print(result5)

print('-------------------------')

result6 = qa_chain.invoke({'query': 'who is the manufacturer of rx 7800 xt'})

print(result6)

print('-------------------------')

result7 = qa_chain.invoke({'query': 'what is the GPU chip on RTX 3090 Ti'})

print(result7)

Executing the above Python code snippet generates the following typical output:


Output.4

{'query': 'what is the memory size on GeForce RTX 3070 Ti', 'result': 'The memory size on GeForce RTX 3070 Ti is 8GB.'}
-------------------------
{'query': 'who is the manufacturer of rx 7800 xt', 'result': 'The manufacturer of RX 7800 XT is AMD.'}
-------------------------
{'query': 'what is the GPU chip on RTX 3090 Ti', 'result': "The GPU chip on NVIDIA's GeForce RTX 3090 Ti is GA102."}

We have successfully demonstrated the Python code snippet for Recipe 4.


Recipe 5 : Q&A on a PDF file content using LLM


For this recipe, we will use the Nvidia 3rd Quarter 2024 financial report to analyze it !!!

Also, ensure to install the additional Python module(s) by executing the following command:

$ pip install pypdf

The following code snippet creates a PDF file loader, loads the page chunks from the PDF file as embedded documents into the persistent vector store Chroma using the embedding class, creates a prompt template with the data from the PDF file as the context, creates a Q&A LLM chain, and executes the chain to get answers to questions from the LLM model:


from langchain.document_loaders import PyPDFLoader
from langchain_chroma import Chroma
from langchain.chains import RetrievalQA

nvidia_q3_2024 = './data/NVIDIA-3rd-Qtr.pdf'

pdf_loader = PyPDFLoader(nvidia_q3_2024)

pdf_pages = pdf_loader.load_and_split()

vector_store2 = Chroma(collection_name='pdf_docs', embedding_function=ollama_embedding, persist_directory=chroma_db_dir)

vector_store2.add_documents(pdf_pages)

template3 = """
Given the following context, answer the question based only on the provided context.

Context: {context}

Question: {question}
"""

prompt4 = PromptTemplate.from_template(template3)

retriever2 = vector_store2.as_retriever()

qa_chain2 = RetrievalQA.from_chain_type(llm=ollama_llm, retriever=retriever2, chain_type_kwargs={'prompt': prompt4})

result8 = qa_chain2.invoke({'query': 'what was the revenue in q3 2024'})

print(result8)

print('-------------------------')

result9 = qa_chain2.invoke({'query': 'what were the expenses in q3 2023'})

print(result9)

Executing the above Python code snippet generates the following typical output:


Output.5

{'query': 'what was the revenue in q3 2024', 'result': 'The revenue for Q3 2024 is $17,475 million.'}
-------------------------
{'query': 'what were the expenses in q3 2023', 'result': "In Q3 FY23, NVIDIA's operating expenses increased by 16% compared to Q2 FY23. The specific increase was $298 million from $2,576 million to $2,874 million. This growth in expenses is part of the company's strategy to support its growth engines such as GPUs, CPUs, networking, AI foundry services, and NVIDIA AI Enterprise software."}

We have successfully demonstrated the Python code snippet for Recipe 5.


References

Quick Primer on LangChain

LangChain Documentation



© PolarSPARC