PolarSPARC

Quick Primer on Running GGUF models on Ollama


Bhaskar S 01/04/2025


GPT-Generated Unified Format (or GGUF for short) is a binary file format for efficient storage, distribution, and deployment of LLM models.

In the article on Ollama , we demonstrated how one can deploy LLM model(s) on a local desktop.

In this article, we will demonstrate how one can deploy and use any LLM model in the GGUF format using Ollama.

The installation and setup will be on a Ubuntu 24.04 LTS based Linux desktop. Ensure that Docker is installed and setup on the desktop (see instructions).

Also, ensure the command-line utility curl is installed on the Linux desktop.

The following are the steps one can follow to download, deploy, and use a GGUF model in Ollama:

Open two Terminal windows (referred to as term-1 and term-2).

To test the 8-bit quantized DeepSeek-v3 model, execute the following docker command in term-2:


$ docker exec -it ollama ollama run deepseek-v3-Q8_0:latest


After the LLM model is loaded (which will take a few seconds the first time), the command would wait for an user input at the prompt >>>Send a message (/? for help).

To test the just loaded LLM model, execute the following user prompt:


>>> assuming ollama llm chat, generate python code using langchain to chat with the llm with message history


Using DeepSeek-v3, the following would be the typical response:


Output.4

```python
from langchain.chat_models import OLLAMA_LLM
from langchain.schema import (
    AIMessage, 
    HumanMessage, 
    SystemMessage
)

class LLMWithHistory:
    
    def __init__(self):
        self.llm = OLLAMA_LLM()
        self.messages = []
        
    def add_message(self, message):
        if isinstance(message, str):
            message = HumanMessage(content=message)
        self.messages.append(message)
    
    def get_response(self):
        ai_message = AIMessage(content=str(self.llm([msg.content for msg in self.messages], return_generated=True)))
        self.add_message(ai_message)
        return ai_message
        
chatbot = LLMWithHistory()
chatbot.add_message("Hello, how are you?")
print(chatbot.get_response().content) # "Hello!"
```

Interesting - there is no class called OLLAMA_LLM !!!

Using ChatGPT, the following was the generated response:


Output.5

from langchain.chat_models import ChatOpenAI
from langchain.schema import AIMessage, HumanMessage, SystemMessage

# Initialize the chat model
# Replace 'YOUR_API_KEY' with your actual API key or authentication method
llm = ChatOpenAI(model="ollama-llm", temperature=0.7)

# Initialize the message history
message_history = [
    SystemMessage(content="You are a helpful assistant."),
]

def chat_with_llm(user_input):
    global message_history

    # Add the user's message to the history
    message_history.append(HumanMessage(content=user_input))

    # Generate a response from the LLM
    response = llm(message_history)

    # Add the AI's response to the history
    message_history.append(AIMessage(content=response.content))

    # Return the AI's response
    return response.content

# Example usage
if __name__ == "__main__":
    print("Chat with the LLM. Type 'exit' to end the conversation.")
    while True:
        user_input = input("You: ")
        if user_input.lower() == "exit":
            print("Ending the chat. Goodbye!")
            break

        ai_response = chat_with_llm(user_input)
        print(f"AI: {ai_response}")

Interesting - the initialization of the class ChatOpenAI is not correct !!!

On to the next test to solve a Calculus Derivative problem using the following prompt:


>>> find the derivative of y = (x + 1) * sqrt(x)


Using DeepSeek-v3, the following would be the typical response:


DeepSeek-v3
DeepSeek-v3

Hmm - the answer is WRONG !!!

Using ChatGPT, the following was the generated response:


ChatGPT
ChatGPT

Good - the answer is CORRECT !!!

To exit the user input, execute the following user prompt:


>>> /bye


With this, we conclude this article on downloading, deploying, and using LLM models in the GGUF format !!!


References

GGUF Format

GGUF Models on HuggingFace



© PolarSPARC