PolarSPARC

Quick Primer on Ollama


Bhaskar S 07/04/2024


Overview

Ollama is a powerful open source platform that simplifies the process of running various Large Language Models (or LLM s for short) on a local machine. It enables one to download the various pre-trained LLM models such as, Meta llama3, Microsoft phi-3, Google Gemma-2 and run them locally.

In addition, the Ollama platform exposes a local API endpoint, which enables developers to build AI applications/workflows that can interact with the local LLMs using the API endpoint.

Last but not the least, the Ollama platform effectively leverages the underlying hardware resouces of the local machine, such as CPU(s) and GPU(s), to efficiently and optimally run the LLMs for better performance.

In this primer, we will demonstrate how one can effectively setup and run the Ollama platform using the Docker image.


Installation and Setup

The installation and setup will be on a Ubuntu 22.04 LTS based Linux desktop. Ensure that Docker is installed and setup on the desktop (see instructions).

Also, ensure that the Python 3.x programming language as well as the Jupyter Notebook packages are installed. In addition, ensure the command-line utilities curl and jq are installed on the Linux desktop.

We will setup two required directories by executing the following command in a terminal window:


$ mkdir $HOME/.ollama $HOME/.open-webui

To pull and download the docker image for Ollama, execute the following command in a terminal window:


$ docker pull ollama/ollama:0.1.48

The following should be the typical output:


Output.1

0.1.48: Pulling from ollama/ollama
7646c8da3324: Pull complete 
d1060ab4fb75: Pull complete 
e58f7d737fbb: Pull complete 
Digest: sha256:4a3c5b5261f325580d7f4f6440e5094d807784f0513439dcabfda9c2bdf4191e
Status: Downloaded newer image for ollama/ollama:0.1.48
docker.io/ollama/ollama:0.1.48

Next, to pull and download the docker image for open-webui, execute the following command in a terminal window:


$ docker pull ghcr.io/open-webui/open-webui:0.3.7

The following should be the typical output:


Output.2

0.3.7: Pulling from open-webui/open-webui
2cc3ae149d28: Pull complete 
dc57dfa1396c: Pull complete 
b275de30f399: Pull complete 
0ea58f563222: Pull complete 
251072225b40: Pull complete 
130662f3df11: Pull complete 
4f4fb700ef54: Pull complete 
de53a1836181: Pull complete 
d28e6308a168: Pull complete 
e2c345686679: Pull complete 
79a5f49fad7c: Pull complete 
7fda7d409e89: Pull complete 
3d58b296e488: Pull complete 
5825bf31e383: Pull complete 
23ecdec4be2a: Pull complete 
7a69894d20be: Pull complete 
Digest: sha256:0a424a7ab62cb8a7b7c9bcc0978f9a29f4ba94c30e8ced702e0aea176111d334
Status: Downloaded newer image for ghcr.io/open-webui/open-webui:0.3.7
ghcr.io/open-webui/open-webui:0.3.7

To install the necessary Python packages, execute the following command:


$ pip install ollama


This completes all the system installation and setup for the Ollama hands-on demonstration.


Hands-on with Ollama

Assuming that the ip address of the desktop is 192.168.1.25, start the Ollama platform by executing the following command in the terminal window:


$ docker run --rm --name ollama --network="host" -p 192.168.1.25:11434:11434 -v $HOME/.ollama:/root/.ollama ollama/ollama:0.1.48


The following should be the typical output:


Output.3

Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
Your new public key is: 

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFWDUmhQHV9dYigEEZ8EMo5sCaBiAwdwXmWOg50f2DTF

2024/07/03 23:41:49 routes.go:1064: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-07-03T23:41:49.407Z level=INFO source=images.go:730 msg="total blobs: 0"
time=2024-07-03T23:41:49.407Z level=INFO source=images.go:737 msg="total unused blobs removed: 0"
time=2024-07-03T23:41:49.407Z level=INFO source=routes.go:1111 msg="Listening on [::]:11434 (version 0.1.48)"
time=2024-07-03T23:41:49.408Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama438152879/runners
time=2024-07-03T23:41:52.664Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11 rocm_v60101 cpu]"
time=2024-07-03T23:41:52.666Z level=INFO source=types.go:98 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="62.7 GiB" available="59.9 GiB"

Notice that we are using the host networking option with docker for a reason that will become evident very soon.

If the linux desktop has NVidia GPU with decent amount of VRAM and has been enabled for use in docker (see instructions), then execute the following command instead to start Ollama:


$ docker run --rm --name ollama --gpus=all --network="host" -p 192.168.1.25:11434:11434 -v $HOME/.ollama:/root/.ollama ollama/ollama:0.1.48


For the hands-on demonstration, we will download and use the Microsoft Phi-3 Mini pre-trained LLM model.

Open a new terminal window and execute the following docker command to download the LLM model:


$ docker exec -it ollama ollama run phi3:mini


The following should be the typical output:


Output.4

pulling manifest 
pulling 3e38718d00bb... 100% ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 2.2 GB                         
pulling fa8235e5b48f... 100% ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 1.1 KB                         
pulling 542b217f179c... 100% |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||  148 B                         
pulling 8dde1baf1db0... 100% |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||   78 B                         
pulling ed7ab7698fdd... 100% |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||  483 B                         
verifying sha256 digest 
writing manifest 
removing any unused layers 
success

After the pre-trained LLM model is downloaded successfully, the command would wait for an user input.

To test the just downloaded LLM model, execute the following user prompt:


>>> describe a gpu in less than 50 words


The following should be the typical output:


Output.5

A GPU, or Graphics Processing Unit, is a specialized electronic circuit designed to handle intensive graphical computations 
and rendering by efficiently processing large blocks of data simultaneously. It's essential for accelerating the creation of 
images, animations, and visual effects on computers.

Now that I have provided you with an explanation in less than 50 words: A GPU is a dedicated circuit designed to rapidly 
compute graphics tasks through parallelized calculations, crucial for rendering high-quality visual content like games or 
videos efficiently.

To exit the user input, execute the following user prompt:


>>> /bye


Now, shifting gears to test the local API endpoint, open a new terminal window and execute the following command to list all the LLM models that are hosted in the running Ollama platform:


$ curl http://192.168.1.25:11434/api/tags | jq


The following should be the typical output:


Output.6

{
  "models": [
    {
      "name": "phi3:mini",
      "model": "phi3:mini",
      "modified_at": "2024-07-03T23:44:37.053278672Z",
      "size": 2176178401,
      "digest": "d184c916657ef4eaff1908b1955043cec01e7aafd2cef8a5bbfd405a7d35d1fb",
      "details": {
        "parent_model": "",
        "format": "gguf",
        "family": "phi3",
        "families": [
          "phi3"
        ],
        "parameter_size": "3.8B",
        "quantization_level": "Q4_0"
      }
    }
  ]
}

From the Output.6 above, it is eveident we only have one LLM model - phi3:mini !

Next, to send a user prompt to the LLM model for a response, execute the following command:


$ curl http://localhost:11434/api/generate -d '{
  "model": "phi3:mini",
  "prompt": "describe a gpu in less than 50 words",
  "stream": false
}' | jq

The following should be the typical output:


Output.7

{
  "model": "phi3:mini",
  "created_at": "2024-07-04T00:30:10.472200226Z",
  "response": " A GPU (Graphics Processing Unit) is specialized hardware designed to accelerate the rendering of images, videos, and animations. It excels at parallel processing tasks essential for graphics-intensive applications like video games or CAD software but can also handle general computing workloads when optimized with frameworks such as OpenCL or CUDA.",
  "done": true,
  "done_reason": "stop",
  "context": [
    32010, 8453, 263, 330, 3746, 297, 3109, 393, 29871, 29945, 29900, 3838, 32007, 32001, 319, 22796, 313, 17290, 10554, 292, 13223, 29897, 338, 4266, 1891, 12837, 8688, 304, 15592, 403, 278, 15061, 310, 4558, 29892, 19707, 29892, 322, 3778, 800, 29889, 739, 5566, 1379, 472, 8943, 9068, 9595, 18853, 363, 18533, 29899, 524, 6270, 8324, 763, 4863, 8090, 470, 315, 3035, 7047, 541, 508, 884, 4386, 2498, 20602, 664, 18132, 746, 27545, 411, 29143, 1316, 408, 4673, 6154, 470, 315, 29965, 7698, 29889, 32007
  ],
  "total_duration": 5252315330,
  "load_duration": 762191388,
  "prompt_eval_count": 14,
  "prompt_eval_duration": 261804000,
  "eval_count": 70,
  "eval_duration": 4184915000
}

WALLA - we have successfully tested the local API endpoints !

Now we will shift gears to get our hands dirty with Open WebUI, which is another open source project that provides an extensible, feature-rich, and user-friendly self-hosted WebUI designed to operate entirely offline. It supports various LLM runners, including Ollama and other OpenAI-compatible APIs.

To start the Open WebUI platform, execute the following command in a new terminal window:


$ docker run --rm --name open-webui --network="host" --add-host=host.docker.internal:host-gateway -v $HOME/.open-webui:/app/backend/data -e OLLAMA_API_BASE_URL=http://192.168.1.25:11434/api ghcr.io/open-webui/open-webui:0.3.7


The following should be the typical output:


Output.8

Loading WEBUI_SECRET_KEY from file, not provided as an environment variable.
Generating WEBUI_SECRET_KEY
Loading WEBUI_SECRET_KEY from .webui_secret_key
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
/app

  ___                    __        __   _     _   _ ___ 
  / _ \ _ __   ___ _ __   \ \      / /__| |__ | | | |_ _|
| | | | '_ \ / _ \ '_ \   \ \ /\ / / _ \ '_ \| | | || | 
| |_| | |_) |  __/ | | |   \ V  V /  __/ |_) | |_| || | 
  \___/| .__/ \___|_| |_|    \_/\_/ \___|_.__/ \___/|___|
      |_|                                               

      
v0.3.7 - building the best open-source AI user interface.

https://github.com/open-webui/open-webui

INFO:apps.openai.main:get_all_models()
INFO:apps.ollama.main:get_all_models()
INFO:     127.0.0.1:46718 - "GET /health HTTP/1.1" 200 OK


!!! ATTENTION !!!

The docker command option --add-host=host.docker.internal:host-gateway along with the "host" network option is very IMPORTANT as it enables a container to connect to services on the host

Open the web browser and enter the URL link http://192.168.1.25:8080.

The following illustration depicts the browser prompting the user to authenticate or register:


Login or Register
Figure.1

Click on the Sign up link and register as user alice as shown in the following illustration:


Register User
Figure.2

Click on the Create Account button and the user alice is logged in and presented with a screen as shown in the following illustration:


Logged In User
Figure.3

Click on the Select a model drop-down to select the one LLM model to interact with as shown in the following illustration:


Model Selection
Figure.4

Once we have the LLM model selected, enter a user prompt in the bottom textbox and click on the Up Arrow as shown in the following illustration:


Enter Prompt
Figure.5

The LLM model will respond with a text corresponding to the user prompt as shown in the following illustration:


LLM Response
Figure.6

Finally, we will test Ollama using Python code snippets.

To list all the LLM models that are hosted in the running Ollama platform, execute the following code snippet:


from ollama import Client

client = Client(host='http://192.168.1.25:11434')

client.list()

The following should be the typical output:


Output.9

{'models': [{'name': 'phi3:mini',
  'model': 'phi3:mini',
  'modified_at': '2024-07-03T23:44:37.053278672Z',
  'size': 2176178401,
  'digest': 'd184c916657ef4eaff1908b1955043cec01e7aafd2cef8a5bbfd405a7d35d1fb',
  'details': {'parent_model': '',
    'format': 'gguf',
    'family': 'phi3',
    'families': ['phi3'],
    'parameter_size': '3.8B',
    'quantization_level': 'Q4_0'}}]}

To send a user prompt to the LLM model running on the Ollama platform, execute the following code snippet:


client.chat(model='phi3:mini', messages=[{'role': 'user', 'content': 'Describe ollama in less than 50 words'}])

The following should be the typical output:


Output.10

{'model': 'phi3:mini',
  'created_at': '2024-07-03T23:54:24.849518978Z',
  'message': {'role': 'assistant',
    'content': " Ollama is a multi-lingual chatbot and language learning companion that provides interactive, real-time conversation practice with users worldwide. It's accessible online at any time for free education on various languages without traditional classroom constraints."},
  'done_reason': 'stop',
  'done': True,
  'total_duration': 4246531629,
  'load_duration': 765618210,
  'prompt_eval_count': 15,
  'prompt_eval_duration': 284682000,
  'eval_count': 50,
  'eval_duration': 3151915000}

This concludes the various demonstrations on using the Ollama platform for running and working with the pre-trained LLM models locally !


References

Ollama

Ollama API

Open WebUI



© PolarSPARC