Run Ollama on Intel Arc GPU (IPEX)

As of the time of writing, Ollama does not officially support Intel Arc GPUs in its releases. However, Intel provides a Docker image that includes a version of Ollama compiled with Arc GPU support enabled. This guide will walk you through setting up and running Ollama on your Intel Arc GPU using the IPEX (Intel OneAPI Extension for XPU) Docker image.

Prerequisites

Before proceeding, ensure you have the following installed and properly configured:

Docker Desktop
Intel Arc GPU drivers

Links to the installation guides for Docker and the Arc drivers are provided at the end of this article. Be sure to follow the appropriate guide for your operating system.

Set Up Ollama Container

Pull the Intel Analytics IPEX Image:

Pull the Intel Analytics IPEX image from Docker Hub:

   docker pull intelanalytics/ipex-llm-inference-cpp-xpu:latest

Start the Container with Ollama Serve:

Because the Docker command to start the container is quite long, it’s convenient to save it to a script for easy adjustment and restarting.

Mac and Linux users: Create a file named start-ipex-llm.sh in your home directory and add the following content:

   #!/bin/bash

   docker run -d --restart=always 
       --net=bridge 
       --device=/dev/dri 
       -p 11434:11434 
       -v ~/.ollama/models:/root/.ollama/models 
       -e PATH=/llm/ollama:$PATH 
       -e OLLAMA_HOST=0.0.0.0 
       -e no_proxy=localhost,127.0.0.1 
       -e ZES_ENABLE_SYSMAN=1 
       -e OLLAMA_INTEL_GPU=true 
       -e ONEAPI_DEVICE_SELECTOR=level_zero:0 
       -e DEVICE=Arc 
       --shm-size="16g" 
       --memory="32G" 
       --name=ipex-llm 
       intelanalytics/ipex-llm-inference-cpp-xpu:latest 
       bash -c "cd /llm/scripts/ && source ipex-llm-init --gpu --device Arc && bash start-ollama.sh && tail -f /llm/ollama/ollama.log"

Once you have the script saved, make it executable (for Mac and Linux users) and run it:

   chmod +x ~/start-ipex-llm.sh
   ~/start-ipex-llm.sh

Windows users: Create a file named start-ipex-llm.bat and adjust the above command for the Windows terminal. Make sure to modify paths and syntax accordingly.

Explanation of Flags:

--restart=always: Ensures the container restarts automatically if it stops.
--device=/dev/dri: Grants the container access to the GPU device.
--net=bridge: Uses the bridge networking driver.
-p 11434:11434: Maps port 11434 of the container to port 11434 on the host.
-e OLLAMA_HOST=0.0.0.0: Sets the host IP address for Ollama to listen on all interfaces. This allows other systems to call the Ollama API.
-e no_proxy=localhost,127.0.0.1: Prevents Docker from using the proxy server when connecting to localhost or 127.0.0.1.
-e ONEAPI_DEVICE_SELECTOR=level_zero:0: Tells Ollama which GPU device to use. This may need to be adjusted if you have an iGPU installed on your system.
-e PATH=/llm/ollama:$PATH: Adds the Ollama binary directory (/llm/ollama) to the PATH. This allow Ollama commands to be executed easily with docker run commands.
-v ~/.ollama/models:/root/.ollama/models: Mounts the host’s Ollama models directory into the container. This allows downloaded models to be persisted when the container restarts.
--shm-size="16g": Sets the shared memory size to 16 GB. This setting may need to be adjusted for your system. See the Docker documentation for more information on shared memory.
--memory="32G": Limits the container’s memory usage to 32 GB. This setting may need to be adjusted for your system. See the Docker documentation for more information on memory usage.
--name=ipex-llm: Names the container ipex-llm. This name is used to reference the container in other commands.

Download a Model:

Once the container is up, you can pull a model from the Ollama library. Replace <MODEL_ID> with the specific model ID you wish to download (e.g., qwen2.5-coder:0.5b, llama3.2).

   docker exec ipex-llm ollama pull <MODEL_ID>

You can browse the Ollama model library for more options here.

Using Ollama

With your desired model(s) downloaded, you can interact with them directly using the Ollama CLI, make API calls, or integrate with various tools. Below are some ways to get started.

Using the Ollama CLI

The Ollama CLI allows you to interact with models directly from your terminal. Any ollama command that you would typically run locally can now be executed within your container by prefixing the command with docker exec -it ipex-llm.

For example, to interact with the model you downloaded earlier:

docker exec -it ipex-llm ollama run <MODEL_ID>

Check the Ollama CLI Reference for more information about available commands.

Making API Calls

You can make API requests to the Ollama model endpoint:

curl http://localhost:11434/v1/completions 
     -H "Content-Type: application/json" 
     -d '{
           "model": "<MODEL_ID>",
           "prompt": "Write a JavaScript function that takes an array of numbers and returns the sum of all elements in the array."
         }'

Additional Tools

Once Ollama is running, you can leverage it with a variety of AI tools. Here are a few of my favorites:

Open WebUI: A user-friendly interface for interacting with AI models, offering many features similar to ChatGPT.
Continue.dev: An extension for VSCode and JetBrains that provides “Co-pilot” capabilities.
Aider: One of the first and still one of the great AI coding assistants.
CrewAI: An easy to use AI agent framework that can be used with Ollama models to run AI agents locally.

Feel free to suggest others that should be added to this list.

Troubleshooting

If you encounter issues, consider the following steps:

Verify GPU Access:

Use sycl-ls within the container to check if the Arc GPU is recognized.

To start an interactive shell within the container:

   docker exec -it ipex-llm /bin/bash

Then run:

   sycl-ls

You can find helpful tips here.

Check Ollama Logs:

Monitor the logs for any errors:

   docker logs ipex-llm -f

Update Docker and Drivers:

Ensure that both Docker and your GPU drivers are up to date.

Consult Community Resources:

Refer to Intel’s GitHub repositories and community forums for additional support:

Conclusion

Running Ollama on your Intel Arc GPU is straightforward once you have the proper drivers installed and Docker running. With your system set up, it’s as simple as running any other Docker container with a few extra arguments.

Keep an eye on the Ollama GitHub repository for updates, and consider contributing to pull requests to bring Intel Arc support to the official Ollama builds.

Additional Resources

Below is a running list of related links. Please feel free to comment others that you think should be added to the list.

Intel Arc Driver Installation:
Docker Installation Guides:

Ollama Links

Source link
lol