Site icon Voina Blog (a tech warrior's blog)

How to force locally run Ollama AI models to use all your CPU or GPU cores

Advertisements

Experimenting with different Ollama sourced AI models I discovered that sometimes in a very strange way my CPU or GPU resources are not used efficiently.

By looking at the way Ollama models are packed we see that they are basically very similar to docker images where a full environment is specified.

For example we can list the config file for qwen2.5-coder:

✦ ❯ ollama show qwen2.5-coder:32b --modelfile
# Modelfile generated by "ollama show"
# To build a new Modelfile based on this, replace FROM with:
# FROM qwen2.5-coder:32b

FROM /usr/share/ollama/.ollama/models/blobs/sha256-ac3d1ba8aa77755dab3806d9024e9c385ea0d5b412d6bdf9157f8a4a7e9fc0d9
TEMPLATE """{{- if .Suffix }}<|fim_prefix|>{{ .Prompt }}<|fim_suffix|>{{ .Suffix }}<|fim_middle|>
{{- else if .Messages }}
{{- if or .System .Tools }}<|im_start|>system
{{- if .System }}
{{ .System }}
{{- end }}
{{- if .Tools }}

# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{{- range .Tools }}
{"type": "function", "function": {{ .Function }}}
{{- end }}
</tools>

For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>
{{- end }}<|im_end|>
{{ end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 -}}
{{- if eq .Role "user" }}<|im_start|>user
{{ .Content }}<|im_end|>
{{ else if eq .Role "assistant" }}<|im_start|>assistant
{{ if .Content }}{{ .Content }}
{{- else if .ToolCalls }}<tool_call>
{{ range .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
{{ end }}</tool_call>
{{- end }}{{ if not $last }}<|im_end|>
{{ end }}
{{- else if eq .Role "tool" }}<|im_start|>user
<tool_response>
{{ .Content }}
</tool_response><|im_end|>
{{ end }}
{{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
{{ end }}
{{- end }}
{{- else }}
{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
{{ end }}{{ .Response }}{{ if .Response }}<|im_end|>{{ end }}"""
SYSTEM You are Qwen, created by Alibaba Cloud. You are a helpful assistant.
LICENSE """
                                 Apache License
                           Version 2.0, January 2004
...

As you can see we can very easily extend this model if we use the above config file dumped in put own file and change the FROM directive.

We can now create a new model by defining a file custom-qwen2.5-coder where we dump the above mod file

ollama show qwen2.5-coder:32b --modelfile >> custom-qwen2.5-coder

Then edit and make it like this:

...
FROM qwen2.5-coder:32b
...
SYSTEM You are Qwen, created by Alibaba Cloud. You are a helpful assistant.
PARAMETER num_gpu 0
PARAMETER num_thread 18

LICENSE """
                                 Apache License
...

Note the new PARAMETER directive that will configure how the model will be used.

The following article is a good tutorial but to sum-up we can set the following parameters:

Note that basically we changed only the allocation of GPU cores and threads.

PARAMETER num_gpu 0 this will just tell the ollama not to use GPU cores (I do not have a good GPU on my test machine). But you can use it to maximize the use of your GPU.

PARAMETER num_thread 18 this will just tell ollama to use 18 threads so using better the CPU resources. Note that usually models are configured in a conservative way. For example qwen2.5 was using a maximum of 6 CPU cores (6 threads) even if my machine has 20 cores.

To create a new model image just run the command:

ollama create custom-qwen2.5-coder:32b -f ./custom-qwen2.5-coder

Check if the new model was created:

✦ ❯ ollama list
NAME                        ID              SIZE     MODIFIED     
custom-qwen2.5-coder:32b    353407281410    19 GB    14 hours ago    
qwen2.5-coder:32b           4bd6cbf2d094    19 GB    15 hours ago    

In the end do not forget to set the custom model as the default model in your tests.

Exit mobile version