Skip to main content

Ollama (local)

Ollama exposes an OpenAI-compatible API, so KhunQuant connects to it using the same openai_compat/ provider used for OpenAI. No separate installation is needed beyond Ollama itself.

Prerequisites

Install and start Ollama:

# macOS
brew install ollama
ollama serve

# Pull a model
ollama pull llama3.2

Configuration

Add to ~/.khunquant/config.json:

{
"model_list": [
{
"model_name": "local",
"model": "llama3.2",
"api_base": "http://localhost:11434/v1",
"api_key": "ollama"
}
]
}

No .security.yml entry is needed — api_key: "ollama" is a placeholder that Ollama accepts.

Using a different model

Pull any model from the Ollama library and update the model field:

ollama pull mistral
{
"model_name": "local",
"model": "mistral",
"api_base": "http://localhost:11434/v1",
"api_key": "ollama"
}

Remote Ollama instance

If Ollama runs on a different machine, change api_base:

{
"api_base": "http://192.168.1.100:11434/v1"
}
Low-resource devices

Ollama is ideal for running KhunQuant on Raspberry Pi, MIPS routers, or RISC-V boards where cloud API costs or latency are a concern. KhunQuant's binary targets (linux/arm, linux/mips, linux/riscv64) are designed for exactly this use case.