Ollama (local)
Ollama exposes an OpenAI-compatible API, so KhunQuant connects to it using the same openai_compat/ provider used for OpenAI. No separate installation is needed beyond Ollama itself.
Prerequisites
Install and start Ollama:
# macOS
brew install ollama
ollama serve
# Pull a model
ollama pull llama3.2
Configuration
Add to ~/.khunquant/config.json:
{
"model_list": [
{
"model_name": "local",
"model": "llama3.2",
"api_base": "http://localhost:11434/v1",
"api_key": "ollama"
}
]
}
No .security.yml entry is needed — api_key: "ollama" is a placeholder that Ollama accepts.
Using a different model
Pull any model from the Ollama library and update the model field:
ollama pull mistral
{
"model_name": "local",
"model": "mistral",
"api_base": "http://localhost:11434/v1",
"api_key": "ollama"
}
Remote Ollama instance
If Ollama runs on a different machine, change api_base:
{
"api_base": "http://192.168.1.100:11434/v1"
}
Low-resource devices
Ollama is ideal for running KhunQuant on Raspberry Pi, MIPS routers, or RISC-V boards where cloud API costs or latency are a concern. KhunQuant's binary targets (linux/arm, linux/mips, linux/riscv64) are designed for exactly this use case.