Ollama (local)

Ollama exposes an OpenAI-compatible API, so KhunQuant connects to it using the same openai_compat/ provider used for OpenAI. No separate installation is needed beyond Ollama itself.

Prerequisites

Install and start Ollama:

# macOS
brew install ollama
ollama serve

# Pull a model
ollama pull llama3.2

Configuration

Add to ~/.khunquant/config.json:

{
  "model_list": [
    {
      "model_name": "local",
      "model": "llama3.2",
      "api_base": "http://localhost:11434/v1",
      "api_key": "ollama"
    }
  ]
}

No .security.yml entry is needed — api_key: "ollama" is a placeholder that Ollama accepts.

Using a different model

Pull any model from the Ollama library and update the model field:

ollama pull mistral

{
  "model_name": "local",
  "model": "mistral",
  "api_base": "http://localhost:11434/v1",
  "api_key": "ollama"
}

Remote Ollama instance

If Ollama runs on a different machine, change api_base:

{
  "api_base": "http://192.168.1.100:11434/v1"
}

Low-resource devices

Ollama is ideal for running KhunQuant on Raspberry Pi, MIPS routers, or RISC-V boards where cloud API costs or latency are a concern. KhunQuant's binary targets (linux/arm, linux/mips, linux/riscv64) are designed for exactly this use case.

Prerequisites​

Configuration​

Using a different model​

Remote Ollama instance​

Prerequisites

Configuration

Using a different model

Remote Ollama instance