Skip to main content
Tale connects to AI models through providers — any OpenAI-compatible endpoint qualifies, including local runtimes like Ollama, vLLM, and LocalAI. Running agents against a local provider keeps prompts, completions, and knowledge-base context inside your own network; nothing reaches a hosted model vendor. This tutorial walks through adding Ollama as a provider, attaching a model to an agent, and switching it on. You need Admin access. A working Ollama or vLLM server reachable from your Tale instance is the only external prerequisite — see their respective install guides.

Step 1 — Start the local runtime

For Ollama on the same host as Tale:
ollama pull llama3.3
ollama serve
Ollama listens on http://localhost:11434 by default. Confirm it responds:
curl -s http://localhost:11434/api/tags | jq '.models[].name'
If Tale runs in Docker, point it at the host instead of localhost — on Linux, use http://host.docker.internal:11434 with an explicit Docker extra-host, or the host’s LAN IP. See Self-hosted configuration for the networking options.

Step 2 — Add the provider in Tale

Navigate to Settings > Providers and click Add provider. Fill in:
FieldValue
Nameollama-local (slug, used internally)
Display nameOllama (local)
Base URLhttp://host.docker.internal:11434/v1
API keyAny non-empty value — Ollama does not auth
Add one model entry per runtime model you want to expose:
FieldValue
IDllama3.3 (must match Ollama exactly)
Display nameLLaMA 3.3
Tagschat
Save. The provider appears in the list. For vLLM the process is identical; the base URL is whatever you configured (commonly http://vllm.internal:8000/v1), and the model ID must match the --served-model-name flag you started vLLM with.

Step 3 — Add the model to an agent

Open Agents, pick the agent you want to run on the local model, and open its JSON at TALE_CONFIG_DIR/agents/<slug>.json. Add the model ID to supportedModels:
{
  "supportedModels": ["llama3.3"]
}
If two providers define the same model ID, prefix with the provider slug to pin routing:
{
  "supportedModels": ["ollama-local:llama3.3"]
}
See Providers — Making models available to agents for the full rules.

Step 4 — Test from chat

Open Chat, pick the agent, and ask anything. In Settings > Usage analytics or the agent’s conversation history, confirm the request was served by ollama-local — the model ID appears in the thread metadata. Latency will be higher than a hosted frontier model; that is expected on most hardware. If Tale falls back to a different provider, either the model ID does not match Ollama’s list, or supportedModels still contains a frontier-model entry that takes precedence — remove or reorder.

Step 5 — Wire it into the must-have tutorials

Both admin must-haves benefit from a local provider:
  • Office Agents — the add-in hits Tale; Tale routes to the local model. No change to the add-in side. See Word & Excel add-in.
  • Meeting transcription — Meetily already runs Whisper locally; adding a local provider closes the loop so the summarisation LLM is also local. See Meeting transcription.

Troubleshooting

  • Tale cannot reach Ollama from Dockerlocalhost inside the Tale container is not the host. Use host.docker.internal (Docker Desktop), the host LAN IP, or put Ollama and Tale on the same Docker network.
  • 404 on model — model ID is case-sensitive and must match the name ollama list prints.
  • Empty or very short answers — default Ollama context window is small. Pull a larger-context variant or override num_ctx in the model’s Modelfile.
  • Encrypted API key required — if you edit provider files directly, the API-key file must be SOPS-encrypted. Setting the key via the UI handles encryption for you; see Providers — SOPS-encrypted secrets.
Last modified on April 21, 2026