Operational

Intelligent
AI Gateway

Enterprise-grade proxy for large language model inference. Secure, fast, and built for scale.

Available Models

Discover models via GET /api/tags — local, cloud, and Anthropic inference

Local Models — runs on local hardware via /api/chat

Loading local models...

Cloud Models — Ollama Cloud via /api/ollama/api/chat

Loading cloud models...

Anthropic Models — Anthropic Claude via /api/anthropicChat

Loading Anthropic models...

API Endpoints

Integrate with your application using these endpoints

GET
/api/tags

Discover all available models (local & cloud). No auth required. Call this first to get exact model names.

// No auth needed GET /api/tags // Response { "models": [ { "name": "gemma4:26b", "provider": "local", "endpoint": "/api/chat", "details": { "parameter_size": "26B", ... } } ] }
POST
/api/chat

Send a conversation with message history and get a response.

// Headers Authorization: Bearer YOUR_API_KEY Content-Type: application/json // Request body { "model": "qwen3:8b", "messages": [ { "role": "user", "content": "Hello!" } ] } // Response { "model": "qwen3:8b", "message": { "role": "assistant", "content": "Hi there!" }, "done": true }
POST
/api/generate

Send a single prompt and get a text completion.

// Headers Authorization: Bearer YOUR_API_KEY Content-Type: application/json // Request body { "model": "qwen3:8b", "prompt": "Explain AI in one sentence." } // Response { "model": "qwen3:8b", "response": "AI is...", "done": true }
GET
/api/v1/health

Verify your API key is valid and the service is reachable.

// Headers Authorization: Bearer YOUR_API_KEY // 200 Response (key valid) { "status": "ok", "app": "YourAppName" } // 401 Response (key invalid/revoked) { "message": "Unauthorized" }

All endpoints require an API key via Authorization: Bearer or X-API-Key header

Ollama Cloud Proxy

Route your Ollama Cloud requests through the gateway — just swap the base URL

POST
/api/ollama/api/chat

Proxy to Ollama Cloud chat. Replaces calling ollama.com/api/chat directly.

// Headers Authorization: Bearer YOUR_API_KEY Content-Type: application/json // Request body { "model": "gemma4:31b-cloud", "messages": [ { "role": "user", "content": "Hello!" } ] } // Available models // gemma4:31b-cloud // gpt-oss:120b-cloud // minimax-m2.7:cloud
POST
/api/ollama/api/generate

Proxy to Ollama Cloud generate. Replaces calling ollama.com/api/generate directly.

// Headers Authorization: Bearer YOUR_API_KEY Content-Type: application/json // Request body { "model": "gpt-oss:120b-cloud", "prompt": "Explain AI in one sentence." } // Migration from direct Ollama Cloud // Old: https://ollama.com/api/generate // New: https://llm.zithera.com.au/api/ollama/api/generate

Apps just change base URL from https://ollama.com to https://llm.zithera.com.au/api/ollama — paths stay the same

Anthropic Cloud Proxy

Route your Anthropic Claude requests through the gateway

POST
/api/anthropicChat

Proxy to Anthropic Messages API. Sends a conversation and gets a response. Streams by default via SSE.

// Headers Authorization: Bearer YOUR_API_KEY Content-Type: application/json // Request body { "model": "claude-opus-4-7", "messages": [ { "role": "user", "content": "Hello, Claude" } ], "max_tokens": 1024 } // max_tokens optional (default 4096) // stream optional (default true, set false for single JSON) // system, tools, temperature, top_p — all passed through
POST
/api/anthropicGenerate

Convenience wrapper — pass a single prompt string and the gateway wraps it as a user message before calling Anthropic.

// Headers Authorization: Bearer YOUR_API_KEY Content-Type: application/json // Request body { "model": "claude-sonnet-4-6", "prompt": "Explain AI in one sentence.", "max_tokens": 256 } // Equivalent to /api/anthropicChat with: // messages: [{ role: "user", content: prompt }]

Apps send Anthropic-native request bodies — all Anthropic features (tool use, vision, prompt caching, extended thinking) work via the body fields you'd use directly against api.anthropic.com

Agent Instructions

Copy and paste these into your AI coding agent (Claude Code, Cursor, Copilot, etc.)

Local LLM via Zithera Gateway
## LLM API Integration — Zithera Gateway (Local Models) ### Environment Variable ``` LLM_END_POINT=https://llm.zithera.com.au LLM_API_KEY=YOUR_API_KEY ``` Authentication: Include API key in every request via one of: - Header: `Authorization: Bearer ${LLM_API_KEY}` - Header: `X-API-Key: ${LLM_API_KEY}` ### GET ${LLM_END_POINT}/api/tags — Discover available models (no auth required) ``` GET ${LLM_END_POINT}/api/tags ``` Response: `{ "models": [{ "name": "gemma4:26b", "provider": "local", "endpoint": "/api/chat", ... }] }` IMPORTANT: Always call /api/tags first to get the exact model names before making chat/generate requests. Each model includes a `provider` field ("local" or "cloud") and an `endpoint` field showing which API path to use. ### POST ${LLM_END_POINT}/api/chat — Chat completion (with message history) ``` POST ${LLM_END_POINT}/api/chat Content-Type: application/json Authorization: Bearer ${LLM_API_KEY} { "model": "qwen3:8b", "messages": [ { "role": "user", "content": "Hello!" } ] } ``` Response: `{ "model": "qwen3:8b", "message": { "role": "assistant", "content": "..." }, "done": true }` ### POST ${LLM_END_POINT}/api/generate — Text completion (single prompt) ``` POST ${LLM_END_POINT}/api/generate Content-Type: application/json Authorization: Bearer ${LLM_API_KEY} { "model": "qwen3:8b", "prompt": "Explain AI in one sentence." } ``` Response: `{ "model": "qwen3:8b", "response": "...", "done": true }` ### Notes - LLM_END_POINT should be set to `https://llm.zithera.com.au` (no trailing slash, no /api). - The app must append the path (e.g. `/api/chat`, `/api/generate`) to LLM_END_POINT. - Responses stream by default (newline-delimited JSON). To disable streaming, add `"stream": false` to your request body. - Rate limited per API key. - Do NOT call upstream providers directly — always go through the gateway.
Ollama Cloud via Zithera Gateway
## LLM API Integration — Zithera Gateway (Ollama Cloud Models) ### Environment Variable ``` LLM_END_POINT=https://llm.zithera.com.au LLM_API_KEY=YOUR_API_KEY ``` Authentication: Include API key in every request via one of: - Header: `Authorization: Bearer ${LLM_API_KEY}` - Header: `X-API-Key: ${LLM_API_KEY}` ### GET ${LLM_END_POINT}/api/tags — Discover available models (no auth required) ``` GET ${LLM_END_POINT}/api/tags ``` Response: `{ "models": [{ "name": "gemma4:31b-cloud", "provider": "cloud", "endpoint": "/api/ollama/api/chat", ... }] }` IMPORTANT: Always call /api/tags first to get the exact model names before making chat/generate requests. Filter by `provider: "cloud"` for cloud models. The `endpoint` field tells you which API path to use. ### POST ${LLM_END_POINT}/api/ollama/api/chat — Chat completion ``` POST ${LLM_END_POINT}/api/ollama/api/chat Content-Type: application/json Authorization: Bearer ${LLM_API_KEY} { "model": "gemma4:31b-cloud", "messages": [ { "role": "user", "content": "Hello!" } ] } ``` Response: `{ "model": "gemma4:31b-cloud", "message": { "role": "assistant", "content": "..." }, "done": true }` ### POST ${LLM_END_POINT}/api/ollama/api/generate — Text completion ``` POST ${LLM_END_POINT}/api/ollama/api/generate Content-Type: application/json Authorization: Bearer ${LLM_API_KEY} { "model": "gpt-oss:120b-cloud", "prompt": "Explain AI in one sentence." } ``` Response: `{ "model": "gpt-oss:120b-cloud", "response": "...", "done": true }` ### Notes - LLM_END_POINT should be set to `https://llm.zithera.com.au` (no trailing slash, no /api). - The app must append the path (e.g. `/api/ollama/api/chat`, `/api/ollama/api/generate`) to LLM_END_POINT. - These endpoints proxy to Ollama Cloud (ollama.com). Do NOT call ollama.com directly. - Responses are non-streaming JSON. - Rate limited per API key.
Anthropic (Claude) via Zithera Gateway
## LLM API Integration — Zithera Gateway (Anthropic Claude) ### Environment Variable ``` LLM_END_POINT=https://llm.zithera.com.au LLM_API_KEY=YOUR_API_KEY ``` Authentication: Include API key in every request via one of: - Header: `Authorization: Bearer ${LLM_API_KEY}` - Header: `X-API-Key: ${LLM_API_KEY}` ### GET ${LLM_END_POINT}/api/tags — Discover available models (no auth required) ``` GET ${LLM_END_POINT}/api/tags ``` Response: `{ "models": [{ "name": "claude-opus-4-7", "display_name": "Claude Opus 4.7", "provider": "anthropic", "endpoint": "/anthropicChat" }] }` IMPORTANT: Always call /api/tags first to get the exact model names before making chat/generate requests. Filter by `provider: "anthropic"` for Claude models. The `endpoint` field tells you which API path to use. ### POST ${LLM_END_POINT}/api/anthropicChat — Anthropic Messages API ``` POST ${LLM_END_POINT}/api/anthropicChat Content-Type: application/json Authorization: Bearer ${LLM_API_KEY} { "model": "claude-opus-4-7", "messages": [ { "role": "user", "content": "Hello, Claude" } ], "max_tokens": 1024 } ``` Response: Anthropic-native JSON, e.g. `{ "id": "msg_...", "content": [{ "type": "text", "text": "..." }], "role": "assistant", "model": "claude-opus-4-7", "stop_reason": "end_turn", ... }` ### POST ${LLM_END_POINT}/api/anthropicGenerate — Single-prompt convenience wrapper ``` POST ${LLM_END_POINT}/api/anthropicGenerate Content-Type: application/json Authorization: Bearer ${LLM_API_KEY} { "model": "claude-sonnet-4-6", "prompt": "Explain AI in one sentence.", "max_tokens": 256 } ``` The gateway wraps `prompt` as `messages: [{ role: "user", content: prompt }]` then calls /v1/messages. Response shape is identical to /api/anthropicChat. ### Notes - LLM_END_POINT should be set to `https://llm.zithera.com.au` (no trailing slash, no /api). - `max_tokens` is optional on our side (gateway defaults to 4096). Anthropic requires it upstream. - `stream` is optional and defaults to true. Streaming responses are Anthropic-native SSE (`text/event-stream` with `content_block_delta` events). Set `"stream": false` for a single JSON response. - All Anthropic body fields pass through unchanged: `system`, `tools`, `tool_choice`, `temperature`, `top_p`, `metadata`, `stop_sequences`, vision content blocks, prompt-caching markers, etc. Use them exactly as documented at docs.anthropic.com. - Rate limited per API key. - Do NOT call api.anthropic.com directly — always go through the gateway.
All Endpoints — Full Reference
## LLM API Integration — Zithera Gateway (Full Reference) ### Environment Variable ``` LLM_END_POINT=https://llm.zithera.com.au LLM_API_KEY=YOUR_API_KEY ``` Authentication: Include API key in every request via one of: - Header: `Authorization: Bearer ${LLM_API_KEY}` - Header: `X-API-Key: ${LLM_API_KEY}` --- ### 0. Model Discovery (IMPORTANT — call this first) **GET ${LLM_END_POINT}/api/tags** — List all available models (no auth required) ``` GET ${LLM_END_POINT}/api/tags ``` Response: ``` { "models": [ { "name": "gemma4:26b", "provider": "local", "endpoint": "/api/chat", "details": { ... } }, { "name": "gemma4:31b-cloud", "provider": "cloud", "endpoint": "/api/ollama/api/chat", "details": { ... } }, { "name": "claude-opus-4-7", "display_name": "Claude Opus 4.7", "provider": "anthropic", "endpoint": "/anthropicChat" } ] } ``` Each model includes: - `name` — exact model name to use in requests - `provider` — "local" (runs on local hardware), "cloud" (Ollama Cloud), or "anthropic" (Anthropic Claude) - `endpoint` — which API path to use for this model (e.g. /api/chat, /api/ollama/api/chat, /api/anthropicChat) IMPORTANT: Always call /api/tags first to get the exact model names. Using an incorrect model name will return a "model not found" error. --- ### 1. Local LLM Endpoints (proxied to Heedable) **POST ${LLM_END_POINT}/api/chat** — Chat with message history ``` POST ${LLM_END_POINT}/api/chat { "model": "qwen3:8b", "messages": [{ "role": "user", "content": "Hello!" }] } ``` **POST ${LLM_END_POINT}/api/generate** — Single prompt completion ``` POST ${LLM_END_POINT}/api/generate { "model": "qwen3:8b", "prompt": "Explain AI in one sentence." } ``` --- ### 2. Ollama Cloud Endpoints (proxied to ollama.com) **POST ${LLM_END_POINT}/api/ollama/api/chat** — Ollama Cloud chat ``` POST ${LLM_END_POINT}/api/ollama/api/chat { "model": "gemma4:31b-cloud", "messages": [{ "role": "user", "content": "Hello!" }] } ``` **POST ${LLM_END_POINT}/api/ollama/api/generate** — Ollama Cloud text completion ``` POST ${LLM_END_POINT}/api/ollama/api/generate { "model": "gpt-oss:120b-cloud", "prompt": "Explain AI in one sentence." } ``` --- ### 3. Anthropic Endpoints (proxied to api.anthropic.com) **POST ${LLM_END_POINT}/api/anthropicChat** — Anthropic Messages API ``` POST ${LLM_END_POINT}/api/anthropicChat { "model": "claude-opus-4-7", "messages": [{ "role": "user", "content": "Hello, Claude" }], "max_tokens": 1024 } ``` **POST ${LLM_END_POINT}/api/anthropicGenerate** — Single-prompt wrapper (gateway converts to messages) ``` POST ${LLM_END_POINT}/api/anthropicGenerate { "model": "claude-sonnet-4-6", "prompt": "Explain AI in one sentence.", "max_tokens": 256 } ``` --- ### Notes - LLM_END_POINT should be set to `https://llm.zithera.com.au` (no trailing slash, no /api). - The app must append the path to LLM_END_POINT when making requests. - Local model responses (`/api/chat`, `/api/generate`) stream by default (newline-delimited JSON). To disable streaming, add `"stream": false` to your request body. - Cloud model responses (`/api/ollama/api/chat`, `/api/ollama/api/generate`) are non-streaming JSON. - Anthropic responses (`/api/anthropicChat`, `/api/anthropicGenerate`) stream by default via SSE (`text/event-stream`). `max_tokens` defaults to 4096 if omitted. All Anthropic body fields (system, tools, vision, prompt caching) pass through unchanged. Set `"stream": false` for a single JSON response. - Rate limited per API key. - Always use the gateway — never call upstream providers (heedable.com, ollama.com, api.anthropic.com) directly. - Health check: GET ${LLM_END_POINT}/api/health (no auth needed) - Auth health check: GET ${LLM_END_POINT}/api/v1/health (requires API key — use to verify key validity)

Authenticated Access

SHA-256 hashed API keys with multiple auth methods.

Rate Protected

Per-key throttling with configurable request limits.

Full Audit Trail

Every request logged with latency, status, and source.