Proxy to Ollama Cloud generate. Replaces calling ollama.com/api/generate directly.
// HeadersAuthorization: Bearer YOUR_API_KEYContent-Type: application/json
// Request body
{
"model": "gpt-oss:120b-cloud",
"prompt": "Explain AI in one sentence."
}
// Migration from direct Ollama Cloud// Old: https://ollama.com/api/generate// New: https://llm.zithera.com.au/api/ollama/api/generate
Apps just change base URL from https://ollama.com to https://llm.zithera.com.au/api/ollama — paths stay the same
Anthropic Cloud Proxy
Route your Anthropic Claude requests through the gateway
POST
/api/anthropicChat
Proxy to Anthropic Messages API. Sends a conversation and gets a response. Streams by default via SSE.
// HeadersAuthorization: Bearer YOUR_API_KEYContent-Type: application/json
// Request body
{
"model": "claude-opus-4-7",
"messages": [
{ "role": "user", "content": "Hello, Claude" }
],
"max_tokens": 1024
}
// max_tokens optional (default 4096)// stream optional (default true, set false for single JSON)// system, tools, temperature, top_p — all passed through
POST
/api/anthropicGenerate
Convenience wrapper — pass a single prompt string and the gateway wraps it as a user message before calling Anthropic.
// HeadersAuthorization: Bearer YOUR_API_KEYContent-Type: application/json
// Request body
{
"model": "claude-sonnet-4-6",
"prompt": "Explain AI in one sentence.",
"max_tokens": 256
}
// Equivalent to /api/anthropicChat with:// messages: [{ role: "user", content: prompt }]
Apps send Anthropic-native request bodies — all Anthropic features (tool use, vision, prompt caching, extended thinking) work via the body fields you'd use directly against api.anthropic.com
Agent Instructions
Copy and paste these into your AI coding agent (Claude Code, Cursor, Copilot, etc.)
Local LLM via Zithera Gateway
## LLM API Integration — Zithera Gateway (Local Models)
### Environment Variable
```
LLM_END_POINT=https://llm.zithera.com.au
LLM_API_KEY=YOUR_API_KEY
```
Authentication: Include API key in every request via one of:
- Header: `Authorization: Bearer ${LLM_API_KEY}`
- Header: `X-API-Key: ${LLM_API_KEY}`
### GET ${LLM_END_POINT}/api/tags — Discover available models (no auth required)
```
GET ${LLM_END_POINT}/api/tags
```
Response: `{ "models": [{ "name": "gemma4:26b", "provider": "local", "endpoint": "/api/chat", ... }] }`
IMPORTANT: Always call /api/tags first to get the exact model names before making chat/generate requests.
Each model includes a `provider` field ("local" or "cloud") and an `endpoint` field showing which API path to use.
### POST ${LLM_END_POINT}/api/chat — Chat completion (with message history)
```
POST ${LLM_END_POINT}/api/chat
Content-Type: application/json
Authorization: Bearer ${LLM_API_KEY}
{
"model": "qwen3:8b",
"messages": [
{ "role": "user", "content": "Hello!" }
]
}
```
Response: `{ "model": "qwen3:8b", "message": { "role": "assistant", "content": "..." }, "done": true }`
### POST ${LLM_END_POINT}/api/generate — Text completion (single prompt)
```
POST ${LLM_END_POINT}/api/generate
Content-Type: application/json
Authorization: Bearer ${LLM_API_KEY}
{
"model": "qwen3:8b",
"prompt": "Explain AI in one sentence."
}
```
Response: `{ "model": "qwen3:8b", "response": "...", "done": true }`
### Notes
- LLM_END_POINT should be set to `https://llm.zithera.com.au` (no trailing slash, no /api).
- The app must append the path (e.g. `/api/chat`, `/api/generate`) to LLM_END_POINT.
- Responses stream by default (newline-delimited JSON). To disable streaming, add `"stream": false` to your request body.
- Rate limited per API key.
- Do NOT call upstream providers directly — always go through the gateway.
Ollama Cloud via Zithera Gateway
## LLM API Integration — Zithera Gateway (Ollama Cloud Models)
### Environment Variable
```
LLM_END_POINT=https://llm.zithera.com.au
LLM_API_KEY=YOUR_API_KEY
```
Authentication: Include API key in every request via one of:
- Header: `Authorization: Bearer ${LLM_API_KEY}`
- Header: `X-API-Key: ${LLM_API_KEY}`
### GET ${LLM_END_POINT}/api/tags — Discover available models (no auth required)
```
GET ${LLM_END_POINT}/api/tags
```
Response: `{ "models": [{ "name": "gemma4:31b-cloud", "provider": "cloud", "endpoint": "/api/ollama/api/chat", ... }] }`
IMPORTANT: Always call /api/tags first to get the exact model names before making chat/generate requests.
Filter by `provider: "cloud"` for cloud models. The `endpoint` field tells you which API path to use.
### POST ${LLM_END_POINT}/api/ollama/api/chat — Chat completion
```
POST ${LLM_END_POINT}/api/ollama/api/chat
Content-Type: application/json
Authorization: Bearer ${LLM_API_KEY}
{
"model": "gemma4:31b-cloud",
"messages": [
{ "role": "user", "content": "Hello!" }
]
}
```
Response: `{ "model": "gemma4:31b-cloud", "message": { "role": "assistant", "content": "..." }, "done": true }`
### POST ${LLM_END_POINT}/api/ollama/api/generate — Text completion
```
POST ${LLM_END_POINT}/api/ollama/api/generate
Content-Type: application/json
Authorization: Bearer ${LLM_API_KEY}
{
"model": "gpt-oss:120b-cloud",
"prompt": "Explain AI in one sentence."
}
```
Response: `{ "model": "gpt-oss:120b-cloud", "response": "...", "done": true }`
### Notes
- LLM_END_POINT should be set to `https://llm.zithera.com.au` (no trailing slash, no /api).
- The app must append the path (e.g. `/api/ollama/api/chat`, `/api/ollama/api/generate`) to LLM_END_POINT.
- These endpoints proxy to Ollama Cloud (ollama.com). Do NOT call ollama.com directly.
- Responses are non-streaming JSON.
- Rate limited per API key.
Anthropic (Claude) via Zithera Gateway
## LLM API Integration — Zithera Gateway (Anthropic Claude)
### Environment Variable
```
LLM_END_POINT=https://llm.zithera.com.au
LLM_API_KEY=YOUR_API_KEY
```
Authentication: Include API key in every request via one of:
- Header: `Authorization: Bearer ${LLM_API_KEY}`
- Header: `X-API-Key: ${LLM_API_KEY}`
### GET ${LLM_END_POINT}/api/tags — Discover available models (no auth required)
```
GET ${LLM_END_POINT}/api/tags
```
Response: `{ "models": [{ "name": "claude-opus-4-7", "display_name": "Claude Opus 4.7", "provider": "anthropic", "endpoint": "/anthropicChat" }] }`
IMPORTANT: Always call /api/tags first to get the exact model names before making chat/generate requests.
Filter by `provider: "anthropic"` for Claude models. The `endpoint` field tells you which API path to use.
### POST ${LLM_END_POINT}/api/anthropicChat — Anthropic Messages API
```
POST ${LLM_END_POINT}/api/anthropicChat
Content-Type: application/json
Authorization: Bearer ${LLM_API_KEY}
{
"model": "claude-opus-4-7",
"messages": [
{ "role": "user", "content": "Hello, Claude" }
],
"max_tokens": 1024
}
```
Response: Anthropic-native JSON, e.g. `{ "id": "msg_...", "content": [{ "type": "text", "text": "..." }], "role": "assistant", "model": "claude-opus-4-7", "stop_reason": "end_turn", ... }`
### POST ${LLM_END_POINT}/api/anthropicGenerate — Single-prompt convenience wrapper
```
POST ${LLM_END_POINT}/api/anthropicGenerate
Content-Type: application/json
Authorization: Bearer ${LLM_API_KEY}
{
"model": "claude-sonnet-4-6",
"prompt": "Explain AI in one sentence.",
"max_tokens": 256
}
```
The gateway wraps `prompt` as `messages: [{ role: "user", content: prompt }]` then calls /v1/messages. Response shape is identical to /api/anthropicChat.
### Notes
- LLM_END_POINT should be set to `https://llm.zithera.com.au` (no trailing slash, no /api).
- `max_tokens` is optional on our side (gateway defaults to 4096). Anthropic requires it upstream.
- `stream` is optional and defaults to true. Streaming responses are Anthropic-native SSE (`text/event-stream` with `content_block_delta` events). Set `"stream": false` for a single JSON response.
- All Anthropic body fields pass through unchanged: `system`, `tools`, `tool_choice`, `temperature`, `top_p`, `metadata`, `stop_sequences`, vision content blocks, prompt-caching markers, etc. Use them exactly as documented at docs.anthropic.com.
- Rate limited per API key.
- Do NOT call api.anthropic.com directly — always go through the gateway.
All Endpoints — Full Reference
## LLM API Integration — Zithera Gateway (Full Reference)
### Environment Variable
```
LLM_END_POINT=https://llm.zithera.com.au
LLM_API_KEY=YOUR_API_KEY
```
Authentication: Include API key in every request via one of:
- Header: `Authorization: Bearer ${LLM_API_KEY}`
- Header: `X-API-Key: ${LLM_API_KEY}`
---
### 0. Model Discovery (IMPORTANT — call this first)
**GET ${LLM_END_POINT}/api/tags** — List all available models (no auth required)
```
GET ${LLM_END_POINT}/api/tags
```
Response:
```
{
"models": [
{ "name": "gemma4:26b", "provider": "local", "endpoint": "/api/chat", "details": { ... } },
{ "name": "gemma4:31b-cloud", "provider": "cloud", "endpoint": "/api/ollama/api/chat", "details": { ... } },
{ "name": "claude-opus-4-7", "display_name": "Claude Opus 4.7", "provider": "anthropic", "endpoint": "/anthropicChat" }
]
}
```
Each model includes:
- `name` — exact model name to use in requests
- `provider` — "local" (runs on local hardware), "cloud" (Ollama Cloud), or "anthropic" (Anthropic Claude)
- `endpoint` — which API path to use for this model (e.g. /api/chat, /api/ollama/api/chat, /api/anthropicChat)
IMPORTANT: Always call /api/tags first to get the exact model names. Using an incorrect model name will return a "model not found" error.
---
### 1. Local LLM Endpoints (proxied to Heedable)
**POST ${LLM_END_POINT}/api/chat** — Chat with message history
```
POST ${LLM_END_POINT}/api/chat
{ "model": "qwen3:8b", "messages": [{ "role": "user", "content": "Hello!" }] }
```
**POST ${LLM_END_POINT}/api/generate** — Single prompt completion
```
POST ${LLM_END_POINT}/api/generate
{ "model": "qwen3:8b", "prompt": "Explain AI in one sentence." }
```
---
### 2. Ollama Cloud Endpoints (proxied to ollama.com)
**POST ${LLM_END_POINT}/api/ollama/api/chat** — Ollama Cloud chat
```
POST ${LLM_END_POINT}/api/ollama/api/chat
{ "model": "gemma4:31b-cloud", "messages": [{ "role": "user", "content": "Hello!" }] }
```
**POST ${LLM_END_POINT}/api/ollama/api/generate** — Ollama Cloud text completion
```
POST ${LLM_END_POINT}/api/ollama/api/generate
{ "model": "gpt-oss:120b-cloud", "prompt": "Explain AI in one sentence." }
```
---
### 3. Anthropic Endpoints (proxied to api.anthropic.com)
**POST ${LLM_END_POINT}/api/anthropicChat** — Anthropic Messages API
```
POST ${LLM_END_POINT}/api/anthropicChat
{
"model": "claude-opus-4-7",
"messages": [{ "role": "user", "content": "Hello, Claude" }],
"max_tokens": 1024
}
```
**POST ${LLM_END_POINT}/api/anthropicGenerate** — Single-prompt wrapper (gateway converts to messages)
```
POST ${LLM_END_POINT}/api/anthropicGenerate
{ "model": "claude-sonnet-4-6", "prompt": "Explain AI in one sentence.", "max_tokens": 256 }
```
---
### Notes
- LLM_END_POINT should be set to `https://llm.zithera.com.au` (no trailing slash, no /api).
- The app must append the path to LLM_END_POINT when making requests.
- Local model responses (`/api/chat`, `/api/generate`) stream by default (newline-delimited JSON). To disable streaming, add `"stream": false` to your request body.
- Cloud model responses (`/api/ollama/api/chat`, `/api/ollama/api/generate`) are non-streaming JSON.
- Anthropic responses (`/api/anthropicChat`, `/api/anthropicGenerate`) stream by default via SSE (`text/event-stream`). `max_tokens` defaults to 4096 if omitted. All Anthropic body fields (system, tools, vision, prompt caching) pass through unchanged. Set `"stream": false` for a single JSON response.
- Rate limited per API key.
- Always use the gateway — never call upstream providers (heedable.com, ollama.com, api.anthropic.com) directly.
- Health check: GET ${LLM_END_POINT}/api/health (no auth needed)
- Auth health check: GET ${LLM_END_POINT}/api/v1/health (requires API key — use to verify key validity)
Authenticated Access
SHA-256 hashed API keys with multiple auth methods.
Rate Protected
Per-key throttling with configurable request limits.
Full Audit Trail
Every request logged with latency, status, and source.