API Reference
OpenAI-compatible chat completions, embeddings, and models endpoints. SSE streaming.
Mikan Cloud speaks the OpenAI wire format. Anything you'd send to https://api.openai.com/v1 works against https://api.mikancloud.com/v1 — same request bodies, same response shapes, same SSE event format.
POST /v1/chat/completions
Mirrors the OpenAI Chat Completions spec.
Models
Send GET /v1/models for the live list. Current chat models include deepseek-v3.2, gpt-4o, gpt-4o-mini, claude-opus-4-7, claude-sonnet-4-6, claude-haiku-4-5, gemini-3-pro, gemini-3-flash. Unknown ids return 400 model_not_found.
Request
{
"model": "deepseek-v3.2",
"stream": true,
"messages": [
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "ping"}
],
"temperature": 0.7,
"max_tokens": 1024
}Response
200 OK returns the standard OpenAI shape: id, object: "chat.completion", choices[], usage{}. With stream: true, you receive text/event-stream chunks (data: {…}) terminated by data: [DONE].
POST /v1/embeddings
Mirrors the OpenAI Embeddings spec. Same request shape, same { object: "list", data: [{ embedding: [...] }], usage: {...} } response.
Models
model | Provider | Dim | Multimodal | $/1M tok |
|---|---|---|---|---|
qwen3-vl-embedding-8b | self-hosted (Mikan) | 768 | yes (text in P0; image/video next) | $0.05 |
text-embedding-3-small | OpenAI | 1536 | no | $0.02 |
text-embedding-3-large | OpenAI | 3072 | no | $0.13 |
qwen3-vl-embedding-8b runs on a Mikan-operated GPU behind Cloudflare Access. It is the only multimodal embedding model on the gateway and undercuts Voyage / Cohere on price.
Request
curl https://api.mikancloud.com/v1/embeddings \
-H "Authorization: Bearer $MIKAN_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-vl-embedding-8b",
"input": "the quick brown fox"
}'input accepts a single string or an array of strings (batch). Multimodal block-array input ({ type: "image_url", … }) is not yet supported — calls return 400 bad_request. That extension is tracked for a near-term release.
Response
{
"object": "list",
"data": [
{ "object": "embedding", "index": 0, "embedding": [0.012, -0.034] }
],
"model": "qwen3-vl-embedding-8b",
"usage": { "prompt_tokens": 12, "total_tokens": 12 }
}The model field echoes the alias you sent, not the upstream id.
GET /v1/models
Returns every supported chat and embedding model, plus a mikan_capabilities hint:
{
"object": "list",
"data": [
{ "id": "deepseek-v3.2", "object": "model", "owned_by": "deepseek",
"mikan_capabilities": { "kind": "chat" } },
{ "id": "qwen3-vl-embedding-8b", "object": "model", "owned_by": "self-hosted-qwen",
"mikan_capabilities": { "kind": "embedding", "dimensions": 768, "multimodal": true } }
]
}OpenAI clients ignore unknown fields, so mikan_capabilities is safe to leave on. atomos / orangebot read it to filter for embedding-capable or multimodal models.
Rate limits
Default: 60 requests per minute per API key across all endpoints. Exceeding this returns 429 rate_limit_exceeded. Bump the cap by emailing support — verified accounts can be lifted to 600 rpm.
Headers
| Header | Direction | Purpose |
|---|---|---|
Authorization: Bearer sk-mk-… | request | required on every call |
x-mikan-client: <label> | request, optional | tags the call in your usage log (e.g. atomos, cursor) |
x-mikan-request-id: <uuid> | response | echoed on every response; quote it in support tickets |
Errors
OpenAI-shape envelopes everywhere. Common codes:
missing_authorization(401) — no Bearer headerinvalid_api_key(401) — key revoked or doesn't matchmodel_not_found(400) — id not in/v1/modelsinsufficient_balance(402) — top up via Striperate_limit_exceeded(429) —retry-afterheader includedupstream_error(502) — upstream provider down or timed out
What we don't expose yet
- Image generation, audio transcription, file uploads, threads / assistants — out of scope.
- Function calling / tool use — passes through where the upstream supports it; not actively tested.
- Multimodal embedding input shape (image / video blocks) — coming after the text-only soak.