Overview
The Chat Completions API is fully compatible with OpenAI’s API format, allowing you to use the OpenAI Python SDK, JavaScript SDK, or any OpenAI-compatible client.Model Parameter (Optional): The
model parameter is optional and primarily for SDK compatibility. Your organization’s orchestrator agent automatically uses the configured LLM model.If you’re using an OpenAI SDK that requires the model parameter, you can pass any value (e.g., "junis-orchestrator") for compatibility.Endpoint
Authentication
Include your API key in theX-API-Key header:
Request Format
Basic Request
cURL
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
messages | array | Yes | Array of message objects (OpenAI format) |
stream | boolean | No | Enable streaming mode (default: false) |
temperature | number | No | Sampling temperature (0.0 - 2.0) |
max_tokens | integer | No | Maximum tokens in response |
top_p | number | No | Nucleus sampling parameter |
frequency_penalty | number | No | Frequency penalty (-2.0 to 2.0) |
presence_penalty | number | No | Presence penalty (-2.0 to 2.0) |
stop | string or array | No | Stop sequences |
n | integer | No | Number of completions to generate (default: 1) |
tools | array | No | Available tools for function calling (keyword-based detection) |
elevenlabs_extra_body | object | No | ElevenLabs metadata (e.g., conversation_id for session continuity) |
Message Format
Messages follow the OpenAI format:system: System instructions (optional, rarely needed with Junis)user: User messagesassistant: Assistant responses (for conversation context)tool: Tool call results (for function calling)
Response Format
Non-Streaming Response
Status Code:200 OK
Response Body:
Response Fields
| Field | Type | Description |
|---|---|---|
id | string | Unique completion ID |
object | string | Object type (always "chat.completion") |
created | integer | Unix timestamp |
model | string | Model used |
choices | array | Array of completion choices |
choices[].message | object | Generated message |
choices[].message.role | string | Always "assistant" |
choices[].message.content | string | Response text |
choices[].finish_reason | string | Reason for completion (stop, length, tool_calls) |
usage | object | Token usage statistics |
Streaming Mode
Enable real-time token-by-token streaming withstream: true.
Streaming Request
cURL
Streaming Response Format
Server-Sent Events (SSE) format:- Each line starts with
data: - First chunk contains
role - Subsequent chunks contain
contentdeltas - Final chunk has
finish_reason - Stream ends with
data: [DONE]
Session Management
Junis automatically manages conversation sessions for you.How Sessions Work
Session Detection
Junis uses smart session detection with the following priority:- ElevenLabs
conversation_id(inelevenlabs_extra_body) → Uses or creates session with this ID - Single message → Creates new session
- Multiple messages → Uses message hash to match existing sessions (allows conversation continuation)
For ElevenLabs integration: Pass
conversation_id in elevenlabs_extra_body to maintain session continuity across calls.Viewing Sessions
Retrieve your sessions via the Sessions API:Function Calling (Tool Use)
When a supported keyword is detected in the assistant’s response, the API returns atool_calls object with finish_reason: "tool_calls".
Error Handling
Common Errors
Error Types
| Error Type | Status Code | Description | Action |
|---|---|---|---|
invalid_request_error | 400 | Malformed request | Check request format |
authentication_error | 401 | Invalid API key | Verify API key |
permission_error | 403 | Insufficient permissions | Check API key scopes |
rate_limit_error | 429 | Too many requests | Implement exponential backoff |
api_error | 500 | Server error | Retry with exponential backoff |
Best Practices
- Use Streaming: Enable
stream: truefor long responses to provide real-time feedback - Handle Rate Limits: Implement exponential backoff when hitting rate limits (see error codes above)
- Keep Message History Concise: Trim old messages to avoid token limits (keep last 10-20 messages)
- Secure API Keys: Never hardcode API keys; use environment variables
- Log Requests: Log requests and responses for debugging and monitoring
Response Parameters
Temperature Guide
| Use Case | Recommended Temperature |
|---|---|
| Factual Q&A | 0.0 - 0.3 |
| General chat | 0.5 - 0.7 |
| Creative writing | 0.8 - 1.0 |
| Code generation | 0.0 - 0.2 |
