Skip to main content

Overview

The Junis External API implements rate limiting to ensure fair usage and maintain service quality for all users. Rate limits are applied per API key and are configurable when creating or updating an API key. Base URL: https://api.junis.ai
Rate limits are specific to each API key. If your organization has multiple API keys, each key has independent rate limits.

Rate Limiting Architecture

The External API uses a Redis-based sliding window counter with dual time windows (per-minute and per-hour) for precise, distributed rate limiting across multiple server instances.
Storage: Redis with distributed support across all API serversWindows: Two independent sliding windows
  • Per-Minute: Rolling 60-second window, resets every minute
  • Per-Hour: Rolling 3600-second window, resets every hour
Algorithm: Sliding window counter with atomic increments
Check Before Request: Rate limits are checked before processing your requestHeaders Included: Every response includes current rate limit status429 on Exceed: Returns 429 Too Many Requests if limit exceededAutomatic Reset: Counters automatically reset at the end of each window

Default Rate Limits

When you create a new API key, the following default limits apply:
Limit TypeDefault ValueCustomizable
Requests per minute100Yes ✓
Requests per hour1,000Yes ✓
You can customize rate limits when creating an API key in Admin → API Keys. Contact your organization admin to adjust limits for existing keys.

How to Set Custom Limits

When creating an API key via the Admin panel:
  1. Navigate to Admin → API Keys
  2. Click ”+ Create API Key”
  3. Configure Rate Limits section:
    • Requests per Minute: 1 to 1000 (default: 100)
    • Requests per Hour: 1 to 100000 (default: 1000)
  4. Click “Create”

Rate Limit Headers

Every API response includes headers that show your current rate limit status.

Response Headers

X-RateLimit-Limit-Minute: 60
X-RateLimit-Remaining-Minute: 55
X-RateLimit-Reset-Minute: 1699564800
X-RateLimit-Limit-Hour: 1000
X-RateLimit-Remaining-Hour: 950
X-RateLimit-Reset-Hour: 1699564800
HeaderDescription
X-RateLimit-Limit-MinuteMaximum requests allowed per minute
X-RateLimit-Remaining-MinuteRequests remaining in current minute window
X-RateLimit-Reset-MinuteUnix timestamp when minute window resets
X-RateLimit-Limit-HourMaximum requests allowed per hour
X-RateLimit-Remaining-HourRequests remaining in current hour window
X-RateLimit-Reset-HourUnix timestamp when hour window resets
Always monitor X-RateLimit-Remaining-* headers to avoid hitting rate limits. Consider pausing requests when remaining count is low.

429 Too Many Requests

When you exceed your rate limit, the API returns a 429 status code.

Response Format

HTTP Status:
HTTP/1.1 429 Too Many Requests
Retry-After: 45
Headers:
X-RateLimit-Limit-Minute: 60
X-RateLimit-Remaining-Minute: 0
X-RateLimit-Reset-Minute: 1699564800
X-RateLimit-Limit-Hour: 1000
X-RateLimit-Remaining-Hour: 950
X-RateLimit-Reset-Hour: 1699564800
Retry-After: 45
Body:
{
  "error": "rate_limit_exceeded",
  "message": "Rate limit exceeded: 60 requests per minute",
  "limit": 60,
  "window": "minute",
  "reset_at": 1699564800,
  "retry_after": 45
}
FieldDescription
errorError type: rate_limit_exceeded
messageHuman-readable error message
limitThe rate limit that was exceeded
windowWhich window was exceeded: minute or hour
reset_atUnix timestamp when the window resets
retry_afterSeconds to wait before retrying

Handling Rate Limits

Best Practices

  • Monitor Headers: Always check X-RateLimit-Remaining-Minute and X-RateLimit-Remaining-Hour headers to track your usage
  • Implement Exponential Backoff: When you receive a 429 response, use the Retry-After header value to wait before retrying
  • Cache Responses: Implement caching (TTL: 5 minutes recommended) to reduce API calls for frequently accessed data
  • Batch Requests: Use paginated endpoints instead of individual requests (e.g., GET /api/external/sessions?limit=100)
  • Use Streaming: For real-time chat, use stream: true parameter instead of polling to avoid repeated API calls

Rate Limit Optimization

Identify Bottlenecks

  1. Log Rate Limit Headers: Track X-RateLimit-Remaining-* headers in your application logs
  2. Analyze Request Patterns: Identify high-frequency endpoints that may benefit from caching or batching
    • Repeated session fetches → Add caching
    • Individual message fetches → Use paginated list endpoints
    • Polling for updates → Switch to streaming
  3. Implement Caching: Use Redis or in-memory cache with 5-minute TTL for frequently accessed data
  4. Monitor & Alert: Set up alerts when rate limit usage exceeds 90%

Testing Rate Limits

Local Testing

Test your rate limit handling before deploying to production:
# Test minute rate limit (send 65 requests in 1 minute)
for i in {1..65}; do
  curl -s -o /dev/null -w "%{http_code}\n" \
    https://api.junis.ai/api/external/v1/chat/completions \
    -H "X-API-Key: jns_live_YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"messages":[{"role":"user","content":"test"}]}'
  sleep 0.5
done

Increasing Rate Limits

If you need higher rate limits for your use case, you have two options:

Option 1: Update Existing API Key (Admin Only)

Organization admins can update rate limits for existing API keys:
  1. Navigate to Admin → API Keys
  2. Click “Edit” on the API key
  3. Update Rate Limits section
  4. Click “Save”
Only organization admins can modify API key rate limits. Contact your admin if you need higher limits.

Option 2: Create New API Key with Higher Limits

If you’re an admin, create a new API key with custom limits:
  1. Navigate to Admin → API Keys
  2. Click ”+ Create API Key”
  3. Set higher limits:
    • Requests per Minute: Up to 1000
    • Requests per Hour: Up to 100000
  4. Click “Create”
  5. Copy the API key (shown only once)
  6. Update your application to use the new API key
Set rate limits based on your actual usage patterns. Start conservative and increase as needed.

FAQs

Answer: Rate limits are per API key.Example: If your organization has 2 API keys:
  • API Key 1: 60 requests/minute
  • API Key 2: 300 requests/minute
  • Total capacity: 360 requests/minute (independent limits)
Response: 429 Too Many Requests with Retry-After headerAction: Wait for the number of seconds specified in Retry-After or until the next window resetNo penalties: No account suspension or additional charges
Short-term: Yes, organization admins can update rate limits in Admin → API KeysLong-term: Create new API keys with higher limits as needed
Answer: Streaming responses count as 1 request, regardless of how long the stream lasts.Example:
  • Non-streaming: 1 request = 1 complete response
  • Streaming: 1 request = entire conversation stream
Streaming is more efficient for real-time chat.
Method 1: Check response headers (X-RateLimit-Remaining-*)Method 2: Set up logging in your application (see Best Practices)Method 3: Monitor 429 error rates in your application metrics

Next Steps