Why this happens
Serverless Inference enforces concurrency limits to ensure fair usage and service stability. When the number of simultaneous requests from your account exceeds the allowed limit, additional requests are rejected with a 429 status code.What you can do
-
Reduce concurrent requests
- Implement request queuing or throttling in your application
- Use exponential backoff when retrying failed requests
-
Increase your limits
- Review your plan’s concurrency limits and upgrade if needed
Quotas & Rate Limits