reasoning field of responses. The value of this field is null in the responses of non-reasoning models.
Supported models with reasoning
The following table lists the models on Serverless Inference that return reasoning output. Each supported model might always include reasoning, or might disable or enable reasoning by default:| Model ID (for API usage) | Reasoning support |
|---|---|
deepseek-ai/DeepSeek-V4-Flash | Disabled by default |
google/gemma-4-31B-it | Enabled by default |
MiniMaxAI/MiniMax-M2.5 | Always on |
moonshotai/Kimi-K2.6 | Always on |
moonshotai/Kimi-K2.5 | Always on |
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-FP8 | Enabled by default |
openai/gpt-oss-120b | Always on |
openai/gpt-oss-20b | Always on |
Qwen/Qwen3.5-35B-A3B | Enabled by default |
Qwen/Qwen3.5-27B | Enabled by default |
Qwen/Qwen3-235B-A22B-Thinking-2507 | Always on |
zai-org/GLM-5.1 | Enabled by default |
Models with Always on reasoning
If a model is listed as Always on in the preceding Supported models table, it always includes reasoning and this cannot be disabled.
Disable reasoning
If a model is listed asEnabled by default in the preceding Supported models table, you can disable reasoning to reduce token usage or simplify the response. To opt out of reasoning for a request, in chat_template_kwargs, set the enable_thinking flag to the value False (Python) or false (Bash):
- Python
- Bash
Enable reasoning
If a model is listed asDisabled by default in the preceding Supported models table, you can enable reasoning by setting the enable_thinking flag to the value True (Python) or true (Bash) in the preceding code snippet.