Select-then-Solve

Paradigm Routing as Inference-Time Optimization for LLM Agents

Heng Zhou Zelin Tan Zhemeng Zhang Yutao Fan Yibing Lin Li Kang
Xiufeng Song Rui Li Songtao Huang Ao Yu Yuchen Fan Yanxu Chen
Kaixin Xu Xiaohong Liu Yiran Qin Philip Torr Chen Zhang Zhenfei Yin
Paper Code Dataset Cite

TL;DR

We compare 6 reasoning paradigms across 4 frontier LLMs and 10 benchmarks (~18k runs). No single paradigm wins everywhere.

Our embedding-based router selects the best paradigm per task, improving accuracy from 47.6% to 53.1% and recovering up to 37% of the oracle gap at half the token cost of always using ReAct.

Models cannot select their own paradigm: zero-shot self-routing fails for weaker models, revealing paradigm selection as a distinct meta-reasoning capability.

Key Findings

44pp

Structure Helps (Sometimes)

ReAct improves over Direct by 44pp on GAIA where web search is essential.

-15pp

Structure Hurts (Sometimes)

CoT degrades HumanEval by 15pp because step-by-step reasoning disrupts code generation.

+5.5pp

Router Beats All Fixed Choices

Embedding router improves 5.5pp over Direct and 2.8pp over best fixed paradigm across 4 models.

67.1%

Self-Routing Partially Works

GPT-5 can self-select at 67.1%, but weaker models fail, all trailing the learned router.

The Pipeline

Before answering, a lightweight router selects the best reasoning paradigm for each task.

Select-then-Solve Pipeline

Results at a Glance

No Single Paradigm Wins

Direct vs. best paradigm vs. oracle on each dataset for GPT-5. The best paradigm (colored) differs for every task type.

Best paradigm per dataset

Router Comparison Across Models

The embedding router (green) consistently outperforms Direct and Best-single. Self-routing (red) shows mixed results.

Router comparison

GPT-5 Success Rate Heatmap

Success rates across all paradigms and datasets. No single row dominates all columns.

GPT-5 heatmap

6 Paradigms Compared

No Tools

Direct

Free-form answer. No scaffold imposed. The model decides how to reason.

No Tools

Chain-of-Thought

Step-by-step reasoning before answering.

Web + Code

ReAct

Interleave reasoning with tool calls in a thought-action loop.

Web + Code

Plan-Execute

Create a plan first, then execute each step with tools.

Web + Code

Reflection

Answer, critique, and revise iteratively.

Web + Code

ReCode

Solve via recursive code generation and execution.

Router Results

MethodGPT-5GeminiQwen3-MaxQwen3-30BAvg
Direct60.355.549.824.947.6
Best-single62.455.550.732.850.3
Embedding Router64.261.054.632.853.1
Self-route67.156.842.427.548.4
Oracle72.973.472.556.868.9

Citation

@misc{zhou2026selectthensolveparadigmroutinginferencetime, title={Select-then-Solve: Paradigm Routing as Inference-Time Optimization for LLM Agents}, author={Heng Zhou and Zelin Tan and Zhemeng Zhang and Yutao Fan and Yibing Lin and Li Kang and Xiufeng Song and Rui Li and Songtao Huang and Ao Yu and Yuchen Fan and Yanxu Chen and Kaixin Xu and Xiaohong Liu and Yiran Qin and Philip Torr and Chen Zhang and Zhenfei Yin}, year={2026}, eprint={2604.06753}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2604.06753}, }