Standard chain-of-thought generates a single reasoning trace, which can go wrong and produce a wrong final answer with no recovery mechanism. Advanced variants address this. Self-consistency, introduced by Google in 2022, samples multiple reasoning traces (typically 5–40) at higher temperature for each query, then takes the majority vote of final answers. Because different reasoning paths converge on the correct answer more reliably than on any specific wrong answer, self-consistency consistently improves accuracy on math and logic tasks — often by 10–20 percentage points on hard benchmarks. The cost is linear in samples: 40x more inference compute for 40-sample self-consistency. Tree-of-thought (ToT) extends the idea by exploring multiple reasoning branches at each step, evaluating partial progress, and pruning weak branches — effectively performing beam search over the reasoning space rather than Monte Carlo sampling. ToT is powerful for puzzle-like problems with clear intermediate evaluation criteria but requires careful prompt design for branch evaluation. Reflection and self-critique prompts work differently: after generating an answer, the model is asked to review its own reasoning for errors and produce a corrected version. Surprisingly effective for coding tasks and detail-heavy outputs where single-shot generation misses requirements. The newest frontier is reasoning models (o1, o3, R1) which bake extensive CoT and verification into training, producing highly effective reasoning without prompt engineering. For teams building LLM applications in 2026, choosing between these variants is primarily a cost/accuracy tradeoff calibrated to task stakes.
AdvancedAI & MLPrompting TechniquesKnowledge
Advanced Chain-of-Thought Variants: Self-Consistency, Tree-of-Thought, and Reflection
Basic chain-of-thought is just the starting point. Self-consistency samples multiple reasoning paths and votes, tree-of-thought explores reasoning branches, and reflection prompts the model to critique its own output. Each variant improves reasoning reliability at the cost of additional compute — choose based on task stakes.
chain-of-thought-reasoningself-consistencytree-of-thought
Want more like this?
WeeBytes delivers 25 cards like this every day — personalised to your interests.
Start learning for free