Member-only story
“BOLT” : Revolutionizing Long Chain-of-Thought in Language Models Without Distillation
BOLT (Bootstrap Long Chain-of-Thought), a novel approach designed to enhance the reasoning capabilities of Large Language Models (LLMs) by enabling them to generate long chains of thought (LongCoT) without relying on distillation from existing LongCoT models or expensive human annotations. Traditional approaches to developing LongCoT models typically depend on knowledge distillation from advanced models (e.g., OpenAI’s o1) or are limited to specific domains like mathematics and coding. This paper aims to address these gaps by proposing a more systematic and generalizable method.

Key Contributions:
Novel Training Pipeline: BOLT employs a three-stage training process:
- LongCoT Bootstrapping: Utilizes in-context learning to generate LongCoT data from a ShortCoT model. Only a few in-context examples are needed (e.g., 10 examples in their experiments).
- LongCoT Supervised Finetuning (SFT): Fine-tunes a ShortCoT model using the bootstrapped data, enabling it to adapt to the LongCoT reasoning style.
- LongCoT Online Training: Refines LongCoT reasoning further through online exploration and outcome reward models (ORM).
No Dependency on Knowledge Distillation: Unlike prior approaches, BOLT avoids the need for distillation from existing LongCoT models, making it more accessible and transparent.
Broad Generalizability: The method is validated across multiple model scales (7B, 8B, 70B) and a variety of benchmarks, including:
- Arena-Hard: Real-world coding problems.
- MT-Bench: Multi-domain questions including writing, reasoning, and STEM.
- WildBench: Real-world queries from human-chatbot interactions.
- ZebraLogic: Logic grid puzzles assessing constraint satisfaction.
- MATH500: Competition-level mathematics problems.
Methodology:
LongCoT Bootstrapping:
- Leverages in-context learning by providing a ShortCoT model with LongCoT examples.
- Only 10 carefully curated examples were used, showing high efficiency.