10.1 - The Concept - From Simple to Complex Tasks
A significant challenge in reinforcement learning is that training an agent on a complex, final task from a random starting point can be extraordinarily inefficient or even impossible. The agent often fails to discover the rare, early-stage actions that lead to long-term success.
Curriculum Learning is an advanced training strategy that solves this problem by structuring the learning process as a sequence of tasks with increasing difficulty. This mirrors how humans learn, by mastering foundational skills before moving on to more complex applications.
The Core Idea- Bootstrapping a Policy
Instead of a single, monolithic training run, we create a curriculum of progressively harder environments. The agent's learned policy from an easier stage is used as the starting point, or "bootstrap," for training on the next, harder stage. This is a form of Transfer Learning.
The Curriculum Learning Workflow:
+--------------------------------+
| Stage 1: Foundational Task |
| (e.g., 1 Marine vs 1 Zergling)|
+--------------------------------+
|
v
+--------------------------------+
| Train Agent A until proficient.|
| (Save weights to `model_A.zip`)|
+--------------------------------+
|
| Load weights from `model_A.zip`
v
+----------------------------------+
| Stage 2: Intermediate Task |
| (e.g., 2 Marines vs 2 Zerglings)|
+----------------------------------+
|
v
+--------------------------------+
| Continue training Agent B. |
| (Save weights to `model_B.zip`)|
+--------------------------------+
|
| Load weights from `model_B.zip`
v
+--------------------------------+
| Stage 3: Target Task |
| (e.g., 5v5 mixed combat) |
+--------------------------------+
Why This Approach is Effective
- Solves Sparse Reward and Credit Assignment: In early, simple stages, positive rewards are dense and easily attributable to specific actions. The agent can quickly learn a basic, valuable policy (e.g., how to shoot and move).
- Guided, Meaningful Exploration: The agent's exploration is focused. It builds upon a foundation of competence rather than exploring from a state of complete randomness, which is highly inefficient in a large state space.
- Improved Training Stability and Speed: By starting each new stage with a competent policy, the agent often converges on a high-performance solution for the final, complex task much faster and more reliably than an agent trained from scratch on that same task.
Curriculum Design Principles
Designing an effective curriculum is an engineering task that requires careful planning.
Principle | Description | Example |
---|---|---|
Atomic Skills | Each stage should isolate and teach a single new skill or concept. | Stage 1: Teach 1v1 kiting. Stage 2: Teach target selection (2v1). Stage 3: Teach group cohesion (2v2). |
Smooth Difficulty Ramp | The increase in difficulty between stages should be gradual. A jump that is too large can cause the agent to fail to adapt. | Good: 1v1 -> 2v1 -> 2v2. Bad: 1v1 -> 10v10. |
Quantifiable Graduation | There must be a clear, measurable metric to determine when an agent has "mastered" a stage and is ready to advance. | "The agent graduates from Stage 1 when it achieves a >95% win rate against the built-in AI on the 1v1 task." |
Environment Parameterization | The gymnasium environment should be designed to be easily configurable so that a single Env class can represent all stages of the curriculum. | The MicroEnv __init__ could accept parameters like (num_marines, num_zerglings) . |
In the next section, we will design a concrete example of a simple curriculum for our MicroBot
, demonstrating how to put these principles into practice.