10.2 - Example - A Curriculum for the Micro Bot

Let's apply the theory of Curriculum Learning to a practical example. We will design a simple, three-stage curriculum to teach our MicroBot agent how to handle progressively more complex combat scenarios.

Our goal is to start with the foundational 1v1 kiting skill and build up to managing a small group engagement.

Curriculum Specification

This table defines our staged learning plan.

Stage	Task Name	Scenario	Skill Being Taught	Graduation Criterion
1	Kiting Fundamentals	1 Marine vs. 1 Zergling	The core mechanic of kiting: move during weapon cooldown.	>95% win rate.
2	Target Prioritization	2 Marines vs. 1 Zergling	How to focus fire on a single target with multiple units.	>98% win rate.
3	Group Cohesion	2 Marines vs. 2 Zerglings	How to manage multiple threats and maintain positioning as a group.	>90% win rate.

Step 1- Parameterize the Bot and Environment

First, we must modify our bot so that it can be configured for each stage of the curriculum. We will add parameters to its __init__ method to control the number of units spawned. The environment will then pass these parameters during the bot's instantiation.

Updated micro_bot.py (Relevant parts):

# in micro_bot.py
import gymnasium as gym
from sc2.bot_ai import BotAI
from sc2.ids.unit_typeid import UnitTypeId

# A simple, custom gym-like environment is now common
# as SC2GymEnv is no longer the standard.
# This is a simplified representation for the example.
class MicroEnv(gym.Env):
    def __init__(self, map_name="AcropolisLE", num_marines: int = 1, num_zerglings: int = 1):
        self.map_name = map_name
        self.num_marines = num_marines
        self.num_zerglings = num_zerglings
        # Define action and observation spaces appropriate for your task
        # This is highly dependent on the specific implementation of the bot's actions/observations
        self.action_space = gym.spaces.Discrete(4) # Placeholder
        self.observation_space = gym.spaces.Box(low=0, high=1, shape=(10,)) # Placeholder

    # The environment needs methods like step, reset, render, close
    # which would contain the game logic. For this example, we focus on the setup.

    def get_bot(self):
        """Returns an instance of the bot with the configured parameters."""
        return MicroBot(num_marines=self.num_marines, num_zerglings=self.num_zerglings)


class MicroBot(BotAI):
    # __init__ no longer takes action_queue and obs_queue
    def __init__(self, num_marines: int, num_zerglings: int):
        super().__init__()
        self.num_marines = num_marines
        self.num_zerglings = num_zerglings
        # ... rest of __init__ ...

    async def on_start(self):
        """Spawns units based on the parameters."""
        p = self.start_location
        # Use the parameters to control the number of units created
        await self._client.debug_create_unit([[UnitTypeId.MARINE, self.num_marines, p, 1]])
        await self._client.debug_create_unit([[UnitTypeId.ZERGLING, self.num_zerglings, p.towards(self.enemy_start_locations[0], 8), 2]])

Step 2- Create a Curriculum Training Manager

Next, we need a script that manages the stage-by-stage training process. This script will load the model from the previous stage, create the new, harder environment, and continue training.

The following script, training_manager.py, implements the logic required to automate the curriculum.

training_manager.py (Implementation):

import numpy as np
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import SubprocVecEnv
from stable_baselines3.common.env_util import make_vec_env
from micro_bot import MicroEnv # Assuming MicroEnv is a gym.Env compatible wrapper

# A placeholder function for evaluation. In a real scenario, this would
# run several episodes without learning and calculate the win rate.
def evaluate_win_rate(model, env, num_episodes=100) -> float:
    """
    Evaluates the win rate of the model over a number of episodes.
    NOTE: This is a placeholder. A real implementation would need to
    run the simulation and track wins vs. losses.
    """
    # Placeholder logic: return a dummy value that increases over time
    # to allow the curriculum to proceed.
    # In a real implementation, you would run episodes and calculate true win rate.
    print("Evaluating model... (using placeholder logic)")
    if not hasattr(evaluate_win_rate, "win_rate"):
        evaluate_win_rate.win_rate = 0.80 # Initial dummy win rate
    else:
        evaluate_win_rate.win_rate += 0.08 # Increase dummy win rate each time it's called
    
    print(f"Current win rate: {evaluate_win_rate.win_rate:.2f}")
    return min(evaluate_win_rate.win_rate, 1.0)


def run_curriculum(if_multiprocess:bool = False):
    # --- STAGE 1: Kiting Fundamentals ---
    print("--- Starting Stage 1: 1v1 ---")
    # Using make_vec_env to create a vectorized environment
    env_stage1 = make_vec_env(MicroEnv, n_envs=4 if if_multiprocess else 1,
                              vec_env_cls=SubprocVecEnv if if_multiprocess else None,
                              env_kwargs={'num_marines': 1, 'num_zerglings': 1})
                              
    model = PPO("MlpPolicy", env_stage1, verbose=1)
    
    # Train until the graduation criterion is met
    while evaluate_win_rate(model, env_stage1) < 0.95:
        model.learn(total_timesteps=20_000)
    
    model.save("model_stage1")
    print("--- Stage 1 Complete. ---")
    env_stage1.close()

    # --- STAGE 2: Target Prioritization ---
    print("--- Starting Stage 2: 2v1 ---")
    env_stage2 = make_vec_env(MicroEnv, n_envs=4 if if_multiprocess else 1,
                              vec_env_cls=SubprocVecEnv if if_multiprocess else None,
                              env_kwargs={'num_marines': 2, 'num_zerglings': 1})
    
    # Load the model and set the new environment
    model = PPO.load("model_stage1")
    model.set_env(env_stage2)

    while evaluate_win_rate(model, env_stage2) < 0.98:
        model.learn(total_timesteps=20_000)

    model.save("model_stage2")
    print("--- Stage 2 Complete. ---")
    env_stage2.close()

    # --- STAGE 3: Group Cohesion ---
    print("--- Starting Stage 3: 2v2 ---")
    env_stage3 = make_vec_env(MicroEnv, n_envs=4 if if_multiprocess else 1,
                              vec_env_cls=SubprocVecEnv if if_multiprocess else None,
                              env_kwargs={'num_marines': 2, 'num_zerglings': 2})

    model = PPO.load("model_stage2")
    model.set_env(env_stage3)

    while evaluate_win_rate(model, env_stage3) < 0.90:
        model.learn(total_timesteps=20_000)
        
    model.save("final_micro_model")
    print("--- Curriculum Complete! Final model saved. ---")
    env_stage3.close()

if __name__ == '__main__':
    # Set to True if you have configured your system for multiprocessing
    # e.g. by wrapping the script entry point in if __name__ == '__main__':
    run_curriculum(if_multiprocess=True)

This structured, staged approach provides a clear path for teaching a bot complex behaviors. By mastering simple skills first, the agent is much better equipped to find effective policies for the final, more difficult tasks.

Curriculum Specification​

Step 1- Parameterize the Bot and Environment​

Step 2- Create a Curriculum Training Manager​

Curriculum Specification

Step 1- Parameterize the Bot and Environment

Step 2- Create a Curriculum Training Manager