Skip to main content

10.2 - Example - A Curriculum for the Micro Bot

Let's apply the theory of Curriculum Learning to a practical example. We will design a simple, three-stage curriculum to teach our MicroBot agent how to handle progressively more complex combat scenarios.

Our goal is to start with the foundational 1v1 kiting skill and build up to managing a small group engagement.

Curriculum Specification

This table defines our staged learning plan.

StageTask NameScenarioSkill Being TaughtGraduation Criterion
1Kiting Fundamentals1 Marine vs. 1 ZerglingThe core mechanic of kiting: move during weapon cooldown.>95% win rate.
2Target Prioritization2 Marines vs. 1 ZerglingHow to focus fire on a single target with multiple units.>98% win rate.
3Group Cohesion2 Marines vs. 2 ZerglingsHow to manage multiple threats and maintain positioning as a group.>90% win rate.

Step 1- Parameterize the Bot and Environment

First, we must modify our bot so that it can be configured for each stage of the curriculum. We will add parameters to its __init__ method to control the number of units spawned. The environment will then pass these parameters during the bot's instantiation.

Updated micro_bot.py (Relevant parts):

# in micro_bot.py
import gymnasium as gym
from sc2.bot_ai import BotAI
from sc2.ids.unit_typeid import UnitTypeId

# A simple, custom gym-like environment is now common
# as SC2GymEnv is no longer the standard.
# This is a simplified representation for the example.
class MicroEnv(gym.Env):
def __init__(self, map_name="AcropolisLE", num_marines: int = 1, num_zerglings: int = 1):
self.map_name = map_name
self.num_marines = num_marines
self.num_zerglings = num_zerglings
# Define action and observation spaces appropriate for your task
# This is highly dependent on the specific implementation of the bot's actions/observations
self.action_space = gym.spaces.Discrete(4) # Placeholder
self.observation_space = gym.spaces.Box(low=0, high=1, shape=(10,)) # Placeholder

# The environment needs methods like step, reset, render, close
# which would contain the game logic. For this example, we focus on the setup.

def get_bot(self):
"""Returns an instance of the bot with the configured parameters."""
return MicroBot(num_marines=self.num_marines, num_zerglings=self.num_zerglings)


class MicroBot(BotAI):
# __init__ no longer takes action_queue and obs_queue
def __init__(self, num_marines: int, num_zerglings: int):
super().__init__()
self.num_marines = num_marines
self.num_zerglings = num_zerglings
# ... rest of __init__ ...

async def on_start(self):
"""Spawns units based on the parameters."""
p = self.start_location
# Use the parameters to control the number of units created
await self._client.debug_create_unit([[UnitTypeId.MARINE, self.num_marines, p, 1]])
await self._client.debug_create_unit([[UnitTypeId.ZERGLING, self.num_zerglings, p.towards(self.enemy_start_locations[0], 8), 2]])

Step 2- Create a Curriculum Training Manager

Next, we need a script that manages the stage-by-stage training process. This script will load the model from the previous stage, create the new, harder environment, and continue training.

The following script, training_manager.py, implements the logic required to automate the curriculum.

training_manager.py (Implementation):

import numpy as np
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import SubprocVecEnv
from stable_baselines3.common.env_util import make_vec_env
from micro_bot import MicroEnv # Assuming MicroEnv is a gym.Env compatible wrapper

# A placeholder function for evaluation. In a real scenario, this would
# run several episodes without learning and calculate the win rate.
def evaluate_win_rate(model, env, num_episodes=100) -> float:
"""
Evaluates the win rate of the model over a number of episodes.
NOTE: This is a placeholder. A real implementation would need to
run the simulation and track wins vs. losses.
"""
# Placeholder logic: return a dummy value that increases over time
# to allow the curriculum to proceed.
# In a real implementation, you would run episodes and calculate true win rate.
print("Evaluating model... (using placeholder logic)")
if not hasattr(evaluate_win_rate, "win_rate"):
evaluate_win_rate.win_rate = 0.80 # Initial dummy win rate
else:
evaluate_win_rate.win_rate += 0.08 # Increase dummy win rate each time it's called

print(f"Current win rate: {evaluate_win_rate.win_rate:.2f}")
return min(evaluate_win_rate.win_rate, 1.0)


def run_curriculum(if_multiprocess:bool = False):
# --- STAGE 1: Kiting Fundamentals ---
print("--- Starting Stage 1: 1v1 ---")
# Using make_vec_env to create a vectorized environment
env_stage1 = make_vec_env(MicroEnv, n_envs=4 if if_multiprocess else 1,
vec_env_cls=SubprocVecEnv if if_multiprocess else None,
env_kwargs={'num_marines': 1, 'num_zerglings': 1})

model = PPO("MlpPolicy", env_stage1, verbose=1)

# Train until the graduation criterion is met
while evaluate_win_rate(model, env_stage1) < 0.95:
model.learn(total_timesteps=20_000)

model.save("model_stage1")
print("--- Stage 1 Complete. ---")
env_stage1.close()

# --- STAGE 2: Target Prioritization ---
print("--- Starting Stage 2: 2v1 ---")
env_stage2 = make_vec_env(MicroEnv, n_envs=4 if if_multiprocess else 1,
vec_env_cls=SubprocVecEnv if if_multiprocess else None,
env_kwargs={'num_marines': 2, 'num_zerglings': 1})

# Load the model and set the new environment
model = PPO.load("model_stage1")
model.set_env(env_stage2)

while evaluate_win_rate(model, env_stage2) < 0.98:
model.learn(total_timesteps=20_000)

model.save("model_stage2")
print("--- Stage 2 Complete. ---")
env_stage2.close()

# --- STAGE 3: Group Cohesion ---
print("--- Starting Stage 3: 2v2 ---")
env_stage3 = make_vec_env(MicroEnv, n_envs=4 if if_multiprocess else 1,
vec_env_cls=SubprocVecEnv if if_multiprocess else None,
env_kwargs={'num_marines': 2, 'num_zerglings': 2})

model = PPO.load("model_stage2")
model.set_env(env_stage3)

while evaluate_win_rate(model, env_stage3) < 0.90:
model.learn(total_timesteps=20_000)

model.save("final_micro_model")
print("--- Curriculum Complete! Final model saved. ---")
env_stage3.close()

if __name__ == '__main__':
# Set to True if you have configured your system for multiprocessing
# e.g. by wrapping the script entry point in if __name__ == '__main__':
run_curriculum(if_multiprocess=True)

This structured, staged approach provides a clear path for teaching a bot complex behaviors. By mastering simple skills first, the agent is much better equipped to find effective policies for the final, more difficult tasks.