1.3 - Our Solution - The Gym Environment Wrapper
To resolve the async
vs. sync
conflict, we will architect a solution using a well-established software design pattern: an Adapter. We will create a custom Python class that acts as a bridge, presenting a standard, synchronous gymnasium
interface to the RL agent, while internally managing the complex, asynchronous python-sc2
game process.
This wrapper is the cornerstone of our entire RL implementation.
The gymnasium.Env
API Contract
stable-baselines3
requires its environments to adhere to the gymnasium.Env
API. Our wrapper must implement this contract precisely.
Method Signature | Role in Our Wrapper |
---|---|
__init__(self, ...) | Defines the action_space and observation_space . Creates the communication queues. |
reset(self) | Episode Start. Terminates any old game process, launches a new one, waits for the initial observation, and returns it. |
step(self, action) | Core Interaction. Takes an action from the agent, puts it on the action queue, waits for the game to process it, retrieves the result from the observation queue, and returns it. |
close(self) | Cleanup. Ensures the python-sc2 game process is properly terminated when training finishes. |
The Implementation Plan
Our SC2GymEnv
wrapper will be responsible for orchestrating the two processes. Our BotAI
subclass will also need to be modified to listen for commands.
-
SC2GymEnv
(The Wrapper - Process 1):- Inherit from
gymnasium.Env
. - Initialize
multiprocessing.Queue
for actions and observations. - Implement
reset
to manage the game process lifecycle. - Implement
step
to pass data between queues. - Implement
close
for clean shutdown.
- Inherit from
-
RLBotAI
(The Game Actor - Process 2):- Be initialized with the communication queues.
- In
on_step
, get anaction
from the action queue (blocking). - Execute the action in the game world.
- Calculate the
observation
andreward
. - Put the
(obs, reward, terminated, ...)
tuple on the observation queue.
System Architecture Diagram
This diagram illustrates the complete, two-process architecture with our SC2GymEnv
wrapper as the bridge.
+--------------------------+ +---------------------------+ +---------------------------+
| stable-baselines3 | | SC2GymEnv Wrapper | | python-sc2 Game Proc. |
| (Training Loop) | | (Our Adapter) | | (Our RLBotAI) |
+--------------------------+ +---------------------------+ +---------------------------+
| | |
| calls env.step(action) | |
|------------------------------>| action_q.put(action) |
| |--------------------------------->| action = action_q.get()
| | |
| | | ...plays game step...
| | |
| | | new_obs, ... = ...
| | | obs_q.put(new_obs, ...)
| | obs, ... = obs_q.get() <---------|------------------------
| | |
| returns (obs, reward, ...) | |
|<------------------------------| |
| | |
Synchronous Process 1 Inter-Process Queues Asynchronous Process 2
With this design, the RL agent operates within its standard synchronous loop, completely abstracted from the asynchronous nature of the game it is learning to control. In the next chapter, we will implement this SC2GymEnv
class.