Skip to main content

4.1 - Goal - Learning an Economic Trade-off

Having successfully trained an agent on a single task, we now introduce a core element of strategy games: decision-making under constraints. Our second agent will learn to manage the fundamental economic trade-off between building more workers (increasing income) and building more supply (increasing capacity).

This task is designed to teach the agent to prioritize actions based on a dynamic game state, moving beyond a single, repetitive goal.

The Conceptual Challenge- Balancing Competing Goods

The agent must learn that both actions are "good" but that one is often more critical than the other depending on the situation.

+------------------------+      +-----------------------------+
| Action 1: Build SCV | | Action 2: Build Depot |
+------------------------+ +-----------------------------+
| PROS: | | PROS: |
| + Increases Income | | + Unlocks Army Growth |
| CONS: | | CONS: |
| - Consumes Supply | | - Does not increase Income |
| | | |
+-----------^------------+ +-------------^---------------+
| |
| WHICH ONE IS BETTER |
| RIGHT NOW? |
+--------------------------------+
|
+-------------v---------------+
| The Agent's Policy |
+-----------------------------+

Success Criteria

  • The agent must learn to continuously produce workers.
  • The agent must learn to proactively build Supply Depots to avoid being supply-blocked.
  • The agent must learn to prioritize depot construction when supply_left is low, even if it has enough minerals to build a worker.
  • The episode must terminate successfully upon reaching a target of 30 workers.

Environment Design

1. Observation Space Specification

To make an informed decision, the agent now needs full visibility into the supply situation.

  • Gymnasium Type: gymnasium.spaces.Box
  • Shape: (4,)
  • Data Type: numpy.float32
IndexSource FeatureNormalizationPurpose
0self.minerals1000.0"Can I afford my chosen action?"
1self.workers.amount50.0"How is my economic progress?"
2self.supply_used200.0"How much pressure is on my supply?"
3self.supply_cap200.0"What is my current supply limit?"

2. Action Space Specification

The action space is expanded to allow for the new building choice.

  • Gymnasium Type: gymnasium.spaces.Discrete
  • Size: 3
Action ValueAgent's Intent
0Do Nothing
1Build Worker (SCV)
2Build Supply (Supply Depot)

3. Reward Function Specification

The reward function is designed to heavily punish the primary failure state (getting supply-blocked) while lightly encouraging all productive actions.

Pseudocode Logic:

function get_reward(action, game_state, last_supply_left):
reward = 0

# Event-Based Penalty for a critical failure
if game_state.supply_left == 0 and last_supply_left > 0:
reward -= 10

# Sparse Rewards for productive actions
if action == 1 and successfully_trained_worker:
reward += 1
elif action == 2 and successfully_built_depot:
reward += 2 // Slightly higher to incentivize this crucial task

return reward

This design forces the agent to develop a more sophisticated policy. It cannot simply learn to always build workers; it must learn to pay attention to its supply and act preemptively to avoid the large penalty.