7.2 - What to Look For in Training Graphs
TensorBoard provides a rich suite of metrics, but effective diagnostics requires focusing on a few key indicators. These graphs tell a story about the health and progress of your agent's learning process. By learning to interpret their patterns, you can quickly diagnose issues with your environment design or hyperparameter tuning.
All the following metrics are found in the SCALARS tab of the TensorBoard dashboard.
Primary Diagnostic Metrics
This table outlines the four most critical metrics for assessing a training run.
Metric (TensorBoard Tag) | Primary Question Answered | Healthy Pattern (Desired Trend) | Problem Pattern (Indicates an Issue) |
---|---|---|---|
rollout/ep_rew_mean | "Is the agent learning?" | Steadily increasing. ^ ` | .·´<br /> |
train/loss | "Is the training process stable?" | Steadily decreasing. ^ ` | ·<br /> |
train/entropy_loss | "Is the agent exploring enough?" | Gradually decreasing. ^ ` | ·<br /> |
rollout/ep_len_mean | "Is the agent achieving its goal efficiently?" | Depends on the task goal. For "reach a goal" tasks (Worker/Macro bots), a decreasing length is good (it's getting faster). For "survival" tasks (Micro bot), an increasing length is good. | The trend opposes the task goal. If the Micro bot's episode length is decreasing, it's learning to die faster. This is a classic sign that the reward function is inadvertently incentivizing the wrong outcome. |
A Structured Diagnostic Flowchart
When analyzing a new training run, follow this structured process to quickly identify potential issues.
+---------------------------------+
| Start: View TensorBoard |
+---------------------------------+
|
v
+---------------------------------+
| 1. Is `ep_rew_mean` trending UP?|--[NO]--> Investigate Reward Function Design
+---------------------------------+
|
[YES]
|
v
+---------------------------------+
| 2. Is `loss` trending DOWN? |--[NO]--> Investigate Hyperparameters (esp. learning_rate)
+---------------------------------+
|
[YES]
|
v
+---------------------------------+
| 3. Does `ep_len_mean` match |--[NO]--> Re-evaluate Reward Function for unintended consequences
| the task goal? |
+---------------------------------+
|
[YES]
|
v
+---------------------------------+
| Training run appears healthy. |
| Continue monitoring. |
+---------------------------------+
By methodically answering these questions, you can efficiently diagnose the state of your training process and determine where to focus your debugging efforts.