V2 Development and Experimentation¶
Recall that at the end of V1, I had concluded with three areas for improvement:
- Policy network improvements
- Training algorithm improvements
- End-to-end tensorization
End-to-end Tensorization¶
The training loop of V1 wasn't terribly slow, but I knew that scaling it up in its current form with either a more powerful policy network or training algorithm would put a significant drag on development. End-to-end tensorization, i.e., the expression of all simulation environment and training algorithms as tensors, would solve this with a major speedup. I decided to start with this with the hope of unlocking a step change in training speed.
Note
This is something that I wouldnt normally take on in a side project. Tensorization isn't rocket science, but it requires a sustained level of moderately high mental effort -- it would be like signing up for 3 hours of voluntary math homework.
This is a case where having AI code assist absolutely helped me achieve something that I wouldn't have taken on at all. All the cutting edge AI models were more than capable of converting the environment, and I was freed up to operate as a tech lead: thinking of useful utilities that would help build an effective testing harness and visualization compatibility.
In V1 I had implemented a batch training script, where multiple games were played at the same time, and update gradients were averaged across the games. Its trivial to generate a batch of predictions from a Policy Network, but the simulation environment was "vectorized" by simply running multiple times serially. In V2, we wanted our environment to manage the state of a batch of games by holding all the information in multi-dimensional arrays (in our case, torch tensors).
For example, in V1, this is how I handled kicks (the logic is that the ball is kicked by any player within a certain distance of the ball):
# Sum up all kick forces from players within kicking distance
total_kx = sum(
a.kx for p, a in zip(self.team_a, team_a_actions)
if self._distance_to_ball(p.position) <= MIN_KICKING_DISTANCE
) + sum(
a.kx for p, a in zip(self.team_b, team_b_actions)
if self._distance_to_ball(p.position) <= MIN_KICKING_DISTANCE
)
total_ky = sum(
a.ky for p, a in zip(self.team_a, team_a_actions)
if self._distance_to_ball(p.position) <= MIN_KICKING_DISTANCE
) + sum(
a.ky for p, a in zip(self.team_b, team_b_actions)
if self._distance_to_ball(p.position) <= MIN_KICKING_DISTANCE
)
In V2, this became:
# Calculate distances from players to ball
ball_expanded = self.ball_position.unsqueeze(1) # [batch_size, 1, 2]
team1_to_ball = torch.norm(self.team1_positions - ball_expanded, dim=2)
team2_to_ball = torch.norm(self.team2_positions - ball_expanded, dim=2)
# Determine players within kicking distance
team1_can_kick = team1_to_ball < MIN_KICKING_DISTANCE
team2_can_kick = team2_to_ball < MIN_KICKING_DISTANCE
# Calculate new ball velocity based on kicks
team1_kick_mask = team1_can_kick.float().unsqueeze(-1)
team1_total_kicks = torch.sum(team1_kick_mask * team1_kick_vel, dim=1)
team1_kickers_count = torch.sum(team1_kick_mask, dim=(1, 2))
team2_kick_mask = team2_can_kick.float().unsqueeze(-1)
team2_total_kicks = torch.sum(team2_kick_mask * team2_kick_vel, dim=1)
team2_kickers_count = torch.sum(team2_kick_mask, dim=(1, 2))
# Combine team 1 and 2
total_kicks = team1_total_kicks + team2_total_kicks
total_kickers = team1_kickers_count + team2_kickers_count
# Safely average
kicking_mask = (total_kickers > 0).float().unsqueeze(-1)
safe_kickers = torch.clamp(total_kickers.unsqueeze(-1), min=1.0)
averaged_kicks = total_kicks / safe_kickers
This is the un-sexy plumbing work of machine learning, but it resulted in significantly faster operations. Unfortunately, I still ran into trouble when trying to run this on a GPU.
Results¶
Environment | Device | Batch Size | Iterations per Second | Total Games per Second |
---|---|---|---|---|
V1 | CPU | 1 | 8 | 8 |
V1 | CPU | 64 | 0.5 | 32 |
V2 | CPU | 64 | 6 | 384 |
V2 | GPU | 64 | 0.5 | 32 |
V2 | GPU | 512 | 0.5 | 256 |