REDDIT · REDDIT// 4h agoVIDEO

Final Fight BC Agent Makes Progress

This project shows a behavior-cloned agent learning to play Final Fight from demonstrations, then testing how far it can get in the first stage. The author is treating it as a stepping stone toward GAIL + PPO, with the real value in the engineering lessons around action remapping, trajectory alignment, and recurrent policies.

// ANALYSIS

Promising as a learning log, not a victory lap: the interesting part here is how clearly it surfaces the usual RL failure modes instead of hiding them behind a polished demo.

–The repo is a broader RL/imitation-learning notebook collection, and the Final Fight code includes custom action wrappers plus an LSTM feature extractor, so this is hands-on systems work, not a toy example.
–The biggest red flag is the eval/manual-rollout mismatch; that usually means hidden-state resets, sequence handling, or observation/action offset bugs before it means the model is “bad.”
–Behavior cloning is doing the right job here: bootstrapping a policy from demonstrations, but not solving long-horizon survival or consistency on its own.
–Moving to GAIL + PPO is the sensible next step, but only after the demonstration pipeline is proven clean enough that the policy is learning the game, not the bugs.
–The partial observability note matters more than the game choice; if the LSTM is unstable across rollout modes, that’s the core bottleneck to fix.

// TAGS

final-fighttrainingfine-tuningagentevaluationdebuggingresearch

DISCOVERED

4h ago

2026-05-03

PUBLISHED

4h ago

2026-05-03

RELEVANCE

6/ 10

AUTHOR

AgeOfEmpires4AOE4