OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoVIDEO
Final Fight BC Agent Makes Progress
This project shows a behavior-cloned agent learning to play Final Fight from demonstrations, then testing how far it can get in the first stage. The author is treating it as a stepping stone toward GAIL + PPO, with the real value in the engineering lessons around action remapping, trajectory alignment, and recurrent policies.
// ANALYSIS
Promising as a learning log, not a victory lap: the interesting part here is how clearly it surfaces the usual RL failure modes instead of hiding them behind a polished demo.
- –The repo is a broader RL/imitation-learning notebook collection, and the Final Fight code includes custom action wrappers plus an LSTM feature extractor, so this is hands-on systems work, not a toy example.
- –The biggest red flag is the eval/manual-rollout mismatch; that usually means hidden-state resets, sequence handling, or observation/action offset bugs before it means the model is “bad.”
- –Behavior cloning is doing the right job here: bootstrapping a policy from demonstrations, but not solving long-horizon survival or consistency on its own.
- –Moving to GAIL + PPO is the sensible next step, but only after the demonstration pipeline is proven clean enough that the policy is learning the game, not the bugs.
- –The partial observability note matters more than the game choice; if the LSTM is unstable across rollout modes, that’s the core bottleneck to fix.
// TAGS
final-fighttrainingfine-tuningagentevaluationdebuggingresearch
DISCOVERED
4h ago
2026-05-03
PUBLISHED
4h ago
2026-05-03
RELEVANCE
6/ 10
AUTHOR
AgeOfEmpires4AOE4