BACK_TO_FEEDAICRIER_2
Final Fight BC Agent Makes Progress
OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoVIDEO

Final Fight BC Agent Makes Progress

This project shows a behavior-cloned agent learning to play Final Fight from demonstrations, then testing how far it can get in the first stage. The author is treating it as a stepping stone toward GAIL + PPO, with the real value in the engineering lessons around action remapping, trajectory alignment, and recurrent policies.

// ANALYSIS

Promising as a learning log, not a victory lap: the interesting part here is how clearly it surfaces the usual RL failure modes instead of hiding them behind a polished demo.

  • The repo is a broader RL/imitation-learning notebook collection, and the Final Fight code includes custom action wrappers plus an LSTM feature extractor, so this is hands-on systems work, not a toy example.
  • The biggest red flag is the eval/manual-rollout mismatch; that usually means hidden-state resets, sequence handling, or observation/action offset bugs before it means the model is “bad.”
  • Behavior cloning is doing the right job here: bootstrapping a policy from demonstrations, but not solving long-horizon survival or consistency on its own.
  • Moving to GAIL + PPO is the sensible next step, but only after the demonstration pipeline is proven clean enough that the policy is learning the game, not the bugs.
  • The partial observability note matters more than the game choice; if the LSTM is unstable across rollout modes, that’s the core bottleneck to fix.
// TAGS
final-fighttrainingfine-tuningagentevaluationdebuggingresearch

DISCOVERED

4h ago

2026-05-03

PUBLISHED

4h ago

2026-05-03

RELEVANCE

6/ 10

AUTHOR

AgeOfEmpires4AOE4