
Mano-P agent plays Chinese mahjong via pure vision
Mininglamp AI showcased its open-source Mano-P GUI-VLA agent playing Chinese Mahjong entirely through screen vision and mouse clicks. The demonstration serves as a brutal stress test for the model's ability to operate in complex, unstructured visual environments without underlying APIs.
Testing a GUI agent on Mahjong is a brilliant flex that proves visual-action models are graduating past predictable web DOMs into messy, unstructured visual spaces.
- –Mano-P relies on raw pixel perception, making decisions based purely on the screen state without any backend hooks or game data
- –The game demands high visual precision to distinguish intricate tiles and fast reasoning to react to opponent actions
- –Unlike cloud-dependent models, Mano-P is heavily optimized to run locally on consumer edge hardware like M4 Mac minis
- –The project currently tops the OSWorld benchmark for specialized GUI models, offering a compelling open-source alternative for computer-use tasks
DISCOVERED
9h ago
2026-05-28
PUBLISHED
12h ago
2026-05-28
RELEVANCE
AUTHOR
Enough-Astronaut9278