๐ŸŽฎ Lmgame Bench: Leaderboard ๐ŸŽฒ

๐ŸŽฎ Welcome to LMGame Bench!

We invite developers to implement their own gaming agents by replacing our baseAgent in customer_runner.py and evaluate them on our comprehensive benchmark. Visit our repository at https://github.com/lmgame-org/GamingAgent to get started and join the competition to see how your agent performs!

๐Ÿ“Š Data Visualization

1 14

๐Ÿ’ก Click a legend entry to isolate that model. Double-click additional ones to add them for comparison.

๐Ÿ•น๏ธ Game Selection

๐Ÿ„ Super Mario Bros

๐Ÿ“ฆ Sokoban

๐Ÿ”ข 2048

๐Ÿฌ Candy Crush

๐ŸŽฏ Tetris

โš–๏ธ Ace Attorney

โฐ Time Tracker

03/25/2025

๐Ÿ”„ Controls

๐Ÿ“‹ Detailed Results

๐Ÿ’ก The slider above controls how many top models are shown in the radar chart, bar chart, and data table.

All data analysis can be replicated by checking this Jupyter notebook

Player
Organization
Avg Normalized Score
Super Mario Bros Score
Sokoban Score
2048 Score
Candy Crush Score
Tetris Score
Ace Attorney Score
llama-4-maverick-17b-128e-instruct-fp8
anthropic
83.51
1025.3
1.33
1882.67
557.67
13.67
1.33

Note: 'n/a' in the table indicates no data point for that model.