๐ฎ Lmgame Bench: Leaderboard ๐ฒ
๐ฎ Welcome to LMGame Bench!
We invite developers to implement their own gaming agents by replacing our baseAgent
in customer_runner.py
and evaluate them on our comprehensive benchmark. Visit our repository at https://github.com/lmgame-org/GamingAgent to get started and join the competition to see how your agent performs!
๐ Data Visualization
๐ก Click a legend entry to isolate that model. Double-click additional ones to add them for comparison.
๐ก Click a legend entry to isolate that model. Double-click additional ones to add them for comparison.
๐น๏ธ Game Selection
๐ Super Mario Bros
๐ฆ Sokoban
๐ข 2048
๐ฌ Candy Crush
๐ฏ Tetris
โ๏ธ Ace Attorney
โฐ Time Tracker
๐ Controls
๐ Detailed Results
๐ก The slider above controls how many top models are shown in the radar chart, bar chart, and data table.
All data analysis can be replicated by checking this Jupyter notebook
Player | Organization | Avg Normalized
Score | Super Mario Bros
Score | Sokoban
Score | 2048
Score | Candy Crush
Score | Tetris
Score | Ace Attorney
Score | |
---|---|---|---|---|---|---|---|---|---|
llama-4-maverick-17b-128e-instruct-fp8 | anthropic | 83.51 | 1025.3 | 1.33 | 1882.67 | 557.67 | 13.67 | 1.33 |
Note: 'n/a' in the table indicates no data point for that model.
๐ Data Visualization
๐ก Click a legend entry to isolate that model. Double-click additional ones to add them for comparison.
๐ฎ Model Name (GamingAgent) - Our specialized gaming agents
๐ก Click a legend entry to isolate that model. Double-click additional ones to add them for comparison.
๐ฎ Model Name (GamingAgent) - Our specialized gaming agents
๐น๏ธ Game Selection
๐ Super Mario Bros
๐ฆ Sokoban
๐ข 2048
๐ฌ Candy Crush
๐ฏ Tetris
โ๏ธ Ace Attorney
โฐ Time Tracker
๐ Controls
๐ Detailed Results
๐ฎ Model Name (GamingAgent) - Our specialized gaming agents
Player | Organization | Avg Normalized
Score | Super Mario Bros
Score | Sokoban
Score | 2048
Score | Candy Crush
Score | Tetris
Score | Ace Attorney
Score | |
---|---|---|---|---|---|---|---|---|---|
๐ฎ gemini-2.5-pro-preview-05-06 (GamingAgent) | deepseek | 98.33 | 1498.3 | 2.33 | 3586.67 | 491.67 | 33.67 | n/a |
Player | Organization | Avg Normalized
Score | Super Mario Bros
Score | Sokoban
Score | 2048
Score | Candy Crush
Score | Tetris
Score | Ace Attorney
Score | |
---|---|---|---|---|---|---|---|---|---|
1 | ๐ฅ ๐ฎ o3-2025-04-16 (GamingAgent) | openai | 98.33 | 3445 | 8 | 7120 | 647 | 42 | 16 |
2 | ๐ฅ ๐ฎ o1-2024-12-17 (GamingAgent) | openai | 54.57 | 855 | 2.33 | 7580 | 159 | 35 | 16 |
3 | ๐ฅ ๐ฎ o4-mini-2025-04-16 (GamingAgent) | openai | 50.91 | 1448 | 5.33 | 4432 | 487.3 | 25.3 | 4 |
4 | ๐ฎ gemini-2.5-pro-preview-05-06 (GamingAgent) | google | 46.66 | 1498.3 | 4.33 | 3586.67 | 416.3 | 23.3 | 7 |
5 | ๐ฎ deepseek-r1-0528 (GamingAgent) | deepseek | 40.99 | n/a | 4.67 | 3330 | 491.67 | 33.67 | n/a |
Note: 'n/a' in the table indicates no data point for that model.