๐ฎ Lmgame Bench: Leaderboard ๐ฒ
๐ Data Visualization
๐ก Click a legend entry to isolate that model. Double-click additional ones to add them for comparison.
๐ก Click a legend entry to isolate that model. Double-click additional ones to add them for comparison.
๐ฎ Game Selection
๐ฎ Super Mario Bros
๐ฆ Sokoban
๐ข 2048
๐ฌ Candy Crush
๐ฏ Tetris
โ๏ธ Ace Attorney
โฐ Time Tracker
๐ Controls
๐ Detailed Results
All data analysis can be replicated by checking this Jupyter notebook
Player | Organization | Super Mario Bros
Score | Sokoban
Score | 2048
Score | Candy Crush
Score | Tetris
Score | Ace Attorney
Score | |
---|---|---|---|---|---|---|---|---|
gamingagent + llama-4-maverick-17b-128e-instruct-fp8 | anthropic | 1498.3 | 2.33 | 3586.67 | 491.67 | 33.67 | 3.67 |
Note: 'n/a' in the table indicates no data point for that model.
๐ Data Visualization
๐ก Click a legend entry to isolate that model. Double-click additional ones to add them for comparison.
๐ก Click a legend entry to isolate that model. Double-click additional ones to add them for comparison.
๐ฎ Game Selection
๐ฎ Super Mario Bros
๐ฆ Sokoban
๐ข 2048
๐ฌ Candy Crush
๐ฏ Tetris
โ๏ธ Ace Attorney
โฐ Time Tracker
๐ Controls
๐ Detailed Results
Player | Organization | Super Mario Bros
Score | Sokoban
Score | 2048
Score | Candy Crush
Score | Tetris
Score | Ace Attorney
Score | |
---|---|---|---|---|---|---|---|---|
llama-4-maverick-17b-128e-instruct-fp8 | anthropic | 1540.7 | 1.33 | 1738.67 | 557.67 | 13.67 | 1.33 |
Player | Organization | Super Mario Bros
Score | Sokoban
Score | 2048
Score | Candy Crush
Score | Tetris
Score | Ace Attorney
Score | |
---|---|---|---|---|---|---|---|---|
1 | o3-2025-04-16 | openai | 1955 | 2 | 7220 | 106 | 31 | 8 |
2 | o1-2024-12-17 | openai | 1434 | 0 | 7176 | 90 | 13 | 3 |
3 | claude-sonnet-4-20250514 | anthropic | n/a | 0 | 3844 | 557.67 | 13.67 | 1.33 |
4 | claude-3-7-sonnet-20250219 | anthropic | 1430 | 0 | 2624 | 126.3 | 13 | 3 |
5 | gemini-2.5-flash-preview-05-20 | google | n/a | 0 | 2750 | 254 | 16 | 2.33 |
6 | gemini-2.5-flash-preview-04-17 | google | 1540.7 | 0 | 1738.67 | 97.7 | 19 | 1 |
7 | o4-mini-2025-04-16 | openai | 1348.3 | 1.33 | 1882.67 | 110.7 | 15 | 2 |
8 | gemini-2.5-pro-preview-06-05 | google | n/a | 0.33 | 2232 | 496 | 13.67 | 1.33 |
9 | gpt-4.1-2025-04-14 | openai | 1991.3 | 0 | 1113.33 | 101 | 13 | 0 |
10 | random (x30) | unknown | 986.97 | 0 | 1228 | 116.5 | 10.2 | 0 |
11 | claude-3-5-sonnet-20241022 | anthropic | 1540 | 0 | 84 | 17 | 12.3 | 1 |
12 | gemini-2.5-pro-preview-05-06 | google | 1025.3 | 1 | 120.5 | 177.3 | 12.3 | 8 |
13 | gpt-4o-2024-11-20 | openai | 1028.3 | 0 | 176 | 59 | 14.7 | 0 |
14 | llama-4-maverick-17b-128e-instruct-fp8 | meta | 786 | 0 | 28 | 32.3 | 11.7 | 0 |
Note: 'n/a' in the table indicates no data point for that model.