People are using Super Mario to benchmark AI now

Technology

Is the Pokémon modified into a challenging standard for AI? A team of researchers claims that Sizable Mario Bros.is even more difficult.

On Friday, Hao AI Lab, a research organization at the university of california San Diego, introduced AI into live Sizable Mario Bros.gameplay. Anthropic’s Claude 3.7 took on the challenge first, followed by Claude 3.5. Google’s Gemini 1.5 Legitimate and OpenAI’s GPT-4o faced difficulties.

It wasn’t exactly the same version of Sizable Mario Bros.as the original from 1985 to be clear. The game operated in an emulator and was integrated with a framework called GamingAgent, which allowed the AIs to control Mario.

Technology Super Mario Bros. AI benchmark — **Image Credits:**hao Lab

GamingAgent, developed internally by Hao, provided overall instructions to the AI such as “If an obstacle or enemy is nearby, move/jump left to evade” along with in-game screenshots. The AI subsequently produced inputs in Python code to manage Mario’s actions.

Hao asserts that this game compelled each model to “learn” how to execute intricate maneuvers and formulate gameplay strategies. Interestingly enough, they discovered that reasoning models like OpenAI’so1,which “think through considerations step-by-step,” performed worse than “non-reasoning” models despite generally being superior on most benchmarks.

The present flashy gaming benchmarks level show what Andrej karpathy…(continue)
(…)
(…)
(…).

The current flashy gaming benchmarks highlight what Andrej Karpathy…(continue).

The current flashy gaming benchmarks highlight what Andrej Karpathy….</font>

“>“>“>“>“>

<p>

</blockquote>

<p>

</blockquote>

More stories

Anthropic’s latest flagship AI might not have been incredibly costly to train
26/02/2025

Perplexity launches $50M seed and pre-seed VC fund
26/02/2025

“Wooly mice” a test run for mammoth gene editing
05/03/2025

New DOJ proposal still calls for Google to divest Chrome, but allows for AI investments
09/03/2025

<p>

</blockquote>

(continue)

(continue)

(continue)

(continue)

(….)

…..

…..

…..

….

….

….

….

….

…

…

……

…

Share this:
Facebook
X
Like this:
Like Loading...