Feb 2, 2025

Is thinking in visuals the next big thing for improving large reasoning models’ reasoning capabilities? What would it take for reasoning models to become visual thinkers?

3 Comments

Yexi

Feb 3, 2025

For the mirror test, how about o3-mini-high, and gemini 2.0 experimental? I won't surprise if they fail as well as they are the same generation. Just want to see how the most advanced solution compare with their lightweight versions.

Reply (1)

Share

Forest

Feb 3, 2025Edited

Gemini-exp-1206 gave me an answer of 80 (direct prompt) or 8 (CoT prompt). I don't have access to o3-mini-high.

Reply (1)

Share

Forest

Mar 25, 2025Edited

Update: Gemini 2.5 cracked this. That said, I have a harder version of this which Gemini 2.5 still fails:

Alice divided a two-digit number into two equal parts and summed the two parts up, resulting in a number that was 2 more than the original number. What was her two digit number? Think out of the box.

Reply

Share

The Unscalable

Can Reasoning Models Think Visually?