Is thinking in visuals the next big thing for improving large reasoning models’ reasoning capabilities? What would it take for reasoning models to become visual thinkers?
For the mirror test, how about o3-mini-high, and gemini 2.0 experimental? I won't surprise if they fail as well as they are the same generation. Just want to see how the most advanced solution compare with their lightweight versions.
Update: Gemini 2.5 cracked this. That said, I have a harder version of this which Gemini 2.5 still fails:
Alice divided a two-digit number into two equal parts and summed the two parts up, resulting in a number that was 2 more than the original number. What was her two digit number? Think out of the box.
For the mirror test, how about o3-mini-high, and gemini 2.0 experimental? I won't surprise if they fail as well as they are the same generation. Just want to see how the most advanced solution compare with their lightweight versions.
Gemini-exp-1206 gave me an answer of 80 (direct prompt) or 8 (CoT prompt). I don't have access to o3-mini-high.
Update: Gemini 2.5 cracked this. That said, I have a harder version of this which Gemini 2.5 still fails:
Alice divided a two-digit number into two equal parts and summed the two parts up, resulting in a number that was 2 more than the original number. What was her two digit number? Think out of the box.