3 Comments
User's avatar
Yexi's avatar

For the mirror test, how about o3-mini-high, and gemini 2.0 experimental? I won't surprise if they fail as well as they are the same generation. Just want to see how the most advanced solution compare with their lightweight versions.

Expand full comment
Forest's avatar

Gemini-exp-1206 gave me an answer of 80 (direct prompt) or 8 (CoT prompt). I don't have access to o3-mini-high.

Expand full comment
Forest's avatar

Update: Gemini 2.5 cracked this. That said, I have a harder version of this which Gemini 2.5 still fails:

Alice divided a two-digit number into two equal parts and summed the two parts up, resulting in a number that was 2 more than the original number. What was her two digit number? Think out of the box.

Expand full comment