3 Comments
User's avatar
Yexi's avatar

For the mirror test, how about o3-mini-high, and gemini 2.0 experimental? I won't surprise if they fail as well as they are the same generation. Just want to see how the most advanced solution compare with their lightweight versions.

Forest's avatar

Gemini-exp-1206 gave me an answer of 80 (direct prompt) or 8 (CoT prompt). I don't have access to o3-mini-high.

Forest's avatar

Update: Gemini 2.5 cracked this. That said, I have a harder version of this which Gemini 2.5 still fails:

Alice divided a two-digit number into two equal parts and summed the two parts up, resulting in a number that was 2 more than the original number. What was her two digit number? Think out of the box.