Discussion about this post

User's avatar
Yexi's avatar

For the mirror test, how about o3-mini-high, and gemini 2.0 experimental? I won't surprise if they fail as well as they are the same generation. Just want to see how the most advanced solution compare with their lightweight versions.

Expand full comment
2 more comments...

No posts