o3-mini Vs. 4o! The Bouncing Ball In A Hexagon Test Is Back And Upgraded 4o Seems To Be The Best.

Who did it better? o3 or 4o? The bouncing ball inside a hexagon test is back, and this time it’s the upgraded ChatGPT 4o’s turn to take the test. In the video, we’re displaying the o3 mini version alongside the 4o version, and it seems that the 4o model is handling the physics in a more natural way, with less errors.

The o3-mini model was previously shown to be considerably better than the DeepSeek R1 model at this challenge, so this is another step forward and the gap is widening. I wonder how long it will be before we look back with amusement at the fact that this was one of the toughest challenges for AI to take on. Not very long, I suspect.

more insights