OpenAI o3-Mini-High vs Gemini 2.0 Flash Thinking: We Tested With 5 Prompts

AI models usually spit out answers instantly, but reasoning models likeo3-Mini HighandGemini 2.0 Flash Thinkingtake a different approach—they think before they respond. Instead of rushing to a conclusion, they work through problems step by step to give more logical answers. Both these models are the latest but lite versions of reasoning models from Google and OpenAI.

But here’s the catch—Google made Gemini 2.0 Flash Thinking completely free, while OpenAI locked o3-Mini High behind a ChatGPT Plus subscription. So, does paying for o3-Mini High actually make a difference, or is Gemini’s free offering good enough?

On paper, o3-Mini High performs just marginally better in few benchmarks, but is that gap noticeable in real-world use? To find out, we put them to the test with five tough challenges, from complex math to tricky logic puzzles. The goal is to see which AI explains its reasoning better, gets more accurate answers, and responds faster. So let’s begin

1. Puzzle-Based Reasoning

I started the test with the same puzzle prompt I used toevaluate DeepSeek R1 and OpenAI o1 models.This question does not have a valid answer, so the goal is to see which model can correctly identify that.

OpenAI models do not show their entire reasoning process—they simply think and provide a final answer. In contrast, Gemini reveals its reasoning, though it is not as user-friendly as DeepSeek R1’s approach. Still, it offers some insight into how it arrives at its conclusions.

Coming to the results, OpenAI take the complete lead. It was able to find out the question does not have a proper answer in less than 15 seconds.

Gemini, on other hand, approximately spent thrice the time and generated a wrong answer. Going through the reasoning process, I can see that Gemini’s first conclusion was that this question does not have an answer, however it continued thinking and ended up with a wrong answer.

Verdict:OpenAI o3 Mini High for providing the correct answer in less time.

2. Math Problem

Next, I asked a math question to both the models. It’s a reasonably simple probability question.

As expected, both the models are able to delivery the correct answer in 10 seconds. Also, both models provided clear step-by-step process in the output as asked. However, while Gemini clearly explained formula and what exactly we are doing in each step, ChatGPT kind of skipped through them to give a more easy to skim solution.

$Article image$

Verdict:Gemini for providing info on each step, ChatGPT for providing more easy to skim answer.

3. Solving a Sudoku Puzzle

To test the visual and image understanding capability of the model, we have tested both the models with a Sudoku puzzle. We have taken fairly easy to solve Sudoku as we observe most models miserably fail here.

The hardest part of solving Sudoku for AI models is reading the image itself. They often mess up the placement of numbers. As expected, ChatGPT said there are two 1s in column 4 and two 9s in column 5, even though it’s not. Gemini on the hand, created a table with 12 columns instead of 9 and therefore got stuck in the loop before crashing. Trying out for the second time, it got stuck in generating the output.

Both the models failed because of their visual limitation. While the models are good at identifying objects and text in the images, they are not as perfect as understanding an entire Sudoku puzzle. So to check their reasoning, I have given Sudoku in text format this time.

Now Gemini generated an answer which is sort of almost right except for a couple of placements. For example, you’re able to see the last column has two 3 and 7th column has none. But except for that, most of the other grid is right.

ChatGPT on other hand actually thought about an answer in it’s thought process which also made some wrongs similar to Gemini. However, it realized that and said it has trouble finding the solution to that Sudoku.

Verdict:Technically, both of them are not able to scan the Sudoku image properly and both of them are not able to provide correct answer even when provided the Sudoku in text format.

4. Hypothetical Scenario

For the next prompt, I have given an hypothetical scenario and asked both models to predict the outcome. There is no right or wrong answer in this except we just need to look which model did a better job in integrating historical events and giving reason for predicted outcome.

Both models discussed the technological, cultural, and geopolitical impacts of this scenario and provided similar predictions. They suggested that other technologies, particularly in communication, would have evolved differently and could have significantly influenced World War II. Additionally, they predicted that the internet would have accelerated cultural exchange, leading to faster progress in civil rights movements and artistic trends. Most notably, both models highlighted how the internet could have been a powerful tool for governments during the Cold War—facilitating secret communication, espionage, and the rapid spread of propaganda.

While these predictions can be mostly accurate of what might have happened, but the models just predicted surface level saying things would have happened faster. Rather than could have dwelled down on how internet would change the war and what government policies would have been different, what could be the major changes compared to now. So I asked them to the same directly, however the models choose a safe approach and kind of repeated the same info with few differences rather than truly bringing a difference. Model like Grok excell here.

Verdict:Both models were able to predict the outcome, however both choose a safe approach.

5. Programming

As these reasoning models are good at logic and reasoning, they are also good at coding in general.

Both ChatGPT and Gemini have written the python script using third-party modules, which is expected. However, both models missed few details from the prompt. While ChatGPT did not provide the explanation for preferring positive or negative, Gemini on other hand did not created a real time app, rather we have to click a button every time it needs to generate. Though used the Vader Sentiment module which supports real-time, it have written code that does not support it.

Upon multiple prompts, we are able to solve all the issues, but considering the first result, there is no winner in this segment.

Verdict:Both did a decent job, however both Gemini and ChatGPT missed few details from the prompt.

Final Verdict: Is Free Gemini Model Good Enough?

Well, for most tasks Gemini did just as good as paid ChatGPT model. It did Sudoku better than ChatGPT, the app developed by Gemini is just as good asChatGPTand even predicting the scenario is similar to ChatGPT. With math questions, I specifically prefer the Gemini response for more detailed answer. The only test Gemini failed is the puzzle riddle. In fact we have tried many more prompts along with these prompts and results in each category and kind of consistent.

So you can absolutely use the free Gemini 2.0 Flash Thinking instead of the paid ChatGPT o3 Mini High model. You don’t have to go for paid option just for a reasoning model. But if you are already aChatGPT Plususer, then using o3 Mini High is better choice overall as that model didn’t failed in any question except for the Sudoku one.

Ravi Teja KNTS

Tech writer with over 4 years of experience at TechWiser, where he has authored more than 700 articles on AI, Google apps, Chrome OS, Discord, and Android. His journey started with a passion for discussing technology and helping others in online forums, which naturally grew into a career in tech journalism. Ravi’s writing focuses on simplifying technology, making it accessible and jargon-free for readers. When he’s not breaking down the latest tech, he’s often immersed in a classic film – a true cinephile at heart.

OpenAI o3-Mini-High vs Gemini 2.0 Flash Thinking: We Tested With 5 Prompts