(2025-03-28) ZviM Gemini 2.5 Is The New SoTA

Zvi Mowshowitz: Gemini 2.5 is the New SoTA. Google Gemini 2.5 Pro Experimental is America’s next top large language model. That doesn’t mean it is the best model for everything. In particular, it’s still Gemini, so it still is a proud member of the Fun Police, in terms of censorship and also just not being friendly or engaging, or willing to take a stand.

If you want a friend, or some flexibility and fun, or you want coding that isn’t especially tricky, then call Claude, now with web access.
If you want an image, call GPT-4o.
But if you mainly want reasoning, or raw intelligence? For now, you call Gemini.

The feedback is overwhelmingly positive. Many report Gemini 2.5 is the first LLM to solve some of their practical problems, including favorable comparisons to o1-pro. It’s fast. It’s not $200 a month. The benchmarks are exceptional.

(On other LLMs I’ve used in the past and may use again when they update: I’ve stopped using Perplexity entirely now that Claude has web access, I never use r1, and I only use Grok narrowly for when I need exactly real time reactions from Twitter.)

Table of Contents

Introducing Gemini 2.5 Pro.
Their Lips are Sealed.
On Your Marks.
The People Have Spoken.
Adjust Your Projections.

Introducing Gemini 2.5 Pro

Gemini 2.5 models are thinking models, capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy.

Gemini 2.5 Pro is available now in Google AI Studio and in the Gemini app for Gemini Advanced users, and will be coming to Vertex AI soon. We’ll also introduce pricing in the coming weeks, enabling people to use 2.5 Pro with higher rate limits for scaled production use.

Their Lips are Sealed

The thing about this failure is that it is not simply irresponsible. It is also bad marketing, and therefore bad business. You want people seeing those details.

To be fair to Google, they’re against sharing information about their products in general. This isn’t unique to safety information. I don’t think it is malice, or them hiding anything. I think it’s operational incompetence. But we need to fix that.

On Your Marks

Note that Claude 3.7 really is a monster for coding.

The People Have Spoken

Adjust Your Projections

Edited: 2025-04-04 13:15:02.391248 | Tweet this! | Search Twitter for discussion

Bill Seitz