(2025-06-18) Zvim Gemini25 Pro From0506 To0605

Zvi Mowshowitz: Google Gemini 2.5 Pro: From 0506 to 0605. Google recently came out with Gemini-2.5-0605, to replace Gemini-2.5-0506, because I mean at this point it has to be the companies intentionally fucking with us, right?

The general consensus seems to be that this was a mixed update the same way going from 0304 to 0506 was a mixed update.
If you want to do the particular things they were focused on improving, you’re happy. If you want to be told you are utterly brilliant, we have good news for you as well.
If you don’t want those things, then you’re probably sad. If you want to maximize real talk, well, you seem to have been outvoted. Opinions on coding are split.

This post also covers the release of Gemini 2.5 Flash Lite.

What’s In A Name

I do not think this constant ‘here is the new model and you are about to lose the old version’ is good for developers? I would not want this to be constantly sprung on me. Even if the new version is better, it is different, and old assumptions won’t hold.

On Your Marks

Gallabytes: new Gemini is quite strong, somewhere between Claude 3.7 and Claude 4 as far as agentic coding goes. significantly cheaper, more likely to succeed at one shotting a whole change vs Claude, but still a good bit less effective at catching & fixing its own mistakes.

It’s got a solid lead on what’s left of the text LMArena, but then that’s also a hint that you’re likely going to have a sycophancy issue.

This is a big win for Gemini 2.5, with their models the only ones on the Pareto frontier for Arena, but it doesn’t reflect real world utility and it suggests that they got there by caring about Arena. There are a number of things Gemini does that are good for Arena, but that are not good for my experience using Gemini, and as we update I worry this is getting worse.

The thing about Gemini 2.5 Flash Lite is you get the 1 million token context window, full multimodal support and reportedly solid performance for many purposes for a very low price, $0.10 per million input tokens and $0.40 per million output, plus caching and a 50% discount if you batch. That’s a huge discount even versus regular 2.5 Flash (which is $0.30/$2.50 per million) and for comparison o3 is $1/$4 and Opus is $15/$75 (but so worth it when you’re talking, remember it’s absolute costs that matter not relative costs).

Gemini 2.5 Flash Lite Preview

This too is being offered.

The Model Report

We finally have a complete 70-page report on everything Gemini 2.5, thread here. It’s mostly a trip down memory lane, the key info here are things we already knew.

Safety Testing

Section 5 is the safety report. I’ve covered a lot of these in the past, so I will focus on details that are surprising. The main thing I notice is that Google cares a lot more about mundane ‘don’t embarrass Google’ concerns than frontier safety concerns.

General Reactions to Gemini 2.5-Pro 0605

Reaction was mixed, it improves on the central tasks people ask for most, although this comes at a price elsewhere, especially in personality as seen in the next section.

The Personality Upgrades Are Bad

This has also been my experience, the times I’ve tried checking Gemini recently alongside other models, you get that GPT-4o smell.
The problem is that the evaluators have no taste. If you are optimizing for ‘personality,’ the judges of personality effectively want a personality that is sycophantic, uncreative and generally bad.

Also this, which I hate with so much passion and is a pattern with Gemini:
Alex Krusz: Feels like it’s been explicitly told not to have opinions.

The Lighter Side


Edited:    |       |    Search Twitter for discussion