(2025-04-17) ZviM AI #112 Release The Everything

Zvi Mowshowitz: AI #112: Release the Everything. OpenAI has upgraded its entire suite of models. By all reports, they are back in the game for more than images. GPT-4.1 and especially GPT-4.1-mini are their new API non-reasoning models. All reports are that GPT-4.1-mini especially is very good. o3 is the new top of the line ChatGPT reasoning model, with o3-pro coming in a few weeks. Reports are that it too looks very good, even without us yet taking much advantage of its tool usage. If you have access, check it out. Full coverage is coming soon. There’s also o4-mini and o4-mini-high. Oh, they also made ChatGPT memory cover all your conversations, if you opt in, and gave us a version of Claude Code called Codex. And an update to their preparedness framework that I haven’t had time to examine yet.
Anthropic gave us (read-only for now) Google integration (as in GMail and Calendar to complement GDrive), and also a mode known as Research, which would normally be exciting but this week we’re all a little busy. Google and everyone else also gave us a bunch of new stuff. The acceleration continues.

Table of Contents

  • Not covered yet, but do go check them out: OpenAI’s o3 and o4-mini. Previously this week: GPT-4.1 is a Mini Upgrade, Open AI #13: Altman at TED and OpenAI Cutting Corners on Safety Testing
  • Language Models Offer Mundane Utility. But doctor, you ARE ChatGPT!
  • Language Models Don’t Offer Mundane Utility. Cuomo should have used o3.
  • Huh, Upgrades. ChatGPT now has full memory across conversations.
  • On Your Marks. A new benchmark for browsing agents.
  • Research Quickly, There’s No Time. Just research. It’s cleaner. Check your email.
  • Choose Your Fighter. Shoutouts to Google-AI-in-search and Mistral-Small-24B?
  • Deepfaketown and Botpocalypse Soon. Building your own AI influencer.
  • The Art of the Jailbreak. ChatGPT can now write its own jailbreaks.
  • Get Involved. Study with UT Austin, or work for Ted Cruz. We all make choices.
  • Introducing. Google offers agent development kit, OpenAI copies Claude Code.
  • In Other AI News. Oh no, please, not another social network.
  • Come on OpenAI, Again? Funny what keeps happening to the top safety people.
  • Show Me the Money. Thinking Machines and SSI.
  • In Memory Of. Ways to get LLMs real memory?
  • Quiet Speculations. What even is AGI anyway, and other questions.
  • America Restricts H20 Sales. We did manage to pull this one out.
  • House Select Committee Report on DeepSeek. What they found was trouble.
  • Tariff Policy Continues To Be How America Loses. It doesn’t look good.
  • The Quest for Sane Regulations. Dean Ball joins the White House, congrats man.
  • The Week in Audio. Hassabis, Davidson and several others.
  • Rhetorical Innovation. No need to fight, they can all be existential dangers.
  • Aligning a Smarter Than Human Intelligence is Difficult. Working among us.
  • AI 2027. Okay, fair.
  • People Are Worried About AI Killing Everyone. Critch is happy with A2A.
  • The Lighter Side. The numbers are scary. The words are scarier.

Language Models Offer Mundane Utility

Kate Pickert asserts in Bloomberg Why AI is Better Than Doctors at the Most Human Part of Medicine. AI can reliably express sympathy to match the situation, is always there to answer and doesn’t make you feel pressured or rushed. Even the gung ho doctors still saying things like ‘AI is not going to replace physicians, but physicians who know how to use AI are going to be at the top of their game going forward’ and saying how it ‘will allow doctors to be more human,’ and the article calls that an ‘ideal state.’ Isn’t it amazing how every vision of the future picks some point where it stops?

The US Government is deploying AI to clean up its personnel records and correct inaccurate information. That’s great if we do a good job. Translate dolphin vocalizations? Pin down where photographs were taken. It seems to be very good at this.

Language Models Don’t Offer Mundane Utility

Andrew Cuomo used ChatGPT for his snoozefest of a vacuous housing plan, which is fine except he did not check its work.

It’s actively good to use AI to help you, but this is not that. He didn’t even have someone check its work. If New York City elects Andrew Cuomo as mayor we deserve what we will get.

Apple’s demo of Siri’s new abilities to access reader emails and find real-time flight data and plot routes in maps came as news to the people working on Siri. In general Mac Rumors paints a picture of a deeply troubled and confused AI effort at Apple, with eyes very much not on the ball.

Huh, Upgrades

ChatGPT memory now extends to the full contents of all your conversations. You can opt out of this. You can also do incognito windows that won’t interact with your other chats. You can also delete select conversations. Noam Brown: Memory isn’t just another product feature. It signals a shift from episodic interactions (think a call center) to evolving ones (more like a colleague or friend). Still a lot of research to do but it’s a step toward fundamentally changing how we interact with LLMs.

I wonder if it is now time to build a tool to let one easily port their chat histories between various chatbots? Presumably this is actually easy, you can copy over the entire back-and-forth with and tags and paste it in, saying ‘this is so you can access these other conversations as context’ or what not?

Anna Gat is super gung ho on memory, especially on it letting ChatGPT take on the role of therapist. It can tell you your MBTI and philosophy and lead you to insights about yourself and take different points of view and other neat stuff like that. I am skeptical that doing this is the best idea, but different people work differently.

On Your Marks

LM Arena launches a ‘search Arena’ leaderboard, Gemini 2.5 Pro is on top with Perplexity-Sonar-Reasoning-Pro (high) slightly behind on presumably more compute.

A lot of getting good at using LLMs is figuring out how, or doing the necessary work, to give them the appropriate context. That includes you knowing that context too.

Research Quickly, There’s No Time

Well, well, what do we have here.
Anthropic: Today we’re launching Research, alongside a new Google Workspace integration. Claude now brings together information from your work and the web. Research represents a new way of working with Claude. It explores multiple angles of your question, conducting searches and delivering answers in minutes. The right balance of depth and speed for your daily work. Claude can also now connect with your Gmail, Google Calendar, and Google Docs. It understands your context and can pull information from exactly where you need it. Research is available in beta for Max, Team, and Enterprise plans in the United States, Japan, and Brazil. Separately, the Google Workspace integration is now available for all paid plans.
Oh. My. God. Huge if true! And by true I mean good at job. I’m excited for both features, but long term I’m more excited for Google integration than for research. Yes, this should 100% be Google Gemini’s You Had One Job, but Gemini is not exactly nailing it, so Claude? You’re up. Right now it’s read-only, and it’s been having trouble finding things and having proper access in my early tests, but I’m waiting until I try it more. Might be a few bugs to work out here.

Peter Wildeford:
Just tried “Claude Research”, it’s much faster (takes <1min) but much weaker than Gemini’s and ChatGPT’s “Deep Research” (fails to find key facts and sources that Gemini/ChatGPT do). I think it’s a great replacement for a Google quick dive but not for serious research. Claude integration with Google apps seems potentially awesome and I’d be curious how it compares against Gemini (which is surprisingly weak in this area given the home field advantage) But alas “Claude for Google Drive is not approved by Advanced Protection” so can’t yet try it. Still, I expect to use this a lot. Also LOL in the video how everyone compliments the meticulous sabbatical planning even though it was just a Claude copy+paste.

Choose Your Fighter


Edited:    |       |    Search Twitter for discussion

No Space passed/matched! - http://fluxent.com/wiki/2025-04-17-ZvimAi112ReleaseTheEverything