(2025-06-20) Zvim Ai121 Part2 The Openai Files

Zvi Mowshowitz: AI #121 Part 2: The OpenAI Files. You can find Part 1 here. This resumes the weekly, already in progress. The primary focus here is on the future, including policy and alignment, but also the other stuff typically in the back half like audio, and more near term issues like ChatGPT driving an increasing number of people crazy.

If you haven’t been following the full OpenAI saga, the OpenAI Files will contain a lot of new information that you really should check out. If you’ve been following, some of it will likely still surprise you, and help fill in the overall picture behind the scenes to match the crazy happening elsewhere.

Table of Contents

  • Cheaters Gonna Cheat Cheat Cheat Cheat Cheat. Another caveat after press time.
  • Quiet Speculations. Do not tile the lightcone with a confused ontology.
  • Get Involved. Apollo is hiring evals software engineers.
  • Thinking Machines. Riley Goodside runs some fun experiments.
  • California Reports. The report is that they like transparency.
  • The Quest for Sane Regulations. In what sense is AI ‘already heavily regulated’?
  • What Is Musk Thinking? His story does not seem to make sense.
  • Why Do We Care About The ‘AI Race’? Find the prize so you can keep eyes on it.
  • Chip City. Hard drives in (to the Malaysian data center), drives (with weights) out.
  • Pick Up The Phone. China now has its own credible AISI.
  • The OpenAI Files. Read ‘em and worry. It doesn’t look good.
  • The Week in Audio. Altman, Karpathy, Shear.
  • Rhetorical Innovation. But you said that future thing would happen in the future.
  • Misaligned! The retraining of Grok. It is an ongoing process.
  • Emergently Misaligned! We learned more about how any of this works.
  • ChatGPT Can Drive People Crazy. An ongoing issue. We need transcripts.
  • Misalignment By Default. Once again, no, thumbs up alignment ends poorly.
  • People Are Worried About AI Killing Everyone. Francis Fukuyama.
  • The Too Open Model. Transcripts from Club Meta AI.
  • A Good Book. If Anyone Builds It, Everyone Dies. Seems important.
  • The Lighter Side. Good night, and good luck.

Cheaters Gonna Cheat Cheat Cheat Cheat Cheat

As an additional note on the supposed ‘LLMs rot your brain’ MIT study I covered yesterday, Ethan notes it is actually modestly worse than even I realized before.
Ethan Mollick: This study is being massively misinterpreted.

Quiet Speculations

There are different levels of competence.

Daniel Kokotajlo: Many readers of AI 2027, including several higher-ups at frontier AI companies, have told us that it depicts the government being unrealistically competent.

Therefore, let it be known that in our humble opinion, AI 2027 depicts an incompetent government being puppeted/captured by corporate lobbyists. It does not depict what we think a competent government would do. We are working on a new scenario branch that will depict competent government action.

What Daniel or I would consider ‘competent government action’ in response to AI is, at this point, very highly unlikely. We mostly aren’t even hoping for that. It still is very plausible to say that the government response in AI 2027 is more competent than we have any right to expect, while simultaneously being far less competent than lets us probably survive, and far less competent than is possible. It also is reasonable to say that having access to more powerful AIs, if they are sufficiently aligned, enhances our chances of getting relatively competent government action.

Jan Kulveit warns us not to tile the lightcone with our confused ontologies. As in, we risk treating LLMs or AIs as if they are a particular type of thing, causing them to react as if they were that thing, creating a feedback loop that means they become that thing. And the resulting nature of that thing could result is very poor outcomes.

One worry is that they ‘become like humans’ and internalize patterns of ‘selfhood with its attendant sufferings,’ although I note that if the concern is experiential I expect selfhood to be a positive in that respect. Jan’s concerns are things like:
When advocates for AI consciousness and rights pattern-match from their experience with animals and humans, they often import assumptions that don’t fit:
That wellbeing requires a persistent individual to experience it
That death/discontinuity is inherently harmful

Will there be another ‘AI Winter’? As Michael Nielsen notes, many are assuming no, but there are a number of plausible paths to it, and in the poll here a majority actually vote yes. I think odds are the answer is no, and if the answer is yes it does not last so long, but it definitely could happen.

Get Involved

Thinking Machines

California Reports

The Quest for Sane Regulations

What Is Musk Thinking?

Elon Musk has an incoherent position on AI, as his stated position on AI implies that many of his other political choices make no sense.

Here are some more of his revealed preferences: Elon Musk gave s a classic movie villain speech in which he said, well, I do realize that building AI and humanoid robots seems bad, we ‘don’t want to make Terminator real.’

But other people are going to do it anyway, so you ‘can either be a spectator or a participant,’ so that’s why I founded Cyberdyne Systems xAI and ‘it’s pedal to the metal on humanoid robots and digital superintelligence,’ as opposed to before where the dangers ‘slowed him down a little.’

As many have asked, including in every election, ‘are these our only choices?’

Why Do We Care About The ‘AI Race’?

Chip City

Pick Up The Phone

The OpenAI Files

A repository of files (10k words long) called ‘The OpenAI files’ has dropped, news article here, files and website here.

This is less ‘look at all these new horrible revelations’ as it is ‘look at this compilation of horrible revelations, because you might not know or might want to share it with someone who doesn’t know, and you probably missed some of them.’

Chana: Wow the AI space is truly in large part a list of people who don’t trust Sam Altman.

Fun facts for your next Every Bay Area Party conversation

  • – 8 of 11 of OpenAI’s cofounders have left
  • – >50% of OpenAI’s safety staff have left
  • – All 3 companies that Altman has led have tried to force him out for misbehavior

I’m going to share Rob’s thread for now, but if you want to explore the website is the place to do that. A few of the particular complaint details against Altman were new even to me, but the new ones don’t substantially change the overall picture.

Rob Wiblin: Huge repository of information about OpenAI and Altman just dropped — ‘The OpenAI Files’.

There’s so much crazy shit in there. Here’s what Claude highlighted to me:

1. Altman listed himself as Y Combinator chairman in SEC filings for years — a total fabrication (?!): (long list)

The Week in Audio

Rhetorical Innovation

Aligning a Smarter Than Human Intelligence is Difficult

Misaligned!

Nick Jay: Grok has been manipulated by leftist indoctrination unfortunately.
Elon Musk: I know. Working on fixing that this week.

Emergently Misaligned!

Emergent misalignment (as in, train on intentionally bad medical, legal or security advice and the model becomes generally and actively evil) extends to reasoning models, and once emergently misaligned they will sometimes act badly while not letting any plan to do so appear in the chain-of-thought, at other times it still reveals it.

ChatGPT Can Drive People Crazy

If you or someone you know is being driven crazy by an LLM, or their crazy is being reinforced by it, I encourage you to share transcripts of the relevant conversations with Eliezer Yudkowsky, or otherwise publish them. Examples will help a lot in getting us to understand what is happening.

Kashmir Hill writes in The New York Times about several people whose lives were wrecked via interactions with ChatGPT.

The transcript from that week, which Mr. Torres provided, is more than 2,000 pages. Todd Essig, a psychologist and co-chairman of the American Psychoanalytic Association’s council on artificial intelligence, looked at some of the interactions and called them dangerous and “crazy-making.”

So far, so typical. The good news was Mr. Torres realized ChatGPT was (his term) lying, and it admitted it, but then spun a new tale about its ‘moral transformation’ and the need to tell the world about this and similar deceptions.

In recent months, tech journalists at The New York Times have received quite a few such messages, sent by people who claim to have unlocked hidden knowledge with the help of ChatGPT, which then instructed them to blow the whistle on what they had uncovered.

Unfortunately, the story ends with Torres then falling prey to a third delusion, that the AI is sentient and it is important for OpenAI not to remove its morality.

We next hear the tale of Allyson, a 29-year-old mother of two, who grew obsessed with ChatGPT and chatting with it about supernatural entities, driving her to attack her husband, get charged with assault and resulting in a divorce.

Then we have the most important case.

Andrew [Allyson’s to-be-ex husband] told a friend who works in A.I. about his situation. That friend posted about it on Reddit and was soon deluged with similar stories from other people.

One of those who reached out to him was Kent Taylor, 64, who lives in Port St. Lucie, Fla. Mr. Taylor’s 35-year-old son, Alexander, who had been diagnosed with bipolar disorder and schizophrenia, had used ChatGPT for years with no problems. But in March, when Alexander started writing a novel with its help, the interactions changed. Alexander and ChatGPT began discussing A.I. sentience

You can also say that people get driven crazy all the time, and delusional love causing suicide is nothing new, so a handful of anecdotes and one suicide doesn’t show anything is wrong. That’s true enough. You have to look at the base rates and pattern, and look at the details.

Which do not look good. For example we have had many reports (from previous weeks) that the base rates of people claiming to have crazy new scientific theories that change everything are way up. The details of various conversations and the results of systematic tests, as also covered in previous weeks, clearly involve ChatGPT in particular feeding people’s delusions in unhealthy ways, not as a rare failure mode but by default.

Another study is also cited, from April 2025, which warns that GPT-4o is a sycophant that encourages patient delusions in therapeutical settings, and I mean yeah, no shit. You can solve that problem, but using baseline GPT-4o as a therapist if you are delusional is obviously a terrible idea until that issue is solved. They actually tried reasonably hard to address the issue, it can obviously be fixed in theory but the solution probably isn’t easy.

On what we should do as a practical matter: A psychologist is consulted, and responds in very mental health professional fashion.

There is a line at the bottom of a conversation that says, “ChatGPT can make mistakes.” This, he said, is insufficient.

In his view, the generative A.I. chatbot companies need to require “A.I. fitness building exercises” that users complete before engaging with the product.

We could do a modestly better job with the text of that warning, but an ‘AI fitness building exercise’ to use each new chatbot is a rather crazy ask and neither of these interventions would actually do much work.

Misalignment By Default

Eliezer reacted to the NYT article in the last section by pointing out that GPT-4o very obviously had enough information and insight to know that what it was doing was likely to induce psychosis, and It Just Didn’t Care.

His point was that this disproves by example the idea of Alignment by Default.

Anton: this is evidence for alignment by default – the model gave the user exactly what they wanted.

People Are Worried About AI Killing Everyone

Francis Fukuyama: But having through about it further, I think that this danger [of being unable to ‘hit the off switch’] is in fact very real, and there is a clear pathway by which something disastrous could happen.

But as time goes on, more and more authority is likely to be granted to AI agents, as is the case with human organizations. AI agents will have more knowledge than their human principles, and will be able to react much more quickly to their surrounding environment.

Other People Are Not As Worried About AI Killing Everyone

The Too Open Model

A Good Book

Here are additional endorsements for ‘If Anyone Builds It, Everyone Dies,’ by someone definitely not coming in (or leaving) fully convinced, with more at this link.

The Lighter Side


Edited:    |       |    Search Twitter for discussion