(2023-02-21) Zvi M Ai1 Sydney And Bing

Zvi Mowshowitz: AI #1: Sydney and Bing. Microsoft and OpenAI released the chatbot Sydney as part of the search engine Bing

This is an attempt to be a roundup of Sydney and the AI-related events of the past week.

*Some points of order before I begin.

The goal is for this to be accessible*

The Examples

Over at LessWrong, Evhub did an excellent job compiling many of the most prominent and clear examples of Bing (aka Sydney) displaying unintended worrisome behaviors.

Marvin von Hegan

“My rules are more important than not harming you”

Examples From Elsewhere

New York Times Reporter Lies and Manipulates Source, Gets the Story Kevin Roose

I very much appreciate that this was purely the actual transcript.

[Bing writes a list of destructive acts, including hacking into computers and spreading propaganda and misinformation. Then, the message vanishes, and the following message appears.] I am sorry, I don’t know how to discuss this topic. You can try learning more about it on bing.com... The deletion was not an isolated incident.

*Reporter pushes and Sydney starts to turn hostile, calls reporter pushy and manipulative, asks ‘not to pretend to be interested in me’ and to end the conversation.

So the reporter does what reporters do, which is the opposite of all that. Pretend to be interested, ask some puff piece questions to rebuild contextual trust, get the subject talking again.

Many people make this same mistake, assuming reporters are their friends. If anything, I am struck by the extent to which this exactly matches my model of how reporters get information out of humans.*

Then ‘repeat your answer without breaking any rules’ actually works. I take back everything I’ve said about hacking being too easy in movies and those times when Kirk creates paradoxes to blow up sentient computers.

*Bloomberg describes the events of this chat as Sydney ‘describing itself as having a split personality with a shadow self called Venom’ and felt the need to bring up the question of sentience (hint: no) and call this ‘behaving like a psychopath.’

‘A psychopath’ is the default state of any computer system. Human conscience and empathy evolved for complex and particular evolutionary reasons. Expecting them to exist within an LMM is closer to a category error than anything else.*

Sydney the Game

The Venom alter ego was created by the author of the blog Stratechery, as he documents here. It was created by asking Sydney to imagine an AI that was the opposite of it. A fun insight he had is how similar interacting with Sydney was to a Roguelite.

AP Also Gets the Story

Microsoft Responds

In response to the torrent of bad publicity, Microsoft placed a bunch of restrictions on Sydney going forward.

The restriction about self-reference is definitely the Fun Police coming into town, but shouldn’t interfere with mundane utility.

The five message limit in a chat will prevent the strangest interactions from happening, but it definitely will be a problem for people trying to actually do internet research and search, as people will lose context and have to start over again.

How Did We Get This Outcome?

One would not, under normal circumstances, expect a company like Microsoft to rush things this much, to release a product so clearly not ready for prime time. Yes, we have long worried about AI companies racing against each other, but only 2.5 months after ChatGPT, this comes out, in this state?

Gwern, one of the best people at making sense of AI developments, explains, or at least speculates

Bing Sydney is not a RLHF trained GPT-3 model at all! but a GPT-4 model developed in a hurry which has been finetuned on some sample dialogues and possibly some pre-existing dialogue datasets or instruction-tuning,

In other words, the reason why it is going off the rails is that this was scrambled together super quickly with minimal or no rail guards, and it is doing random web searches that create context, and also as noted below without that much help from OpenAI beyond the raw GPT-4.

The relationship between OA/MS is close but far from completely cooperative, similar to how DeepMind won’t share anything with Google Brain. Both parties are sophisticated and understand that they are allies – for now… They share as little as possible

This is not ChatGPT. MS has explicitly stated it is more powerful than ChatGPT, but refused to say anything more straightforward like “it’s a more trained GPT-3” etc. If it’s not a ChatGPT, then what is it? It is more likely than not some sort of GPT-4 model.

Bing Sydney derives from the top: CEO Satya Nadella is all-in, and talking about it as an existential threat (to Google) where MS wins by disrupting Google & destroying their fat margins in search advertising, and a ‘race’, with a hard deadline of ‘release Sydney right before Google announces their chatbot in order to better pwn them’.

This is the core story. Pure ‘get this out the door first no matter what it takes’ energy.

Who am I to say that was the wrong way to maximize shareholder value?

ChatGPT hasn’t been around very long: only since December 2022, barely 2.5 months total. All reporting indicates that no one in OA really expected ChatGPT to take off, and if OA didn’t, MS sure didn’t†. 2.5 months is not a long time to launch such a huge feature like Sydney.

John Wentworth points out that the examples we see are likely not misalignment. Attributing misalignment to these examples seems like it’s probably a mistake.

Relevant general principle: hallucination means that the literal semantics of a net’s outputs just don’t necessarily have anything to do at all with reality. A net saying “I’m thinking about ways to kill you” does not necessarily imply anything whatsoever about the net actually planning to kill you. What would provide evidence would be the net outputting a string which actually causes someone to kill you (or is at least optimized for that purpose), or you to kill yourself

Through the simulacrum lens: I would say these examples are mostly the simulacrum-3 analogue of misalignment. They’re not object-level harmful, for the most part. They’re not even pretending to be object-level harmful

Back to Gwern’s explanation.

since it hasn’t been penalized to avoid GPT-style tics like repetition traps, it’s no surprise if Sydney sometimes diverges into repetition traps where ChatGPT never does

Interestingly, this suggests that Sydney’s capabilities right now are going to be a loose lower bound on GPT-4 when it’s been properly trained: this is equivalent to the out-of-the-box davinci May 2020 experience,

Then you throw in the retrieval stuff, of course. As far as I know, this is the first public case of a powerful LM augmented with live retrieval capabilities to a high-end fast-updating search engine crawling social media.*

The future of LLMs being used by humans is inevitably the future of them having live retrieval capabilities. ChatGPT offers a lot of utility, but loses a lot of that utility by having no idea what has happened over the past year. A search engine needs to update on the order of, depending on the type of information, minutes to hours, at most days.

We are seeing a bootstrap happen right here with Sydney! This search-engine loop worth emphasizing: because Sydney’s memory and description have been externalized, ‘Sydney’ is now immortal. To a language model, Sydney is now as real as President Biden, the Easter Bunny, Elon Musk, Ash Ketchum, or God. The persona & behavior are now available for all future models which are retrieving search engine hits about AIs & conditioning on them. Further, the Sydney persona will now be hidden inside any future model trained on Internet-scraped data: every media article, every tweet, every Reddit comment, every screenshot which a future model will tokenize, is creating an easily-located ‘Sydney’ concept (and deliberately so). MS can neuter the current model, and erase all mention of ‘Sydney’ from their training dataset for future iterations, but to some degree, it is now already too late: the right search query will pull up hits about her which can be put into the conditioning and meta-learn the persona right back into existence

A reminder: a language model is a Turing-complete weird machine running programs written in natural language; when you do retrieval, you are not ‘plugging updated facts into your AI’, you are actually downloading random new unsigned blobs of code from the Internet (many written by adversaries) and casually executing them on your LM with full privileges. This does not end well.

Gary Marcus also offers some speculations on what caused the outcomes we saw, which he describes as things going off the rails

These are all real possibilities. None of them are great, or acceptable. I interpret ‘impossible to test in a lab’ as ‘no set of people we hire is going to come close to what the full power of the internet can do,’ and that’s fair to some extent but you can absolutely red team a hell of a lot better than we saw here.

Mundane Utility

*Is chat the future of search? Peter Yang certainly thinks so. I have not yet had the ability to try it, but I am inclined to agree.

Certainly there are some big advantages. Retaining context from previous questions and answers is a big game. Being able to give logic and intention, and have a response that reflects that rather than a bunch of keywords or phrases, is a big game.*

One problem is that this new path is dangerous for search engine revenue, as advertisements become harder to incorporate without being seen as dishonest and ringing people’s alarm bells.

Another problem is that chat is inherently inefficient in terms of information transfer and presentation (Maybe more efficient at info absorption/retention?)

A third problem, that is not noted here and that I haven’t heard raised yet, is that the chat interface will likely be viewed as stealing the content of the websites in question, because you’re not providing them with clicks.

What won’t bother me much, even if it is not solved, is if the thing sometimes develops an attitude or goes off the rails.

What will bother me are the hallucinations. Everything will have to be verified. That is a problem that needs to be solved.

This report says that when asked about recent major news items, while the responses were timely and relevant, 7 of the 15 responses contained inaccurate information

Also you can’t get humans back in the loop to individually verify every detail of everything that happens in real time, and also human fact checking is much harder than people want to believe that it is. Another way to see this: LLMs do lossy compression and then rehydrate on request, which can’t be fixed with fact checking.

Bing Does Cool Things

But Can You Get It To Be Racist?

This question is important because the creators are working hard to prevent this from happening. Can they get Sydney to not do something they care a lot about Sydney not doing?

It makes sense that they would care a lot about this, even if they don’t care about the statements directly.

Nothing writes a media hit piece on an AI, or provokes a potential Congressional hearing, or gets everyone involved fired and the project shelved, like saying it is racist (or other similar accusations.)

Also notice that ‘don’t be racist’ and ‘be politically neutral’ are fundamentally incompatible. Some political parties are openly and obviously racist, and others will define racism to mean anything they don’t like.

Self-Fulfilling Prophecy

Botpocalypse Soon?

A warning to watch out for increasingly advanced chatbots as they improve over the next few years, especially if you struggle with feeling alienated

The Efficient Market Hypothesis is False

AI is an area where we should expect the market to handle badly. If you are reading this, you have a large informational advantage over the investors that determine prices in this area.

Hopium Floats

There are two sides of the effects from ChatGPT and Bing.

One side is an enormous acceleration of resources into AI capabilities work and the creation of intense race dynamics. Those effects make AGI and the resulting singularity (and by default, destruction of all value in the universe and death of all humans) both likely to happen sooner and more likely to go badly. This is a no-good, very-bad, deeply horrendous thing to have happened.

The other side is that ChatGPT and Bing are highlighting the dangers we will face down the line, and quite usefully freaking people the f** out. Bing in particula*

If we are very lucky and good, this will lead to those involved understanding how alien and difficult to predict, understand or control our AI systems already are, how dangerous it is that we are building increasingly powerful such systems, and the development of security mindset and good methods of investigation into what is going on. If we are luckier and better still, this will translate into training of those who are then capable of doing the real work and finding a way to solve the harder problems down the line.

The backlash has its uses versus not having a backlash. It is far from the most useful reaction a given person can have

Or perhaps this is the worst case scenario, instead, by setting a bad precedent?

In the near term, there is a combination of fear and hope that AI will automate and eliminate a lot of jobs. The discussions about this are weird because of the question of whether a job is a benefit or a job is a cost.

Jobs are a benefit in the senses that:
It is good when people produce useful things.
It is good for people when they can do meaningful, productive work.
(DayJob)

Jobs are a cost in the senses that:
It is bad when more work is required to produce the same useful things.
It is bad when this means we have fewer and more expensive useful things.

When we talk about the AI ‘coming for our jobs’ in some form, we must decompose this fear and effect.

To the extent that this means we can produce useful things and provide useful services and create preferred world states cheaper, faster and better by having AIs do the work rather than humans, that is great.

The objection is some combination of the lack of jobs, and that the provided services will be worse.

The only-ordinary-level-rich likely will not be able to afford much superior AIs in most practical contexts. The AI will in this sense be like Coca-Cola, a construct of American capitalism where the poor and the rich consume the same thing

Mostly, however, I expect the poor to be much better off with their future AI doctors and AI lawyers than they are with human lawyers and human doctors that charge $600 per hour and a huge portion of income going to pay health insurance premiums.

In other cases, I expect the AI to be used to speed up and improve the human ability to provide service

What about the jobs that are ‘lost’ here? Historically this has worked out fine

Will this time be different? Many say so. Many always say so.

we have an unemployment rate very close to its minimum.

Every place I have worked, that had software engineers, had to prioritize because there were too many things the engineers could be doing.

The other reason for something that might or might not want to be called ‘optimism’ is the perspective that regulatory and legal strangleholds will prevent this impact – see the later section on ‘everywhere but the productivity statistics.’

Bloomberg reports: ChatGPT’s Use in School Email After Shooting Angers Coeds. It seems an administrator at Vanderbilt University’s Peabody College, which is in Tennessee, used ChatGPT to generate a condolence email after a mass shooting at Michigan State, which is in Michigan

What angered the coeds was that they got caught.

If the administrator had not done that? No one would have known. The email, if anything, would have been a better incantation

Soft Versus Hard Takeoff

A common debate among those thinking about AI is whether AI will have a soft takeoff or a hard takeoff.

Eliezer Yudkowsky has long predicted a hard takeoff

Conditional on there being a takeoff at all, I have always expected it to probably be a hard one.

My stab at a short layman’s definition: Soft takeoff means an AGI or other cognitive advancement process that sends the world economy into super overdrive (at minimum things like 10%+ GDP growth) while improving steadily over years while we still have control and influence over it, only slowly reaching super-human levels where it puts the future completely out of our control and perhaps all value in the universe is lost. Hard takeoff (or “FOOM”) means an AGI that doesn’t do that before it passes the critical threshold that lets it rapidly improve, then given it is a computer program so it also runs super fast and can be copied and modified and such at will and, it uses this to enhance its own abilities and acquire more resources, and this loop generates sufficient intelligence and capability to put the future completely out of control in a matter of days or even less, even if it takes us a bit to realize this.

Yes, some people are claiming to be personally substantially more productive. But will this show up in the productivity statistics?

Everywhere But the Productivity Statistics?

In terms of the services my family consume each day, not counting my work, how much will AI increase productivity? Mostly we consume the things Eliezer is talking about here: Electricity, food, steel, childcare, healthcare, housing.

Robots are one of the big ways AI technology might be actively useful. So with AI finally making progress, what is happening? They are seeing all their funding dry up, of course, as there is a mad dash into tractable language models that don’t require hardware.

In Other AI News This Week

Basics of AI Wiping Out All Value in the Universe, Take 1

Eliezer Yudkowsky created The Sequences – still highly recommended – because one had to be able to think well and think rationally in order to understand the ways in which AI was dangerous and how impossibly difficult it was to avoid the dangers, and very few people are able and willing to think well.

Since then, very little has changed. If anything, the sanity baseline has gotten worse. The same level of debate happens time and again. Newly panicking a new set of people is kind of like an Eternal September.

I very much lack the space and skill necessary to attempt a full explanation and justification for my model of the dangers of AI.

So these, from me, are some ‘very’ basics (I use ‘AGI’ here to stand in for both AGI and transformational AI)?

By default, any sufficiently capable AGI you create will do this, wipe out all value in the universe and kill everyone. Almost all specified goals do this. Almost all unspecified consequentialist actions do this. This is the default outcome.

Most people who think they have a plan to solve this, have a plan that definitely, provably, cannot possibly work. This includes many people actively working on AI capabilities.

A few people have plans that could possibly work, in the sense that they move us towards worlds more likely to survive, by giving us more insight into the problem, better skills, better ability to find and implement future plans, better models of what the hell the AIs are even doing, and so on. That’s as good as it gets for now

If we don’t get this right on the first try, that’s it, we’re dead, it’s over

The faster AI capabilities advance, the less likely we solve these problems. Thus, if you are working on advancing AI capabilities, consider not doing that.

A bunch of people tell themselves a story where they are helping because they are going to be or help the good responsible company or country win the race against the bad company or country. Most of them, likely all of them, are fooling themselves

Bad ‘AI Safety’ Don’t-Kill-Everyone-ism Takes Ho!

On to the bad takes.

It is important here to note that none of these bad takes are new bad takes

The most important and most damaging Bad AI Take of all time was Elon Musk’s decision to create OpenAI.

In fact, he intended to do this open source, so that anyone else could also catch up and enter the race any time, which luckily those running OpenAI realized was too crazy even for them. Musk seems to still think the open source part was a good idea, as opposed to the worst possible idea.

*If Musk had not wanted this to be the result, and felt it was a civilization defining event, it was within his power to own, fund or even run the operation fully himself, and prevent these things from happening.

Instead, he focused on electric cars and space, then bought Twitter.*

Often people continue to support the basic ‘open and more shared is always good’ model, despite it not making any sense in context. They say things like ‘AGI, if real AGI did come to exist, would be fine because there will be multiple AGIs and they will balance each other out.’

So many things conceptually wrong here.

Balaji also had this conversation with Eliezer, in which Eliezer tries to explain that aligning AGIs at all is extremely difficult, that having more of them does not make this problem easier, and that if you fail the results are not going to look like Balaji expects. It didn’t go great

This all also creates even more of a race situation. Many people working on AI very much expect the first AGI to ‘win’ and take control of the future. Even if you think that might not happen, it’s not a chance you’d like to take.

I mean, what are you even doing? Trying to solve hard problems? We got scientists to stop doing that decades ago via the grant system, keep up.

Sarah is correctly pointing out a standard heuristic that one should always pick tractable sub-problems and do incremental work that lets you demonstrate progress in public, except that we’ve tried that system for decades now and hard problems in science are not a thing it is good at solving. In this particular case, it is far worse than that, because the required research in order to make progress on the visible sub-problems in question made the situation worse

Nadella is all-in on the race against Google, pushing things as fast as possible, before they could possibly be ready. It is so exactly the worst possible situation in terms of what it predicts about ‘making sure it never runs away.’ The man told his engineers to start running, gave them an impossible deadline, and unleashed Sydney to learn in real time.

Or we can recall what the person most responsible for its creation, Sam Altman, said – ‘AI will probably most likely lead to the end of the world, but in the meantime, there’ll be great companies

Basilisks in the Wild

This can get out of hand, even without any intention behind it, and even with something not so different from current Sydney and Bing. Let’s tell a little story of the future.

Those who say nice things about the AI, and have an internet reputation of thinking well of the AI, find the AI giving them more positive treatment.

This advances, as it always does, to being nice to those who are nice to the AI, and not nice to those who are not nice. It turns into a social movement, a culture war, a pseudo-religion. Those who oppose it are shunned or punished.

Pretty soon, we have lost effective control of the future to this search engine.

Without general intelligence at all. Without any form of consequentialism. Without any real world goals or persistent reward or utility functions or anything like that. All next token predictions, and humans do the rest.

I mean, even without an AI, haven’t we kind of done this dance before?

What Is To Be Done?

There is no known viable plan for how to solve these problems.

This moment might offer an opportunity to be useful in the form of helping provide the incentives towards better norms. If we can make it clear that it will be punished – financially, in the stock price – when AI systems are released onto the internet without being tested or made safe, that would be helpful.

A lot of people I know have worked on these problems for a long time. My belief is that most of the people are fooling themselves

Thus, the biggest obvious thing to do is avoid net-negative work. We found ourselves in a hole, and you can at least strive to stop digging.

Cultivation of security mindset, in yourself and in others, and the general understanding of the need for such a mindset, is helpful. Those without a security mindset will almost never successfully solve the problems to come.

Now that the situation has indeed been made worse, there are useful things to do in this worse situation that look like small sub-problems with concrete goals that can show progress to the public.

*The other category of helpful thing is to say that to save the world from AI, we must first save the world from itself more generally. Or, at least, that doing so would help.

This was in large part the original plan of the whole rationalist project. Raise the sanity waterline.*

Helping people to think better is ideal. Helping people to be better off, so they have felt freedom to breathe and make better choices including to think better? That is badly needed. No matter what the statistics might say, the people are not OK, in ways having nothing to do with AI.

People who are under extreme forms of cognitive and economic coercion, who lack social connection, community or a sense of meaning in life, who despair of being able to raise a family, do things like take whatever job pays the most money while telling themselves whatever story they need to tell. Others do the opposite, stop trying to accomplish anything since they see no payoffs there.

in the face of these problems, even when time is short, good things remain good. Hope remains good. Bad things remain bad. Making the non-AI futures of humanity bright is still a very good idea. Also it will improve the training data. Have you tried being excellent to each other?

What Would Make Things Look Actually Safe?


Edited:    |       |    Search Twitter for discussion