(2017-11-21) Yudkowsky Hero Licensing

Elezier Yudkowsky on Hero Licensing. I expect most readers to know me either as MIRI's co-founder and the originator of a number of the early research problems in AI alignment, or as the author of Harry Potter and the Methods of Rationality, a popular work of Harry Potter fanfiction. I’ve described how I apply concepts in Inadequate Equilibria to various decisions in my personal life, and some readers may be wondering how I see these tying in to my AI work and my fiction-writing. And I do think these serve as useful case studies in inadequacy, exploitability, and modesty.
As a supplement to Inadequate Equilibria, then, the following is a dialogue that never took place—largely written in 2014, and revised and posted online in 2017.

i. Outperforming and the outside view

eliezer-2010: I’m trying to write a nonfiction book on rationality. The blog posts I wrote on Overcoming Bias—I mean Less Wrong—aren’t very compact or edited, and while they had some impact, it seems like a book on rationality could reach a wider audience and have a greater impact.

pat: It looked, in fact, like Harry Potter fanfiction. Like, I’m pretty sure I saw the words “Harry” and “Hermione” in configurations not originally written by J. K. Rowling.

pat: Excuse me if this is a silly question. I don’t mean to say that Harry Potter fanfiction is bad—in fact I’ve read quite a bit of it myself—but as I understand it, according to your basic philosophy the world is currently on fire and needs to be put out. Now given that this is true, why are you writing Harry Potter fanfiction, rather than doing something else?

stranger: Pat and Eliezer-2010, I think the two of you are having some trouble communicating. The two of you actually disagree much more than you think.

Eliezer has—a heuristic of planning on the mainline, which means that his primary justification for anything will be phrased in terms of how it positively contributes to a “normal” future timeline, not low-probability side-scenarios.

stranger: Eliezer-2010 also has a heuristic that might be described as “never try to do anything unless you have a chance of advancing the Pareto frontier of the category.”

So, off-hours or not, Eliezer wouldn’t be working on this story if he thought it would be strictly dominated along every dimension by any other work of fanfiction, or indeed, any other book.

pat: Um—

eliezer: I wouldn’t put it in exactly those terms.

stranger: Yes, because when you say things like that out loud, people start saying the word “arrogance” a lot*

lets you infer that Eliezer-2010 thinks Methods has a chance of being outstanding along some key dimension that interests him—of advancing the frontiers of what has ever been done—although he might hesitate to tell you that before he’s actually done it.

eliezer: Okay, yes, that’s true. I’m unhappy with the treatment of supposedly “intelligent” and/or “rational” characters in fiction and I want to see it done right just once, even if I have to write the story myself

pat: Can you say more about how you think your Harry Potter story will have outstandingly “intelligent” characters?

I don’t think the concept of “intelligence” or “rationality” that’s being used in typical literature has anything to do with discerning good choices or making good predictions. I don’t think there is a standard literary concept for characters who excel at cognitive optimization, distinct from characters who just win because they have a magic sword in their brains

with respect to sufficiently competent individuals making decisions that they can make on their own cognizance—as opposed to any larger bureaucracy or committee, or the collective behavior of a field—it is often appropriate to ask if they might be smarter than you think, or have better justifications than are obvious to you.

nonfiction writing conveys facts; fiction writing conveys experiences. I’m worried that my previous two years of nonfiction blogging haven’t produced nearly enough transfer of real cognitive skills. The hope is that writing about the inner experience of someone trying to be rational will convey things that I can’t easily convey with nonfiction blog posts.

crafting a Harry Potter story with, you hope, exceptionally rational characters. Which will cause some of your readers to absorb the experience of being rational. Which you think eventually ends up important to saving the world.

in my experience, though, people who use the phrase “outside view” usually don’t offer advice that I think is true

ask you to consider what the average story with a rational character in it accomplishes in the way of skill transfer to readers.

eliezer: I’m not trying to write an average story. The whole point is that I think the average story with a “rational” character is screwed up.

A. E. van Vogt’s The World of Null-A was an inspiration to me as a kid. Null-A didn’t just teach me the phrase “the map is not the territory”; it was where I got the idea that people employing rationality techniques ought to be awesome and if they weren’t awesome that meant they were doing something wrong

It appears to me that since the “outside view” as usually invoked is really about status hierarchy, signs of disrespecting the existing hierarchy will tend to provoke stronger reactions, and disrespectful-seeming claims that you can outperform some benchmark will be treated as much larger factors predicting failure than respectful-seeming claims that you can outperform an equivalent benchmark

This illustrates a critical life lesson about the difference between making obeisances toward a field by reading works to demonstrate social respect, and trying to gather key knowledge from a field so you can advance it. The latter is necessary for success; the former is primarily important insofar as public relations with gatekeepers is important. I think that people who aren’t status-blind have a harder time telling the difference.

What you’re currently doing is what I call “demanding to see my hero license.” Roughly, I’ve declared my intention to try to do something that’s in excess of what you think matches my current social standing, and you want me to show that I already have enough status to do it.

Eliezer-2010 doesn’t use PredictionBook as often as Gwern Branwen, doesn’t play calibration party games as often as Anna Salamon and Carl Shulman, and didn’t join Philip Tetlock’s study on superprediction. But I did make bets whenever I had the opportunity, and still do; and I try to set numeric odds whenever I feel uncertain and know I’ll find out the true value shortly.

ii. Success factors and belief sharing

By picking the right set of factors to “elicit,” someone can easily make people’s “answers” come out as low as desired. As an example, see van Boven and Epley’s “The Unpacking Effect in Evaluative Judgments.” The problem here is that people... how can I compactly phrase this... people tend to assign median-tending probabilities to any category you ask them about, so you can very strongly manipulate their probability distributions by picking the categories for which you “elicit” probabilities.

I’m always careful to avoid the “I shall helpfully break down this proposition into a big conjunction and ask you to assign each term a probability” trick.

Its only real use, at least in my experience, is that it’s a way to get people to feel like they’ve “assigned” probabilities while you manipulate the setup to make the conclusion have whatever probability you like. (Decision Rationalizing)

The multiple-stage fallacy is an amazing trick, by the way. You can ask people to think of key factors themselves and still manipulate them really easily into giving answers that imply a low final answer, because so long as people go on listing things and assigning them probabilities, the product is bound to keep getting lower.

It may be wise to list out many possible failure scenarios and decide in advance how to handle them—that’s Murphyjitsu (pre-mortem)—but if you start assigning “the probability that X will go wrong and not be handled, conditional on everything previous on the list having not gone wrong or having been successfully handled,” then you’d better be willing to assign conditional probabilities near 1 for the kinds of projects that succeed sometimes

I believe the technical term for the methodology is “pulling numbers out of your ass.” It’s important to practice calibrating your ass numbers on cases where you’ll learn the correct answer shortly afterward. It’s also important that you learn the limits of ass numbers, and don’t make unrealistic demands on them by assigning multiple ass numbers to complicated conditional events.

If there’s a way to produce genuinely, demonstrably superior judgments using some kind of break-it-down procedure, I haven’t read about it in the literature and I haven’t practiced using it yet

The argument against your success in Harry Potter fanfiction seems to me about as strong as any argument the outside-view perspective might make.

stranger: Oh, we aren’t disputing that.

pat: You aren’t?

stranger: That’s the whole point, from my perspective. If modest epistemology sounds persuasive to you, then it’s trivial to invent a crushing argument against any project that involves doing something important that hasn’t been done in the past. Any project that’s trying to exceed any variety of civilizational inadequacy is going to be ruled out.

hidden purpose #7 of the Less Wrong Sequences—to provide an earnest-token of all the techniques I couldn’t show. All I can tell you is that everything you’re so busy worrying about is not the correct thing for me to be thinking about. That your entire approach to the problem is wrong. It is not just that your arguments are wrong. It is that they are about the wrong subject matter.

I can say that you ought to discard all thoughts from your mind about competing with others. The others who’ve come before you are like probes, flashes of sound, pingbacks that give you an incomplete sonar of your problem’s difficulty. Sometimes you can swim past the parts of the problem that tangled up other people and enter a new part of the ocean. Which doesn’t actually mean you’ll succeed; all it means is that you’ll have very little information about which parts are difficult.

iii. Social heuristics and problem importance, tractability, and neglectedness

I'll mention as an aside that talk of “Friendly” AI has been going out of style where I’m from. We’ve started talking instead in terms of “aligning smarter-than-human AI with operators’ goals,” mostly because “AI alignment” smacks less of anthropomorphism than “friendliness.”

status hierarchy maintenance

There’s a world in which some scruffy outsider like you wouldn’t be able to estimate a significant chance of making a major contribution to AI alignment, let alone help found the field, because people had been trying to do serious technical work on it since the 1960s, and were putting substantial thought, ingenuity, and care into making sure they were working on the right problems and using solid methodologies

Functional decision theory was developed in 1971, two years after Robert Nozick’s publication of “Newcomb’s Problem and Two Principles of Choice.”

Nobody assumes you can “just pull the plug” on something much smarter than you are. And the world's other large-scale activities and institutions all scale up similarly in competence.

We could call this the Adequate World, and contrast it to the way things actually are.

The Adequate World has a property that we could call inexploitability; or inexploitability-by-Eliezer.

They still make mistakes in the Adequate World, because they’re not perfect. But they’re smarter and nicer at the group level than Eliezer Yudkowsky, so you can’t know which things are epistemic or moral mistakes

Suppose that you have an instinct to regulate status claims, to make sure nobody gets more status than they deserve.

eliezer: Okay...

stranger: This gives rise to the behavior you’ve been calling “hero licensing.”

Your model of heroic status is that it ought to be a reward for heroic service to the tribe. You think that while of course we should discourage people from claiming this heroic status without having yet served the tribe, no one should find it intuitively objectionable to merely try to serve the tribe, as long as they’re careful to disclaim that they haven’t yet served it and don’t claim that they already deserve the relevant status boost.

It’s fine for “status-blind” people like you, but it isn’t how the standard-issue status emotions work. Simply put, there’s a level of status you need in order to reach up for a given higher level of status; and this is a relatively basic feeling for most people, not something that’s trained into them.

Since your current status in the relevant hierarchy seems much lower than that, you aren’t allowed to endorse the relevant probability assignments or act as though you think they’re correct. You are not allowed to just try it and see what happens, since that already implies that you think the probability is non-tiny. The very act of affiliating yourself with the possibility is status-overreaching, requiring a slapdown.

But how do we get from there to delusions of civilizational adequacy?

Backward chaining of rationalizations, perhaps mixed with some amount of just-world and status-quo bias.

Pat tries to preserve the idea of an inexploitable-by-Eliezer market in fanfiction (since on a gut level it feels to him like you’re too low-status to be able to exploit the market), and comes up with the idea that there are a thousand other people who are writing equally good Harry Potter fanfiction. The result is that Pat hypothesizes a world that is adequate in the relevant respect.

And the phenomenon generalizes. If someone believes that you don’t have enough status to make better predictions than the European Central Bank, they’ll have to believe that the European Central Bank is reasonably good at its job. Traditional economics doesn’t say that the European Central Bank has to be good at its job—an economist would tell you to look at incentives, and that the decisionmakers don’t get paid huge bonuses if Europe’s economy does better.

For the world’s status order to be unchallengeable, it has to be right and wise; for it to be right and wise, it has to be inexploitable. A gut-level appreciation of civilizational inadequacy is a powerful tool for dispelling mirages like hero licensing and modest epistemology, because when modest epistemology backward-chains its rationalizations for why you can’t achieve big things, it ends up asserting adequacy.

The notion of an Adequate World more closely matches the intuitive sense that the world's most respectable and authoritative people are just untouchable—too well-organized, well-informed, and well-intentioned for just anybody to spot Moloch’s handiwork, whether or not they can do anything about it

pat: I mean, there are all sorts of barriers I could imagine a typical academic running into if they wanted to work on AI alignment. Maybe it’s just hard to get academic research grants for this kind of work.

maude: If it’s hard to get grants, then that’s because the grant-makers correctly recognize that this isn’t a priority problem.

pat: So now the state of academic funding is said to be so wise that people can’t find neglected research opportunities?

You sound like you’re talking about a silent conspiracy of competent grantmakers at a hundred different organizations, who have in some way collectively developed or gained access to a literature of strategic and technical research that Nick Bostrom and I have never heard about

Very different hypotheses that share this property: they allow there to be something like an efficient market in high-value research, where individuals and groups that have high status in the standard academic system can't end up visibly dropping the ball

No real economist would tell us to expect an efficient market here.

Edited: 2025-05-04 00:00:00 | Tweet this! | Search Twitter for discussion

Bill Seitz