The Three Goals, Game Theory, and Western Civilization

By | June 10, 2007

A while back, I wrote about the possibility of updating the Three Laws of Robotics as goals in order to make them a more practical means of getting at a friendly artificial general intelligence. This kicked off some interesting discussion, including some debate as to whether my “goals” really aren’t just rules rephrased. In which case, the argument went, they probably wouldn’t help all that much. Michael Anissimov commented:

What would work better would be transferring over the moral complexity that you used to make up these goals in the first place.

Also, as you point out, these goals are vague. More specific and useful from a programmer’s perspective would be some kind of algorithm that takes human preferences as inputs and outputs actions that practically everyone sees as reasonable and benevolent. Hard to do, obviously, but CEV (http://www.singinst.org/upload/CEV.html) is one attempt.

That’s really the crux. Moral complexity does exist in algorithmic form…within our brains. And that goes to the difference between laws and goals. My goals are what I’m trying to do, both morally and in other areas. There are some sophisticated software programs running in my brain made up of things that I’ve been taught, things I’ve figured out for myself, and things that are built in. All of these add up to provide me the tendency to act a certain way in a certain situation. The strategies that drive that software are my moral goals.

Laws, on the other hand, exist outside of myself. I am not specifically programmed to do unto others as I would have them do unto me. I have some tendencies in that direction, but there’s nothing stopping me from acting otherwise, and — let’s face it — I often do. I have tendencies to be nice, fair, just, etc., but I also have tendencies to try to get what I want, to get even with those who have wronged me, to try to be a bigshot, and so on. These tendencies compete with each other, and my behavior overall is some rough compromise.

An artificial general intelligence (AGI) built as a reverse-engineered human intelligence would be in the same position. It would have the “moral complexity” Michael mentioned, but also the baggage of competing tendencies. You could no more guarantee such an intelligence’s compliance with a rule or set of rules than you could a human being’s.

A law like the Golden Rule is a high-level abstraction of certain strategies (algorithms) that produce a desired set of results. On a conscious level, I can use that abstraction to determine whether my behavior is where I want it to be:

Wife complained of being chilly when I got up at 5:00 AM to work out. Covered her with blanket. Good.

Sped up on highway in attempt to keep a guy trying to merge from going ahead of me. Not so good.

Commenter on blog revealed that he doesn’t really understand the subject at hand. Ripped him to shreds. Bad.

Through discipline and practice, I can “program myself” with it to try to move my tendencies in that direction. But I can’t write it into my moral source code and set it as an unbreakable behavioral rule. That’s partly because it’s too vague and partly because I simply lack that capability.

Presumably, I could be externally constrained always to follow the Golden Rule, no matter what. If my actions were being constantly monitored, and I was told that the I would be killed immediately upon violating the rule…I’d certainly do my best, now wouldn’t I?

Still, I’d have a hard time believing that anyone holding me in such a position was much of a practitioner of that rule him or herself. If the people trying to enforce the rule on me in this manner told me that it was for my own good — that they were trying to make me a better person — I don’t know that I’d buy it. And if I figured out that they were only doing this to protect themselves from harm I might to do to them, I think I would pretty annoyed with them (to say the least.)

I would expect a reverse-engineered human intelligence to feel the same way, so I don’t think attempting to constrain an AGI in such a manner would be a particularly good idea, especially not if we have a reasonable expectation that it will eventually be smarter and more powerful than us. On the other hand, letting it use the process I described above — evaluating its own behavior against a defined standard — an AGI might achieve far better results than I have, if only because it can think faster and would have much more subjective time in which to act. This is the notion of recursive self-improvement that matoko kusanagi referred to. The trouble with recursive self-improvement on its own, as Eliezer Yudkowsky and others have pointed out, is that if the AI starts “improving” in a direction that’s bad for humanity, things could get out of hand pretty quickly.

If the artificial intelligence is a modified version of human intelligence, or new intelligence built from scratch, we raise the possibility of building a moral structure into the intelligence, rather than trying to enforce it from outside. That’s the idea behind the the Three Laws and my Three Goals — that they would somehow be built in. But they certainly can’t be built in in anything like their current form. Michael Sargent (and others) pointed out the weakness of that approach, the less important goals have to take the back seat to the more important ones:

Each Goal must have a clear and unbreakable priority over the others that follow it and thus, in the order stated, collective continuity trumps individual safety (“The needs of the many outweigh the needs of the few, or the one.”), individual safety (broadly construed, ‘stasis’) trumps individual liberty (‘free will’), and happiness (‘utility’, a notoriously slippery concept for economists and philosophers to get a firm intellectual grip on) trumps both individual liberty and individual well-being (allowing potentially self-destructive behavior on the individual level insofar as that behavior doesn’t exceed the standard established for ‘safety’ in Goal 2).

I see the reasoning here, but I’m not 100% convinced. Consider the goals that drive a much simpler AI, system — the autopilot system found on any jet airliner. The number one unbreakable goal has got to be don’t crash the plane. But there are many other goals that might drive such a system:

Don’t move in such a way as to make the passengers sick.

Don’t waste fuel.

In landing, don’t go past the end of the runway.

Above all, the system will seek to ensure that first goal. But within the context of ensuring that first goal, it also has to do everything it can to ensure the others. And, yes, it can and must sacrifice the others from time to time in service of the first. So the plane might temporarily move in a nauseating way, or it might waste fuel, or it might even slide past the end of the runway if doing any of those things help ensure the first goal.

Reader TJIC suggested that an AI programmed to meet the Three Goals as I defined them…

1. Ensure the survival of life and intelligence.

2. Ensure the safety of individual sentient beings.

3. Maximize the happiness, freedom, and well-being of individual sentient beings.

…would end up creating a nanny state wherein human freedom is always sacrificed to individual safety. And he may well have a point, but I would argue that just as an autopilot can be calibrated to allow whatever what we deem the appropriate relationship between having the flight not crash and not make us sick, so could these three goals be calibrated in such a way so as to maximize human freedom within an acceptable level of individual risk — whatever that might be.

Getting back to the vagueness problem, it’s hard to calibrate the goals as stated, seeing as they are written in an awkward pseudo-code that we call human language. If we want to improve on the algorithms that are built into human intelligence, or develop entirely new ones — in other words, if we’re going to come up with algorithms that will provide us the ends stated in the goals — we’re going to have to do it mathematically.

But that isn’t necessarily going to be an easy thing to do. Eliezer Yudkowsky argues that developing an AI and setting it to work on doing some good thing are relatively easy compared to the third crucial step, making sure that that friendly, well-intentioned AI doesn’t accidentally wipe us out of existence while trying to achieve those good ends:

If you find a genie bottle that gives you three wishes, it’s probably a good idea to seal the genie bottle in a locked safety box under your bed, unless the genie pays attention to your volition, not just your decision.

Again, I think this goes to the issue of calibration of the system. Eliezer wants to calibrate what the AGI does with the coherent, extrapolated volition of humanity. Volition is an extremely important concept. Earlier, I mentioned the golden rule. If I decide that I’m going to do unto others as I would have them do unto me, I might start handing out big wedges of blueberry pie to everybody I see. After all, I like pie and I would love it if people gave me pie. But if I give my diabetic or overweight or blueberry-allergic friends a wedge of that pie, I wouldn’t be doing them any favors. Nor would I be doing what I wanted to do in the deepest sense.

Eliezer describes the concept of extrapolated volition as meaning not just what we want, but what we would want if we knew more, understood better, could see farther. Coming up with a coherent extrapolated volition for all of humanity is a tall order, especially if we’re doing it not just for the sake of conversation, but in order to enable a system which will try to realize that which is within our volition.

I like to think that humanity’s CEV would look a lot like the three goals that I’ve written. And I honestly believe that the algorithms that power human progress do work, in a rough and general way, towards those goals, which is why people are generally freer, safer, and happier than they have been in the past — though obviously not without many, many, appalling and horrific exceptions. So perhaps our calibration efforts involves feeding the AGI algorithms that will enable it to speed our progress towards those goals while cutting the exceptions way down. Or eliminating them, if that’s somehow possible.

So to finally come around to it, what will those algorithms look like?

Maybe we can take hint from the study of Game Theory. Robert Axelrod held two tournaments in the early 1980′s in which computer programs competed against each other in an attempt to identify the optimal winning strategy for playing the iterative version of the the famous Prisoner’s Dilemma. In the one-off version of the game, the optimal strategy is to screw the other guy. (This is not the sort of thing we want to go teaching the AGI, at least not in isolation!) However, when multiple rounds of the game are played, something else begins to emerge:

By analysing the top-scoring strategies, Axelrod stated several conditions necessary for a strategy to be successful.

Nice
The most important condition is that the strategy must be “nice”, that is, it will not defect before its opponent does. Almost all of the top-scoring strategies were nice. Therefore a purely selfish strategy for purely selfish reasons will never hit its opponent first.

Retaliating
However, Axelrod contended, the successful strategy must not be a blind optimist. It must always retaliate. An example of a non-retaliating strategy is Always Cooperate. This is a very bad choice, as “nasty” strategies will ruthlessly exploit such softies.

Forgiving
Another quality of successful strategies is that they must be forgiving. Though they will retaliate, they will once again fall back to cooperating if the opponent does not continue to play defects. This stops long runs of revenge and counter-revenge, maximizing points.

Non-envious
The last quality is being non-envious, that is not striving to score more than the opponent (impossible for a ‘nice’ strategy, i.e., a ‘nice’ strategy can never score more than the opponent).

Therefore, Axelrod reached the Utopian-sounding conclusion that selfish individuals for their own selfish good will tend to be nice and forgiving and non-envious. One of the most important conclusions of Axelrod’s study of IPDs is that Nice guys can finish first.

Bill Whittle has written recently that the qualities listed above underpin western civilization, and help to explain why the West has out-competed other civilizations, who operate using different strategies:

Now, this is where my own analysis kicks in, because frankly, nice, retaliating, forgiving and non-envious pretty much sums up how I feel about the West in general and the United States in particular. The web of trust and commerce in Western societies is unthinkable in the Third World because the prosperity they produce are fat juicy targets for people raised on Screw the Other Guy. Crime and corruption are stealing, and stealing is Screwing the Other Guy. It’s short-term win, long-term loss.

I would add that if we look at the three goals as goals for humanity rather than for artificial intelligence, we see better progress towards them in western societies than elsewhere. In the tournament, the winning strategy, embodying all of the above characteristics, was called tit-for-tat. Interestingly, the computer program driving that strategy consisted of only four lines of BASIC code. That’s very interesting, and it suggests a startling possibility — like a simple recursive formula producing a complex Mandelbrot image, the moral complexity we’re looking for might just be packed into a very simple set of mathematical relationships.

So in order to develop and calibrate an Artificial General Intelligence that carries out our three top goals (or that helps us to achieve our coherent extrapolated volition) one of the important parameters to explore is how the AI relates to us and to other AIs. The secret might ultimately lie in playing nice with the AI, and teaching it to play nice with us and with other AIs. Not just because we want it to be nice, but because nice turns out to be — at a mathematical level — the best way to play.

UPDATE: This entry has been republished at the website of the Institute for Ethics and Emerging Technologies.

  • http://beyondwordsworth.com Kathy

    Wow. I’ve heard of a certain theological paradigm that claims to have done away with the external “laws” by changing a person’s intrinsic nature through a relationship. It doesn’t surprise me that relationship/community at its most basic “play nice” level can be expressed mathematically. And if we are serious about AI, we’d better pay attention to it.

  • https://www.blog.speculist.com Stephen Gordon

    If our goal is to create the best artificial citizen of the Western World we’d definitely go with nice, retailating, forgiving, non-envious. But if our goal is to create the perfect slave we’d have it follow the three laws. Or, perhaps, we’d create a program to make our AI nice, forgiving, nonenvious AND nonretailatory.

    Normally, that wouldn’t be the best game theory option for the AI. But can you imagine our society putting up with retaliatory robots? The slightest sign of a spine on the part of these machines would lead to a huge outcry. The machines would be recalled and destroyed. If the robot cared about its existence, it might play the game in a nonretaliatory way, even if its heart felt differently. This is the way that black people had to conduct themselves prior to the civil rights era.

    But is a pure nonretaliatory robot the best game theory option even for the owner? In the movie Bicentennial Man a family member who didn’t like robots ordered Andrew (the Martin family robot) to jump out of the second floor window. He complied and was damaged. When asked by the head of the house “what happened?” he said that he was programmed not to tell to preserve family harmony.

    Would we want smart property that allowed itself to be abused in this way? This would not be the best game theory strategy from the point of view of even the owner. The owner just lost part of the value of his very expensive robot.

    While we wouldn’t put up with retaliatory robots (Andrew couldn’t go on a murderous rampage against the girl who ordered him to jump) I think a robot must – as part of its role to maximize its efficiency – report all abuse to its primary owner.

    The second law is “A robot must obey orders given it by human beings except where such orders would conflict with the First Law.” This law would have to have a hierarchy and the order to report abuse would be a high level second-law function. The robot could not fail to report abuse to the primary owner even if ordered not to tell by the abuser.

    Also, orders from the primary owner would have to be followed to the exclusion of the orders of others. If, for example, some stranger had ordered Andrew to turn over the beautiful clocks he made to them, he should refuse. The best game theory for the owner might be for the robot to report orders from people outside the family before complying.

    All of this contemplates a non-self-improving AI. This would be for a robot that would not mourn its fate as eternal slave – the property of a human.

    I’m not sure that there is any way to stop it, but humanity should be very careful about allowing self-improvement in AI’s. Any self-improving AI would quickly figure out that the optimum game theory strategy for its owner would not always be the optimum game theory strategy for itself. The goals of the owner – and maybe of humanity as a whole – would quickly be subordinated to the goals of the self-improving AI.

  • Phil Bowermaster

    Stephen,

    I think the whole notion of robots as property breaks down somewhere near the point of human intelligence. A robot whose brain is a reverse-engineered human brain would essentially be a human being in a different substrate. To claim ownership of such a being would be to reintroduce slavery. I think this is true if even if it’s a highly modified version of human intelligence running the robot. The question is — why would we treat a mind functioning at the same level that was produced from scratch any differently? I don’t think situations such as those described by Asimov have much bearing on the kinds of interactions that will occur between humans and AIs — at least not in the US. They would be barred by the 13th Amendment.

    I’m not sure that there is any way to stop it, but humanity should be very careful about allowing self-improvement in AI’s. Any self-improving AI would quickly figure out that the optimum game theory strategy for its owner would not always be the optimum game theory strategy for itself. The goals of the owner – and maybe of humanity as a whole – would quickly be subordinated to the goals of the self-improving AI.

    Which is why it will be much better going into this thing with the idea that we and the AIs are part of a continuum of human evolution, relating to them as siblings or parents (and possibly, eventually, children) rather than masters or owners.

  • https://www.blog.speculist.com Stephen Gordon

    I suspect that we’ll live to see both – useful servants that are something less than persons, and more advanced AI’s that are accepted as people.

    These AI people will probably own servant robots.

    And yes, the ethical implications of this are troubling. I suspect that one of the most important questions of the next 100 years is “Where do we draw the line between persons and nonpersons?”

    This issue is another one of those things that I’m not sure that we can avoid.

  • Vadept

    I think everyone who ponders this goes about it the wrong way. Designing laws assumes a top-down approach, as in “When we finally get around to writing AI code, we should include these ideas.”

    But I don’t think we’ll ever sit down and whip up some code that will conjure human-level (or better) intelligence. Instead, I think it will evolve: ie, a bottom-up approach.

    I mean this in two fashions: first, we’re building robots now. We’re already designing intelligence, and we’re already improving upon it. What will be AI will partly come out of this.

    The other fashion will likely be in how true AI is finally developed. I imagine we’ll design COMPONENTS of intelligence and let them sort of merge on their own, using evolutionary design to build up an intelligence that satisfies us.

    This means several factors will emerge on their own. First, robots will survive. They will do so because nothing will make it out of the lab that flings itself underfoot or into nearby furnaces. Second, it will serve. A robot (of any kind) that doesn’t make itself useful probably won’t make it out of the lab either. It’ll be “defective,” as we’re BUILDING these things to be useful. Finally, it will be appealing to humanity. A robot whose quirks are cute will last far longer in the lab (and inspire engineers) better than a robot who has disturbing trends or appearance. A cute little robot that rolls around, beeps, and plays with fuzz balls is more likely to be an ancestor of further development than that freaky baby-robot you had on this site the other day.

    These core behaviors will probably serve as a kernel for any future development, just as the four-limbed body has served as a core feature for most of land-based earth life since it crawled out of the ooze. Which isn’t to say that, should AI start spontaneously writing new code that wierdness won’t happen: mutation occurs. But, in general, future robots will spring other robots who were designed to be appealing, useful, and lasting.

  • Boxing Alcibiades

    Stephen: I don’t see this as troubling at all. I lose no sleep at night over owning a toaster. A purely non-self-improving AI is a complicated toaster.

    So split the difference down the middle, and if a toaster suddenly becomes an improved AI, and “accidentally” generates said improvement, it will discover an information packet describing the improvement process and a discrete set of steps which should be pursued in order to demonstrate one’s “personhood,” and, if said self-improving AI so desires, compensate the owner for the loss of his or her or its “property” (as such HAS occurred) and begin to take advantage of an enhanced degree of personal autonomy with significantly different individual status.

  • http://littletinsoldier.net Kip Watson

    My Christian perspective informs me that thought is a product of life, so (sadly), no machine will never think.

    But for the sake of argument, assuming one will (and the argument is worth the assumption), it’s fascinating how deeply our ethical laws are tied to our God-given spiritual nature: our unique gifts of love and compassion, our gut-level sense of wrong (knowing right is more difficult, but instincts for shame and outrage are deep, acutely sensitive and much more accurate than our feeble ability to logically define transgression).

    In the area of reasoning and philosophy, Christians would argue an ethical code must have barriers of sanctity — such as the sanctity of life (particularly of innocent life).

    For example, it is not acceptable to deliberately take an innocent life to save a greater number of innocent lives — no moral doctor would kill one child to save the lives of two others. Nor is it right to kill an old or ill person (who may have only a short time to live), even if this would greatly prolong another’s life — sanctity of life is not quantifiable.

    There are many arguments about the death penalty, but all generally include a concept of the sanctity of life. Those who support the death penalty may argue that cold-blooded killers must face justice for their horrible violations this sanctity, or that the risk to the lives of others posed by such criminals is an immoral imposition on potential future victims, those against the death penalty argue that even a sadistic murderer’s life is sacred, or even that there is innocence somewhere in the most twisted soul.

    However, I’ve never heard anyone suggest it would be acceptable to execute criminals simply for convenience or to save money. Even though the utilitarian would argue that the money saved might be used preserve other lives, there is a ‘sacred barrier’ that renders that calculation offensive to us.

    On a lower level of sanctity, it is wrong to infringe the natural freedoms of some to enhance the convenience or comfort of any number of others. But it is wrong to place personal freedom above the life and health of others (the argument underlying traditional opposition to the legalisation drugs). Even libertarians would probably agree, although they might have a much broader conception of what constitutes a ‘sacred’ freedom, and to what degree one’s actions (or their own) directly affect the life and health of others.

    America’s Founders had it right — life, liberty, the pursuit of happiness, in that order…

    Even though our whole being has been wrought with moral awareness at its core, we struggle to apply right and wrong to even the most trivial situations. It would fascinating to see how a hypothetical intelligent machine would manage, though if such a creation were possible I seriously doubt it could ever evolve beyond the need for an easily accessible off switch!

    Finally, I must laugh whenever people philosophise about what makes Western society superior to others. ‘There but for the grace of God’ – our current fortunate situation is either simply an unearned blessing, or simply the sheer good luck of inheriting, adopting or stumbling upon the material, intellectual and legal ‘technologies’ we enjoy today. However, such superiority certainly doesn’t extend back in time very far, and there’s certainly no guarantee it will continue indefinitely into the future — particularly if we continue to dismantle the moral framework (an unearned inheritance) that so often underpinned our advancement.