AI Ethics: Are Asimov's Laws Enough?

When I tell people that my research interests are in AI ethics, they often respond by introducing me to Asimov’s three laws of robotics. This post goes out to them (it’s also the first post of a three-part series that will culminate in explaining my recent paper in the Conference on Decision and Control).

There’s been a fair amount of media attention lately on the ethics of artificial intelligence (AI). A number of prominent scholars and businesspeople (including the likes of Stephen Hawking and Elon Musk) have been prophesying humanity’s inexorable defeat at the hands of our future robot overlords. As an AI researcher, I don’t think that level of paranoia is warranted. Nevertheless, robots are being increasingly considered for tasks involving life-and-death scenarios (e.g., self-driving cars) and social interaction (e.g., robots in health- and elder-care settings). If we’re going to be putting artificial agents (a buzzword, meaning artificially intelligent beings, whether software bots or physical robots) in these sorts of situations, we are going to want them to adhere to our moral and social norms.

What is a programmer to do? Well, the first option is to just hardwire moral and social norms into our robots. The most famous candidate set of moral rules are the “Three Laws of Robotics” proposed in the science fiction novels of Isaac Asimov:

A robot may not injure a human being or, through inaction, allow a human being to come to harm.
A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.
A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws.

Whenever I start talking with someone about AI ethics who has read Asimov’s books or seen the movie I, Robot, they bring up these three laws, and then proceed to be very pleased with themselves for solving this baffling problem in seconds. Although I am happy that my research problem is one that people can relate to, I must admit that I am frustrated that people seem tol think it’s largely a solved problem.

Make no mistake, I hold no grudges against Asimov for coming up with these laws. In fact, as I understand it (although I have committed the cardinal heresy of having never read Asimov’s books), he spends many pages systematically deconstructing his three laws, showing their insufficiency for the problem of AI ethics.

Here are a few key problems with Asimov’s three laws:

They fail to fully address the possibility of conflicts between laws. At the very least, Asimov gives a hierarchy between the three laws: avoiding harm > obeying orders > protecting its own existence. So if a robot needs to sacrifice itself to save a person’s life, it will readily do so (and, interestingly, a robot ordered to destroy itself will do so). But what happens when a law conflicts with itself? The most commonly-posed example of this is the trolley problem, in which one life may be sacrificed to save five. The three laws do not explicitly state what to do in this problem; indeed, people will answer differently depending on how the problem is posed. More common (though less exciting) examples of this deal with conflicting orders: what if two humans give the robot conflicting orders? (e.g., “make sure this door remains closed” and “open this door”) Should the robot obey the human who asked first? The order which is easier to obey? The human that the robot “likes” more? My point is that robots will need a moral conflict resolution mechanism not already contained in Asimov’s laws.
They fail to fully address uncertainty and randomness. Many people drive to work every day, even though they know that despite their best efforts, there is some possibility that they will injure either themselves or someone else during their commute. If a self-driving car was given the three laws of robotics, would it choose never to drive you to work because there is a chance that someone could get hurt? This is not the sort of behavior we would want for our robots. On the other hand, we could interpret the first law as only prohibiting actions that the robot believes strongly will cause harm, but that would mean that a robot might have no problem indulging us in a game of Russian roulette. My point here is that a truly morally-competent robot must be able to reason about probability and uncertainty and come up with judgments that people find neither too draconian nor too loose.
They fail to fully address all the nuance and complexity of the modern moral world. Do we really believe that everything morally relevant to a human or robot can be summed up in these three laws? Even if we did, how would we provide the robot with a proper definition of “harm”? The definition of “harm” is not even static! Decades ago, many people in the United States and elsewhere thought that interracial marriage was immoral, and they almost certainly would have claimed it was harmful. More recently: is euthanasia harmful? Is abortion? In any case, even if we could code a robot with Asimov’s laws and with a pre-defined concept of harm, the programmers would probably be stuck updating the moral software as society’s values change.

My argument isn’t only with people who claim that Asimov’s laws are all we need. It’s with the whole enterprise of trying to hardwire morality into our machines. Humans have a vast collection of little moral rules and social conventions, and we will likely expect robots to obey many of them. We can try to write down thousands to millions of little rules governing how to be moral in all manner of situations (and probably be stuck updating them for the foreseeable future, as times change). I think there is some value in that approach. If we’re smart, maybe we can find a way to crowdsource it: let a large number of people come up with a large number of rules, and find a way to automatically convert them into system code.

If we’re really smart, we’ll program robots to learn right and wrong themselves, both from verbal instruction and from observing the behavior of those around them. How do we do this? In my next post, I will discuss one proposed approach to learning morality by observing behavior: inverse reinforcement learning.