The Chidi Anagonye Problem in Machine Ethics

2019, Jun 07    

In this blog post, I describe a lesson researchers trying to develop ethical machines can take from Chidi Anagonye, a protagonist of the NBC sitcom “The Good Place”. Spoiler alert for Season 1 of the show!

I very much enjoy NBC’s TV show The Good Place. In addition to enjoying it because it’s funny and heartwarming, I admire it as someone who thinks regularly about ethics for my research. The Good Place frequently brings up ethical theory and depicts it in action, and more than once I’ve found the examples that it uses relevant to my own work. I’ve even used clips from the show in presentations I’ve given about my research! Lately I’ve been thinking about how one of the protagonists of the show, Chidi Anagonye, could be seen as a cautionary tale for people who, like me, are thinking about moral reasoning and decision-making in machines.

Chidi Anagonye

The Good Place follows four people who discover that they have died and gone to “the good place”, an idyllic afterlife reserved for those who were extremely good. One of these people is an ethicist named Chidi Anagonye, who spent his life exhaustively reasoning about the moral implications of every decision he made.

Unfortunately, by the end of the first season, the protagonists discover that they have been deceived, and are in fact in an experimental “bad place” environment designed to eternally torture those unworthy of the good place by deceiving them and repeatedly placing them in situations in which they torture each other. While several of the protagonists already suspected that they were unworthy of their positions in the good place, Chidi is surprised and disappointed: after all, he spent his whole life attempting to ensure that he never behaved immorally! What was the fatal flaw that prevented him from achieving a positive afterlife?

The answer is that all of his exhaustive reasoning about the moral implications of every decision meant that Chidi could not make timely enough decisions to do real good in the world, and that his indecisiveness annoyed and alienated those around him. What good is spending all your time deciding how you could do the most good if you run out the clock deliberating over the decision?

The Chidi Anagonye Problem

Machine ethics is the field devoted to trying to equip robots and other artificial agents with moral reasoning and decision-making capabilities. One natural way to implement moral reasoning in machines would be to design the agent to consider all possible courses of action it could perform, to examine the moral implications of each (thinking about the consequences all the way out into the future), and then to pick what is determined to be the best course of action. Indeed, this sort of approach isn’t uncommon among machine ethics researchers, including myself. The problem is that this sort of exhaustive moral reasoning takes time, and lots of it. Windows of opportunity for doing good, or avoiding harm, may close in the time it takes for an agent to fully think through and evaluate their consequences. Like Chidi, these systems could be great at determining the most moral courses of action given unlimited time, but completely unable to behave in a morally competent manner in the real world. This is what I call the “Chidi Anagonye problem” in machine ethics.

Here’s an example from my own work: I’ve developed an algorithm that allows us to give an agent a set of explicit moral norms (represented in a logical language) and for that agent to attempt to violate those norms as little as possible, even when the norms conflict with each other in a particular environment (e.g., a moral dilemma).1 One little drawback of this algorithm is that the amount of time it takes to figure out the right thing to do grows exponentially in the number of such norms. In toy environments, with only three or four norms, the algorithm runs quickly and effectively; unfortunately, with as few as seven or eight norms, it can take hours to decide what to do. Every time I tell people about this algorithm, I have to add, “…and hopefully in the future we can modify it to run more quickly with large numbers of norms”.2

I’m not the only one whose work runs into this problem. Lots of research in machine ethics relies on this paradigm of examining all the consequences of all possible courses of action before determining what to do. To make their algorithms manageable, researchers tend to use very simple moral situations as demonstrations. Don’t get me wrong: there may be applications in which this sort of exhaustive reasoning could be useful, e.g. in helping guide people through complex moral scenarios where there’s no time limit. But you could never deploy algorithms like this on a real robot and expect it to behave ethically.

To me the essence of the Chidi Anagonye Problem is that researchers in machine ethics (and, arguably, in AI more generally) aren’t really grappling with the fact that the agents they create are embedded in time: that time marches inexorably forward while the agent deliberates and acts. In modern AI, time is often treated at best as an afterthought, and at worst not at all.

An example or two

To help demonstrate some of the complications that occur when grappling with time-embeddedness, and at the risk of eye-rolling from ethicists, let’s turn to that classic moral dilemma: the trolley problem. If you’re unfamiliar with the trolley problem, here’s the premise: a train is hurtling down a track towards five unsuspecting railroad workers, and will hit and kill them if nothing is done. You are standing next to a switch which will reroute the trolley onto another track, saving the five railroad workers. Unfortunately, one railroad worker stands on the alternate track. Do you flip the switch, killing one worker to save five?

The trolley problem has been done to death by ethicists, moral psychologists, and AI researchers (and has been parodied numerous times). But in most cases, it’s examined as a binary choice: you flip the switch or you don’t, and no other options are available to you. It’s often presented in written form like my description above, but which completely sweeps under the rug various aspects of the problem (including time-embeddedness, as I’ll explain below).

First, the trolley dilemma in the real world is what I describe in one of my papers as a moral quasi-dilemma: a situation that at first glance may appear to be a moral dilemma, but in which one can’t guarantee that there are no other possible solutions. For example, just seeing the trolley heading towards the people, how can you know that you can’t, e.g., shout to warn the workers of the danger, or throw something at them to get their attention, or throw a heavy object in front of the train to slow it down? (Of course, the scenario could be modified in such a way as to make these specific things impossible, but that doesn’t change that many conventional ways of understanding the problem may admit alternate solutions.)

Not only is the trolley problem a quasi-dilemma, but the passage of time is very important to how the dilemma could possibly be solved. If the trolley is moving fast enough or is close enough to the workers, it becomes functionally impossible to make any decision at all, and virtually guaranteed that the five will die. On the other hand, if the trolley is many miles away from the workers, it appears foolish to think of it as a dilemma at all: just go to the workers, calmly inform them that the train is coming and will be here in a few minutes, and warn them that they should get off the tracks before it arrives. Somewhere in between these two extremes is the interesting region where you must use your time efficiently: you could frame the problem as a dilemma and deliberate over whether killing one to save five is justified, or you could insist that there must be some solution and consider various possible options (and perhaps even try a few), but you must keep in the back of your mind the fact that you don’t have much time before the outcome is determined for you.

Add to the challenge the fact that all this must be done more or less on the fly: you don’t get to plan in advance what the best way to allocate your time will be in every possible moral scenario. You can think about time management when faced with the trolley problem, but thinking about how to spend your time also costs time. (Fun thought: you could also think about how you will allocate time to thinking about how to allocate time, and so on and so on.)

The point here is that a system built to operate outside of time probably wouldn’t do a very good job in a situation like this. In case you think that this example is especially contrived (which it is) and that even humans probably wouldn’t do a very good job (whatever that means) when faced with a real trolley problem (and I suspect they wouldn’t), here’s an even simpler example in which time-embeddedness matters. You’re driving down the street and after turning a corner, you see someone walking across the street in front of you. If you continue going straight you will hit them. You can swerve and avoid the accident at minimal risk to you or anyone else.

In this second example, there’s a pretty clear answer that most people would endorse: just swerve and avoid the pedestrian. It wouldn’t take most humans very long to come to that conclusion, and I like to think most people would do so and avoid the accident. But if a system was designed the wrong way, if that system had to always examine the moral consequences of every possible course of action, or was otherwise ill-equipped to handle its embeddedness in time, that system might fail to make the right choice simply because it has already hit the pedestrian by the time it decides to swerve to avoid them. Such a system might be a first-class moral reasoner in theory, but completely morally incompetent in the real world.

Optimality, competence, and bounded rationality

This time-embeddedness is a basic fact of human life, to the extent that it’s easy to take for granted that we generally do a pretty good job thinking, planning, etc given the limited time we have to do so. Nevertheless, this forms the basis for a set of ideas about human decision-making known as bounded rationality. In the next few paragraphs I’ll explain this idea (while, I’m sure, drastically oversimplifying) and how it may point towards some approaches to the Chidi Anagonye Problem.

Economists have classically treated humans as rational beings, able to make optimal decisions (from the perspective of maximizing benefits and minimizing costs) all the time using all available facts. This is, of course, a pretty bad assumption (although economists would perhaps argue that much of people’s irrationality comes out in the wash when examining a society as a whole). Economist/psychologist Herbert Simon proposed the idea of bounded rationality, which basically is this: humans cannot be completely rational because of limited access to information, limited brain-power (so that they can’t always process all the facts), and limited time available to make decisions.

Simon’s initial approach was to alter classical economic models to deal with these limitations (e.g., to include costs for gathering or processing information). This is helpful for economists (who view the agents from the outside), but, as I have described in the previous section, doesn’t help much for decision-makers themselves: adding costs for information processing time doesn’t actually make the task of calculating the best course of action given these costs easier. If anything, the extra variable makes the process a little harder.

Ideas about how humans actually accomplish bounded rationality have come a long way since Simon’s original work. One of the key ideas of the field of bounded rationality is that instead of actually solving the problem that they’re faced with, humans generally use a series of heuristics: practical ways for “sort of” solving the problem, which may not guarantee optimality, but are fast and can generally lead to a half-decent solution. Sorts of heuristics include:

  • Solving a slightly easier or less nuanced problem than the one in front of you: e.g., rather than truly estimating the probability of various events occurring, humans may choose their estimate based on how easily those events come to their minds.
  • Not considering all possible alternatives: humans generally only consider the courses of action that come to their minds (although they may occasionally try to search for other possibilities). Intuitions as to which avenues to consider first are of prime importance here, and in humans these are probably partly innate and partly learned from experience.
  • Stopping search through alternatives once an acceptable one is found: rather than continuing to search for better solutions, humans may stop searching once they find a solution that seems acceptable. This is sometimes referred to as “satisficing”, and Gerd Gigerenzer has argued that this happens in moral decision-making.

If you’re interested in the various heuristics we use and how they affect the rationality of our behavior, you should read Daniel Kahneman’s book Thinking Fast and Slow. It summarizes decades of research on the topic by a Nobel prizewinner.

The key takeaway here is that humans don’t always behave optimally. AI researchers are not necessarily constrained to design agents that make all the same mistakes that humans do, but it may be that truly optimal behavior, both in morally charged situations and not, just isn’t possible, and that AI researchers could stand to take some inspiration from the way humans “sorta” solve problems. Developing effective heuristics – stopping search early, focusing on the options that seem the most promising using something analogous to intuition (again, partly innate, and partly learned), and solving approximate problems: all of these may help researchers to handle the Chidi Anagonye Problem. For while heuristics may compromise optimality, they help humans to achieve competence, and the quest for optimality may undermine competence.

Conclusion

I’m still working through the implications of the Chidi Anagonye Problem, and I think there’s a lot of work still to do, both in figuring out how humans cope with time pressure in moral decision-making, and in grappling with time-embeddedness when designing artificial agents. I have yet to incorporate any insights related to this problem into my own technical work. Nevertheless, I hope that thinking about how to deal with this problem will help researchers in machine ethics in their quest to endow agents with moral competence.

My goal is ultimately to refine these thoughts and publish some version of them in an academic paper, so please let me know of any comments you may have! I hope to write up a second part of this blog post talking in more detail about possible approaches to deal with this problem, so stay tuned for that.

  1. I know it has been a long time, but I still do hope to cover some of this work in more detail in future blog posts. 

  2. We have some general ideas about how we might do this, but it’s never quite risen to the top of our to-do list.