BOSTON – You’re driving along the highway when, suddenly, a person darts out across the busy road. There’s speeding traffic all around you, and you have a split second to make the decision: do you swerve to avoid the person and risk causing an accident?
Do you carry on and hope to miss them? Do you brake? How does your calculus change if, for example, there’s a baby strapped in the back seat?
In many ways, this is the classic “moral dilemma,” often called the trolley problem. It has a million perplexing variants, designed to expose human bias, but they all share the basics in common. You’re in a situation with life-or-death stakes, and no easy options, where the decision you make effectively prioritizes who lives and who dies.
A new paper from MIT published last week in Nature attempts to come up with a working solution to the trolley problem, crowdsourcing it from millions of volunteers. The experiment, launched in 2014, defied all expectations, receiving over 40 million responses from 233 countries, making it one of the largest moral surveys ever conducted.
A human might not consciously make these decisions. It’s hard to weigh up relevant ethical systems as your car veers off the road. But, in our world, decisions are increasingly made by algorithms, and computers just might be able to react faster than we can.
Hypothetical situations with self-driving cars are not the only moral decisions algorithms will have to make. Healthcare algorithms will choose who gets which treatment with limited resources. Automated drones will choose how much “collateral damage” to accept in military strikes.
Not All Morals Are Created Equal
Yet “solutions” to trolley problems are as varied as the problems themselves. How can machines make moral decisions when problems of morality are not universally agreed upon, and may have no solution? Who gets to choose right and wrong for the algorithm?
The crowd-sourcing approach adopted by the Moral Machine researchers is a pragmatic one. After all, for the public to accept self-driving cars, they must accept the moral framework behind their decisions. It’s no good if the ethicists or lawyers agree on a solution that’s unacceptable or inexplicable to ordinary drivers.
The results have the intriguing implication that moral priorities (and hence the types of algorithmic decisions that might be acceptable to people) vary depending on where you are in the world.
The researchers first acknowledge that it’s impossible to know the frequency or character of these situations in real life. Those involved in accidents often can’t tell us exactly what happened, and the range of possible situations defies easy classification. So, to make the problem tractable, they break it down into simplified scenarios, looking for universal moral rules.
As you take the survey, you’re presented with thirteen questions that ask for a simple yes or no choice, trying to narrow down responses to nine factors.
Should the car swerve into the other lane, or should it keep going? Should you preserve the young people versus the old people? Women over men? Pets over humans? Should you try to spare the most lives possible, or is one baby “worth” two elderly people? Spare the passengers in the car versus the pedestrians? Those who are crossing the road legally versus illegally?
Should you spare people who are more physically fit? What about those with higher social status, like doctors or businessmen?
In this harsh, hypothetical world, somebody’s got to die, and you’ll find yourself answering each of these questions—with varying degrees of enthusiasm. Yet making these decisions exposes deeply-ingrained cultural norms and biases.
Crunching through the vast dataset the researchers obtained as a result of the survey yields universal rules as well as fascinating exceptions. The three most dominant factors, averaged across the entire population, were that everyone preferred to spare more lives than fewer, humans over pets, and the young over the elderly.
Regional Differences
You might agree with these broad strokes, but looking further yields some pretty disturbing moral conclusions. More respondents chose to save a criminal than a cat, but fractionally preferred to save a dog over a criminal. As a global average, being old is judged more harshly than being homeless—yet homeless people were spared less often than the obese.
These rules didn’t apply universally: respondents from France, the United Kingdom, and the US had the greatest preference for youth, while respondents from China and Taiwan were more willing to spare the elderly. Respondents from Japan displayed a strong preference for saving pedestrians over passengers in the car, while respondents from China tended to choose to save passengers over pedestrians.
The researchers found that they could cluster responses by country into three groups:
“Western,” predominantly North America and Europe, where they argued morality was predominantly influenced by Christianity; “Eastern,” consisting of Japan, Taiwan, and Middle Eastern countries influenced by Confucianism and Islam, respectively; and “Southern” countries including Central and South America, alongside those with a strong French cultural influence. In the Southern cluster there were stronger preferences for sparing women and the fit than anywhere else.
In the Eastern cluster, the bias towards saving young people was least powerful.
Filtering by the various attributes of the respondent yields endless interesting tidbits. “Very religious” respondents are fractionally more likely to save humans over animals, but both religious and irreligious respondents display roughly equal preference for saving those of high social status vs. those of low social status, even though (one might argue) it contradicts some religious doctrines. Both men and women prefer to save women, on average—but men are ever-so-slightly less inclined to do so.
Questions With No Answer
No one is arguing that this study somehow “resolves” these weighty moral questions. The authors of the study note that crowdsourcing the data online introduces a sample bias. The respondents skewed young, skewed male, and skewed well-educated; in other words, they looked like the kind of people who might spend 20 minutes online filling out a survey about morality for self-driving cars from MIT.
Even with a vast sample size, the number of questions the researchers posed were limited. Getting nine different variables into the mix was hard enough—it required making the decisions simple and clear-cut. What happens if, as you might expect in reality, the risks were different depending on the decision you took? What if the algorithm were able to calculate, for example, that you had only a 50 percent chance of killing pedestrians given the speed you’re going?
Edmond Awad, one of the authors of the study, expressed caution about over-interpreting the results. “It seems concerning that people found it okay to a significant degree to spare higher status over lower status,” he told MIT Technology Review. “It’s important to say, ‘Hey, we could quantify that’ instead of saying, ‘Oh, maybe we should use that. The discussion should move to risk analysis—about who is at more risk or less risk—instead of saying who’s going to die or not, and also about how bias is happening.”
Perhaps the most important result of the study is the discussion it has generated. As algorithms start to make more and more important decisions, affecting people’s lives, it’s crucial that we have a robust discussion of AI ethics. Designing an “artificial conscience” should be a process with input from everybody. While there may not always be easy answers, it’s surely better to understand, discuss, and attempt to agree on the moral framework for these algorithms, rather than allowing the algorithms to shape the world with no human oversight.