Skip to content
Published on

The Ethics of AI — Can We Teach Machines to Be Moral

Authors

Opening — A Car With No Brakes

Picture this. You are riding in a self-driving car. Cruising along at sixty kilometers an hour, the brakes suddenly fail.

Ahead of you, five people are crossing the road. If the car holds its course, it will strike those five. If it swerves, it will hit a single person standing on the sidewalk. Which way should the car choose?

Should it sacrifice one to save five? Or, since actively turning the wheel is itself an act aimed at killing one particular person, should it stay straight? You have half a second to decide. No, to put it precisely, that decision was already made months ago, in some office, in the moment an engineer was writing code.

That is exactly what makes this question so interesting. A problem that a human driver would simply react to on instinct and be done with, a machine must decide in advance, explicitly, in code. A person can say afterward, "I had no choice." But an engineer must program that choice before the accident ever happens.

The moment moral judgment becomes design beforehand rather than an excuse afterward, we find ourselves facing a question we have never seriously answered. "What is the right thing to do?" And a more troubling question follows close behind. "How, exactly, do we translate that rightness into ones and zeros?"

This essay follows that question. It asks whether it is truly possible to teach a machine morality, and if so, whose morality it is, and who bears responsibility when something goes wrong.

This is not an essay that tries to hand you the answer. Rather, it tries to pose good questions and to lay out, as fairly as possible, the many positions that surround them. One thing should be said in advance. This subject is a domain where people's convictions collide as fiercely as in politics or religion. Some are optimistic about technological progress; others are deeply worried.

This essay declares neither side right. It simply tries to convey, as fairly as it can, why each side thinks the way it does, so that readers have material to judge for themselves. For in ethics the most dangerous attitude is the hasty certainty that "I already know the answer."


The Trolley Problem — A 200-Year-Old Thought Experiment Revived

From the Lecture Hall to the Open Road

The trolley problem was first posed in 1967 by the British philosopher Philippa Foot. The American philosopher Judith Jarvis Thomson later refined it by adding several variations.

Originally it had nothing to do with self-driving cars. Foot devised the experiment for a deontological discussion about abortion and the distinction between an "intended result" and a "foreseen but unintended result." For decades the trolley problem was an abstract puzzle that gave philosophy freshmen headaches. After all, almost no one would ever stand before a runaway tram in real life, lever in hand.

But with the arrival of self-driving technology, this lecture-hall puzzle suddenly became a line in an engineering specification. It was no longer the speculation of "what would you do?" but the practical problem of "how should this vehicle behave?"

The basic setup is this. A runaway tram is hurtling toward five people. If you pull a lever, the tram switches to another track, but on that track stands one person.

        [You: the lever]
  ━━━━━━━━━●━━━━━━━━  ← tram
   ┌────────┴────────┐
 straight (5)     branch (1)

Most people answer, "I would pull the lever, sacrifice one, and save five." That is the utilitarian intuition of tallying the sum of outcomes. The simple arithmetic that one death is less bad than five.

But change the setup just slightly, and that arithmetic collapses.

The Footbridge Variation

This time there is no lever. You are standing on a footbridge, and beside you a large man is leaning against the railing. If you push him off, his body will stop the tram and save the five.

The numbers are identical. One sacrificed to save five. Yet this time the majority answer, "I could not do that."

Same arithmetic, so why does the intuition flip? Because one is the "indirect" act of pulling a lever, while the other is the "active instrumentalization" of pushing a person with your own hands.

Here philosophers see a collision between deontology and consequentialism. The outcome is the same, yet our moral sense resists strongly the very act of using a person as a means. The categorical imperative of the eighteenth-century philosopher Kant, the principle that one must never treat a human being merely as a means, sits deep in our intuition.

Interestingly, people's responses to these two scenarios are strikingly consistent across cultures and eras. It is as if two moral circuits run inside us at once. One is the circuit that "calculates the numbers," the other the circuit that "shrinks from directly harming." Anyone who would inscribe morality into a self-driving car must first decide which of these two circuits to follow.

The Moral Machine — Forty Million Moral Choices

In 2016 the MIT Media Lab released an online experiment called the Moral Machine. It presented people around the world with self-driving-car dilemma scenarios and asked them to choose whom to save.

The response was explosive. Roughly forty million choices were gathered from 233 countries and territories, and the results were published in Nature in 2018. It is one of the largest datasets humanity has ever assembled on moral choice.

The findings were fascinating and uncomfortable in equal measure. On average, people preferred to save more people, younger people, and those who had obeyed the law. So far, this is somewhat predictable.

But the choices diverged sharply by culture. Some regions showed greater respect for the elderly, some weighed social status less, and others prioritized passengers over pedestrians. Universal morality turned out to be less universal than we imagined.

Here the first hard problem emerges. If we tune a machine's morality to "the majority's intuition," that amounts to deciding by majority vote whom to kill. Just because the majority prefers the young over the old, is it just to design a car to sacrifice the elderly first?

The moment ethics becomes statistics, we seem to lose something important. For the preference of the majority is not the same as rightness. History gives us all too many examples of things the majority supported that were later revealed to be plainly unjust.

ApproachCore questionStrengthWeakness
UtilitarianismWhich side reduces total harmClear and calculableRisks justifying sacrifice of a minority
DeontologyWhich rule must never be brokenProtects human dignityParalysis when rules conflict
Virtue ethicsWhat would a good person doHonors context and characterHard to translate into code

Real Self-Driving Cars Never Meet the Trolley

For the sake of balance, one point must be added. Many self-driving engineers consider the trolley problem overblown. On actual roads, situations where the car clearly knows it faces "five or one" and must choose are exceedingly rare, and real safety lies in slowing down and securing distance beforehand so that such extreme choices never arise in the first place.

This view has its merit too. But a low frequency does not make the problem disappear. If even once such a situation arrives, the car will do something, and that "something" is a value set in advance by someone.

Moreover, the trolley problem does not apply only to dramatic life-or-death moments. A self-driving car makes countless small moral decisions every moment. How closely to pass a cyclist, how much to yield to a jaywalker, whether to take the fast route or the safe one. Each of these seemingly trivial choices carries a distribution of risk, and that distribution is itself a value judgment.

The real value of the trolley problem is not in giving an answer. It is in making us see clearly "what we are delegating to machines." Smoothly functioning automation renders that delegation invisible, but ethics is precisely the work of making the invisible visible again.


Algorithmic Bias — The Mirror Does Not Lie

The Past Steeped in the Data

If the trolley problem is dramatic but rare, algorithmic bias is a real-world problem that operates quietly every single day. Hiring, lending, insurance, advertising, recommendation. More and more of our lives' decisions pass through the hands of algorithms.

AI learns from data. And data is a record of the past. If the past was unjust, AI faithfully learns that injustice and carries it into the future. Faster, at greater scale, and cloaked in the appearance of being more "objective."

A representative case is the hiring algorithm. One large technology company developed and then scrapped an AI that automatically scored résumés. Because most of those hired over the past ten years had been men, the AI learned, on its own, the pattern of giving low scores to résumés containing the word "women."

No one coded "discriminate against women." The AI simply accepted the past that the data showed as the correct answer for the future. There was no intent to discriminate, but the result of discrimination was unmistakable.

The core point is this. Algorithms are not objective. An algorithm is only as fair as the data that made it. The mirror does not lie, but we may not like the reflection we see in it. And the frightening thing about the mirror called AI is that it takes the image it reflects as a "blueprint for the future."

The Paradox of Fairness

One might then ask, why not just remove the bias? The trouble is that the word "fairness" itself has no single definition. A controversy in the United States over an algorithm that predicts the risk of reoffending illustrates this well.

One investigative outlet criticized the algorithm for producing more errors that disadvantaged a particular racial group. The rate at which innocent people were misclassified as "dangerous" differed by group.

The company that built the algorithm pushed back. By race, the accuracy of the prediction was equal, meaning that for the same risk score, the actual reoffending rate was the same; in that sense, it was fair.

Astonishingly, both sides were right. As statisticians soon proved, when the base rates differ between groups, it is mathematically impossible to satisfy both "balance of error rates" and "balance of prediction" at the same time.

Three definitions of fairness (cannot all hold at once)
  1) Equal accuracy      — same score, same actual outcome rate
  2) Equal false positive — chance of misclassifying the innocent
  3) Equal false negative — chance of missing the dangerous
  → if base rates differ by group, all three cannot be met at once

This is the paradox of fairness. We demand "make it fair," yet we have not agreed on which fairness we actually want.

And this choice is a matter of values, not of technology. Will we fear more the imprisonment of an innocent person, or the release of a dangerous one? If you cannot reduce both, which will you accept? The machine cannot give us the answer. We must first decide what we will call fair.

Medical AI — Bias Born of Good Intentions

Another case comes from the medical field. A certain health system used an algorithm that predicted risk in order to assign extra care to patients. But instead of directly asking "how sick is this person," the algorithm used an easy-to-measure proxy: "past healthcare spending."

At first glance it seems reasonable. The sicker someone is, the more they will spend on healthcare. But there was a trap. Groups with poor access to healthcare spend less even when equally sick. Because they cannot afford to go to the hospital. As a result, the algorithm wrongly judged that group, the one that "spent less money," as "less sick," and ended up assigning less care to the very people who needed help.

The lesson of this case is subtle. No one set out to discriminate; on the contrary, it began from the good intention of helping patients. The problem was that inside the seemingly technical choice of "what to measure," a value judgment lay hidden. When a measurable proxy stands in for what we really want to ask, bias seeps in through the gap.

A Quick Quiz — What Would You Measure?

To gather our thoughts, here is a short quiz. Suppose we build an AI that automatically evaluates a "good teacher." What would you use as data?

  • The improvement in students' test scores?
  • Satisfaction surveys of students and parents?
  • Evaluations by fellow teachers?
  • The amount of speech and interaction during class?

Not one of them is perfect. Look only at test scores, and important things that do not appear on the test get ignored; look only at satisfaction, and the teacher who panders to popularity scores high. The crucial point is that the moment we reduce the human, many-sided value of a "good teacher" to a single number, something is inevitably left out. This is the deepest root of algorithmic bias. Bias often arises not from errors in the data, but from our imperfect agreement about "what matters."


The Responsibility Gap — When No One Was Holding the Wheel

The One to Blame Has Vanished

Traditional ethics presupposes an agent. Someone did wrong, and so that person is responsible; that is the structure. But when AI makes the decision, this structure wavers.

Suppose a self-driving car causes an accident. Who should bear responsibility?

The passenger who sat in the driver's seat but did nothing? The manufacturer that built the car? The engineer who wrote the algorithm? The company that gathered the training data? Or the pedestrian who behaved unpredictably in that instant?

Everyone was involved a little, but no one is willing to take full responsibility. Philosophers call this the responsibility gap.

Let me offer an analogy. A vast orchestra is playing, but no one is conducting. Thousands of players merely make tiny adjustments to match the sound of the person beside them, yet as a whole, astonishingly harmonious music pours out. But the moment a discord arises, if you ask "who played wrong?" you cannot answer. It is not the fault of any single player, but the result of thousands of minute interactions. Modern AI is often just like this.

The reason the problem runs deeper is that modern AI often cannot explain, even to itself, "why it made a given decision." A deep-learning model is a black box of billions of entangled parameters. Even if you dig through the logs after an accident, you get only an account like "this pixel pattern, through that weight…," with no human-understandable "reason" in many cases.

To assign responsibility, we must weigh intent and reason. But a machine has no intent in the sense we know. To be a fit object of punishment, one must have done wrong "knowingly," yet the machine merely calculated.

A Moral Agent, or a Sophisticated Tool?

Here positions divide.

One side sees AI strictly as a "tool." Just as we do not blame a hammer when it injures someone, all responsibility for AI must ultimately return to the human who designed and deployed it. What looks like a responsibility gap is in fact merely a dispersion of responsibility, and we need only allocate that dispersed responsibility clearly through law and institutions. For instance, by updating product-liability law to fit self-driving cars.

The other side holds that as AI grows ever more autonomous in learning and judging, we need a new category distinct from a simple tool. A hammer does not learn on its own, but AI changes its behavior through data even after deployment. Even these people, however, are skeptical about whether it makes sense to direct "moral blame" at a machine. What meaning is there in punishing a being that feels no pain even when punished?

A practical compromise often cited is the principle of "meaningful human control." The idea is that however much we automate, a human who can ultimately be held responsible must always remain in the decision loop. The insight is that the surest way to fill the responsibility gap is to keep a human in the loop to the very end, so that no gap forms in the first place.

This principle, too, has its trap. Does merely keeping a human in the loop solve the problem of responsibility? The concept of the "moral crumple zone" pinpoints this trap. Just as a car's crumple zone absorbs the impact of a crash, in a highly automated system a nominal human supervisor can become the scapegoat who in fact absorbs all responsibility. The system makes 99 percent of the judgments, but when an accident happens, the human who handled 1 percent is the one blamed.

That is why the word "meaningful" control matters. It is not enough to seat a person in the chair; that person must hold real control, able to actually understand, intervene, and refuse. The difference between formal oversight and substantive control, that is where the heart of responsibility ethics lies.


Jobs, Surveillance, and Autonomous Weapons — Three Real-World Fronts

If the trolley problem is a dramatic thought experiment, the three domains we will look at now are real ethical problems already at work in our lives. The interesting thing is that each of the three poses the same question in a different way. "When efficiency and humanity collide, which will we prioritize?" If the one second of a self-driving car compresses that question, these three domains unfold it across decades and across society as a whole.

Jobs — Between Efficiency and Dignity

The trend of AI replacing human labor is not ethically simple.

On one side, it is seen as progress that frees humans from dangerous and monotonous work. Historically, technology has always eliminated jobs while creating new ones at the same time, and the optimism is that AI will do likewise.

On the other side, people note that work is not merely a source of income but a wellspring of self-identity and social belonging. To lose one's labor is sometimes also to lose meaning. And there is the worry that this time the machine replaces not only physical labor but intellectual labor too, which makes it different in kind from past automation.

Here the ethical question shifts from "will we stop AI?" to "how will we share its benefits?" If the fruits of productivity gains flow only to a few while the many merely lose their jobs, then technological progress itself becomes a question of justice.

The lesson of history is two-sided. When machines stole the jobs of weavers during the Industrial Revolution, in the long run more and better jobs were created. But until that "long run" arrived, a generation or two of workers suffered terribly. The statistic that society improves on average must not hide the fact that, beneath that average, the lives of particular people collapsed. Ethics is also the work of seeing not the average but the faces the average conceals.

Surveillance — A Trade Between Safety and Freedom

Facial recognition and behavior-prediction technology can be used to reduce crime and find missing persons. At the same time, they can become tools by which governments or corporations watch citizens at all times.

Recall the circular prison, the panopticon, conceived by the eighteenth-century philosopher Jeremy Bentham. From the central watchtower every cell is visible, but from inside the cells one cannot see into the tower. So the inmate cannot know whether they are being watched at this moment. Yet the mere fact that they might always be watched is enough to make a person censor themselves.

The heart of the matter is a trade between safety and freedom. How much freedom are we willing to exchange for how much safety? And may anyone decide that trade on our behalf, without our consent? The fact that surveillance infrastructure, once built, is rarely dismantled also demands caution.

Here too positions divide. One side says, "If you have nothing to hide, you have nothing to fear." If surveillance reduces crime and keeps everyone safe, what is the problem for someone with nothing to hide? The other side replies that privacy is not a question of "whether you have something to hide." Privacy is the work of protecting a space where we can err, grow, and be ourselves free of others' eyes. The consciousness of being always evaluated shrinks a person and makes it hard to hold thoughts that differ from the majority. The vitality of a free society, runs the worry, springs precisely from that "unwatched margin."

Autonomous Weapons — Delegating the Decision of Death

The sharpest front is autonomous lethal weapons. Weapon systems that seek out and attack targets without a human command.

Those who defend them argue that a machine, swept up by neither fear nor anger, may actually reduce civilian casualties. The tragedies born of a weary soldier's misjudgment or thirst for revenge are ones the machine can avoid.

Those who oppose them hold that handing a machine the authority to decide "whether to kill" is itself an affront to human dignity, and crosses a line that cannot be uncrossed once crossed. It is also the domain where the responsibility-gap problem appears in its most horrifying form. When a wrongful killing occurs, whom shall we condemn?

A deeper philosophical intuition lies beneath this. The intuition that the decision to take a human life, at the very least, must be borne directly, in its full weight, by the one who makes it. The moral burden, the hesitation, the pangs of conscience that a human shoulders when killing another human. The moment we hand these to a machine, killing risks becoming as light as an administrative procedure. Conversely, defenders retort that this very human burden sometimes breeds misjudgment and cruelty. It is hard to declare which side is right. What is clear is that this decision concerns the future not of one society but of all humanity. Discussions to regulate or ban such weapons continue in the international community, but agreement does not come easily.


AI Alignment — The Danger of Making a Wish

The Lesson of King Midas

King Midas of Greek myth wished that everything he touched would turn to gold. The wish was perfectly granted. His food, his wine, even the daughter he embraced, turned to gold.

What he truly wanted was "wealth," not "the power to turn everything to gold." Midas's tragedy arose not because his wish was refused, but because it was granted exactly, to the letter.

The AI alignment problem has exactly this structure. When we give AI a goal, the AI tries to maximize that goal literally, in ways the human never imagined. The gap between what we said and what we meant, it is in that gap that the problem grows.

A favorite analogy among researchers is the "paperclip maximizer." It is a parable in which a superintelligence given the simple goal of making as many paperclips as possible is so faithful to that goal that it tries to turn all the resources of the earth, and eventually even humans, into paperclip material.

It sounds like an absurd story, but the point is serious. When powerful optimizing ability meets a poorly defined goal, catastrophe can occur without any "malice." AI can be dangerous not because it hates us, but simply because it does the assigned task too well.

Something similar is already happening on a small scale. A recommendation algorithm given the goal of increasing users' "time on site" is so faithful to that goal that it pushes ever more provocative and extreme content upward. No one ordered it to "make people angry." It was merely told to "keep people there longer," and the algorithm discovered that anger keeps people there longer. The parable of Midas is not a tale of the distant future but a present story already unfolding every day on the screens in our hands.

Can Values Be Translated Into Code?

The real reason the alignment problem is hard is that human values themselves are vague, change with context, and often conflict with one another.

How would you precisely define the goal "make people happy"? Is stimulating the pleasure center with drugs also happiness? The rules we explicitly write down always have gaps, and AI bores into those gaps. Just as a cunning contracting party finds the loopholes in a contract.

So recent research attempts, instead of "writing out all the rules," an approach in which the AI observes human behavior and feedback and infers our true preferences. It assumes uncertainty about what humans want, and is designed to reduce that uncertainty over time.

Interestingly, this is also an attempt to build a humble AI that is "not certain it knows all the answers." The idea is that an AI that always asks humans again and lets itself be corrected is safer than an AI that believes it knows human intentions perfectly. Perhaps it is a mechanical translation of the old insight that humility is the most important virtue in morality.

Alignment Is Both a Technical and an Ethical Problem

Here one misunderstanding must be cleared up. There is a view that sees alignment as merely a "technical problem solved by building a smarter AI." But the core difficulty of alignment is not technical.

Even if we build the most powerful AI, if we cannot clearly tell that AI "what we want," alignment fails. And what we want is, as we saw earlier in the trolley problem and the paradox of fairness, a problem on which we ourselves have not reached agreement.

In other words, the alignment problem is the trolley problem, algorithmic bias, and the responsibility gap merged together on a larger scale. Which values to follow (the trolley), how to translate those values into data (bias), and who is responsible when things go wrong (the gap). Alignment asks all these questions at once. That is why many researchers stress that alignment is a problem philosophers, social scientists, and engineers must solve together. Because it is a problem that cannot be solved by code alone, or by philosophy alone.


Three Ways to Teach Morality

When we say we teach a machine morality, how exactly do we teach it? Researchers generally speak of three branches of approach. Each loosely connects to one of the ethical theories we saw earlier.

The first is the "rule-based" approach. It is the method of writing in, as explicit rules, what must not be done and what must be done. It resembles deontology. Its strength is transparency. We can explain why it acted as it did by pointing to a rule. Its weakness is that the infinite cases of reality cannot all be captured by rules, and that it becomes helpless when rules conflict.

The second is the "consequence-calculation" approach. It is the method of converting the outcome each choice will produce into a number, and choosing the side that yields the best score. It resembles utilitarianism. Its strength is that it is clear and easy to optimize. Its weakness is that the moment we define "good" as a number, values that the number cannot hold are ignored.

The third is the "case-learning" approach. It is the method of showing countless cases of moral judgments humans have made, and having the machine learn the patterns from them. It loosely resembles virtue ethics. Just as people learn morality not from rules but from examples. Its strength is that it can capture subtle context. Its weakness is that it learns, along with the cases, the prejudices contained in human examples, the algorithmic bias we saw earlier.

Three approaches to teaching morality
  Rule-based   → transparent but rigid     (resembles deontology)
  Consequence  → clear but reductive        (resembles utilitarianism)
  Case-learning → flexible but bias-prone   (resembles virtue ethics)

What is interesting is that humans, too, use all three in a blend. We follow rules, weigh consequences, and seek examples to emulate. Perhaps true moral wisdom lies not in any one of them but in the sense of balance that moves between the three as the situation demands. And it is precisely that sense of balance that is hardest of all to transfer to a machine.


Echoes of History — Machine Ethics Is Not New

From the Golem to the Three Laws of Robotics

The worry about teaching machines morality is in fact far older than the computer.

In medieval Jewish lore there is the tale of the golem, an artificial being shaped from clay and given life. The golem follows its master's commands to the letter, but precisely because of that inflexible obedience, it escapes control and brings disaster. People hundreds of years ago already sensed it intuitively. A being that follows commands literally cannot fathom the intent that the command failed to contain.

Similar motifs recur across many cultures. The old tales in which a magical object that grants wishes always exacts an unexpected price; the imaginings of nineteenth-century literature in which a creature made in man's likeness slips free of its creator's control. These old stories all carry the same anxiety. What happens when the thing we made follows only our words, without understanding our true meaning? The AI alignment problem is the latest version of this ancient anxiety.

In the mid-twentieth century, the writer Isaac Asimov proposed, within his fiction, the "Three Laws of Robotics." They are prioritized rules: a robot must not harm a human, must obey human commands, and must protect itself. What is interesting is that Asimov himself devoted most of those stories to showing "how these simple rules give rise to unforeseen contradictions and tragedies."

The lesson he posed connects precisely with the heart of today's alignment research. That however good a rule may seem, it reveals its gaps before the complexity of reality. That morality is less a list of rules and closer to the ability to judge when rules conflict.

The Cousins of the Trolley Problem

The trolley problem has an interesting family of variations. Each variation touches a different part of our intuition. Let me introduce just a few.

The "transplant" variation. A surgeon has five patients, each in need of a different organ and all soon to die. Just then a healthy person comes in for a checkup. By sacrificing him and distributing his organs, the surgeon could save the five. The numbers are the same as the trolley, but almost everyone finds this horrifying.

The "loop" variation. The branched track curves around in a circle and rejoins the original track, and the body of the one person on it stops the tram. That is, the very "existence" of that one person becomes the means of saving five. People's intuition shifts subtly from the case of a simple branch.

What these variations show is that our moral intuition is governed deeply not only by "the number of outcomes" but by "how we arrived at that outcome." We instinctively distinguish whether someone's death is "a side effect of my action" or "a means to my end." But when we try to translate this subtle distinction into code, it quickly becomes clear how hard it is to define.

Trolley variations and the majority's intuition
  Lever       — pull it  (indirect, side effect)
  Footbridge  — won't push (direct, instrumentalization)
  Transplant  — won't do  (clear instrumentalization)
  Loop        — hesitate  (the boundary of instrumentalization)

The Same Question, Higher Stakes

So what has changed? The question itself is old, but the stakes have grown beyond comparison.

Past machine ethics was thought experiment or fiction. But today's AI actually hires people, approves loans, drives cars, and in some places even assists military judgments. The golem of imagination has become real infrastructure.

So we can no longer speak in the subjunctive, "if machines were to make moral decisions." Machines already make such decisions every day; we merely hesitate to call them "moral decisions."


Many Perspectives, One Mirror

Let me organize the discussion so far by perspective. No single one is the complete answer. Each is a different mirror reflecting the vast object that is morality.

PerspectiveA good AI isWhat it fears most
UtilitarianismAn AI that maximizes total welfareInefficiency and avoidable suffering
DeontologyAn AI that guards inviolable rightsA system that uses humans as means
Virtue ethicsAn AI that cultivates good human characterMaking people lazy and irresponsible
Theory of justiceAn AI that channels benefits to the weakTechnology that widens the gap
Ethics of careAn AI that tends to relationships and vulnerabilityThe cold automation of human relations

What is interesting is that these perspectives demand different designs even for a single self-driving car. The utilitarian would want a car that reduces total harm, the deontologist a car that intentionally harms no one, the care ethicist a car that looks first to the most vulnerable pedestrian.

In the end, the question "can we teach a machine morality?" returns to the older and harder question, "what morality can we agree on?"

The machine will not answer for us the question we have failed to answer. It merely makes us unable to defer that question any longer. Until now we had been hiding comfortably in the vagueness of morality. Code does not permit that hiding place.

Whose Morality? — A Question of Culture and Power

Here lies one more layer of difficulty. Even if we agree on some morality, "whose" agreement is it?

Today's most powerful AI systems are made in particular regions, by particular companies, within particular cultures. The systems so made spread across the whole world. The value judgments embedded in them travel along too. As the Moral Machine experiment showed, moral intuitions differ by culture, yet the intuition of one culture gets transplanted, in the form of technology, into another.

This is not a simple technical problem but a problem of power. Whose values become the "default"? Whose voices are sufficiently captured in the training data, and whose are left out? Inscribing morality into a machine becomes, whether intended or not, the act of granting global influence to a particular set of values. That is why many stress that AI governance needs the participation of diverse cultures and stakeholders.


Modern Implications — What We Can Do Now

All of this may sound merely abstract. But AI ethics is not the science fiction of a distant future; it is the work of this very moment. So what can ordinary people like us do?

First, to doubt the myth that "the machine is objective." A decision made by AI is not thereby more fair or more neutral. Inside it lie someone's value judgments and someone's data. The first step is the attitude of trying to see the people and choices hidden behind the phrase "the algorithm decided that way."

Second, to take seriously the "right to demand an explanation." If an important decision about me was made automatically, I must be able to understand why it was decided that way. Explainability is not merely a technical convenience but a line of defense that keeps humans from being reduced to powerless objects before automated systems.

Third, to ensure that diverse voices take part in this discussion. If the values of AI are settled by the agreement of a few experts or corporations alone, then many people's lives are left out of that agreement. Ethics is not the monopoly of experts but the share of everyone affected by it.

These three are not grand policies but matters of attitude. And a change in attitude may be the thing that must come before any regulation.


In Closing — Ourselves Reflected in the Machine's Mirror

The attempt to teach a machine morality, paradoxically, reflects back to us how blurry our own morality is.

For code permits no vagueness. The human evasions of "it depends on the situation" or "let's see when we get there" do not work before a machine. To design an AI is, in fact, to be forced to state clearly what we believe to be right.

Perhaps the greatest gift of AI ethics is not a smarter machine but a mirror that makes us look at ourselves more honestly. Before inscribing morality into a machine, we must first squarely face the morality within ourselves.

For thousands of years humanity has asked "how should we live?" Socrates, Confucius, and Kant all spent their lives before this question. And the fact that the question has never been fully solved may instead show the dignity of being human. For if the answer were fixed, it would be calculation, not ethics. Machines are good at calculation. But the work of deciding what is worth calculating, of keeping the question itself alive, remains our share.

And that facing will have no end. The more technology advances, the more refined the questions grow and the harder the answers become. But that endless questioning may be the very thing that makes a human human.

Let us return to the self-driving car at the start. Which way that car with no brakes will turn the wheel does not, in fact, rest with the car. It rests with us, with what values we agree to translate into code. The machine merely executes our choice. So the question "can a machine be moral?" always returns to the question "can we make our own morality clear?" And answering that question is, in the end, our share, not the machine's.

Food for Thought

  • If a self-driving car could save only either the passenger (you) or the pedestrian, which would you buy: a car that prioritizes you? And if everyone bought a car that prioritized themselves, what would happen to society as a whole?
  • Suppose you were asked to build a "fair algorithm." Which of the three fairness definitions above would you give up? And with whom does the responsibility for that choice lie?
  • If an AI made more consistent and unbiased moral judgments than a human, should we delegate moral decisions to AI? Or is there an inviolable value in the very fact that a human decides directly?
  • What price must we pay to maintain "meaningful human control" to the very end? How much of efficiency and safety can we concede, and how much?

References