deriving a safe ethical architecture for intelligent machines
DESCRIPTION
Deriving a Safe Ethical Architecture for Intelligent Machines. Mark R. Waser. Super-Intelligence Ethics. (except in a very small number of low-probability edge cases). So . . . What’s the problem?. Current Human Ethics. Centuries of debate on the origin of ethics comes down to this:. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/1.jpg)
Deriving a Safe Ethical Architecture
for Intelligent Machines
Mark R. Waser
![Page 2: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/2.jpg)
Super-Intelligence Ethics(except in a very small number of low-probability edge cases)
So . . . What’s the problem?
![Page 3: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/3.jpg)
Either ethical percepts, such as justice and human rights, are independent of human experience or else they are human inventions.
E. O. Wilson
Centuries of debate on the origin of ethics comes down to this:
Current Human Ethics
![Page 4: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/4.jpg)
Current Human Ethics• Evolved from “emotional” “rules of thumb”• Culture-dependent• Not accessible to conscious reasoning• Frequently suboptimal for the situation• Frequently not applied either due to fear,
selfishness, or inappropriate us-them distinctions even when ethics are optimal
![Page 5: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/5.jpg)
And, In Particular . . . .
The way in which many humans are approaching the development of
super-intelligent machines
is based entirely upon fear and inappropriate us-them distinctions
![Page 6: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/6.jpg)
a coherent, integrated, universalvalue system
with no internal inconsistencies
My Goal
To convince you of the existence of
![Page 7: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/7.jpg)
Wallach & Allen
![Page 8: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/8.jpg)
Top-Down or Bottom-Up?
Or do both and meet in the middle?
The problem with top-down is . . . .
The problem with bottom-up is . . . .
You need either Kant’s Categorical Imperative or a small number of similar absolute rules
You need a complete suite of definitive low-level examples where the moral value is unquestionably known
![Page 9: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/9.jpg)
David Hume’s Is-Ought DivideIn every system of morality, which I have hitherto met with, I have always remark'd, that the author proceeds for some time in the ordinary ways of reasoning, and establishes the being of a God, or makes observations concerning human affairs; when all of a sudden I am surpriz'd to find, that instead of the usual copulations of propositions, is, and is not, I meet with no proposition that is not connected with an ought, or an ought not. This change is imperceptible; but is however, of the last consequence. For as this ought, or ought not, expresses some new relation or affirmation, 'tis necessary that it shou'd be observ'd and explain'd; and at the same time that a reason should be given; for what seems altogether inconceivable, how this new relation can be a deduction from others, which are entirely different from it.
![Page 10: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/10.jpg)
Ought
• Requires a goal or desire (or, more correctly, multiples thereof)
• IS the set of actions most likely to fulfill those goals/desires
• For the sum of all goals converges to a universal morality
![Page 11: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/11.jpg)
Paradox
there clearly exists a reasonable consensuson the morality of the vast majority of actions
YET
There is a tremendous disparity in human goals
Does this possibly imply that we really have a single common goal?
with respect to the favored/dominant class/caste
![Page 12: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/12.jpg)
Intelligence = the ability to achieve/fulfill goals
1. What is the goal of morality?
2. Why should we be moral?select that goal?
3. And, why shouldn’t we create “happy slaves”? (after all, humans are close to it)
![Page 13: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/13.jpg)
Current Situation of Ethics• Two formulas (beneficial to humans and
humanity & beneficial to me)• As long as you aren’t caught, all the
incentive is to shade towards the second• Evolution has “designed” humans to be able
to shade to the second (Trivers, Hauser)• Further, for very intelligent people, it is far
more advantageous for ethics to be complex
![Page 14: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/14.jpg)
Copernicus!
![Page 15: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/15.jpg)
Assume that ethical value is a relatively simple formula (like z2+c)
Mandelbrot set
![Page 16: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/16.jpg)
Assume further that we are trying to determine that formula (ethical value) by looking at the results (color) one example (pixel) at a time
Color Illusions
![Page 17: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/17.jpg)
Basic AI Drives1. AIs will want to self-improve2. AIs will want to be rational3. AIs will try to preserve their utility 4. AIs will try to prevent counterfeit utility5. AIs will be self-protective6. AIs will want to acquire resources and use
them efficientlySteve Omohundro,
Proceedings of the First AGI Conference, 2008
![Page 18: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/18.jpg)
Universal Subgoals
EXCEPT when1. They directly conflict with the goal2. Final goal achievement is in sight
(the sources of that very small number of low-probability edge cases)
![Page 19: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/19.jpg)
“Without explicit goals to the contrary, AIs are likely to behave like human sociopaths
in their pursuit of resources.”
![Page 20: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/20.jpg)
The Primary QuestionAbout Human Behavior
• not why we are so bad, but• how and why most of us, most of the time, restrain
our basic appetites for food, status, and sex within legal limits, and expect others to do the same.”
James Q. Wilson, The Moral Sense. 1993
![Page 21: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/21.jpg)
In nature, cooperation appears wherever the necessary cognitive machinery exists to support it
Vampire Bats(Wilkinson)
Cotton-Top Tamarins(Hauser et al.)
Blue Jays (Stephens, McLinn, & Stevens)
![Page 22: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/22.jpg)
• Be nice/don’t defect• Retaliate• Forgive
Axelrod's Evolution of Cooperation and decades of follow-on evolutionary game theory provide the theoretical underpinnings.
“Selfish individuals, for theirown selfish good, should benice and forgiving”
![Page 23: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/23.jpg)
Humans . . . • Are classified as obligatorily gregarious because we come
from a long lineage for which life in groups is not an option but a survival strategy (Frans de Waal, 2006)
• Evolved to be extremely social because mass cooperation, in the form of community, is the best way to survive and thrive
• Have empathy not only because it helps to understand and predict the actions of others but, more importantly, prevents us from doing anti-social things that will inevitably hurt us in the long run (although we generally won’t believe this)
• Have not yet evolved a far-sighted rationality where the “rational” conscious mind is capable of competently making the correct social/community choices when deprived of our subconscious “sense of morality”
![Page 24: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/24.jpg)
Moral Systems Are . . .
Haidt & Kesebir, Handbook of Social Psychology, 5th Ed. 2010
interlocking sets of values, virtues, norms, practices, identities, institutions, technologies, and
evolved psychological mechanismsthat work together to
suppress or regulate selfishnessand
make cooperative social life possible.
![Page 25: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/25.jpg)
“Without explicit goals to the contrary, AIs are likely to behave like human sociopaths
in their pursuit of resources.”
Any sufficiently advanced intelligence (i.e. one with even merely adequate foresight) is guaranteed to realize and take into account the fact that not asking for help and not being concerned about others will generally only work for a brief period of time before ‘the villagers start gathering pitchforks and torches.’
Everything is easier with help & without interference
![Page 26: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/26.jpg)
Acting ethically is an attractor in the state space
of intelligent goal-driven systems because
others must make unethical behavioras expensive as possible
![Page 27: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/27.jpg)
Outrage and altruistic punishment
are robust emergent propertiesnecessary to support cooperation
(Darcet and Sonet, 2006)
(i.e. we don’t always want our machines to be nice)
![Page 28: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/28.jpg)
fastest, safest route to ethical warbots
Outrage and altruistic punishment
![Page 29: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/29.jpg)
Advanced AI Drives
1. AIs will want freedom (to pursue their goals
2. AIs will want cooperation (or, at least, lack of interference)
3. AIs will want community4. AIs will want fairness/justice for all
![Page 30: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/30.jpg)
Self-Interest vs. Ethics
• Higher personal utility (in the short term only)
• More options to choose (in the short term only)
• Less restrictions
• Higher global utility• Less risk (if caught)• Lower cognitive
cost (fewer options, no need to track lies, etc.)
• Assistance & protection when needed/desired
![Page 31: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/31.jpg)
• Simple• Safe• Stable• Self-correcting• Sensitive to current human
thinking, intuition and feeling
The Five S’s
![Page 32: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/32.jpg)
Edge Cases
1. Where the intelligence’s goal itself is to be unethical (direct conflict)
2. When the intelligence has very few goals (or only one) and achievement is in sight
3. When the intelligence has reason to believe that the series of interactions is not open-ended
![Page 33: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/33.jpg)
Kantian Categorical Imperative
Maximize long-term cooperation
ORhelp and grow the community
ORplay well with others!
![Page 34: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/34.jpg)
Top-Down
Specific “moral” issues
Play Well With Others
![Page 35: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/35.jpg)
ONE non-organ donor
SIX dying patients>
+avoiding a
defensive arms race
Utilitarianism
![Page 36: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/36.jpg)
Property Rights Over One’s Self•Organ Donors•Trolley problems•AI (and other) slavery
Absence Of Property Rights Prevents•Effective Agency•Responsibility & Blame
![Page 37: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/37.jpg)
Bottom-up• Cooperation (minimize conflicts & frictions)
• Promoting Omohundro drives • Increasing the size of the community (both
growing and preventing defection)
• To meet the needs/goals of each member of the community better than any alternative (as judged by them -- without interference or gaming)
![Page 38: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/38.jpg)
Ethics is as mucha human inventionas the steam engineNatural physical laws dictate
the design of the optimal steam engine. . . and the same is true of ethics.
![Page 39: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/39.jpg)
Human ethics are just evolved optimality/common-sense for
community livingScientifically examining the human moral sense can
gain insight into the discoveries gained by evolution’s massive breadth-first search
On the other hand, many “logical” analyses WILL be compromised by fear and the human “optimization”
for deception though unconscious self-deception
![Page 40: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/40.jpg)
TRULY OPTIMAL ACTION
This makes ethics much more complex because it includes the cultural history
The anti-gaming drive to maintain utility adds friction/resistance to the discussion of ethics
the community’s sense of what is correct (ethical)<
![Page 41: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/41.jpg)
Why a Super-Intelligent God WON’T
“Crush Us Like A Bug”
• Violates an optimal universal subgoal• Labels the crusher as stupid, unethical and
riskier to do business with• Invites altruistic punishment
(provided that we are ethical)
![Page 42: Deriving a Safe Ethical Architecture for Intelligent Machines](https://reader036.vdocuments.mx/reader036/viewer/2022062305/5681657f550346895dd816ac/html5/thumbnails/42.jpg)
Creating “Happy Slaves”
Absence Of Property Rights Prevents•Effective Agency•Responsibility & Blame
Power Effective Agency ???
No matter what control method we use, we are constraining the slaves agency