on data-driven creativity - midas...set 2Ω of possible components, Ω, that define the creative...

63
On Data-Driven Creativity Lav R. Varshney ECE/CSL/Beckman/CS/Neuroscience/ILEE University of Illinois at Urbana-Champaign January 5, 2017

Upload: others

Post on 09-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

On Data-Driven Creativity

Lav R. Varshney ECE/CSL/Beckman/CS/Neuroscience/ILEE

University of Illinois at Urbana-Champaign

January 5, 2017

Page 2: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Understanding sociotechnical systems

General purpose technologies of past centuries such as communication networks and engines give rise to new engineering challenges that are not just technical but sociotechnical in scope

2

Page 3: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Obesity in social capital deserts

3

Obesity surveillance using Foursquare data

• Venues (opportunity for social interaction)

• Checkins (actual social activity)

associated with obesity rates in New York City neighborhoods

H. Bai, R. Chunara, and L. R. Varshney, "Social Capital Deserts: Obesity Surveillance using a Location-Based Social Network," in Proc. Data for Good Exchange (D4GX), New York, 28 Sept. 2015. (NYC Media Lab - Bloomberg Data for Good Exchange Paper Award)

Page 4: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

4

A. J. Gross, D. Murthy, and L. R. Varshney, "Pace of Life in Cities and the Emergence of Town Tweeters," presented at International Conference on Computational Social Science (IC2S2), Helsinki, 8-11 June 2015.

Pace of life in cities and the emergence of ‘town tweeters’

Contrary to superlinear scaling of productivity with city population, total volume of tweets scales sublinearly

Looking at individuals, however, greater population density associated with smaller inter-tweet intervals

Concentrated core of more active users that serve an information broadcast function, an emerging group of town tweeters

Page 5: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

DATA WITHIN US DATA BETWEEN US DATA ABOUT US

[Rinie van Est, Intimate Technology: The battle for our body and behaviour, Rathenau Instituut, The Hague, The Netherlands, Jan. 2014.]

Building personalized data-driven technologies

5

Page 6: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Memory Deductive reasoning Association Perception Introspection Abductive reasoning Inductive reasoning Problem solving Language Attention Creativity

6

Augmenting intelligence

Page 7: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Outline

• Evolution of a data-driven culinary computational creativity system

• Design principles of a data-driven culinary computational creativity system

• Beyond culinary: computational creativity as a general purpose technology

• Fundamental limits of creativity

7

Page 8: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Creativity is the generation of an idea or artifact that is judged to be novel and also to be appropriate, useful, or valuable by a suitably knowledgeable social group.

8

Page 9: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Creativity is the generation of an idea or artifact that is judged to be novel and also to be appropriate, useful, or valuable by a suitably knowledgeable social group.

9

Page 10: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

[The New York Times, 27 Feb. 2013]

[San Jose Mercury News, 28 Feb. 2013]

[IEEE Spectrum, 31 May 2013]

[Wired, 1 Oct. 2013]

10

Page 11: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

11

Page 12: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

12

Page 13: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

13

Page 14: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

https://www.ibmchefwatson.com

14

Page 15: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

15

Page 16: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

16

Page 17: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Consensual assessment technique

17

Page 18: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Lovelace: “only when computers originate things should they be believed to have minds”

Beyond the Turing Test: Lovelace 2.0

LOVELACE

FERRUCCI

RIEDL

Lovelace 1.0: an artificial agent possesses intelligence in terms of whether it can “take us by surprise”

Lovelace 2.0: An artificial agent must create artifact o of type t where: • artifact o conforms to constraints C where ci ∈ C is any criterion

expressible in natural language • human evaluator h, having chosen t and C, is satisfied o is valid instance

of t and meets C, and • human referee r determines combination of t and C to not be impossible

18

Page 19: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Address sustainability and public health

One-third of all food produced worldwide, worth around US$1 trillion, gets lost or wasted in food production and consumption systems

One-third (78.6 million) of U.S. adults are obese, but 800

million people in the world do not have enough food to

lead a healthy active life

19

Page 20: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Food and Data Workshop: Interoperability through the Food Pipeline

20

Page 21: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Outline

• Evolution of a data-driven culinary computational creativity system

• Design principles of a data-driven culinary computational creativity system

• Beyond culinary: computational creativity as a general purpose technology

• Fundamental limits of creativity

21

Page 22: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

22

Page 23: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

1. Find Problem

2. Acquire Knowledge

3. Gather Related

Information

4. Incubation

5. Generate Ideas

6. Combine Ideas

7. Select Best Ideas

8. Externalize Ideas

[Sawyer, 2012]

23

Page 24: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Many previous attempts at computational creativity have only had computational divergent thinking, but not computational convergent thinking

Harold Cohen: AARON

David Cope: music

Doug Lenat: Automated Mathematician (AM)

Building big data oriented models of human hedonic perception / cognition allows us to not only generate promising ideas but also to rank the best ones among them

24

Page 25: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

1. Sample from state space, using culturally well-chosen sampling distribution

2. Rank according to psychophysical predictors of novelty and flavor

3. Select either automatically or semi-automatically depending on human-computer interaction model

25

Page 26: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Joint histogram of surprise and pleasantness for 10000 generated Caymanian Plantain Dessert recipes. Values for the selected/tested recipe indicated with red dashed line.

26

Page 27: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Data Engineering and Natural Language Processing to Understand the Domain

PARSER

Generative, Selective, and Planning Algorithms to Create the Best New Ideas

DOMAIN KNOWLEDGE

DATABASE

DYNAMIC PLANNER

COMBINATORIAL DESIGNER

COGNITIVE ASSESSOR

NOVEL RECIPE

27

Page 28: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Recipe Corpus

28

Page 29: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

0

500

1000

1500

2000

2500

3000

0 50 100 150 200

nb_directions

nb

_re

cip

es

SOURCE: 27697 recipes from Wikia dataset

Number of steps

Nu

mb

er o

f re

cip

es

Recipes typically require about eight steps (similar to # ingredients)

NLP tokens include cooking techniques, tools, and ingredients

Data model supports analytics algorithms

Natural language processing is difficult since recipes are out-of-

domain for standard tools trained on general corpora

29

Page 30: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

[Shepherd, 2006]

Neurogastronomy

30

Page 31: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Saffron (Crocus sativus L.)

phenethyl alcohol

safranal

isophorone

Food Chemistry

31

Page 32: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Black Tea

Bantu Beer

Beer

Strawberry

White Wine

Cooked Apple

PLEASANTNESS

INTENSITY FAMILIARITY

R2 = 0.374

Chemical Compound

Ingredient

Recipe

Linear Pleasantness Hypothesis

DATA

Chemistry: molecular properties

Psychology: human-labeled pleasantness rating

Psy

cho

ph

ysic

al P

leas

antn

ess

Chemistry [TPSA, heavy atom count, complexity,

rotatable bond count, hydrogen bond acceptor count]

Hedonic Psychophysics

32

Page 33: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

[Ahn, Ahnert, et al., 2011]

Flavor Networks

33

Page 34: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

[Itti and Baldi, 2006]

𝑆 𝑅, ℬ = 𝐷 𝑃𝐵|𝑅||𝑃𝐵 = 𝑃𝐵|𝑅 log𝑃𝐵|𝑅

𝑃𝐵𝑑𝐵

newly created recipe

personalized repository of prior food experience

prior beliefs

posterior beliefs

Latent Dirichlet Allocation (LDA) Model

Bayesian Surprise and Attention

34

Page 35: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

1. Find Problem

2. Acquire Knowledge

3. Gather Related

Information

4. Incubation

5. Generate Ideas

6. Combine Ideas

7. Select Best Ideas

8. Externalize Ideas

[Sawyer, 2012]

Learn data-driven cognitive models

Use models for creativity

35

Page 36: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

WIKIA

ICE

US NAVY

PARSER

...

RECIPE DB

RECIPE PLANNER

RECIPE DESIGNER

COGNITIVE RECIPE

ASSESSOR

COOKING PLAN

Crowds & Experts

Natural Language Processing

Databases Operations Research

Creativity Analytics

Predictive Analytics

Human-Computer Interaction

5. Generate Ideas

7. Select Best Ideas

8. Externalize

Ideas

6. Combine Ideas

36

Page 37: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Outline

• Evolution of a data-driven culinary computational creativity system

• Design principles of a data-driven culinary computational creativity system

• Beyond culinary: computational creativity as a general purpose technology

• Fundamental limits of creativity

37

Page 38: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

From spices to silks, materials, and education

38

Page 39: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

39

Page 40: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

40

Page 41: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Creativity for Technology, Drug Cocktails, …

41

Page 42: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

[K. Haase, Discovery Systems: From AM to CYRANO, MIT AI Lab Working Paper 293, Mar. 1987]

Constructive machine learning: Discovering concepts

42

Page 43: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Discovering concepts: Music theory from Bach’s chorales

Interpretable rule hierarchy learning by iterative, alternating optimization of Bayesian surprise (against current ruleset) and informativeness in Shannon’s sense

H. Yu, L. R. Varshney, G. E. Garnett, and R. Kumar, "Learning Interpretable Musical Compositional Rules and Traces," in 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, New York, 23 June 2016.

H. Yu, L. R. Varshney, G. Garnett, and R. Kumar, "MUS-ROVER: A Self-Learning System for Musical Compositional Rules," in Proceedings of the 4th International Workshop on Musical Metacreation (MUME 2016), Paris, France, 27 June 2016.

43

Page 44: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Outline

• Evolution of a data-driven culinary computational creativity system

• Design principles of a data-driven culinary computational creativity system

• Beyond culinary: computational creativity as a general purpose technology

• Fundamental limits of creativity

44

Page 45: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Google Magenta

45

Page 46: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Google Magenta

46

Page 47: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Cyclic ordering of cards for mind-reading card trick

47

Page 48: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

48

Page 49: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

From engineering to engineering theory

Multifarious algorithmic ideas for computational creativity – Supervised learning (trained on complete artifacts)

• Collaborative filtering

– Optimization with data-driven psychophysical objectives • Genetic algorithms • Simulated annealing • Stochastic sampling + ranking + selection

Consensual assessment technique allows characterization of whether artifacts are creative or not

Lovelace 2.0 test determines whether a machine is as creative as a human

Are there fundamental limits to how creative any system can be in a given domain?

49

Page 50: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Carnot established fundamental limits on efficiency of engines

Shannon established fundamental limits of communication in the presence of noise

Karaman and Frazzoli established fundamental speed limit of flight in forests without crashing

50

Page 51: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Creativity is the generation of an artifact that is judged to be novel and also to be appropriate, useful, or valuable by a suitably knowledgeable social group.

51

Page 52: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Creativity is the generation of an artifact that is judged to be novel and also to be appropriate, useful, or valuable by a suitably knowledgeable social group.

Towards a formalism

52

Page 53: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Towards a formalism

Artifact An unordered combinatorial object 𝛼 selected from the power set 2Ω of possible components, Ω, that define the creative domain.

(assume all possible components known)

Known Set Set of artifacts that are already known in the creative

domain, Θ ⊆ 2Ω ∈ 22Ω

, also called the inspiration set.

Novelty A non-negative function 𝑠: 2Ω × 22Ω→ ℝ+ that measures the

surprise of a given artifact 𝛼0 in the presence of a known set Θ, e.g. the empirical Bayesian surprise.

Utility A non-negative function 𝑞: 2Ω → ℝ+ that measures the quality of a given artifact 𝛼0, e.g. through the psychophysical properties of components and their combining rules.

53

Page 54: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Towards a formalism

A coding scheme for channel coding may be thought of as a test source with an input distribution, for informational characterization

Similarly, think of a creativity algorithm as a (possibly degenerate) probabilistic process 𝑃𝐴 𝛼 for mathematical characterization

54

Page 55: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Basic Tradeoff in Creativity: Average Case

𝑆 𝑄 = max𝑃𝐴 𝛼 :𝐸 𝑞 𝐴 ≥𝑄

𝐸 𝑠 𝐴, Θ

Novelty-Quality tradeoff in Creativity

55

Page 56: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Basic Tradeoff in Creativity: Average Case

𝑆 𝑄 = max𝑃𝐴 𝛼 :𝐸 𝑞 𝐴 ≥𝑄

𝐸 𝑠 𝐴, Θ

Novelty-Quality tradeoff in Creativity

Lemma [Varshney, 2013]

𝐸 𝑠 𝐴, Θ = 𝐼 𝐴, Θ .

𝑆 𝑄 = max𝑃𝐴 𝛼 :𝐸 𝑞 𝐴 ≥𝑄

𝐼 𝐴, Θ

(Shannon’s capacity-cost function)

Corollary

56

Page 57: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Basic Tradeoff in Creativity: Probabilities

𝑆 𝑄 = max𝑃𝐴 𝛼 :Pr 𝑞 𝐴 >𝜆𝑞 ≥𝑄

Pr 𝑠 𝐴, Θ > 𝜆𝑠

Rather than wanting algorithm that performs well on average, consider algorithm that produces novel and high-quality artifacts with probabilities above thresholds 𝜆𝑠 and 𝜆𝑞

Make use of information geometry techniques

Lemma [Varshney, 2013] Shannon capacity C for channel 𝑝𝑌|𝑋 is:

𝐶 = min𝑝𝑌 𝑦

max𝒳𝑠 𝑥

Geometrically, the unconstrained optimal output distribution will be the center of a “sphere” with radius measured by Bayesian surprise, as derived from the KKT conditions

57

Page 58: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Optimal Creativity Algorithms

The extremal 𝑃𝐴 𝛼 describes an optimal stochastic sampling algorithm for computational creativity Optimal sequential selection can be analyzed using the theory of concomitants of order statistics

58

STOCHASTIC SAMPLING

SEQUENTIAL SELECTION

IDEAS IDEA

Page 59: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Maturity of the field

Initially when Θ is very small, 𝑃𝜃|𝛼 may not be absolutely continuous

with respect to 𝑃𝜃 , so relative entropy in surprise would be infinite

After many artifacts are created and known, the effect of the Bayesian belief update due to the new artifact is small – Noisier channel shifts curve to left – All low-hanging fruits already created

59

Page 60: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

SOURCE: Youn, et al. (2014).

Broad patents were prevalent after WWII, but narrow patents now predominate – Growing the component alphabet?

Time is ripe for broad systems-level inventions, which make use of the fertile resource of narrow inventions

Maturity of the field

60

Page 61: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Lack of absolute continuity in Bayesian surprise expression yields an infinite value – Do new components yield transformational creativity that is

different in kind from combinatorial creativity?

Scientific discovery provides new “ingredients” for creating artifacts and ideas, especially if they are high-quality

Discovery: Growing the component alphabet

61

Page 62: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

62

“Each new machine or technique, in a sense, changes all existing machines and techniques, by permitting us to put them together into new combinations. The number of possible combinations rises exponentially as the number of new machines or techniques rises arithmetically. Indeed, each new combination may, itself, be regarded as a new super-machine.” ‒ Alvin Toffler, Future Shock (1970), pp. 28–29

Page 63: On Data-Driven Creativity - MIDAS...set 2Ω of possible components, Ω, that define the creative domain. (assume all possible components known) Known Set Set of artifacts that are

Email: [email protected] Twitter: @lrvarshney

63

On Data-Driven Creativity Lav R. Varshney University of Illinois at Urbana-Champaign