lecture 04: knowledge representation

58
2003.09.04 - SLIDE 1 IS 202 - FALL 2003 Lecture 04: Knowledge Representation Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 am Fall 2003 SIMS 202: Information Organization and Retrieval Credits to Warren Sack for some of the slides in this lecture

Upload: savea

Post on 30-Jan-2016

62 views

Category:

Documents


1 download

DESCRIPTION

Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 am Fall 2003. Lecture 04: Knowledge Representation. SIMS 202: Information Organization and Retrieval. Credits to Warren Sack for some of the slides in this lecture. Today. Review of Categorization - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 1IS 202 - FALL 2003

Lecture 04: Knowledge Representation

Prof. Ray Larson & Prof. Marc Davis

UC Berkeley SIMS

Tuesday and Thursday 10:30 am - 12:00 am

Fall 2003

SIMS 202:

Information Organization

and Retrieval

Credits to Warren Sack for some of the slides in this lecture

Page 2: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 2IS 202 - FALL 2003

Today

• Review of Categorization

• Knowledge Representation

– The Vocabulary Problem

– Commonsense

– Cyc

• Discussion Questions

• Phone Project Overview and Assignment 2

• Action Items for Next Time

Page 3: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 3IS 202 - FALL 2003

Today

• Review of Categorization

• Knowledge Representation

– The Vocabulary Problem

– Commonsense

– Cyc

• Discussion Questions

• Phone Project Overview and Assignment 2

• Action Items for Next Time

Page 4: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 4IS 202 - FALL 2003

Categorization

• Processes of categorization are fundamental to human cognition

• Categorization is messier than our computer systems would like

• Human categorization is characterized by– Family resemblances– Prototypes– Basic-level categories

• Considering how human categorization functions is important in the design of information organization and retrieval systems

Page 5: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 5IS 202 - FALL 2003

Categorization

• Classical categorization– Necessary and sufficient conditions for

membership– Generic-to-specific monohierarchical structure

• Modern categorization– Characteristic features (family resemblances)– Centrality/typicality (prototypes)– Basic-level categories

Page 6: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 6IS 202 - FALL 2003

Properties of Categorization

• Family Resemblance– Members of a category may be related to one

another without all members having any property in common

• Prototypes– Some members of a category may be “better

examples” than others, i.e., “prototypical” members

Page 7: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 7IS 202 - FALL 2003

Basic-Level Categorization

• Perception– Overall perceived shape– Single mental image– Fast identification

• Function– General motor program

• Communication– Shortest, most commonly used and contextually neutral words– First learned by children

• Knowledge Organization– Most attributes of category members stored at this level– Tends to be in the “middle” of a classification hierarchy

Page 8: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 8IS 202 - FALL 2003

Today

• Review of Categorization

• Knowledge Representation

– The Vocabulary Problem

– Commonsense

– Cyc

• Discussion Questions

• Phone Project Overview and Assignment 2

• Action Items for Next Time

Page 9: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 9IS 202 - FALL 2003

Information Hierarchy

Wisdom

Knowledge

Information

Data

Page 10: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 10IS 202 - FALL 2003

Information Hierarchy

Knowledge

Information

Wisdom

Data

Page 11: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 11IS 202 - FALL 2003

Today’s Thinkers/Tinkerers

George Furnashttp://www.si.umich.edu/~furnas/

Marvin Minskyhttp://web.media.mit.edu/~minsky/

Doug Lenathttp://www.cyc.com/staff.html

Page 12: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 12IS 202 - FALL 2003

The Birth of AI

• Rockefeller-sponsored Institute at Dartmouth College, Summer 1956– John McCarthy, Dartmouth (->MIT->Stanford)– Marvin Minsky, MIT (geometry)– Herbert Simon, CMU (logic)– Allen Newell, CMU (logic)– Arthur Samuel, IBM (checkers)– Alex Bernstein, IBM (chess)– Nathan Rochester, IBM (neural networks)– Etc.

Page 13: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 13IS 202 - FALL 2003

Definition of AI

“... artificial intelligence [AI] is the science of making machines do things that would require intelligence if done by [humans]” (Minsky, 1963)

Page 14: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 14IS 202 - FALL 2003

The Goals of AI Are Not New

• Ancient Greece– Daedalus’ automata

• Judaism’s myth of the Golem• 18th century automata

– Singing, dancing, playing chess?

• Mechanical metaphors for mind– Clock– Telegraph/telephone network– Computer

Page 15: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 15IS 202 - FALL 2003

Some Areas of AI

• Knowledge representation• Programming languages• Natural language understanding• Speech understanding• Vision• Robotics• Planning• Machine learning• Expert systems• Qualitative simulation

Page 16: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 16IS 202 - FALL 2003

AI or IA?

• Artificial Intelligence (AI)– Make machines as smart as (or smarter than)

people

• Intelligence Amplification (IA)– Use machines to make people smarter

Page 17: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 17IS 202 - FALL 2003

Today

• Review of Categorization

• Knowledge Representation

– The Vocabulary Problem

– Commonsense

– Cyc

• Discussion Questions

• Phone Project Overview and Assignment 2

• Action Items for Next Time

Page 18: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 18IS 202 - FALL 2003

Furnas: The Vocabulary Problem

• People use different words to describe the same things– “If one person assigns the name of an item,

other untutored people will fail to access it on 80 to 90 percent of their attempts.”

– “Simply stated, the data tell us there is no one good access term for most objects.”

Page 19: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 19IS 202 - FALL 2003

The Vocabulary Problem

• How is it that we come to understand each other?– Shared context– Dialogue

• How can machines come to understand what we say?– Shared context?– Dialogue?

Page 20: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 20IS 202 - FALL 2003

Vocabulary Problem Solutions?

• Furnas et al.– Make the user memorize precise system

meanings– Have the user and system interact to identify

the precise referent– Provide infinite aliases to objects

• Minsky and Lenat– Give the system “commonsense” so it can

understand what the user’s words can mean

Page 21: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 21IS 202 - FALL 2003

Lenat on the Vocabulary Problem

• “The important point is that users will be able to find information without having to be familiar with the precise way the information is stored, either through field names or by knowing which databases exist, and can be tapped.”

Page 22: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 22IS 202 - FALL 2003

Minsky on the Vocabulary Problem

• “To make our computers easier to use, we must make them more sensitive to our needs. That is, make them understand what we mean when we try to tell them what we want. […] If we want our computers to understand us, we’ll need to equip them with adequate knowledge.”

Page 23: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 23IS 202 - FALL 2003

Today

• Review of Categorization

• Knowledge Representation

– The Vocabulary Problem

– Commonsense

– Cyc

• Discussion Questions

• Phone Project Overview and Assignment 2

• Action Items for Next Time

Page 24: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 24IS 202 - FALL 2003

Commonsense

• Commonsense is background knowledge that enables us to understand, act, and communicate

• Things that most children know

• Minsky on commonsense:– “Much of our commonsense knowledge

information has never been recorded at all because it has always seemed so obvious we never thought of describing it.”

Page 25: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 25IS 202 - FALL 2003

Commonsense Example

• “I want to get inexpensive dog food.”

• The food is not made out of dogs.• The food is not for me to eat.• Dogs cannot buy their own food.• I am not asking to be given dog food.• I am not saying that I want to understand

why some dog food is inexpensive.• The dog food is not more than $5 per can.

Page 26: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 26IS 202 - FALL 2003

Engineering Commonsense

• Use multiple ways to represent knowledge

• Acquire huge amounts of that knowledge

• Find commonsense ways to reason with it (“knowledge about how to think”)

Page 27: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 27IS 202 - FALL 2003

Multiple Representations

• Minksy– “I think this is what brains do instead: Find several

ways to represent each problem and to represent the required knowledge. Then when one method fails to solve a problem, you can quickly switch to another description.”

• Furnas– “But regardless of the number of commands or

objects in a system and whatever the choice of their ‘official’ names, the designer must make many, many alternative verbal access routes to each.”

Page 28: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 28IS 202 - FALL 2003

Today

• Review of Categorization

• Knowledge Representation

– The Vocabulary Problem

– Commonsense

– Cyc

• Discussion Questions

• Phone Project Overview and Assignment 2

• Action Items for Next Time

Page 29: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 29IS 202 - FALL 2003

CYC

• Decades long effort to build a commonsense knowledge-base

• Storied past

• 100,000 basic concepts

• 1,000,000 assertions about the world

• The validity of Cyc’s assertions are context-dependent (default reasoning)

Page 30: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 30IS 202 - FALL 2003

Cyc Examples

• Cyc can find the match between a user's query for "pictures of strong, adventurous people" and an image whose caption reads simply "a man climbing a cliff"

• Cyc can notice if an annual salary and an hourly salary are inadvertently being added together in a spreadsheet

• Cyc can combine information from multiple databases to guess which physicians in practice together had been classmates in medical school

• When someone searches for "Bolivia" on the Web, Cyc knows not to offer a follow-up question like "Where can I get free Bolivia online?"

Page 31: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 31IS 202 - FALL 2003

Cyc Applications

• Applications currently available or in development – Integration of Heterogeneous Databases – Knowledge-Enhanced Retrieval of Captioned Information – Guided Integration of Structured Terminology (GIST) – Distributed AI – WWW Information Retrieval

• Potential applications – Online brokering of goods and services – "Smart" interfaces – Intelligent character simulation for games – Enhanced virtual reality – Improved machine translation – Improved speech recognition – Sophisticated user modeling – Semantic data mining

Page 32: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 32IS 202 - FALL 2003

Cyc’s Top-Level Ontology

• Fundamentals • Top Level • Time and Dates • Types of Predicates • Spatial Relations • Quantities • Mathematics • Contexts • Groups • "Doing" • Transformations • Changes Of State • Transfer Of

Possession • Movement • Parts of Objects

• Composition of Substances

• Agents • Organizations • Actors • Roles • Professions

• Emotion • Propositional

Attitudes • Social • Biology • Chemistry • Physiology • General Medicine

http://www.cyc.com/cyc-2-1/toc.html

• Materials• Waves • Devices • Construction

• Financial • Food • Clothing • Weather • Geography • Transportation • Information • Perception • Agreements • Linguistic Terms • Documentation

Page 33: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 33IS 202 - FALL 2003

OpenCYC

• Cyc’s knowledge-base is now coming online– http://www.opencyc.org/

• How could Cyc’s knowledge-base affect the design of information organization and retrieval systems?

Page 34: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 34IS 202 - FALL 2003

Today

• Review of Categorization

• Knowledge Representation

– The Vocabulary Problem

– Commonsense

– Cyc

• Discussion Questions

• Phone Project Overview and Assignment 2

• Action Items for Next Time

Page 35: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 35IS 202 - FALL 2003

Discussion Questions (Furnas)

• Alison Billings & Vijay Viswanathan on Furnas– Are unlimited alias indexes an effective

design solution to the problem of precision in "term based" searches? Is it possible to implement such a system that could maintain an accurate relation (category) to the designer’s “armchair” term with the existence of polysemy? Would the adaptive nature of this solution propagate an all inclusive alias category which could include all accessible information in a particular index?

Page 36: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 36IS 202 - FALL 2003

Discussion Questions (Furnas)

• Alison Billings & Vijay Viswanathan on Furnas– Since the publishing of this article in 1987 the

technological advances in information retrieval in the past 16 years have been profound. Is the Vocabulary-Problem still a major issue in Human-System Communication? Furnas, et al., provide some solutions to the Vocabulary Problem such as “unlimited aliasing”, “keyword harvesting”, and “adaptive indices.” But now there are WYSIWYG interfaces such as Windows that may reduce the need for command line word choices, search engines that harvest the content from web pages, or services like Google that put out “Did you mean xxxxx?” when search results are sparse. Has the Vocabulary Problem been solved?

Page 37: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 37IS 202 - FALL 2003

Discussion Questions (Minsky)

• Joseph Hall on Minsky– Minsky talks a lot about commonsense. How would

you define what is within the commonsense? Do you think that commonsense would be easy or difficult to teach to a computer? Why? Is commonsense a cross-cultural, basic-level category in the sense of what Lakoff described? Or is it more culturally specific (like "Don't step in front of moving traffic.") and thus harder to define? How would culturally-dependent definitions of "commonsense" complicate Minsky's theory?

– Are machines that learn such a good thing? For example, I would like my computer to learn certain things (like how to fix common errors) but not others (like how to play the stock market with my bank account). Are ethics (cyber and otherwise) to be programmed into learning computers?

Page 38: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 38IS 202 - FALL 2003

Discussion Questions (Minsky)

• Joseph Hall on Minsky– What Minsky describes is all fine and dandy... but

there seems to be a rather large gap between the machines of today and the machines he is postulating. To learn, machines would not only have to be able to note (and take action) when they are deviating from "operational parameter space" (malfunctioning, blue screen of death, etc.) but be able to decide on and implement a solution to the problem at hand from a different direction and/or using a different technique, quickly.

Page 39: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 39IS 202 - FALL 2003

Discussion Questions (Minsky)

• Joseph Hall on Minsky– Do you think that building such a

commonsense-aware machine is possible today? (That is, is Minsky's model of a commonsense-based machine a reasonable *goal* or just an ideal?) If not, what are some of the impediments to the realization of one of Minsky's machines?

– Do user expectations (reasonable or not) of what a computer should be doing factor into this at all?

Page 40: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 40IS 202 - FALL 2003

Discussion Questions (Lenat)

• Rebecca Shapley on Lenat– What does this article imply for best-practices in

information organization & retrieval? How would you articulate the potential for a commonsense knowledgebase to revolutionize information retrieval? Does the premise of a commonsense-base feeding efforts at machine learning or natural language understanding make sense to you? Which potential applications Lenat mentions are compelling to you?

– This article is from 1995 - do we hear anything more about this CYC? Did it revolutionize things? Why does Minsky call for a huge commonsense knowledgebase in 2000 when CYC was nearly complete in 1995?

Page 41: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 41IS 202 - FALL 2003

Discussion Questions (Lenat)

• Rebecca Shapley on Lenat– How would you apply the conduit metaphor &

toolmaker's paradigms to describe, or perhaps critique, the CYC project?

– If CYC is 'automating the whitespace in documents' - capturing the context for information, how would you describe the context it is capturing? How would you describe where the captured context is no longer applicable? How do you feel about the notion that 10+ people in Palo Alto CA were able to describe your context? Do you trust them with that task? Do you consider it necessary that some shared automated context be created? What challenges do you see for their ostensible goal, or limitations do you see to their approach?

Page 42: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 42IS 202 - FALL 2003

Discussion Questions (Lenat)

• Rebecca Shapley on Lenat– Anything in particular you can imagine yourself

unwilling to have represented a particular way in the commonsensebase? Let's say you believe in reincarnation but the assertions in the commonsensebase don't leave any room for this idea, and how to interpret what you might say to a bereaved friend. How do you feel about the ability to 'automatically' interpret your expression being left out? Does it make you feel invisible, relieved, angry? What would be necessary to have it be culturally sensitive, and would that be encodable?

Page 43: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 43IS 202 - FALL 2003

Discussion Questions (Lenat)

• Rebecca Shapley on Lenat– What can you piece together about how CYC

is implemented, how it makes decisions? What questions do you still have about how it works?

– Do you think the tone of the article was influenced by the fact that Lenat was writing as President of Cycorp?

– So, can this common-sense-base 'think'? Is it intelligent? Why and why not?

Page 44: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 44IS 202 - FALL 2003

Today

• Review of Categorization

• Knowledge Representation

– The Vocabulary Problem

– Commonsense

– Cyc

• Discussion Questions

• Phone Project Overview and Assignment 2

• Action Items for Next Time

Page 45: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 45IS 202 - FALL 2003

Assignment 0 Check-In

• Deliverables– Personal web page– Assignments page– Email address– Focus statement– Online Questionnaire

Page 46: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 46IS 202 - FALL 2003

Phone Project Overview

• In this project we will be creating, sharing, and reusing mobile media and metadata

• You and your Project Group will design application use scenarios and develop and refine metadata frameworks for your photos

• Some of you may even choose to develop retrieval applications for the photo database in the second half of the course

• We will be using the Nokia 3650 mobile media phone and software developed by Garage Cinema Research

Page 47: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 47IS 202 - FALL 2003

Phone Project Overview

• In the SIMS 202 Phone Project you and your Project Group will– Experience the actual process of information

organization and retrieval (especially as regards metadata creation and use)

– Work in small, focused teams performing a variety of tasks in image acquisition, description, and application design

– Develop an ongoing resource for SIMS (an annotated photo database) that can be used for internal research and teaching, as well as for external promotional and informational purposes

Page 48: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 48IS 202 - FALL 2003

Phone Project Requirements

• Create engaging and useful application scenarios and photos for use by your team and the entire class– The photos you take and the applications you will

design to use them should be interesting and useful to you and your colleagues

• Create a shared, reusable resource of annotated photos– Design your metadata such that all photos are

accessible not only for the needs of your particular application, but also for the reusability of your photos and metadata by other applications

Page 49: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 49IS 202 - FALL 2003

Phone Project Assignments

• Photo Use Scenario – Application Idea (Assignment 2)– You will brainstorm and storyboard an application for a mobile

media device that accesses a server and facilitates the creation, sharing, and reuse of media and metadata. You will develop user personas and scenarios of how the application works and how the user experiences it.

• Photo Capture and Annotation (Assignment 3) – With the goals of your application and the overall goals of the

class project in mind, each group member is required to take at least 5 pictures relevant to the scenario you specified in the prior assignment. You will also get hands-on experience in annotating photos using the Mobile Media Metadata (MMM) framework, an application available on the mobile phones. You will also identify strengths and weaknesses of MMM framework.

Page 50: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 50IS 202 - FALL 2003

Phone Project Assignments

• Photo Metadata Design (Assignment 4)– Having your application and the overall

project goals in mind, you will design a suitable metadata framework to annotate the photos in the collection. You will also annotate more photos using your metadata framework.

Page 51: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 51IS 202 - FALL 2003

Phone Project Assignments

• Project Presentations (Assignment 6)– In a special class session, your group will present

your application ideas, metadata frameworks, and annotated photos to your fellow students using the Flamenco browser. Each group will have about 10 minutes to present their innovative work.

• Metadata Consolidation (Assignment 8) – You will consolidate your classification scheme with

those belonging to other groups. The entire class will collaborate to create one overall metadata framework which will be used to for Phase II of the project.

Page 52: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 52IS 202 - FALL 2003

Phone Project Assignments

• Phone Project Phase II – Application Selection (Assignment 10)– The entire class will decide on an application to implement from

among the application ideas presented by the various project groups as well as from among any ideas you or your Project group have come up with.

• Phone Project Phase II – Specification & Design (Assignment 13)– A group of class volunteers will draft specifications and designs

for the application selected in the previous assignment.

• Phone Project Phase II – Implementation & Testing (Assignment 14)– A group of class volunteers will implement and test the

application selected in the previous assignment.

Page 53: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 53IS 202 - FALL 2003

Assignment 2: Process

• Brainstorm application ideas• Evaluate your ideas and agree on one to pursue• Come up with a persona and scenario for your

application idea• Write a description of your application idea

involving one persona and one scenario• Draw a storyboard with explanatory text• Document the results of your brainstorming• Create your group website

Page 54: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 54IS 202 - FALL 2003

Assignment 2: Deliverables

• Brief description of the application idea you selected

• Persona description

• Scenario description

• Annotated storyboard

• Work distribution table

• List all brainstorming ideas and reasons for selecting or rejecting each

Page 55: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 55IS 202 - FALL 2003

Assignment 2: Turning It In

• Submit an email to [email protected] with the following information (due September 16, before class):– Group name– URL of your group website– URL to description (application, persona,

scenario), storyboard, brainstorming results, work distribution table

– Time it took you to complete the assignment– Any comments on assignment (optional)

Page 56: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 56IS 202 - FALL 2003

Today

• Review of Categorization

• Knowledge Representation

– The Vocabulary Problem

– Commonsense

– Cyc

• Discussion Questions

• Phone Project Overview and Assignment 2

• Action Items for Next Time

Page 57: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 57IS 202 - FALL 2003

Homework (!)

• Read – Word Association Norms, Mutual Information,

and Lexicography (Church, Kenneth and Hanks, Patrick)

– Wordnet: An Electronic Lexical Database -- Introduction & Ch. 1 (C. Fellbaum, G.A. Miller) (handout)

• Assignment 2: Photo Use Scenario– Due by Tuesday, September 16

Page 58: Lecture 04: Knowledge Representation

2003.09.04 - SLIDE 58IS 202 - FALL 2003

Next Time

• Lexical Relations and WordNet (RRL)