automated scoring of open-ended ethics questions

22
Automated Scoring of Open-ended Ethics Questions Kelly Laas Alan D. Mead Illinois Institute of Technology

Upload: ira-levy

Post on 04-Jan-2016

47 views

Category:

Documents


1 download

DESCRIPTION

Automated Scoring of Open-ended Ethics Questions. Kelly Laas Alan D. Mead Illinois Institute of Technology. Agenda. Importance of ethics and ethics assessment Use of scenario-based assessment and open-ended responses Research Goal Method Results Future Directions. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Automated Scoring of Open-ended Ethics Questions

Automated Scoring of Open-ended Ethics Questions

Kelly LaasAlan D. Mead

Illinois Institute of Technology

Page 2: Automated Scoring of Open-ended Ethics Questions

Agenda

• Importance of ethics and ethics assessment• Use of scenario-based assessment and open-

ended responses• Research Goal• Method• Results• Future Directions

Page 3: Automated Scoring of Open-ended Ethics Questions

Interprofessional Projects Program

• Brings together students from all IIT disciplines (engineering, business, architecture, psychology, the sciences, social sciences and the humanities) to solve a real-world problem.

• All students take a mandatory 2 semesters of IPRO.• Examples of projects are finding new uses for

Chicago-area abandoned buildings, testing and developing products for local businesses, and designing a portable operating room for use in disaster areas.

Page 4: Automated Scoring of Open-ended Ethics Questions

Importance of ethics

• Ethical problems arise at all stages in life, and are part of all professions, disciplines and jobs.

• Teaches students critical thinking, how to deal with “grey areas” when the best course of action is unclear.

• ABET – and other accreditors-require that all students develop an understanding of professional and ethical responsibility.

Page 5: Automated Scoring of Open-ended Ethics Questions

Goals for IPRO Ethics Component Raise Ethical Awareness – Students gain a stronger awareness

of and sensitivity to ethical issues as they arise in the course of research.

Improve Ethical Knowledge - Students gain a basic understanding of relevant ethical topics in their IPRO project, and how these issues should be addressed as their project work progresses.

Improve Knowledge of Resources - Students get a basic idea of resources, people and policies they can turn to when ethical questions arise.

Understand Importance of Ethics - Students begin to appreciate the importance of ethics in research and practice.

Page 6: Automated Scoring of Open-ended Ethics Questions

Ethics Module Samples

• Ethics Roundtable – students give a short presentation and then ask questions from a panel of “experts” drawn from IIT and Chicago’s professional community.

• Professional code of ethics discussion • Case study discussion• Faculty or student developed ideas, which

often lead to new modules being developed

Page 7: Automated Scoring of Open-ended Ethics Questions

Importance of ethics assessment• Are teaching methods working? Are students growing in the

following areas from pre to post test?• Try to measure:

– Ethical Sensitivity -Do students recognize ethical issues when they arise?– Ethical Knowledge -Do students know about applicable codes,

guidelines, laws, developed skills, e.g. discussing issues in clear way?– Ethical judgment - Can they take a plausible course of action using

relevant knowledge?– Increased Ethical Commitment -Will they act more ethically in the

future? Hardest to Measure

Davis, Michael. “Instructional Assessment in the Classroom: Objectives, Methods and Outcomes.” in Practical Guidance on Science and Engineering Ethics Education for Instructors and Administrators. Washington D.C.: National Academies Press, 2013. p. 30.

Page 8: Automated Scoring of Open-ended Ethics Questions

Scenario-based Assessment

“At the beginning of the semester, IPRO 111 went on a tour of Gelco Inc., a company who is one of the largest widget producers in North America. The goal of IPRO 111 was to assist Gelco in developing a more efficient way of shipping widgets …”• What are the ethical issues facing Marco in this

case?• What factors does Marco need to consider in

reaching a decision about what to do?

Page 9: Automated Scoring of Open-ended Ethics Questions

Why open-ended responses?

• Unlike existing multiple-choice assessment methods, no limit to number of answers to scenario.“Ethical problems are “ill-structured: because there is often no clearly specific goal, only incomplete information, several possible solutions, and several parts to reach each. Since a single, simple response is not an option (or at least not a good one), students must investigate the problem, seek relevant information, consider alternative possible solutions, and evaluation short and long term consequences.”

Keefer, M.W. and Davis, M. (2012).“Curricular Design and Assessment in Professional Ethics.” Teaching Ethics, 13(1): 89.

Page 10: Automated Scoring of Open-ended Ethics Questions

Research Goal

• Open-ended responses are great, but scoring them by hand is time-consuming

• This research seeks to provide an automated scoring mechanism– Ideally, to replace human grading entirely– But it would also be helpful to provide a second,

independent score to complement human grading or as a first analysis in a computerized assessment process

Page 11: Automated Scoring of Open-ended Ethics Questions

Automated Essay Scoring Methods

• PEG = Regression using surface characteristics as predictors

• Text categorization (Naive Bayes and k-nearest neighbor)

• Semantic similarity• Natural Language Processing• Or a combination of these methods

Page 12: Automated Scoring of Open-ended Ethics Questions

Classification

• “Essay” scoring systems are oriented to extracting “quality of writing” metrics

• We’re interested in similarity to exemplars– Content is key, not its expression– Therefore, we want a supervised method

• That leaves naïve Bayesian classifier and LSA-based similarity– LSA is relatively hard and requires a lot of data

Page 13: Automated Scoring of Open-ended Ethics Questions

Naïve Bayesian Classification• Goal: Compute P(domain|item) for each domain and classify

as domain of maximum probability• Predicted domain = argmax P(domain|item)

= argmax P(item|domain)P(domain)/P(item)= argmax P(item|domain)P(domain)= argmax P(w1,w2,…,wn|domain)

≈ argmax P(w1|d.)P(w2|d.) … P(wn|d.)

• “Naïve” refers to the assumption of independence of the predictors; – P(w1,w2|domain)= P(w1|d.)P(w2|d.)

– Calculating P(w1|domain) is easy, count how many items have this word and how many items in total for this domain

Page 14: Automated Scoring of Open-ended Ethics Questions

Method

• 45 students completed Marco Gelco scenario– 2 cases deleted due to missing data– 3 exemplar answers

• Responses to four our open-ended questions (we’ll focus on Q1)

• Leave-one-out-crossvalidation (LOOCV)– Hold out response 1, train NB classifier, classify 1– Hold out response 2, train NB classifier, classify 2– Etc…

Page 15: Automated Scoring of Open-ended Ethics Questions

Distribution of ScoresScore Q1 Q2 Q3 Q4

3 15 13 4 14

2 22 16 18 17

1 5 11 11 4

0 2 4 8 9Mean 2.14 1.86 1.44 1.82

SD 0.80 0.95 0.92 1.11

Page 16: Automated Scoring of Open-ended Ethics Questions

Using NB to score responses

1. Pre-process text– Normalize text, eliminate symbols, etc.– Stem words (using Porter algorithm)

2. Train NB classifier on each “score class”– Train classifier for all score=3 responses, then all

score=2 responses, etc.– Training involves counting conditional presense

3. Classify response in class with maximum probability given words used

Page 17: Automated Scoring of Open-ended Ethics Questions

Porter Stemming and Lemmatization

• Words change form as they are used– Plan, plans, planned, planful, etc.– This could cause classifier to be trained incorrectly

• A “lemma” is “the canonical form of an inflected word”

• But lemmatization is difficult– Porter stemming is an easier heuristic method– Series of rules for removing word suffixes– As a heuristic, some mistakes happen

Page 18: Automated Scoring of Open-ended Ethics Questions

Confusion Matrix for Q1

• What are the ethical issues facing Marco in this case?

• 40% strictly correct; 87% if we count the 21 responses actually 2 but graded as a 3

Predicted

Actual 1 2 3

1 2 2 2

2 0 2 21

3 0 2 14

Page 19: Automated Scoring of Open-ended Ethics Questions

What happened to score=0?

• We’re working on that• A score of zero represents either a non-

response or a completely unaware answer– It is hard (impossible) to gather a representative

sample of non-responses or completely wrong responses to train a classifier on score class 0

– Instead, we need to develop heuristics to tell us when classification cannot be performed

Page 20: Automated Scoring of Open-ended Ethics Questions

Unknown Words• 131 words were unknown: able; accident; actually; after; against;

allegiance; allowed; any; assist; assuming; because; been; benefit; both; build; carefully; chose; comes; commercial; committing; communications; competition; conscious; consider; contract; courses; create; derived; despite; details; didnt; directly; does; doesnt; doing; due; effectively; execution; experience; explained; facing; former; friend; gained; gathered; given; good; graduation; granted; how; implications; inc; inevitably; infringing; initial; inside; inspired; instead; intended; lack; leave; legal; likely; limit; main; make; many; most; must; needs; nonuse; observed; obtained; offered; old; opportunities; piece; plans; position; potentially; prevent; previous; privilege; probably; problem; procedure; production; profits; proposition; proprietary; protected; purposes; pursuit; question; read; recalling; regarding; relationships; research; resources; rights; save; say; secret; sell; shouldnt; similar; skills; solution; some; someone; somewhere; stay; such; supposed; team; tell; theft; their; there; think; ultimately; understanding; value; waiver; wants; was; which; while; who; would

Page 21: Automated Scoring of Open-ended Ethics Questions

Future Directions

• Better coverage of unknown words (and mis-spellings)

• Phrase parsing using n-gram analysis– not good ≠ good– Requires larger sample size

• Develop heuristics for score class=0• Analyze reliability of human scores