sigir2013 勉強会資料

Download Sigir2013 勉強会資料

If you can't read please download the document

Upload: -

Post on 11-Jun-2015

2.218 views

Category:

Technology


6 download

TRANSCRIPT

  • 1. SIGIR 2013 (Users and Interactive IR I) 13109

2. An Effective Implicit Relevance Feedback Technique Using Affective, Physiological and Behavioural Features Implicit Relevance Feedback? [1] feature [2] dwell time: task intension: task intension 2 [1] Accurately interpreting clickthrough data as implicit feedback (SIGIR 2005) Thorsten Joachims , et all. [2] A study on the effects of personalization and task information on implicit feedback performance(CIKM 2006) Ryen W. White, et all. 13109 3. affective,physiological Implicit Relevance Feedback task intension ()() (seeking information, re-finding, entertainment) 3 13109 4. -1 1.video retrieval system 4 INS Task (Information seeking intent) INF Task (re-finding search intent) ENA Task (entertainment-based search intent where searchers adjust their arousal level) ENM Task (the entertainment- based search intent where searchers adjust their mood 4 BA Figure 1: A snapshot of the video retrieval system for query avengers. nology into consumer and industrial end-applications. Neu- roSky MindKit-EMTM features two key technologies: (i) ThinkGear-EMTM headset and (ii) eSense-EMTM software (i.e. brainwave interpretation software). The headset is used to extract, lter, and amplify brainwave (EEG) signals and convert that information into digital mental state out- puts for eSense-EMTM software. The EEG signals read by the MindKit-EMTM are detected on the forehead via points Fp1 (electrode placement system by the International Fed- eration in Encephalography and Clinical Neurophysiology). The headset has three dry active sensors: one sensor located on the forehead and two sensors are located behind the ears as ground/reference sensors. It also has electronic circuitry that lters and amplies the brainwaves. The eSense-EMTM software further processes and analyses the obtained brain- wave signals into two useful neuro-sensory values: the users Attention4 and Meditation5 levels at any given moment. The output of eSense-EMTM software has been tested over a wide population and under dierent environmental condi- the output of the BodyMedia SenseWearR Pro3 Armband; and the Attention or Meditation data (referred to as NV) from the output of the eSense-EMTM software. For our be- havioural signal, we considered the dwell time (referred to as DT) logged by the system as our dwell time feature. Finally the task intention was considered as task feature (referred to as Task). Preprocessing: For each visited video, the value of each sensory feature (for both aective and physiological features) was calculated by averaging the data logged by its sensory device during the dwell time period. Since none of the instruments we used normalised the data, we scaled signal values before applying any classication method, to avoid having attributes in greater numeric ranges dominat- ing those in smaller numeric ranges. 3.4 Video Retrieval System For the completion of the search tasks we used a custom- made search environment (named VideoHunt) that was de-13109 5. -2 2.Affective Signals, Physiological Signals4 1) FX (affective features, eMotion19) 1)MU(motion unit features7(happiness, sadness, anger, fear, disgust, and surprise)) + AU(Ekmans Action-unit12) 2) HR(heart rate data, ) 3) AB(near-body ambient temperature, heat flux(?),BodyMedia SenseWear Pro3) 4) NV(Attention or Meditation data, eSense-EM 3.+video retrieval systemdwell time 5 13109 6. -1 1)INS Task (Information Seeking intent) 2)INF Task(Information re-finding search intent) (INF ) 6 tag 4 13109 7. -2 3)ENA Task(Entertainment-based search intent where searchers adjust their arousal level) () 4)ENM Task(the entertainment-based search intent where searchers adjust their mood) 7 / / 13109 8. affective,physiological5% Dwell Time15% () 8 Table 2: This table shows the prediction accuracy of a model trained on dierent sets of features (presented as rows), given dierent se ntentions (presented as columns). The best performing set of features for each condition and search intention is highlighted in bold. INS ENA ENM INF ALL - INF ALL Random [BL1](*) 54.88% 64.06% 64.53% 98.83% 61.19% 50.60% DT [BL2]() 62.40% 65.62% 66.16% 98.83% 71.31% 72.74% DT+Task [BL3]() % % % % 69.65% 76.63% FX 55.63%* (+1.3%) 66.4%** (+3.6%) 64.53% (+0%) 98.83% (+0%) 62.43%** (+2.0%) 64.54%** (+27.4%) AB 54.88% (+0%) 64.06% (+0%) 64.53% (+0%) 98.83% (+0%) 61.19% (+0%) 50.60% (+0%) HR 57.89%** (+5.4%) 64.06% (+0%) 64.53% (+0%) 98.83% (+0%) 62.93%** (+2.8%) 53.27%** (+5.2%) NV 55.63%* (+1.3%) 64.06% (+0%) 64.53% (+0%) 98.83% (+0%) 61.19% (+0%) 55.73%** (+10.1%) FX+AB+HR+NV 55.63%* (+1.3%) 69.53%** (+8.5%) 64.53% (+0%) 98.83% (+0%) 67.16%** (+9.7%) 65.98%** (+30.3%) DT+FX 67.66% (+8.4%) 68.75% (+4.7%) 71.63% (+8.2%) 98.83% (+0%) 72.88% (+2.2%) 77.04% (+5.9%) DT+AB 66.91% (+7.2%) 67.96% (+3.5%) 81.56% (+23.2%) 98.83% (+0%) 71.64% (+0.4%) 76.22% (+4.5%) DT+HR 63.15% (+1.2%) 73.43% (+11.9%) 82.26% (+24.3%) 98.83% (+0%) 72.13% (+1.1%) 76.22% (+4.5%) DT+NV 64.41% (+3.2%) 70.31% (+7.1%) 74.46% (+12.5%) 98.83% (+0%) 72.13% (+1.1%) 75.40% (+5.6%) DT+FX+AB+HR+NV 66.16% (+6.0%) 75% (+14.2%) 80.14% (+21.1%) 98.83% (+0%) 75.37% (+5.6%) 77.04% (+5.9%) DT+Task+FX+AB+HR+NV % % % % 76.36% (+9.6%) 78.89% (+2.9%) the prediction accuracy of a model trained on dwell time and task features signicantly (i.e. DT+Taskrow). The results also show that the prediction accuracy of such a model is even higher than a model trained on all features except task one (i.e. DT+FX+AB+HR+NV row). This show that re- searches on task prediction are complementary to this study rather than contradictory. An interesting nding is that the discriminative powe sensory signals changes once they are combined with d time, even though they show no such power individually. example, a sensory feature that was not discriminative its own for a task (e.g. HR feature for ENM task), w combined with dwell time, can result in the highest pre tion accuracy (i.e. DT+HR features for ENM task)13109 9. 9 How Do Users Respond to Voice Input Errors? Lexical and Phonetic Query Reformulation in Voice 13109 10. (qv), (qtr), 10 indicate its various statuses, which includes: starting or stopping listening a voice query; displaying the transcribed query; and failing to generate the transcribed query. These audio cues are very useful in our transcriptions of the experiment recordings. (a) (b) (c) Figure 1. Screenshots of the Google search app on iPad. 3.2 Search Tasks and Topics Our experiment setting is similar to the one adopted by the TREC session track [17], in which users can issue multiple queries to work on one search topic.13109 11. How Do Users Respond to Voice Input Errors? Lexical and Phonetic Query Reformulation in Voice TREC 50 task(30 form robust track, 20 from web track) google 20 11 13109 12. 12 EXPERIMENT PROCEDURE (90 MIN) User Background Questionnaire Training (One TREC Topic) (10 Topics) Interview 10 min Break (15 Topics) Work on a TREC topic for 2 min Post-task questionnaire 12 13109 13. 908 (55% of 1650) 810 98 13 908 queries have voice input errors (55% of 1,650) 810 by speech recognition error 98 by improper system interruption 45% 49% 6% % of all 1,650 voice queries No Error Speech Rec Error Improper System Interruption 1 QUERY TRANSCRIPTION qv (a voice querys actual content) manually transcribed from the recording two authors had an agreement of 100%, except on casing, plurals, and prepositions qtr (the systems transcription of a voice query) available from the log 16 QUERY TRANSCRIPTION qv (a voice querys actual content) manually transcribed from the recording two authors had an agreement of 100%, except on casing, plurals, and prepositions qtr (the systems transcription of a voice query) available from the log 16 INDIVIDUAL QUERIES: WORDS Missing words: words in qv but not in qtr Incorrect words: words in qtr but not in qv qv: a voice querys actual content qtr: the systems transcription missing words incorrect words 20 INDIVIDUAL QUERIES: PERFORMANCE Significant decline of search performance (nDCG@10) No Errors 742 Queries Speech Rec Errors 810 Queries mean SD mean SD nDCG@10 of qv 0.275 0.20 0.264 0.22 nDCG@10 of qtr 0.275 0.20 0.083 0.16 nDCG@10 - - -0.182 0.23 23 1 13109 14. 2 (ADD) / (SUB) 14 TEXTUAL PATTERNS Query Term Addition (ADD) Query Term Substitution (SUB) SUB word pairs are manually coded (93% agreement) Voice Query Transcribed Query ADD words q1 the sun the son q2 the sun solar system the sun solar system solar system Voice Query Transcribed Query SUB words q1 art theft test q2 art embezzlement are in Dublin theft embezzlement q3 stolen artwork stolen artwork embezzlement stolen art artwork TEXTUAL PATTERNS Query Term Addition (ADD) Query Term Substitution (SUB) SUB word pairs are manually coded (93% agreement) Voice Query Transcribed Query ADD words q1 the sun the son q2 the sun solar system the sun solar system solar system Voice Query Transcribed Query SUB words q1 art theft test q2 art embezzlement are in Dublin theft embezzlement q3 stolen artwork stolen artwork embezzlement stolen art artwork 33 (1)/(2) 13109 15. 2 (RMV) / (ORD) 15 TEXTUAL PATTERNS Query Term Removal (RMV) Query Term Reordering (ORD) Voice Query Transcribed Query q1 advantages of same sex schools andy just open it goes q2 same sex schools same sex schools Voice Query Transcribed Query q1 interruptions to ireland peace talk is directions to ireland peace talks q2 ireland peace talk interruptions ireland peace talks interruptions 34 TEXTUAL PATTERNS Query Term Removal (RMV) Query Term Reordering (ORD) Voice Query Transcribed Query q1 advantages of same sex schools andy just open it goes q2 same sex schools same sex schools Voice Query Transcribed Query q1 interruptions to ireland peace talk is directions to ireland peace talks q2 ireland peace talk interruptions ireland peace talks interruptions 34 13109 16. 2 16 When previous query has voice input error Increased use of SUB & ORD Less use of ADD & RMV Patterns Prev Q Error Prev Q No Error Overall ADD 90.50% 32.98% 53.82% SUB 15.04% 16.34% 14.87% RMV 66.75% 37.93% 48.37% ORD 33.51% 43.03% 39.58% (All Lexical) 99.74% 77.36% 85.47% 37 etic patterns are nearly always with previous voice input errors Prev Q Error Prev Q No Error Overall 0% 14.84% 9.46% 0% 0.60% 0.39% 0% 0.90% 0.57% 0.26% 9.30% 6.02% 0.26% 25.64% 16.44% 0% 20.54% 13.58% 38 Use of phonetic patterns are nearly always associated with previous voice input errors Patterns Prev Q Error Prev Q No Error O STR/SLW 0% 14.84% 9 SPL 0% 0.60% 0 DIF 0% 0.90% 0 WE 0.26% 9.30% 6 (All Phonetic) 0.26% 25.64% 1 Repeat 0% 20.54% 1 13109 17. 2 () WE REP 17 PHONETIC PATTERNS Partial Emphasis (PE) Overstate a specific part of a query PE Type Example Explanation Stressing (STR) rap and crime put stress on rap Slow down (SLW) rap and c-r-i-m-e slow down at crime Spelling (SPL) Puerto Rico spell out each letter in Puerto Different Pronunciation (DIF) Puerto Rico pronounce Puerto differently 35 13109 18. -2 18 () Use of phonetic patterns are nearly always associated with previous voice input errors Patterns Prev Q Error Prev Q No Error Overall STR/SLW 0% 14.84% 9.46% SPL 0% 0.60% 0.39% DIF 0% 0.90% 0.57% WE 0.26% 9.30% 6.02% (All Phonetic) 0.26% 25.64% 16.44% Repeat 0% 20.54% 13.58% 38 onetic patterns are nearly always d with previous voice input errors Prev Q Error Prev Q No Error Overall 0% 14.84% 9.46% 0% 0.60% 0.39% 0% 0.90% 0.57% 0.26% 9.30% 6.02% ) 0.26% 25.64% 16.44% 0% 20.54% 13.58% 38 Use of phonetic patterns are nearly alwa associated with previous voice input err Patterns Prev Q Error Prev Q No Error STR/SLW 0% 14.84% SPL 0% 0.60% DIF 0% 0.90% WE 0.26% 9.30% (All Phonetic) 0.26% 25.64% Repeat 0% 20.54% 13109 19. -3 19 Overall slightly improvement (10% in nDCG@10) But highly depends on whether or not voice input error happened after query reformulation Did not reduce the likelihood of voice input errors The reformulated query has / is nDCG@10 (before after) # of cases No Error 0.150 0.233 474 (40%) Speech Rec Error 0.104 0.079 597 (51%) Interruption 0.156 0.056 79 (6.7%) Query Suggestion 0.201 0.223 32 (2.7%) Overall 0.129 0.143 1,182 40 13109 20. Mining Touch Interaction Data on Mobile Devices to Predict Web smartphone26 desktop pc:30 20 13109 21. inactivity featurerelevance inactive = gesture scanning - 21 13109