towards a universal quality scale for narrowband, wideband ... · problem: quality judgment depends...
TRANSCRIPT
![Page 1: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus](https://reader033.vdocuments.mx/reader033/viewer/2022050122/5f5289ed3f9172363c209c3c/html5/thumbnails/1.jpg)
Life is for sharing.
Towards a Universal Quality Scale for Narrowband, Wideband and Fullband Speech Services Sebastian Möller1, Jens Berger2
1 Quality and Usability Lab, Telekom Innovation Laboratories, TU Berlin, Germany 2 SwissQual AG – A Rohde & Schwarz Company, Solothurn, Switzerland
![Page 2: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus](https://reader033.vdocuments.mx/reader033/viewer/2022050122/5f5289ed3f9172363c209c3c/html5/thumbnails/2.jpg)
Agenda
Motivation Influence Factors
Modelling a telephony situation Bandwidth
Establishment of the Universal Scale Integration of different types of degradations Scale requirements Proposed procedure
Conclusions and Next Steps
2
![Page 3: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus](https://reader033.vdocuments.mx/reader033/viewer/2022050122/5f5289ed3f9172363c209c3c/html5/thumbnails/3.jpg)
Motivation Today’s situation.
Problem statement: Many different subjective experiments using a single scale from 1 to 5 The interpretation of the score highly depends on the experimental context
Listening-only vs. conversation
Bandwidth limitation (e.g. only one bandwidth in the test, or different ones)
Length of the stimuli (short sentences vs. long passages or emulated calls)
In practice, two main discussions are relevant 1) How is the relation between a score for a typical sentence and the quality of a longer call?
Measurement episode
Conversational mode
2) How is the relation between a narrowband score and a super-wideband score?
Bandwidth
Idea: Bandwidth- and situation-independent “universal” scale
![Page 4: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus](https://reader033.vdocuments.mx/reader033/viewer/2022050122/5f5289ed3f9172363c209c3c/html5/thumbnails/4.jpg)
Agenda
Motivation Influence Factors
Modelling a telephony situation Bandwidth
Establishment of the Universal Scale Integration of different types of degradations Scale requirements Proposed procedure
Conclusions and Next Steps
4
![Page 5: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus](https://reader033.vdocuments.mx/reader033/viewer/2022050122/5f5289ed3f9172363c209c3c/html5/thumbnails/5.jpg)
Influence Factors Modeling a telephony situation.
Real human conversation
Free conversation
Controlled conversation
Emulated conversation
3rd party listening test
Listening-only test
Free conversation between two persons
Scripted dialog between two persons
Listening pre-recorded samples Emulation of own speech activity by
keyword spotting
Listening to a pre-recorded conversation No own activity
Listening to pre-recorded short speech samples No own activity
ITU P.805
ITU P.800
ITU P.1302
ITU P.800
![Page 6: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus](https://reader033.vdocuments.mx/reader033/viewer/2022050122/5f5289ed3f9172363c209c3c/html5/thumbnails/6.jpg)
6
Influence Factors Bandwidth.
“Noi
sine
ss”
(Wältermann et al., JAES 2010)
![Page 7: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus](https://reader033.vdocuments.mx/reader033/viewer/2022050122/5f5289ed3f9172363c209c3c/html5/thumbnails/7.jpg)
Result is a Mean Opinion Score representing overall listening quality (e.g. ITU-T P.800)
This integral score reflects all perceived degradations by the users, including individual preferences and cross-masking effects
Result: One score for each presented speech sample despite length and bandwidth, only addressing the listening mode
excellent
good
fair
poor
bad
excellent
good
fair
poor
bad
(5)
(4)
(3)
(2)
(1)
(5)
(1)
Influence Factors Example: Test according to ITU P.800.
![Page 8: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus](https://reader033.vdocuments.mx/reader033/viewer/2022050122/5f5289ed3f9172363c209c3c/html5/thumbnails/8.jpg)
Agenda
Motivation Influence Factors
Modelling a telephony situation Bandwidth
Establishment of the Universal Scale Integration of different types of degradations Scale requirements Proposed procedure
Conclusions and Next Steps
8
![Page 9: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus](https://reader033.vdocuments.mx/reader033/viewer/2022050122/5f5289ed3f9172363c209c3c/html5/thumbnails/9.jpg)
9
E-model approach:
Establishment of the Universal Scale Integration of different types of degradations.
Backgr. noise, acoustic coupling
Linear distortion, delay
Codec Packet loss
Jitter buffer, VAD
Talker echo, listener echo
Circuit noise
Backgr. noise, acoustic coupling
IP WAN
4
4
![Page 10: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus](https://reader033.vdocuments.mx/reader033/viewer/2022050122/5f5289ed3f9172363c209c3c/html5/thumbnails/10.jpg)
10
E-model approach:
Establishment of the Universal Scale Integration of different types of degradations.
IP WAN
4
4
Overall quality R = Ro - Is - Id - Ie,eff
Estimated user judgment MOS = f (R )
Impairments SNR simultaneous delayed nonlin./timevar.
Ps, Ds, STMR
SLR, RLR, Ta
Ie, qdu Ppl Bpl TELR, T, WEPL, Tr
Nc, Nfor Pr, Dr, LSTR
![Page 11: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus](https://reader033.vdocuments.mx/reader033/viewer/2022050122/5f5289ed3f9172363c209c3c/html5/thumbnails/11.jpg)
11
E-model approach:
Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus effect”
Definition of an “absolute quality scale“ (R-scale) which should be independent of the judgment context
Relationship between judgment scale and quality scale is then context-dependent
Establishment of the Universal Scale Integration of different types of degradations.
0 50 100 1501
1.5
2
2.5
3
3.5
4
4.5
R
MO
S
![Page 12: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus](https://reader033.vdocuments.mx/reader033/viewer/2022050122/5f5289ed3f9172363c209c3c/html5/thumbnails/12.jpg)
12
Telephony situation:
Scale should reflect conversational quality, measured e.g. according to ITU-T Rec. P.800 and P.805
Listening-only tests may be used in case that no “conversational impairments“ are present, however scale endings might be used more frequently than in conversation tests
Conversations may be approximated by presenting selected stretches of speech (4…8 s) in “emulated conversation tests“ according to ITU-T Rec. P.1302
Listening-only tests according to ITU-T Rec. P.800 may be used for evaluating the single stretches of speech
Establishment of the Universal Scale Scale requirements.
![Page 13: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus](https://reader033.vdocuments.mx/reader033/viewer/2022050122/5f5289ed3f9172363c209c3c/html5/thumbnails/13.jpg)
13
Bandwidth:
Scale should rank correctly narrowband, wideband, super-wideband and fullband signals, and a “per call quality”
Transformation of individual (P.800) experiments, of different bandwidth contexts, to the universal scale must be possible
Establishment of the Universal Scale Scale requirements.
![Page 14: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus](https://reader033.vdocuments.mx/reader033/viewer/2022050122/5f5289ed3f9172363c209c3c/html5/thumbnails/14.jpg)
14
Bandwidth:
Conduct different tests according to ITU P-series Recommendations in any mode
Listening-only tests
Conversation tests
Emulated conversation tests
Transform results onto the universal scale using fixed anchor conditions
Use the transmission rating scale rather than the MOS scale as a first guess
Establishment of the Universal Scale Proposed procedure.
![Page 15: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus](https://reader033.vdocuments.mx/reader033/viewer/2022050122/5f5289ed3f9172363c209c3c/html5/thumbnails/15.jpg)
Transformation relative to anchor conditions
Independent of original experimental context, the score on the universal scale is the same
Establishment of the Universal Scale Proposed procedure.
![Page 16: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus](https://reader033.vdocuments.mx/reader033/viewer/2022050122/5f5289ed3f9172363c209c3c/html5/thumbnails/16.jpg)
Agenda
Motivation Influence Factors
Modelling a telephony situation Bandwidth
Establishment of the Universal Scale Integration of different types of degradations Scale requirements Proposed procedure
Conclusions and Next Steps
16
![Page 17: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus](https://reader033.vdocuments.mx/reader033/viewer/2022050122/5f5289ed3f9172363c209c3c/html5/thumbnails/17.jpg)
17
Conclusions:
Requirements for the new scale have been set up regarding
length of the measurement episode
conversational mode
bandwidth
Establishment of the scale requires tob-down and bottom-up considerations
Next steps:
Define anchor conditions
Conduct subjective tests
Transform results and adjust
Define transformation laws also for instrumental models
Conclusions and Next Steps Universal quality scale.
![Page 18: Towards a Universal Quality Scale for Narrowband, Wideband ... · Problem: Quality judgment depends on the test context, i.e. the conditions included in the test corpus → “corpus](https://reader033.vdocuments.mx/reader033/viewer/2022050122/5f5289ed3f9172363c209c3c/html5/thumbnails/18.jpg)
Thank you for your attention!
Visit www.qu.tu-berlin.de for more information.