case-study analysis of various field study measures · abreviaturas: ai = indice de articulación;...

17
Abstract The purpose of this study was to measure the reliability, sensitivity and validity of two types of field study measures: subjective rating techniques and self-assessment scales. Eight listeners with hearing loss served as subjects. In a laboratory study two Frequency-Gain-Characteristics (FGCs) were selected; one which subjects rated as having better speech clarity and one which subjects rated as having poorer speech clarity. The clarity ratings for the two FGCs were the criterion meas- ure to which the results of the field studies were compared. Subjects wore each of the two FGCs in the real world for six, one-week time periods. During each week subjects completed either cat- egory ratings or paired comparisons. Two self-assessment scales were also completed. The paired comparison technique was the field study measure with the best psychometric characteristics. Key Words: Hearing Aids, Evaluation Studies Abbreviations: AI = Articulation Index; ANACOVA = Analysis of Covariance; ANOVA = Analysis of Variance; APHAB = Abbreviated Profile of Hearing Aid Benefit; CCT = California Consonant Test; COSI = Client Oriented Scale of Improvement; Exp = experienced; FGC = frequency gain charac- teristic; HINT = Hearing In Noise Test; NALR = National Acoustic Laboratory Revised; REAR = Real Ear Aided Response; SNR = signal-to-noise ratio. Sumario El propósito de este estudio fue medir la confiabilidad, sensibilidad y validez de dos tipos de medi- das de estudios de campo: las técnicas de calificación subjetiva y las escalas de auto-evaluación. Ocho sujetos con hipoacusia fueron evaluados. En un estudio de laboratorio, se seleccionaron dos características de ganancia frecuencial (frequency gain characteristics: FGC): una que los sujetos calificaron como de mejor claridad en la recepción del lenguaje, y otra que calificaron como de peor claridad. La calificación de claridad para las dos FGC fue la medida de criterio contra la que se compararon los resultados de los estudios de campo. Los sujetos utilizaron cada una de las FGC en su mundo real durante seis períodos de una semana. Durante cada semana, los sujetos realizaron calificaciones de cada categoría y comparaciones de ambas. También se completaron dos escalas de auto-evaluación. La técnica de comparación de categorías fue la medida de estu- dio de campo con las mejores características psicométricas. Palabrras Clave: Estudios de evaluación, auxiliares auditivos. Abreviaturas: AI = Indice de articulación; ANCOVA = análisis de covariancia; APHAB = Perfil Abreviado de Beneficio del Auxiliar Auditivo; CCT = Prueba de Consonantes de California; COSI = Escala de Mejoría Orientada al Cliente; CR = tasa de compresión; CT = umbral de compresión; Exp = experiencias; FGC = características de ganancia frecuencial; HINT = Prueba de Audición en Ruido; NAL-R = Laboratorios Nacionales de Acústica-Revisado; REAR = Respuesta Asistida de Oído Real; SNR = tasa de señal/ruido Case-Study Analysis of Various Field Study Measures Jill E. Preminger* David R. Cunningham* * University of Louisville School of Medicine, Program in Audiology, Louisville, Kentucky. Reprint requests: Jill E. Preminger, University of Louisville, School of Medicine, Myers Hall, Louisville, KY 40292 39

Upload: duongmien

Post on 28-Sep-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Case-Study Analysis of Various Field Study Measures · Abreviaturas: AI = Indice de articulación; ANCOVA = análisis de covariancia; APHAB = Perfil Abreviado de Beneficio del Auxiliar

AbstractThe purpose of this study was to measure the reliability, sensitivity and validity of two types of fieldstudy measures: subjective rating techniques and self-assessment scales. Eight listeners withhearing loss served as subjects. In a laboratory study two Frequency-Gain-Characteristics (FGCs)were selected; one which subjects rated as having better speech clarity and one which subjectsrated as having poorer speech clarity. The clarity ratings for the two FGCs were the criterion meas-ure to which the results of the field studies were compared. Subjects wore each of the two FGCsin the real world for six, one-week time periods. During each week subjects completed either cat-egory ratings or paired comparisons. Two self-assessment scales were also completed. The pairedcomparison technique was the field study measure with the best psychometric characteristics.

Key Words: Hearing Aids, Evaluation Studies

Abbreviations: AI = Articulation Index; ANACOVA = Analysis of Covariance; ANOVA = Analysisof Variance; APHAB = Abbreviated Profile of Hearing Aid Benefit; CCT = California Consonant Test;COSI = Client Oriented Scale of Improvement; Exp = experienced; FGC = frequency gain charac-teristic; HINT = Hearing In Noise Test; NALR = National Acoustic Laboratory Revised; REAR = RealEar Aided Response; SNR = signal-to-noise ratio.

SumarioEl propósito de este estudio fue medir la confiabilidad, sensibilidad y validez de dos tipos de medi-das de estudios de campo: las técnicas de calificación subjetiva y las escalas de auto-evaluación.Ocho sujetos con hipoacusia fueron evaluados. En un estudio de laboratorio, se seleccionaron doscaracterísticas de ganancia frecuencial (frequency gain characteristics: FGC): una que los sujetoscalificaron como de mejor claridad en la recepción del lenguaje, y otra que calificaron como de peorclaridad. La calificación de claridad para las dos FGC fue la medida de criterio contra la que secompararon los resultados de los estudios de campo. Los sujetos utilizaron cada una de las FGCen su mundo real durante seis períodos de una semana. Durante cada semana, los sujetosrealizaron calificaciones de cada categoría y comparaciones de ambas. También se completarondos escalas de auto-evaluación. La técnica de comparación de categorías fue la medida de estu-dio de campo con las mejores características psicométricas.

Palabrras Clave: Estudios de evaluación, auxiliares auditivos.

Abreviaturas: AI = Indice de articulación; ANCOVA = análisis de covariancia; APHAB = PerfilAbreviado de Beneficio del Auxiliar Auditivo; CCT = Prueba de Consonantes de California; COSI =Escala de Mejoría Orientada al Cliente; CR = tasa de compresión; CT = umbral de compresión; Exp= experiencias; FGC = características de ganancia frecuencial; HINT = Prueba de Audición enRuido; NAL-R = Laboratorios Nacionales de Acústica-Revisado; REAR = Respuesta Asistida deOído Real; SNR = tasa de señal/ruido

Case-Study Analysis of VariousField Study Measures

Jill E. Preminger*David R. Cunningham*

* University of Louisville School of Medicine, Program in Audiology, Louisville, Kentucky.Reprint requests: Jill E. Preminger, University of Louisville, School of Medicine, Myers Hall, Louisville, KY 40292

39

Page 2: Case-Study Analysis of Various Field Study Measures · Abreviaturas: AI = Indice de articulación; ANCOVA = análisis de covariancia; APHAB = Perfil Abreviado de Beneficio del Auxiliar

Advances in hearing aid technology haveoutpaced our knowledge of hearing aidfitting. It is not clear which hearing aid

processing scheme is most appropriate for lis-teners with different hearing losses. It is notclear whether an individual with a particularhearing loss will benefit from specific speechprocessing technologies compared to more tra-ditional technologies.

A variety of studies have been publishedwhich have attempted to determine whethernew hearing aid technologies provide more ben-efit than older hearing aid technologies for lis-teners with hearing loss. These studies havecompared advanced hearing aid speech process-ing schemes to traditional linear hearing aids(Humes, et al., 1997; Surr, et al., 1997), and havecompared different speech processing schemes(Valente, et al., 1997, Valente, et al., 1998; Stone,et al., 1999). These investigations used a varietyof objective and subjective techniques in the lab-oratory and subjective techniques in the field(the real world) in order to measure hearing aidbenefit. None of these research projects showedan overall preference for a particular hearingaid processing technology for all listeners withhearing loss using objective or subjective tests(Humes, et al., 1997; Surr, et al., 1997, Valente,et al, 1997, Valente, et al., 1998; Stone, et al.,1999). It is clear that potential benefit must beevaluated on an individual basis. In otherwords, some listeners with hearing loss will ben-efit from specific hearing aid processingschemes, while other listeners will not. It is like-ly that listeners with similar degrees and con-figurations of hearing loss will benefit from sim-ilar speech processing schemes.

Sensitive tests of hearing aid benefit arerequired in order to determine the superiority of spe-cific speech processing schemes on an individualbasis. In many experiments objective tests of hear-ing aid benefit have not been sensitive enough toshow differences between new hearing aid process-ing schemes (Humes, et al., 1997; Valente, et al.,1998;Stone,et al.,1999). Many subjective tests usedin the field have also not been sensitive enough todemonstrate differences between hearing aid pro-cessing schemes (Surr, et al., 1997; Valente, et al.,1997). Alternate field study techniques with goodsensitivity are necessary for future research.

Walden (1997) described three types offield study evaluations. These included thedirect observation of hearing aid users in everyday life, subjective ratings of speech in everydaylife, and standardized paper and pencil ques-tionnaires of hearing aid performance.

Direct Observations of the Hearing Aid User

The first type of field study evaluationinvolves the direct observation of the hearingaid user in everyday life. This can be accom-plished by training observers to tally appropri-ate and inappropriate communication behav-iors and strategies used by listeners who wearhearing aids in a variety of communication sit-uations. While this technique is clearly notpractical, it certainly is valid, as it is a directmeasure of performance in the real world. Thereliability and the sensitivity of this measureneeds further research.

Subjective Rating Techniques

In the second type of field study evaluationthe hearing aid user rates total speech under-standing or overall speech quality while wear-ing the hearing aid in daily life. This can beaccomplished using either a category rating ora paired comparison procedure.

The category rating technique is a subjec-tive measure in which the listener rates adimension of speech quality (for example, thepercentage of speech understood) on a pre-deter-mined numerical scale (for example, 0 to 100%).The paired comparison technique is a subjectivemeasure in which the listener has the ability toswitch back and forth repeatedly between twohearing aid processing schemes, and then selectthe better processing scheme for the dimensionbeing measured. (For example, under which con-dition does speech sound clearer?)

In controlled laboratory experiments subjec-tive judgments of speech quality (using pairedcomparisons or category ratings) have been shownto be more sensitive in differentiating betweenhearing aids or frequency response characteristicsthan traditional speech recognition measures(Byrne & Cotton, 1988; Gabrielsson, et al., 1988;Leijon, et al., 1991). Both paired comparisons andcategory ratings have been found to be reliable,(Gabrielsson et al., 1988; Gabrielsson, et al., 1990;Purdy & Pavlovic,1992),even in children as youngas six years of age (Eisenberg & Levitt, 1991;Eisenberg & Dirks, 1995). Subjective measures ofspeech recognition have been found to be validpredictors of traditional speech discriminationmeasures (Speaks, et al., 1972; Cox & McDaniel,1984;Gabrielsson et al,1988). In the laboratory set-ting, the paired comparison technique has beenshown to be more sensitive than the category rat-ing technique (Eisenberg, et al., 1997).

Journal of the American Academy of Audiology/Volume 14, Number 1, 2003

40

Page 3: Case-Study Analysis of Various Field Study Measures · Abreviaturas: AI = Indice de articulación; ANCOVA = análisis de covariancia; APHAB = Perfil Abreviado de Beneficio del Auxiliar

Subjective rating techniques have been usedin many field studies (Kuk & Pape, 1993;Valente,et al., 1997;Valente, et al., 1998; Preminger, et al.,2000). When this evaluation technique is con-ducted in the real world, the listening situationscan vary widely during rating periods and it isunclear how the variability influences the statis-tical properties of this technique (Walden, 1997;Preminger, et al, 2000). There are no reports ofthe psychometric characteristics of either of thesetechniques when used in the field.

Hearing Aid Rating Scales

The third type of field study evaluation is astandardized paper and pencil questionnaire ofhearing aid performance. A variety of question-naires are available. The advantage of thesetechniques is that their statistical propertiesare known, and significant differences in hear-ing aid benefit can be determined. The problemwith these techniques is that they rely on mem-ory, they are subject to bias, and they are notvery sensitive (Walden, 1997).

The Abbreviated Profile of Hearing AidBenefit (APHAB) has undergone extensive psy-chometric testing and is in common use (Cox &Rivera, 1992; Cox & Alexander, 1995). Thisscale explore the benefits of hearing aid usewhile communicating for three listening envi-ronments: 1) Ease of Communication (EC) com-munication under ideal listening conditions, 2)Reverberation (RV) understanding in moder-ately reverberant rooms, and 3) BackgroundNoise (BN) communicating in noisy situations.The reliability of the APHAB has been shown tobe good (Cox & Alexander, 1995). The APHABhas not been shown to be sensitive, however,when comparing different hearing aid technolo-gies (Surr, et al., 1997; Valente, et al., 1997;Newman & Sandridge, 1998). Newman andSandridge (1998) have suggested that this poorsensitivity may occur if a large percentage ofitems on the scale cover areas that may not beimportant to individual hearing aid users.

The Client Oriented Scale of Improvement(COSI) (Dillon, et al., 1997) is an alternate hear-ing aid self-assessment scale in which individualhearing aid users determine the content of thescale. The data available do suggest good test-retest reliability and good validity. (Dillon, et al.,1997). It is possible that this test will also havegood sensitivity but this has not been measured.Because the content of this scale is determinedby individual hearing aid users, it guaranteesthat the content will consist of items which are of

importance to each listener in his or her dailylife. It is possible that this increased validity willresult in increased sensitivity when the COSI isused to differentiate between different hearingaid technologies or processing schemes.

In the present investigation the utility of twotypes of field studies was explored: subjective rat-ing techniques and self-assessment scales. In apreliminary laboratory study two Frequency-Gain-Characteristics (FGCs) were selected; onewhich subjects rated as having better speech clar-ity and one which subjects rated as having poor-er speech clarity. The clarity ratings for the twoFGCs selected in the laboratory study were thecriterion measure, the ‘gold standard’ to whichthe results of the field studies were compared.While speech clarity is not necessarily the validi-ty criterion for all hearing aid users, it is impor-tant to most hearing aid users (Hagerman &Gabrielsson, 1985) and it is an aspect of speechquality that can be measured both in the labora-tory and in the clinic. Speech clarity was chosentherefore, as a tool to evaluate the field studies,not as a tool to evaluate the hearing aid.

Subjects wore each of the two FGCs in thereal world for six, one-week time periods.During each week subjects completed either cat-egory ratings or paired comparisons with speechclarity as the criterion variable. Two self-assess-ment scales were also completed. The purposeof this study was to measure the sensitivity,validity and reliability of the various field studymeasures. The tests were considered sensitive ifthey were able to differentiate between the twohearing aid settings under investigation. Thetests were considered valid if the subjects select-ed the same hearing aid setting in the field thatthey selected in the laboratory. Finally, the testswere considered reliable if the same resultswere measured over two different weeks.

METHOD

Subjects

Eight individuals with hearing loss partici-pated in this study. Four of these individuals hadnever worn hearing aids prior to this project. Allof these subjects were candidates for hearing aidswho had been seen at one of the clinical officesassociated with the University of Louisville’sDoctor of Audiology Program. These individualsagreed to participate in this research projectbefore pursuing amplification. The remainingfour participants were experienced hearing aidusers with at least one year of hearing aid use.

Field Studies/Preminger and Cunningham

41

Page 4: Case-Study Analysis of Various Field Study Measures · Abreviaturas: AI = Indice de articulación; ANCOVA = análisis de covariancia; APHAB = Perfil Abreviado de Beneficio del Auxiliar

Journal of the American Academy of Audiology/Volume 14, Number 1, 2003

42

Individual subject data are shown in Table 1.While this is a diverse subject group, they are typ-ical clinic patients. The purpose of this experi-ment was to determine the usefulness of fieldstudy measures in typical patients. It is impor-tant to determine if these tests are useful on anindividual basis regardless of subject age, degreeof loss and hearing aid experience.

Hearing Aids

Four subjects were fit with three-memory,single channel, wide-dynamic-range compression,behind-the-ear hearing aids. Initially, the FGCwas set according the NAL-R (Byrne & Dillon,1986) prescription using probe-microphone meas-urements. The compression threshold was heldconstant at 55 dB SPL. The compression ratiowas adjusted while subjects listened to continuousdiscourse in a sound-treated room. Using a 7 pointrating scale (7=Uncomfortably Loud, 6=Loud, ButOK, 5=Comfortable, But Slightly Loud,4=Comfortable, 3=Comfortable, But Slightly Soft,2=Soft, 1=Very Soft), the compression characteris-tics were adjusted so that speech at 45 dB Leq wasrated ‘2’ (soft), speech at 65 dB Leq was rated ‘4’(comfortable), and speech at 85 dB Leq was rated‘6’ (loud but OK). These compression characteris-tics were used for all future FGC settings. Thisresulted in a mean CR of 2.5:1.

This hearing aid had a push-button whichallowed the subjects to cycle through the threememories. The hearing aid gain was adjustedto the lowest possible level in the third memory;

subjects perceived this memory as the ‘reduced-volume’ setting. Subjects were instructed to usethis ‘reduced-volume’ setting as an anchor inorder to determine which of the three memoriesthey were in.

Four subjects were fit with a two-memory,two-channel, wide-dynamic-range compression,behind-the-ear hearing aid. The hearing aidwas fit with the same procedure as the single-channel aid; the CT and the CR were adjusted inthe same manner for each channel. The CT var-ied between 40 and 50 dB SPL and the mean CRwas 2.5:1 in each channel. This hearing aid hada switch which allowed subjects to switch up formemory one and switch down for memory two.

Regardless of the hearing aid used, thecompression characteristics were adjusted inorder to achieve automatic volume control set-tings with relatively high compression thresh-olds, relatively shallow compression ratios andrelatively long time constants. Rear-ear aidedresponses (REARS) were measured with 50 and70 dB SPL inputs; outputs were compared at500, 1000 and 2000 Hz. Gain differences werefound to be consistent (less than 3 dB) betweenthe two FGCs for these input levels.

All of the experienced hearing aid userswere fit binaurally. There was a concern that thenew hearing aid users might have difficultynegotiating between the different memories,especially for the hearing aid with the push but-ton memory control. If this was to occur it wouldbe possible that the new hearing aid users wouldinadvertently adjust their two different hearing

Subject Years Number Age (yrs) HA use Ear 250 Hz 500 Hz 1kHz 2kHz 4kHz

S1 70 0 R* 40 35 30 35 70 L 35 30 30 35 55

S2 76 0 R 35 45 50 50 45 L* 40 50 50 60 45

S3 68 0 R 30 35 50 85 90 L* 30 45 35 75 95

S4 81 0 R* 25 35 35 40 50 L 20 30 35 40 55

S5 26 11 R* 30 40 50 60 60 L* 40 50 55 65 60

S6 56 4 R* 5 0 5 35 60 L* 5 5 5 25 60

S7 25 7 R* 35 35 45 50 60 L* 35 45 50 60 60

S8 60 1 R* 25 25 30 35 40 L* 25 30 30 30 45

* = aided ear.

Table 1 Subject Age, Years of Hearing Aid Use, and Unaided Threshold in the Aided Ears

Page 5: Case-Study Analysis of Various Field Study Measures · Abreviaturas: AI = Indice de articulación; ANCOVA = análisis de covariancia; APHAB = Perfil Abreviado de Beneficio del Auxiliar

aids to two different memories. For this reason,all new hearing aid users were fit monaurally.

Laboratory Studies

Speech Materials.

Speech quality and speech clarity judg-ments in the laboratory were made while sub-jects listened to continuous discourse on com-pact disc. The speech was presented in a soundtreated room at 65 dB Leq at the location of thesubject’s head. Speech clarity is just one indi-vidual aspect of overall speech quality; otherdimensions of speech quality may vary whenspeech clarity is held constant (Preminger &Van Tasell, 1995).

Objective measures of speech intelligibilitywere measured using the California ConsonantTest (CCT) (Owens & Schubert, 1977) and theHearing In Noise Test (HINT) (Nilsson, et al.,1994). The CCT was presented on tape at a levelof 65 dB Leq at the location of the subject’s head.The full, 100-item list was administered for eachcondition. The HINT was presented with thespeech signal at 0o azimuth and the speech-spectrum noise at 180o azimuth. The noise washeld constant at 65 dB Leq at the location of thesubject’s head. The level of the speech wasadjusted in 2 dB steps so that it was decreasedafter every correct response and increased afterevery incorrect response. The signal to noiseratio (SNR) for 50% correct for sentence recogni-tion was determined by comparing the level ofthe speech to the level of the noise for the last 15sentences. Subjects completed one 20-sentencepractice list before data collection began.

FGC Selection.

In addition to the NAL-R setting whichwas initially fit in each hearing aid, four addi-tional FGCs were selected. Two with increasedlevels of low-frequency gain and decreased lev-els of high-frequency gain, and two withdecreased levels of low-frequency gain andincreased levels of high-frequency gain. Thegain in each channel was varied using a 4 dBstep size. The overall gain of each FGC wasadjusted to produce equal rms output for a 65dB SPL speech-shaped noise in a 2-cc coupler.Subjective judgments were made to ensureequal loudness for all five FGCs. Subjects lis-tened to continuous discourse (female speaker)in quiet at 65 dB Leq in the sound-treatedroom. The overall gain of the hearing aid was

adjusted until the subject rated each of the fiveFGCs as ‘4’, comfortable loudness.

Subjects made speech clarity and qualityratings while listening to sentence pairs spokenby a male and female speaker. Sentences var-ied in length from five to nine words. The sen-tences were digitized recordings (22,000 Hzsampling rate, 16 bit quantization). The aver-age level of each speaker was 65 dBLeq. Thespeech was presented in noise with a SNR ratioof +8. The noise used was speech babble on CD.The digitized speech signal was loaded onto aMicron Millenia 400 PC, converted to an analogsignal using a digital-to-analog converter(Yamaha YMF-740 sound card), and routed toone channel of an audiometer (GSI 16). Thebabble noise was output from a CD player (SonyCDP-CE405), and routed to the second channelof the audiometer. The speech and noise weremixed and were output from a single loud-speaker (GSI) into a sound-treated room.

Subjects rated the speech clarity for each ofthe five FGCs on a 10-point rating scale, with 1equal to extremely poor clarity and 10 equal toexcellent clarity. The FGCs were alternated inrandom order and ratings for each FGC wererepeated for at least 3 ratings per FGC. (Thenumber of replications was determined by theconsistency of the ratings.) The same procedurewas used to measure speech quality for the fiveFGCs. Speech quality was also measured on a10 point rating scale with 1 equal to extremelypoor speech quality and 10 equal to excellentspeech quality.

Based on the laboratory clarity and qualityratings two FGCs were selected, one with goodspeech clarity and good speech quality, the‘Good FGC’ and one with fair speech clarity andgood speech quality, the ‘Poor FGC’. The sub-jective measures were confirmed using the CCTand the HINT. If a significant difference inspeech intelligibility between the Good andPoor FGC was not measured for at least one ofthese tests, new FGCs were selected and tested.

Field Studies

Subjective Ratings.

Speech clarity was judged in the field witheither category ratings or paired comparisons.Clarity was used rather than quality or overallpreference, because clarity was the criterionmeasure in the laboratory tests. Subjects wereasked to select three times each day when theywere typically listening to speech. Subjects

Field Studies/Preminger and Cunningham

43

Page 6: Case-Study Analysis of Various Field Study Measures · Abreviaturas: AI = Indice de articulación; ANCOVA = análisis de covariancia; APHAB = Perfil Abreviado de Beneficio del Auxiliar

were asked to make their three daily ratings ator near these specified times. If they were notlistening to speech at the specified time theywere asked to wait one hour and try again. Ifthey forgot to make a rating they were asked tomake an extra rating at an alternate time.

Subjects were given small notebooks thatcould easily fit in a purse or a pocket. Eachpage in the notebook had a specified time anddate pre-written when ratings should be made.The subjects had to answer a series of questionson each page and then rate the hearing aid witheither paired comparisons or category ratings.The results from the questions about the listen-ing environment were used as a covariate in thestatistical analyses. An example page is shownin the Appendix.

When category ratings were the measure,each subject had the same FGC programmedinto all memories of the hearing aid. Subjectswere asked to rate the speech clarity on a 10-point scale with 1 being extremely unclear and10 being extremely clear. When paired compar-isons were the measure the Good FGC was pro-grammed into one memory of the hearing aidand the Poor FGC was programmed into anoth-er memory. The subject was instructed to con-sider the clarity of speech as they switched backand forth between the memories. They weretold to switch back and forth as often as neces-sary in order to make their ratings. Subjectswere instructed to select the memory whichmade speech sound clearer.

Self Assessment Scales.

Hearing aid performance was measuredwith two commonly used self-assessment scales.The APHAB (Cox & Alexander, 1995) is a 24-itemscale with 4 subscales: Ease of Communication(EC), Reverberation (RV), Background Noise(BN) and Aversiveness of sounds. The first threesubscales measure speech understanding in avariety of typical listening situation. Only theresults from the first three subscales were usedin the present study, because the questions inthese scales all relate to the criterion measure ofspeech clarity. The APHAB was only adminis-tered in the aided condition so that hearing aidperformance rather than hearing aid benefitwas being measured.

In the COSI (Dillon, et al., 1997) the patientand the audiologist identify five specific listeningsituations where the patient hopes to improvetheir listening abilities with a hearing aid. In thepresent study, only situations involving speech

understanding were selected. Again, this is sothat the results could relate to the criterion ofspeech clarity. For each identified listening situa-tions, subjects are required to give two responses.First, they need to compare how they heard withthe hearing aid compared to without a hearingaid. They are given five degrees of change from‘worse’ to ‘much better’. This is referred to as theranking results. Second, they have to estimatehow much they can understand while wearing thehearing aid. They are given five percent correctlevels from 10% to 95%. These are referred to asthe percent correct data from the COSI.

Subjects were asked to wear their hearingaid(s) for at least eight hours each day; howev-er, subjects did not report the daily number ofhours that the hearing aid(s) were worn. Theymade daily ratings for six, one-week periods,returning to the laboratory weekly for re-pro-gramming and re-instruction. During two ofthe weeks the Good FGC was programmed intoone memory and the Poor FGC was pro-grammed into an alternate memory and pairedcomparisons were made. During two of theweeks the Good FGC was programmed into allmemories of the hearing aid and category rat-ings were made. During two of the weeks thePoor FGC was programmed into all memoriesof the hearing aid and category ratings weremade. At the end of each week where categoryratings were made, both the APHAB and theCOSI were completed. An example of a fieldstudy schedule is shown in Table 2.

Subjects were not informed about the pur-pose of the study. They were told that we wereinterested in learning how well they could under-stand speech with different programs in thehearing aid. Subjects were also not informedabout how the hearing aid was programmedeach week. They were told that the hearing aidwas adjusted a little differently each week.

RESULTS

Laboratory Studies

The purpose of the laboratory studies wasto select two FGCs for the field study evalua-tion. Each FGC had to have good speech quali-ty ratings but different speech clarity ratings.The ‘Good’ FGC was rated with superior speechclarity compared to the ‘Poor’ FGC. Both theGood and the Poor FGCs had to have qualityratings that were considered to be fair (betterthan a ‘6’ on a 1 to 10 point rating scale). Theratings for the Good and Poor FGCs are shown

Journal of the American Academy of Audiology/Volume 14, Number 1, 2003

44

Page 7: Case-Study Analysis of Various Field Study Measures · Abreviaturas: AI = Indice de articulación; ANCOVA = análisis de covariancia; APHAB = Perfil Abreviado de Beneficio del Auxiliar

Field Studies/Preminger and Cunningham

45

in Table 3. All of the clarity ratings were sig-nificantly better for the Good FGC than for thePoor FGC using a .10 probability level. This lessconservative criterion was considered permissi-ble because all analyses were performed onindividual data. There were no significant dif-ferences between the quality ratings for the twotypes of FGCs. The 8-frequency AI was calcu-lated for each FGC (Pavlovic, 1991) for a 65 dBSPL input and these are also shown in Table 3.In all cases the AI was higher for the Good FGCas compared to the Poor FGC.

In all cases the Good FGC had increasedhigh frequency gain and decreased low frequen-cy gain as compared to the Poor FGC. TheREARs for subject 1 are shown in Figure 1; verysimilar Good and Poor FGC comparisons wereselected for all subjects. These findings are con-sistent with the well known finding that manylisteners with hearing loss have improvedspeech intelligibility scores with increased highfrequency amplification but may prefer the over-all sound quality when low frequency amplifica-tion is increased (Punch & Beck, 1980).

The results for the objective speech testsare shown in Table 4. A significant difference

for the CCT was determined according to thebinomial model as described by Thornton andRaffin (1978). A significant difference for theHINT was considered to be a difference inscores greater than 1.5 dB (Nilsson et al., 1994).Each subject had significantly better perform-ance for the Good FGC for at least one of theobjective speech tests.

Field Studies

Using the two FGCs selected during thelaboratory studies, the utility of the subjectiveratings and self-assessment scales were meas-ured in the real world. The results of the fieldstudies are presented with the consideration ofthree factors: validity, reliability and sensitivi-ty. The results were sensitive if the field studymeasure could differentiate between the Goodand the Poor FGC. The results were valid if theGood FGC was rated with better speech claritythan the Poor FGC. Finally, the results werereliable if the results measured during oneweek were replicated during a second weekwhen the hearing aid was programmed in thesame manner.

Hearing Aid

Week Memory 1 Memory 2 Measurements made Measurements RatingDuring the week at end of week Week

1 Poor FGC Poor FGC Category Ratings APHAB/COSI Poor 1 2 Good FGC Good FGC Category Ratings APHAB/COSI Good 1 3 Poor FGC Good FGC Paired Comparisons none P.C. 1 4 Good FGC Poor FGC Paired Comparisons none P.C. 2 5 Good FGC Poor FGC Category Ratings APHAB/COSI Good 2 6 Poor FGC Good FGC Category Ratings APHAB/COSI Poor 2

APHAB = Abbreviated Profile of Hearing Aid Benefit; COSI = Client Oriented Scale of Improvement; FGC = frequency gain characteristic; PC = paired comparison.

Table 2 Field Study Schedule for Subject S1

Table 3 Mean Clarity and Quality Ratings at the Probability Level from the Matched-Pairs t-TestComparing Ratings for Good and Poor Frequency Gain Characteristics

Clarity Quality AI Subject Good Poor p Good Poor p Good Poor

S1 8.5 7.5 0.04* 7.7 7 0.26 .78 .73 S2 8.5 6.7 0.05+ 7.8 6.8 0.13 .36 .33 S3 9 8.2 0.03* 7.8 7 0.11 .33 .29 S4 8.8 7.2 0.04* 7.8 7.3 0.21 .68 .64 S5 9.4 8.3 0.01* 8.3 8 0.21 .69 .61 S6 9 8.1 0.04* 9 7.9 0.11 .64 .61 S7 9.6 8.8 0.05+ 8.3 7.7 0.23 .41 .36 S8 9.6 8.5 0.07+ 9.5 9.8 0.11 .85 .81

Articulation Index for an input level of 65 dB SPL is also included.* = significant at the .05 level, + = significant at the .10 level.

Page 8: Case-Study Analysis of Various Field Study Measures · Abreviaturas: AI = Indice de articulación; ANCOVA = análisis de covariancia; APHAB = Perfil Abreviado de Beneficio del Auxiliar

Journal of the American Academy of Audiology/Volume 14, Number 1, 2003

46

Subjective Ratings: Category Ratings

Each subject made category ratings duringfour one-week periods. For example an individ-ual may have made ratings during the first weekfor the Poor FGC this would be ‘Poor 1’. The nextweek for the Good FGC, this would be ‘Good 1’.The fifth week they could have made ratings forthe Poor FGC again, this would be ‘Poor2’ andfinally during the sixth week for the Good FGC,this would be ‘Good2’ (see the last column inTable 2).The purpose of this study was to determine if thefield studies were useful on an individual basis;therefore, analyses were completed for individ-ual subjects. There were 21 ratings made foreach FGC condition and for each Week. The datafor each subject was submitted to individualAnalyses of Covariance (ANACOVAs) with maineffects of Rating (21 ratings) FGC (Good vs. Poor)

and Week (Weeks 1 vs. Weeks 2). Only the maineffects of FGC and of Week were of interest. Thecovariate was the difficulty of the listening situ-ation on a scale of 1 through 5 with 5 being veryeasy and 1 being extremely difficult. The covari-ate rating took into account the setting, the noiselevel and the subject’s impression of the listeningsituation (see ‘Listening Conditions’ in theAppendix).While it was not possible to complete-ly eliminate the variability associated with real-world listening, it was hoped that this variabilitycould be reduced by including the covariate inthe statistical analysis. The results of the indi-vidual ANACOVAs are shown in Table 5.

The individual category rating results,adjusted by the covariate ratings, are shown inFigure 2. These results are averaged across the21 ratings made during each one-week period.The top graph shows the mean ratings for eachFGC and the bottom graph shows the mean rat-ings for Week (Weeks1 vs. Weeks2). Significantmain effects from Table 5 are shown by asterisksin Figure 1. Five of the 8 subjects showed sig-nificant differences for the main effect FGC; thisindicates fair sensitivity. The results for subject3 were the wrong way, however; ratings for thePoor FGC were higher than for the Good FGC.

The bottom graph in Figure 2 shows themean ratings for each week of ratings. This is acheck of the reliability of the measure. If it is areliable measure, there should be no main effectfor the factor Week. Unfortunately, this was notthe case. Five subjects had significant maineffects for the factor Week. Two listeners gavesignificantly higher ratings for the first fieldstudy weeks and three listeners gave signifi-cantly higher ratings for the field study secondweeks.

Table 4 Results of the Laboratory Tests

CCT HINT

Subject Good Poor Diff Good Poor Diff

S1 87 81 6 -2.36 -0.35 -2.01* S2 60 44 16* -2.7 -0.7 -2* S3 40 39 1 3 9.35 -6.35* S4 64 66 -2 -2.65 -1 -1.65* S5 94 78 16* -1.53 -0.35 -1.18 S6 96 94 2 -6 -2.94 -3.06* S7 94 78 16* -0.18 3 -3.18* S8 86 79 7* -2.12 -1.06 -1.06

The California Consonant Test (CCT) and Hearing In Noise Test (HINT) scores are shown along with the difference (diff) scores betweenthe Good and the Poor FGCs. The CCT scores are in percent correct. The HINT scores are the signal to noise ratio required for 50%correct sentence recognition. A lower HINT score indicates superior speech understanding.* = significant at the .05 level

Figure 1. Real-Ear-Aided-Responses (REAR) for aninput of 50 dB SPL for subject 1. The solid line repre-sents the REAR for the Good FGC and the dashed linerepresents the REAR for the Poor FGC.

Page 9: Case-Study Analysis of Various Field Study Measures · Abreviaturas: AI = Indice de articulación; ANCOVA = análisis de covariancia; APHAB = Perfil Abreviado de Beneficio del Auxiliar

Field Studies/Preminger and Cunningham

47

Reliability can also be addressed by looking atthe interaction between FGC and Week (seeTable 5). Six of the 8 subjects showed signifi-cant interactions. Follow-up testing was accom-plished using the Tukey HSD test with a .05level of significance. Results for two subjectsare shown in Figure 3. The results for Subject6 (top graph) show that the ratings for the first‘Good’ week were significantly higher than forall other weeks. The results for Subject 8 (bot-tom graph) show that the ratings for the first‘Poor’ week were significantly poorer than forall other weeks. In general, this measure wasnot reliable on an individual basis.

Subjective Ratings: Paired Comparisons.

The individual results for the paired com-parisons are shown in Table 6. The Binomialtest was used to determine whether one FGCwas selected significantly more often than theother FGC. Seven of the subjects selected theGood FGC as having better speech clarity sig-nificantly more often during the first week oftesting. The remaining subject (subject 2)selected the Poor FGC as having better speechclarity during the first week of testing. This isnot the same subject who rated the Poor FGCwith superior speech clarity for the category

ratings (see figure 2); that was subject 3.During the second week of paired comparisontesting, 5 of the 8 subjects selected the GoodFGC as having better speech clarity significant-ly more often than the Poor FGC. Across thetwo weeks of testing, the results for 5 of the 8subjects were valid, sensitive and reliable.

Self Assessment Scales

APHAB

The APHAB was given four times, onceafter each week that category ratings weremade (see the 5th column in Table 2). Theresults for the three scales which measuredspeech clarity (ease of communication, rever-beration and background noise) were compared.The results were submitted to individualANOVA analyses with main effects of FGC andWeek and the results are shown in Table 7. Themean results for the 3 scales are shown inFigure 4. The APHAB was only measured forthe aided conditions, so ‘performance’ ratherthan ‘benefit’ was measured. Each graph showspercent problems; a lower score indicates supe-rior speech clarity. The top graph shows theresults for the main effect FGC. Only one sig-nificant difference was found, indicating poor

Subject Group Source of Variation Degrees of Freedom F p

S1 FGC 1 3.834 0.054 Week 1 7.969 0.006FGC x Week 1 8.190 0.005

S2 FGC 1 0.794 0.376 Week 1 0.090 0.765 FGC x Week 1 0.684 0.411

S3 FGC 1 3.447 0.067 Week 1 8.05 0.006 FGC x Week 1 3.067 0.084

S4 FGC 1 2.732 0.103 Week 1 3.934 0.051 FGC x Week 1 0.195 0.660

S5 FGC 1 24.325 <0.001Week 1 0.038 0.585 FGC x Week 1 10.126 0.002

S6 FGC 1 18.123 <0.001Week 1 11.592 0.001FGC x Week 1 17.11 <0.001

S7 FGC 1 1.632 0.205 Week 1 0.122 0.728 FGC x Week 1 0.438 0.51

S8 FGC 1 302.58 <0.001Week 1 172.86 <0.001FGC x Week 1 169.87 <0.001

Probability values significant at the .05 level are in bold and those significant at the .10 level are in italics.ANCOVA = analysis of covariance; FGC = frequency gain characteristic.

Table 5 Repeated Measures ANCOVA for Individual Subjects for the Category Ratings

Page 10: Case-Study Analysis of Various Field Study Measures · Abreviaturas: AI = Indice de articulación; ANCOVA = análisis de covariancia; APHAB = Perfil Abreviado de Beneficio del Auxiliar

Journal of the American Academy of Audiology/Volume 14, Number 1, 2003

48

sensitivity. The one significant difference wasfound for subject 4 who reported significantlyfewer problems for the Poor FGC; this indicatespoor validity. (This was not the subject whorated the Poor FGC with better clarity for cate-gory ratings or for paired comparisons.) Thebottom figure shows the results for the maineffect Week. Here three significant differences

were measured. In addition Subjects 2 and 8had significant interactions for FGC x Week.This indicates poor reliability.

COSI

The COSI was administered at the sametimes as the APHAB. The results were examinedfor both the rank values (degree of change) andthe percent correct estimations (final ability).While five listening situations were initiallyidentified for each subject, each subject did notrate each of the five listening situations at theconclusion of each week. Therefore, the datawere analyzed for the four listening situationsthat each subject identified and rated each week.

Table 6 Binomial Test Results for Individual Subjects for the Paired-Comparison Ratings

Figure 3. Mean category ratings for each one-week peri-od for an individual listener. The top graph shows thedata for subject 6 and the bottom graph shows the datafor subject 8.

Week 1 Week 2

Subject % selected % selected % selected % selectedNumber Good Poor p Good Poor p

S1 80 20 0.012 50 50 1.000 S2 20 80 0.012 45 55 0.824 S3 71 29 0.078 100 0 <0.001S4 94 6 0.001 100 0 <0.001S5 90 10 <0.001 100 0 <0.001S6 94 6 <0.001 94 6 <0.001S7 73 27 0.052 36 64 0.286 S8 100 0 <0.001 100 0 <0.001

Probability values significant at the .05 level are in bold and those significant at the .10 level are in italics.

Figure 2. Mean category ratings for the criterion speechclarity for individual subjects. The results are adjustedby the covariate ratings of the listening situation. The topgraph shows the results collapsed across FGC, the blackbars represent ratings for the Good FGC and the graybars represent ratings for the Poor FGC. The bottomgraph shows the results collapsed across rating week, thegray dotted bars represent ratings for the first two cate-gory rating weeks and the white doted bars represent rat-ings for the second two category rating weeks. Significantmain effects at the .05 level are shown by an asterisk andat the .10 level are shown by a plus sign.

Page 11: Case-Study Analysis of Various Field Study Measures · Abreviaturas: AI = Indice de articulación; ANCOVA = análisis de covariancia; APHAB = Perfil Abreviado de Beneficio del Auxiliar

Field Studies/Preminger and Cunningham

49

The results were submitted to individualANOVA analyses with main effects of FGC andWeek. These results are shown in Tables 8 and9. The results, averaged across all four listen-ing conditions, are shown in Figures 5 and 6.The top graphs show the results for the maineffect FGC. For rank values, three of the sub-jects ranked the Good FGC with better speechclarity than the Poor FGC (Figure 5). This indi-cates fair sensitivity and validity. There wasalso a significant difference between the rank-ings for the FGCs for Subject 3, but this was thewrong way. This same subject had significantlyhigher clarity ratings for the Poor FGC with thecategory rating technique (Figure 2) but hadsignificantly higher ratings for the Good FGCwith the paired comparison technique (Table 6).Two of the eight subjects showed a main effectfor Week, indicating fair reliability.The results in Figure 6 show only one signifi-cant difference between the percent correctestimations for the Good and Poor FGCs. Thisshows that the COSI rank values were moresensitive than the percent correct estimations.

DISCUSSION

In order to summarize the results of this study, it ishelpful to look at the results across all of the field

studies.Table 10 shows the results for the subjectiveratings and the self-assessment scales for each sub-ject. In the ‘validity/sensitivity’ columns ‘G’ is shownin the table if the Good FGC was selected as havingsuperior speech clarity compared to the Poor FGC.A ‘P’ was entered if the Poor FGC was selected ashaving superior speech clarity. For the category rat-ings, the APHAB and the COSI, the data entered inthe table are taken from the individual ANOVAanalyses which were discussed in the results sec-tion. For the paired-comparisons, a ‘P’ or ‘G’ wasentered if either FGC was selected significantlymore often for either of the two weeks that pairedcomparisons were measured during the field study.For the COSI, only the results for the rank valuesare shown because these were found to be more sen-sitive than the percent correct estimations. A ‘G’ inthe table would suggest that the measure was sen-sitive and valid,a ‘P’ in the table would suggest thatthe measure was sensitive but not valid.

Subject Group Source of Variation Degrees of Freedom F p

S1 FGC 1 0.240 0.637 Week 1 0.001 0.981 FGC x Week 1 0.318 0.589

S2 FGC 1 2.530 0.150 Week 1 5.917 0.041FGC x Week 1 3.501 0.098

S3 FGC 1 0.190 0.675 Week 1 0.190 0.675 FGC x Week 1 1.033 0.339

S4 FGC 1 4.127 0.077 Week 1 2.596 0.146 FGC x Week 1 1.008 0.345

S5 FGC 1 2.602 0.145 Week 1 5.797 0.043FGC x Week 1 0.289 0.605

S6 FGC 1 0.162 0.698 Week 1 0.066 0.804 FGC x Week 1 0.162 0.698

S7 FGC 1 1.026 0.334 Week 1 3.635 0.093 FGC x Week 1 0.586 0.815

S8 FGC 1 1.719 0.226 Week 1 1.277 0.291 FGC x Week 1 14.081 0.006

Probability values significant at the .05 level are in bold and those significant at the .10 level are in italics.ANOVA = analysis of variance; APHAB = Abbreviated Profile of Hearing Aid Benefit; FGC = frequency gain characteristics.

Table 7 Repeated Measures ANOVA for Individual Subjects for the APHAB

Page 12: Case-Study Analysis of Various Field Study Measures · Abreviaturas: AI = Indice de articulación; ANCOVA = análisis de covariancia; APHAB = Perfil Abreviado de Beneficio del Auxiliar

Journal of the American Academy of Audiology/Volume 14, Number 1, 2003

50

In Table 10, a letter was entered into the ‘relia-bility’ column only if the test was found to be sensi-tive for that subject; the reliability is meaningless ifthe test cannot differentiate between the two mem-ories. An ‘R’ is entered into the ‘reliability’ column if

there was no main effect for ‘week’ for that subject.A ‘NR’ is entered into the column if there was a maineffect for week. For the paired comparison ratingsan ‘R’ was entered if the same results were foundover each of the two weeks of testing.

Table 8 Repeated Measures ANOVA Collapsed across Subjects for the COSI Rank Data

Figure 4. Mean percent problems for the APHAB averaged across the scales: ease of communication, rever-beration and background noise. (A lower bar indicatessuperior speech clarity.) The top graph shows the resultscollapsed across FGC, the black bars represent ratingsfor the Good FGC and the gray bars represent ratings forthe Poor FGC. The bottom graph shows the results col-lapsed across rating week, the gray dotted bars representratings for the first two weeks and the white doted barsrepresent ratings for the second two weeks. Significantmain effects at the .05 level are shown by an asterisk andat the .10 level are shown by a plus sign.

Subject Group Source of Variation Degrees of Freedom F p

S1 FGC 1 0.035 0.848 Week 1 0.882 0.6312 FGC x Week 1 2.859 0.1138

S2 FGC 1 0.2857 0.603 Week 1 0.000 1.000 FGC x Week 1 2.5714 0.1348

S3 FGC 1 25.000 <0.001Week 1 25 <0.001FGC x Week 1 25 <0.001

S4 FGC 1 1.191 0.297 Week 1 0.429 0.525 FGC x Week 1 0.429 0.525

S5 FGC 1 17.78 0.001Week 1 0.61 0.45 FGC x Week 1 0.61 0.45

S6 FGC 1 4.765 0.049Week 1 1.471 0.249 FGC x Week 1 2.882 0.115

S7 FGC 1 4.188 0.063 Week 1 4.188 0.063 FGC x Week 1 4.188 0.063

S8 FGC 1 0.000 1.000 Week 1 2.000 0.183 FGC x Week 1 0.000 1.000

Probability values significant at the .05 level are in bold and those significant at the .10 level are in italics.

Page 13: Case-Study Analysis of Various Field Study Measures · Abreviaturas: AI = Indice de articulación; ANCOVA = análisis de covariancia; APHAB = Perfil Abreviado de Beneficio del Auxiliar

Field Studies/Preminger and Cunningham

51

The best results were found for the pairedcomparison technique. This method was foundto be valid and sensitive for 6 of the 8 subjectsand it was found to be reliable for 5 of these 6subjects. Other researchers have found thepaired comparison technique to be superior tothe category rating technique in the laboratory.The paired comparison technique has beenfound to be more sensitive than the categoryrating technique in both adults and in children(Eisenberg & Dirks, 1995; Eisenberg, Dirks, &Gornbein, 1997). It is not surprising that simi-lar differences between these techniques arefound in the filed. Judgments in the field aremore difficult than judgments in the laboratorydue to the longer time period between ratingsand the widely varying listening conditions.For the category rating technique, 4 of the 8subjects showed good validity and sensitivity.Reliability was poor for 3 of these 4 subjects.One major problem with the category ratings isthat it is possible for subjects to completelychange their internal reference criterion fromweek to week. This was shown in Figure 2.Subject 8 completely changed her criterion forclarity during week Poor1; ratings made duringthis week were significantly lower than ratingsmade during the other three weeks. This find-

ing is a common problem with the category rat-ing method. Filion and Margolis (1992) pre-sented a clear example of this problem. Theymeasured Loudness Discomfort Levels (LDLs)for a group of listeners using a 9-point categoryrating scale both in the laboratory and in anightclub. The LDLs measured in the laborato-ry over-predicted the LDLs measured in thenightclub. Individual subject’s criterion regard-ing what was too loud varied with the type ofsetting. This same type of criterion shift canoccur for speech clarity or speech qualitydepending on the listening environment or onother variables.

The APHAB was not found to be sensitivefor the majority of subjects. Other studies haveshown poor results when the APHAB has beenused to differentiate between hearing aids. Stoneet al. (1999) had 8 listeners compare 4 differentcompression schemes in a single hearing aid.Subjects each completed an adjusted version ofthe APHAB and several objective speech testsafter 2 weeks of wearing a single compressionprogram. Individual analysis of the APHABresults showed that three of the subjects didhave significant differences between the 4 pro-grams. This was not shown for the objectivespeech tests. Three of the subjects then wore all

Figure 5. Mean rank values (degree of change) for theCOSI averaged across four conditions. The top graphshows the results collapsed across FGC, the black bars rep-resent ratings for the Good FGC and the gray bars repre-sent ratings for the Poor FGC. The bottom graph shows theresults collapsed across rating week, the gray dotted barsrepresent ratings for the first two weeks and the whitedoted bars represent ratings for the second two weeks.Significant main effects at the .05 level are shown by anasterisk and at the .10 level are shown by a plus sign.

Figure 6. Mean percent correct estimations (final abili-ty) for the COSI averaged across four conditions. The topgraph shows the results collapsed across FGC, the blackbars represent ratings for the Good FGC and the graybars represent ratings for the Poor FGC. The bottomgraph shows the results collapsed across rating week,the gray dotted bars represent ratings for the first twoweeks and the white doted bars represent ratings for thesecond two weeks. Significant main effects at the .05level are shown by an asterisk and at the .10 level areshown by a plus sign.

Page 14: Case-Study Analysis of Various Field Study Measures · Abreviaturas: AI = Indice de articulación; ANCOVA = análisis de covariancia; APHAB = Perfil Abreviado de Beneficio del Auxiliar

Journal of the American Academy of Audiology/Volume 14, Number 1, 2003

52

four programs in the field and were able to com-pare the programs in the real world in a varia-tion of paired comparison testing. One of thesesubjects had the same result for the comparisonfield testing as the APHAB, the other two didnot. While the APHAB is clearly sensitive to dif-ferences between unaided and aided perform-ance, it has not been shown to be sensitive to dif-ferences among hearing aids.

The COSI was valid and sensitive for 3 ofthe 8 subjects, but was reliable for only 2 of these3 subjects. The COSI has the same limitationsas the category rating technique and theAPHAB. Ratings are made over long periods oftime and can be influenced by changes in anindividual’s internal criterion and/or by changesin the environment. Given these limitations it isdifficult to maintain sensitivity and reliability.

Subject Group Source of Variation Degrees of Freedom F p

S1 FGC 1 0.368 0.561 Week 1 0.188 0.675 FGC x Week 1 0.101 0.753

S2 FGC 1 0.495 0.495 Week 1 5.505 0.037FGC x Week 1 0.881 0.367

S3 FGC 1 2.455 0.143 Week 1 0.273 0.611 FGC x Week 1 0.273 0.611

S4 FGC 1 1.984 0.184 Week 1 1.624 0.227 FGC x Week 1 3.279 0.095

S5 FGC 1 13.073 0.004Week 1 0.140 0.175 FGC x Week 1 0.762 0.399

S6 FGC 1 0.231 0.639 Week 1 0.231 0.639 FGC x Week 1 0.231 0.639

S7 FGC 1 1.436 0.254 Week 1 0.416 0.531 FGC x Week 1 0.077 0.787

S8 FGC 1 .0370 0.555 Week 1 .0370 0.555 FGC x Week 1 6.517 0.025

Probability values significant at the .05 level are in bold and those significant at the .10 level are in italicsANOVA = analysis of variance; COSI = Client Oriented Scale of Improvement; FGC = frequency gain characteristics.

Category Rating Paired Comparisons APHAB COSI

Validity/ Validity/ Validity/ Validity/Subject Sensitivity Reliability Sensitivity Reliability Sensitivity Reliability Sensitivity Reliability

S1 G NR G NR

S2 P NR

S3 P NR G R P NR

S4 G R P G

S5 G R G R G R

S6 G NR G R G R

S7 G NR

S8 G NR G R

In the validity/sensitivity column: G = Significantly better result for Good FGC; P = Significantly better result for Poor FGC; no entrymeans that there was no significant difference between the Good and Poor FGC. In the Reliability column: R = no significant differ-ence between test weeks, NR = significant difference between test weeks; data was entered in this column only if data was entered inthe validity/sensitivity column for the same subject.APHAB = Abbreviated Profile of Hearing Aid Benefit; COSI = Client Oriented Scale of Imrpvement.

Table 9 Repeated Measures ANOVA Collapsed across Subjects for the COSI Percent Correct Data

Table 10 Sensitivity, Validity and Reliability for Each Evaluation Measure for Each Subject

Page 15: Case-Study Analysis of Various Field Study Measures · Abreviaturas: AI = Indice de articulación; ANCOVA = análisis de covariancia; APHAB = Perfil Abreviado de Beneficio del Auxiliar

Of all the field study methods evaluated,the paired comparison technique was the onlymeasure which allowed the hearing aid user tolisten to each FGC at the same time, and for theexact same listening situation. In this way,changes in internal criterion, and variability inthe listening situation are not factors. For thesereasons, the paired comparison technique is therecommended technique for future field studies.The major limitation of the paired comparisontechnique is that it can only be used when twodifferent programming strategies are comparedwithin the same hearing aid. This techniquecannot be used when comparing two differenthearing aids in the real world.

There are several reasons why the resultsmay not have been sensitive for the measuresused in the present study. First, the AI differ-ences between the FGCs shown in Table 3 aresmall, and indicate that there were only minordifferences between the Good and Poor FGCs.Based on transfer functions that relate AI toexpected speech recognition performance (e.g.Dirks, Bell, Rossman & Kincaid, 1986), differ-ences in speech recognition for these AI differ-ences should not be expected for all types ofspeech materials. These small AI differenceswere necessary, however, so that quality differ-ences between the FGCs would be minimized.Due to the small AI differences, we would expectspeech clarity differences only for the most sen-sitive tests, and only for more difficult listeningsituations (e.g. low context listening situations).

A second reason which may explain thelack of sensitivity is the validity of the tests.Thecriterion measure for this study was speechclarity measured in the laboratory. The subjec-tive rating measures had high face validity;they were direct measures of speech clarity inthe field. The self-assessment scales did assessspeech clarity, but the questions on these scaleswere not exclusive to this area. The broaderfocus of the self-assessment scales may havecontributed to their reduced sensitivity in thiscontext.

Finally, the field study was limited to oneweek with each hearing aid setting. This shorttime period was used in order to limit the entirefield study to a 6-week period. There is a con-cern that a one-week period with a new settingwill not allow sufficient time for adjustmentand accurate ratings. There is evidence how-ever, that listeners with hearing loss can makestable hearing aid ratings 7 days after fitting.Humes, Halling and Coughlin (1996) evaluatedobjective and subjective outcome measures in a

group of 20 elderly listeners 7, 15, 30, 60, 90 and180 days post-fitting. Most of the measureswere found to be reliable and stable, on an indi-vidual basis, during this 6-month period.It would be useful to continue to evaluate fieldstudy methods with larger numbers of subjectsand with different types of hearing aid adjust-ments. It would be worthwhile to determinewhether individual subject characteristics suchas age and hearing aid experience will influencethe psychometric properties of the field studytechniques.

Acknowledgment: This project was funded by anAmerican Academy of Audiology New Investigator awardand the Mary and Mason Rudd Surgical Research Fund,Jewish Hospital Foundation, Louisville, KY. The authorswould like to thank Jenna Dye for assistance with datacollection, Tim Trine at Starkey Laboratories andGraham Naylor at Oticon A/S, Research Division,Denmark, for providing the experimental hearing aids,and two anonymous reviewers who provided extensiveand thoughtful comments. Portions of this paper werepresented at the American Academy of Audiology annualconvention, Philadelphia, April 2002.

REFERENCES

Byrne, D., and Dillon, H. (1986). The National AcousticsLaboratories’ (NAL) new procedure for selecting the gain and fre-quency response of a hearing aid. Ear and Hearing, 7, 257-265.

Byrne, D., & Cotton, S. (1988). Evaluation of the NationalAcoustic Laboratories new hearing aid selection procedure.Journal of Speech and Hearing Research, 31, 178-186.

Cox, R., & McDaniel, D. (1984). Intelligibility ratings of con-tinuous discourse: application to hearing aid selection.Journal of the Acoustical Society of America, 76, 758-766.

Cox, R. M., & Rivera, I. M. (1992). Predictability and reliabil-ity of hearing aid benefit measured using the PHAB. Journalof the American Academy of Audiology, 3, 242-254.

Cox, R. M., & Alexander, G. C. (1995). The abbreviated Profileof Hearing Aid Benefit. Ear and Hearing, 16, 176-186.

Dillon, H., James, A. & Ginis, J. (1997). Client Oriented Scaleof Improvement and its relationship to several other meas-ures of benefit and satisfaction provided by hearing aids.Journal of the American Academy of Audiology, 8, 27-43.

Dirks, D. D., Bell, T. S., Rossman, R. N. & Kincaid, G. E. (1986).Articulation index prediction of contextually dependent words,Journal of the Acoustical Society of America, 80, 82-92.

Eisenberg,L.S.,& Levitt,H. (1991). Paired comparison judgmentsfor hearing aid selection in children. Ear and Hearing,12,417-430.

Eisenberg, L S. & Dirks, D. D. (1995). Reliability and sensi-tivity of paired comparisons and category ratings in children.Journal of Speech and Hearing Research, 38, 1157-1167.

Eisenberg, L. S., Dirks, D. D., & Gornbein, J. A. (1997).Subjective judgments of speech clarity measured by paired com-parisons and category rating. Ear and Hearing, 18, 294-306.

Filion, P. R. & Margolis, R. H. (1992). Comparison of clinicaland real-life judgments of loudness discomfort. Journal of theAmerican Academy of Audiology, 3, 193-199.

Field Studies/Preminger and Cunningham

53

Page 16: Case-Study Analysis of Various Field Study Measures · Abreviaturas: AI = Indice de articulación; ANCOVA = análisis de covariancia; APHAB = Perfil Abreviado de Beneficio del Auxiliar

Gabrielsson, A., Schenkman, B. N., & Hagerman, B. (1988).The effects of different frequency responses on sound qualityjudgments and speech intelligibility. Journal of Speech andHearing Research, 31, 166-177.

Gabrielsson, A., Hagerman, B., Bech-Kristensen, T., &Lundberg, G. (1990). Perceived sound quality reproductionswith different frequency responses and sound levels. Journalof the Acoustical Society of America, 88, 1359-1366.

Hagerman, B. & Gabrielsson, A. (1985). Questionnaires ondesirable properties of hearing aids. Scandinavian Audiology,14, 109-111.

Humes, L.E., Christensen, L.A., Bess, F.H., & Hedley-Williams, A. (1997). A comparison of the benefit provided bywell-fit linear hearing aids and instruments with automaticreduction of low-frequency gain. Journal of Speech andHearing Research, 40, 666-685.

Humes, L. E., Halling, D. & Coughlin, M. (1996). Reliabilityand stability of various hearing-aid outcome measures in agroup of elderly hearing aid wearers. Journal of Speech andHearing Research, 39, 923-935.

Kuk, F.K., & Pape, N.M.C. (1993). Relative Satisfaction forfrequency responses selected with a simplex procedure in dif-ferent listening conditions, Journal of Speech and HearingResearch, 36, 168-177.

Leijon, A., Lindkvist, A., Ringdahl, A., & Israelsson, B. (1991).Sound quality and speech reception for prescribed hearingaid frequency responses. Ear and Hearing, 12, 251-260.

Newman, C. W. & Sandridge, S. A, (1998). Benefit from, satis-faction with, and cost-effectiveness of three different hearingaid technologies, American Journal of Audiology, 7, 115-128.

Nilsson M. Soli SD. Sullivan JA. (1994). Development of theHearing in Noise Test for the measurement of speech recep-tion thresholds in quiet and in noise. Journal of the AcousticalSociety of America, 95, 1085-99.

Owens, E. & Schubert, E. D. (1977). Development of theCalifornia Consonant Test, Journal of Speech and HearingResearch, 20, 463-474.

Pavlovic, C. (1991). Speech recognition and five articulationindexes. Hearing Instruments, 42, 20-23.

Punch, J. L. & Beck, E. L. (1980). Low-frequency response ofhearing aids and judgments of aided speech quality. Journalof Speech and Hearing Research, 45, 325-335.

Preminger, J. E. & Van Tasell, D. J. (1995). Quantifying therelation between speech quality and speech intelligibility.Journal of Speech and Hearing Research, 38, 714-725.

Preminger, J. E., Neuman, A.C., Bakke, M.H., Walters, D. &Levitt, H. (2000). An examination of the practicality of thesimplex procedure. Ear and Hearing. 21, 177-193.

Purdy, S. C., & Pavlovic, C. V. (1992). Reliability, sensitivityand validity of magnitude estimation, category scaling andpaired-comparison judgments of speech intelligibility by olderlisteners. Audiology, 31, 254-271.

Speaks, C., Parker, B., Harris, C., & Kuhl, P. (1972).Intelligibility of connected discourse. Journal of Speech andHearing Research, 15, 590-602.

Stone, M. A., Moore, B. C. J., Alcántara, J. I. & Glasberg, B. R.(1999). Comparison of different forms of compression usingwearable digital hearing aids. Journal of the AcousticalSociety of America, 106, 3603-3619.

Surr, R.K., Cord,T, & Walden, B.E. (1997), Comparison of lin-ear and k-amp circuits. Ear and Hearing, 18, 140-146.

Thornton, A. & Raffin, M. J. M. (1978). Speech-discriminationscores modeled as a binomial variable, Journal of Speech andHearing Research, 21, 507-518.

Valente, M, Sammeth, C.A., Potts, L.G., Wynne, M. K.,Wagner-Escobar, M, & Coughlin, M. (1997). Differences inperformance between Oticon MultiFocus Compact andReSound BT2-E Hearing Aids. Journal of the AmericanAcademy of Audiology, 8, 280-293.

Valente, M., Fabry, D.A., Potts, L.G., & Sandlin, R. E. (1998).Comparing the performance of the Widex SENSO digitalhearing aid with analog aids. Journal of the AmericanAcademy of Audiology, 9, 342-360.

Walden, B. E. (1997). Toward a model clinical-trials protocolfor substantiating hearing aid user-benefit claims. AmericanJournal of Audiology, 6, 13-24.

Journal of the American Academy of Audiology/Volume 14, Number 1, 2003

54

Page 17: Case-Study Analysis of Various Field Study Measures · Abreviaturas: AI = Indice de articulación; ANCOVA = análisis de covariancia; APHAB = Perfil Abreviado de Beneficio del Auxiliar

Field Studies/Preminger and Cunningham

55

KEY

LISTENING CONDITIONSSetting: What is the setting of the listening situation? (Choose all that apply.)

SG = Small Group, LG = Large Group,Rst = Restaurant, Spkr = Speaker (lecture, religious service, etc.), Car,Out = Outdoors, TV, Radio,1 to 1 convers.

Noise Level: What is the level of the back-ground noise present?

Quiet = QLow-Level noise = LHigh-Level Noise = H

Difficulty: How difficult is the listening situation?

Very Easy = 5Easy = 4Moderate = 3Difficult = 2Extremely Difficult = 1

HEARING AID RATING:Clarity: How clear does speech sound throughthe hearing aid?

(On a scale of 1 – 10)Extremely Unclear = 1Extremely Clear = 10

DAILY RECORD

Date: __________ Time: __________

Rate each of the following categories using theinformation provided on the rating key.

LISTENING CONDITIONS:

Setting (circle all that apply)

SG LG Rst Store SpkrCar Out TV Radio 1to1

Noise Level (circle one)

Q L H

Difficulty (1 – 5) __________

HEARING AID RATING:

Clarity ( 1 – 10) __________

APPENDIX Front and Back Pages Used to Make Category Ratings

Back Front