9.3.1 carrying out a significance test for mu a company...

15
9.3.1 Carrying out a Significance Test for mu A company claimed to have developed a new AAA battery that lasts longer than its regular AAA batteries. Based on years of experience, the company knows that its regular AAA batteries last for 30 hours of continuous use, on average. An SRS of 15 new batteries lasted an average of 33.9 hours with a standard deviation of 9.8 hours. Do these data give convincing evidence that the new batteries last longer on average? To find out, we perform a test of where μ is the true mean lifetime of the new deluxe AAA batteries. Conditions Three conditions should be met before performing inference about a population mean: Random, Normal, and Independent. As previously mentioned, the Normal condition for means is population distribution is Normal or sample size is large (n ≥ 30). We often don’t know whether the population distribution is Normal. But if the sample size is large (n ≥ 30), we can safely carry out a significance test (due to the central limit theorem). If the sample size is small, we should examine the sample data for any obvious departures from Normality, such as skewness and outliers. Recall from that the t procedures are quite robust against non-Normality of the population except when outliers or strong skewness are present. For significance tests,“robust” means that the stated P-value is pretty accurate. Example – Better Batteries The figure below shows a dotplot, boxplot, and Normal probability plot of the battery lifetimes for an SRS of 15 batteries. Check the conditions for carrying out a significance test of the company’s claim about it's deluxe AAA batteries.

Upload: others

Post on 07-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 9.3.1 Carrying out a Significance Test for mu A company ...teachers.dadeschools.net/rvancol/StatsNoteTaking... · Example – Better Batteries The battery company wants to test H

9.3.1CarryingoutaSignificanceTestformuAcompanyclaimedtohavedevelopedanewAAAbatterythatlastslongerthanitsregularAAAbatteries.Basedonyearsofexperience,thecompanyknowsthatitsregularAAAbatterieslastfor30hoursofcontinuoususe,onaverage.AnSRSof15newbatterieslastedanaverageof33.9hourswithastandarddeviationof9.8hours.Dothesedatagiveconvincingevidencethatthenewbatterieslastlongeronaverage?Tofindout,weperformatestofwhereμisthetruemeanlifetimeofthenewdeluxeAAAbatteries.ConditionsThreeconditionsshouldbemetbeforeperforminginferenceaboutapopulationmean:Random,Normal,andIndependent.Aspreviouslymentioned,theNormalconditionformeansispopulationdistributionisNormalorsamplesizeislarge(n≥30).Weoftendon’tknowwhetherthepopulationdistributionisNormal.Butifthesamplesizeislarge(n≥30),wecansafelycarryoutasignificancetest(duetothecentrallimittheorem).Ifthesamplesizeissmall,weshouldexaminethesampledataforanyobviousdeparturesfromNormality,suchasskewnessandoutliers.Recallfromthatthetproceduresarequiterobustagainstnon-Normalityofthepopulationexceptwhenoutliersorstrongskewnessarepresent.Forsignificancetests,“robust”meansthatthestatedP-valueisprettyaccurate.Example–BetterBatteriesThefigurebelowshowsadotplot,boxplot,andNormalprobabilityplotofthebatterylifetimesforanSRSof15batteries.

Checktheconditionsforcarryingoutasignificancetestofthecompany’sclaimaboutit'sdeluxeAAAbatteries.

Page 2: 9.3.1 Carrying out a Significance Test for mu A company ...teachers.dadeschools.net/rvancol/StatsNoteTaking... · Example – Better Batteries The battery company wants to test H

Calculations:TeststatisticandP-valueWhenperformingasignificancetest,wedocalculationsassumingthatthenullhypothesisH0istrue.TheteststatisticmeasureshowfarthesampleresultdivergesfromtheparametervaluespecifiedbyH0,instandardizedunits.Asbefore,ForatestofH0:μ=μ0,ourstatisticisthesamplemeanx-bar.ItsstandarddeviationisInanidealworld,ourteststatisticwouldbeBecausethepopulationstandarddeviationσisusuallyunknown,weusethesamplestandarddeviationsxinitsplace.TheresultingteststatistichasthestandarderrorofXinthedenominatorAsmentionedearlier,whentheNormalconditionismet,thisstatistichasatdistributionwithn−1degreesoffreedom.

Page 3: 9.3.1 Carrying out a Significance Test for mu A company ...teachers.dadeschools.net/rvancol/StatsNoteTaking... · Example – Better Batteries The battery company wants to test H

Example–BetterBatteriesThebatterycompanywantstotestH0:μ=30versusHa:μ>30basedonanSRSof15newAAAbatterieswithmeanlifetime hoursandstandarddeviationsx=9.8hours.TheteststatisticisTheP-valueistheprobabilityofgettingaresultthislargeorlargerinthedirectionindicatedbyHa,thatis,P(t≥1.54).Thefigurebelowshowsthisprobabilityasanareaunderthetdistributioncurvewithdf=15−1=14.WecanfindthisP-valueusingthet-distributiontable.Gotothedf=14row.Thetstatisticfallsbetweenthevalues1.345and1.761.Ifyoulookatthetopofthecorrespondingcolumnsinthet-distributiontable,you’llfindthatthe“Upper-tailprobabilityp”isbetween0.10and0.05.(Seethefigurebelow)SincewearelookingforP(t>1.54),thisistheprobabilityweseek.Thatis,theP-valueforthistestisbetween0.05and0.10.

Page 4: 9.3.1 Carrying out a Significance Test for mu A company ...teachers.dadeschools.net/rvancol/StatsNoteTaking... · Example – Better Batteries The battery company wants to test H

Asyoucansee,thet-distributiontablegivesarangeofpossibleP-valuesforasignificancetest.Wecanstilldrawaconclusionfromthetestinmuchthesamewayasifwehadasingleprobability.InthecaseofthenewAAAbatteries,forinstance,wedon’thaveenoughevidencetorejectH0:μ=30becausetheP-valueexceedsourdefaultα=0.05significancelevel.Wecan’tconcludethatthecompany’snewAAAbatterieslastlongerthan30hours,onaverage.Thet-distributiontablehasotherlimitationsforfindingP-values.Itincludesprobabilitiesonlyfortdistributionswithdegreesoffreedomfrom1to30andthenskipstodf=40,50,60,80,100,and1000.(Thebottomrowgivesprobabilitiesfordf=∞,whichcorrespondstothestandardNormalcurve.)Inaddition,thet-distributiontableshowsprobabilitiesonlyforpositivevaluesoft.TofindaP-valueforanegativevalueoft,weusethesymmetryofthetdistributions.Thenextexampleshowshowwedealwithbothoftheseissues.Example–Two-SidedTests,NegativetValues,andMoreUsingthet-distributiontablewiselyWhatifyouwereperformingatestof

H0:μ=5Ha:μ≠5

basedonasamplesizeofn=37andobtainedt=−3.17?Sincethisisatwo-sidedtest,youareinterestedintheprobabilityofgettingavalueoftlessthan−3.17orgreaterthan3.17.ThefigurebelowshowsthedesiredP-valueasanareaunderthetdistributioncurvewith36degreesoffreedom.NoticethatP(t≤−3.17)=P(t≥3.17)duetothesymmetricshapeofthedensitycurve.Sincethet-distributiontableshowsonlypositivet-values,wewillfocusont=3.17.Sincedf=37−1=36isnotavailableonthetable,usedf=30.Youmightbetemptedtousedf=40,butdoingsowouldresultinasmallerP-valuethanyouareentitledtowithdf=36.(Inotherwords,you’dbecheating!)Moveacrossthedf=30row,andnoticethatt=3.17fallsbetween3.030and3.385.Thecorresponding“Upper-tailprobabilityp”isbetween0.0025and0.001.(Seethefigurebelow)Forthistwo-sidedtest,thecorrespondingP-valuewouldbebetween2(0.001)=0.002and2(0.0025)=0.005.Giventhelimitationsofusingthet-tabledistribution,ouradviceistousetechnologytofindP-valueswhencarryingoutasignificancetestaboutapopulationmean.LearnComputingP-valuesfromtdistributiononthecalculator

Page 5: 9.3.1 Carrying out a Significance Test for mu A company ...teachers.dadeschools.net/rvancol/StatsNoteTaking... · Example – Better Batteries The battery company wants to test H

CHECKYOURUNDERSTANDING Does the job satisfaction of assembly-line workers differ when their work is machine-paced rather than self-paced? One study chose 18 subjects at random from a company with over 200 workers who assembled electronic devices. Half of the workers were assigned at random to each of two groups. Both groups did similar assembly work, but one group was allowed to pace themselves while the other group used an assembly line that moved at a fixed pace. After two weeks, all the workers took a test of job satisfaction. Then they switched work setups and took the test again after two more weeks. The response variable is the difference in satisfaction scores, self-paced minus machine-paced. The hypotheses are: where µ is the mean difference in job satisfaction scores (self-paced − machine-paced) in the population of assembly-line workers at the company.Data from a random sample of 18 workers gave and sx = 60. 1. Calculate the test statistic. Show your work. 2. Use the t-table distribution to find the P-value. What conclusion would you draw? 3. Now use your calculator to find the P-value as described in the Technology Corner. Is your result consistent with the value you obtained in Question 2?

Page 6: 9.3.1 Carrying out a Significance Test for mu A company ...teachers.dadeschools.net/rvancol/StatsNoteTaking... · Example – Better Batteries The battery company wants to test H

9.3.2TheOne-sampletTest

Page 7: 9.3.1 Carrying out a Significance Test for mu A company ...teachers.dadeschools.net/rvancol/StatsNoteTaking... · Example – Better Batteries The battery company wants to test H

Example–HealthyStreamsPerformingasignificancetestaboutμThelevelofdissolvedoxygen(DO)inastreamorriverisanimportantindicatorofthewater’sabilitytosupportaquaticlife.AresearchermeasurestheDOlevelat15randomlychosenlocationsalongastream.Herearetheresultsinmilligramsperliter(mg/l):Adissolvedoxygenlevelbelow5mg/lputsaquaticlifeatrisk.(a)Canweconcludethataquaticlifeinthisstreamisatrisk?Carryoutatestattheα=0.05significanceleveltohelpyouanswerthisquestion.

Page 8: 9.3.1 Carrying out a Significance Test for mu A company ...teachers.dadeschools.net/rvancol/StatsNoteTaking... · Example – Better Batteries The battery company wants to test H

(b)Givenyourconclusioninpart(a),whichkindofmistake—aTypeIerrororaTypeIIerror—couldyouhavemade?Explainwhatthismistakewouldmeanincontext.LearnOne-samplettestonthecalculatorAPEXAMTIPRemember:ifyoujustgivecalculatorresultswithnowork,andoneormorevaluesarewrong,youprobablywon’tgetanycreditforthe“Do”step.Werecommenddoingthecalculationwiththeappropriateformulaandthencheckingwithyourcalculator.Ifyouoptforthecalculator-onlymethod,nametheprocedure(ttest)andreporttheteststatistic(t=–0.94),degreesoffreedom(df=14),andP-value(0.1809).CHECKYOURUNDERSTANDING A college professor suspects that students at his school are getting less than 8 hours of sleep a night, on average. To test his belief, the professor asks a random sample of 28 students, “How much sleep did you get last night?” Here are the data (in hours):

Do these data provide convincing evidence in support of the professor’s suspicion? Carry out a significance test at the α = 0.05 level to help answer this question.

Page 9: 9.3.1 Carrying out a Significance Test for mu A company ...teachers.dadeschools.net/rvancol/StatsNoteTaking... · Example – Better Batteries The battery company wants to test H

9.3.3Two-SidedTestsandConfidenceIntervalsExample–JuicyPineappleAtwo-sidedtestAttheHawaiiPineappleCompany,managersareinterestedinthesizesofthepineapplesgrowninthecompany’sfields.Lastyear,themeanweightofthepineapplesharvestedfromonelargefieldwas31ounces.Anewirrigationsystemwasinstalledinthisfieldafterthegrowingseason.Managerswonderwhetherthischangewillaffectthemeanweightoffuturepineapplesgrowninthefield.Tofindout,theyselectandweigharandomsampleof50pineapplesfromthisyear’scrop.TheMinitaboutputbelowsummarizesthedata.

(a)Determinewhetherthereareanyoutliers.Showyourwork.(b)Dothesedatasuggestthatthemeanweightofpineapplesproducedinthefieldhaschangedthisyear?Giveappropriatestatisticalevidencetosupportyouranswer.

Page 10: 9.3.1 Carrying out a Significance Test for mu A company ...teachers.dadeschools.net/rvancol/StatsNoteTaking... · Example – Better Batteries The battery company wants to test H

(c)Canweconcludethatthenewirrigationsystemcausedachangeinthemeanweightofpineapplesproduced?ExplainyouranswerThe significance test in the previous example concludes that the mean weight µ of the pineapples grown in the field this year differs from last year’s 31 ounces. Unfortunately, the test doesn’t give us an idea of what the actual value of µ is. For that, we need a confidence interval. Example–JuicyPineappleConfidenceintervalsgivemoreinformationMinitaboutputforasignificancetestandconfidenceintervalbasedonthepineappledataisshownbelow.TheteststatisticandP-valuematchwhatwegotearlier(uptorounding).

The95%confidenceintervalforthemeanweightofallthepineapplesgrowninthefieldthisyearis31.255to32.616ounces.Weare95%confidentthatthisintervalcapturesthetruemeanweightμofthisyear’spineapplecrop.Aswithproportions,thereisalinkbetweenatwo-sidedtestatsignificancelevelαanda100(1−α)%confidenceintervalforapopulationmeanμ.Forthepineapples,thetwo-sidedtestatα=0.05rejectsH0:μ=31infavorofHa:μ≠31.Thecorresponding95%confidenceintervaldoesnotinclude31asaplausiblevalueoftheparameterμ.Inotherwords,thetestandintervalleadtothesameconclusionaboutH0.Buttheconfidenceintervalprovidesmuchmoreinformation:asetofplausiblevaluesforthepopulationmean.Theconnectionbetweentwo-sidedtestsandconfidenceintervalsisevenstrongerformeansthanitwasforproportions.That’sbecausebothinferencemethodsformeansusethestandarderrorofx-barinthecalculations.

Page 11: 9.3.1 Carrying out a Significance Test for mu A company ...teachers.dadeschools.net/rvancol/StatsNoteTaking... · Example – Better Batteries The battery company wants to test H

CHECKYOURUNDERSTANDINGThe health director of a large company is concerned about the effects of stress on the company’s middle-aged male employees. According to the National Center for Health Statistics, the mean systolic blood pressure for males 35 to 44 years of age is 128. The health director examines the medical records of a random sample of 72 male employees in this age group. The Minitab output below displays the results of a significance test and a confidence interval. 1. Do the results of the significance test allow us to conclude that the mean blood pressure for all the company’s middle-aged male employees differs from the national average? Justify your answer. 2. Interpret the 95% confidence interval in context. Explain how the confidence interval leads to the same conclusion as in Question 1.

Page 12: 9.3.1 Carrying out a Significance Test for mu A company ...teachers.dadeschools.net/rvancol/StatsNoteTaking... · Example – Better Batteries The battery company wants to test H

9.3.4InferenceforMeans:PairedDataPaireddata-Studydesignsthatinvolvemakingtwoobservationsonthesameindividual,oroneobservationoneachoftwosimilarindividuals,resultinpaireddata.Pairedtprocedures:Whenpaireddataresultfrommeasuringthesamequantitativevariabletwice,wecanmakecomparisonsbyanalyzingthedifferencesineachpair.Iftheconditionsforinferencearemet,wecanuseone-sampletprocedurestoperforminferenceaboutthemeandifferenceμd.Thesemethodsaresometimescalledpairedtprocedures.Example–IsCaffeineDependenceReal?Paireddataandone-sampletproceduresResearchersdesignedanexperimenttostudytheeffectsofcaffeinewithdrawal.Theyrecruited11volunteerswhowerediagnosedasbeingcaffeinedependenttoserveassubjects.Eachsubjectwasbarredfromcoffee,colas,andothersubstanceswithcaffeineforthedurationoftheexperiment.Duringonetwo-dayperiod,subjectstookcapsulescontainingtheirnormalcaffeineintake.Duringanothertwo-dayperiod,theytookplacebocapsules.Theorderinwhichsubjectstookcaffeineandtheplacebowasrandomized.Attheendofeachtwo-dayperiod,atestfordepressionwasgiventoall11subjects.Researcherswantedtoknowwhetherbeingdeprivedofcaffeinewouldleadtoanincreaseindepression.Thetablebelowcontainsdataonthesubjects’scoresonadepressiontest.Higherscoresshowmoresymptomsofdepression.

(a)Whydidresearchersrandomlyassigntheorderinwhichsubjectsreceivedplaceboandcaffeine?

(b)Carryoutatesttoinvestigatetheresearchers’question.

Page 13: 9.3.1 Carrying out a Significance Test for mu A company ...teachers.dadeschools.net/rvancol/StatsNoteTaking... · Example – Better Batteries The battery company wants to test H

9.3.5UsingTestsWisely

Significance tests are widely used in reporting the results of research in many fields. New drugs require significant evidence of effectiveness and safety. Courts ask about statistical significance in hearing discrimination cases. Marketers want to know whether a new ad campaign significantly outperforms the old one, and medical researchers want to know whether a new therapy performs significantly better. In all these uses, statistical significance is valued because it points to an effect that is unlikely to occur simply by chance.

Carrying out a significance test is often quite simple, especially if you use a calculator or computer. Using tests wisely is not so simple. Here are some points to keep in mind when using or interpreting significance tests.

Statistical Significance and Practical Importance When a null hypothesis (“no effect” or “no difference”) can be rejected at the usual levels (α = 0.05 or α = 0.01), there is good evidence of a difference. But that difference may be very small. When large samples are available, even tiny deviations from the null hypothesis will be significant.

Example–WouldHealingTimeSignificantdoesn’tmeanimportantSupposewe’retestinganewantibacterialcream,“FormulationNS,”onasmallcutmadeontheinnerforearm.Weknowfrompreviousresearchthatwithnomedication,themeanhealingtime(definedasthetimeforthescabtofalloff)is7.6dayswithastandarddeviationof1.4days.TheclaimwewanttotesthereisthatFormulationNSspeedshealing.Wewillusea5%significancelevel.Procedure:Wecutarandomsampleof25collegestudentsandapplyFormulationNStothewounds.Themeanhealingtimeforthesesubjectsis daysandthestandarddeviationissx=1.4days.Discussion:WewanttotestaclaimaboutthemeanhealingtimeμinthepopulationofcollegestudentswhosecutsaretreatedwithFormulationNS.OurhypothesesareAnexaminationofthedatarevealsnooutliersorstrongskewness,sotheconditionsforperformingaone-samplettestaremet.Wecarryoutthetestandfindthatt=−1.79andP-value=0.043.Since0.043islessthanα=0.05,werejectH0andconcludethatFormulationNS’shealingeffectisstatisticallysignificant.However,thisresultisnotpracticallyimportant.Havingyourscabfalloffhalfadaysoonerisnobigdeal.

Remember the wise saying: Statistical significance is not the same thing as practical importance. The remedy for attaching too much importance to statistical significance is to pay attention to the actual data as well as to theP-value. Plot your data and examine them carefully. Are there outliers or other deviations from a consistent pattern? A few outlying observations can produce highly significant results if you blindly apply common significance tests. Outliers can also destroy the significance of otherwise-convincing data.

The foolish user of statistics who feeds the data to a calculator or computer without exploratory analysis will often be embarrassed. Is the difference you are seeking visible in your plots? If not, ask yourself whether the difference is large enough to be practically important. Give a confidence interval for the parameter in which you are interested. A confidence interval actually estimates the size of the difference rather than simply asking if it is too large to reasonably occur by chance alone. Confidence intervals are not used as often as they should be, whereas significance tests are perhaps overused.

Page 14: 9.3.1 Carrying out a Significance Test for mu A company ...teachers.dadeschools.net/rvancol/StatsNoteTaking... · Example – Better Batteries The battery company wants to test H

Don’t Ignore Lack of Significance There is a tendency to infer that there is no difference whenever a P-value fails to attain the usual 5% standard. A provocative editorial in the British Medical Journal entitled “Absence of Evidence Is Not Evidence of Absence” deals with this issue.22 Here is one of the examples they cite

Example–ReducingHIVTransmissionInterpretinglackofsignificanceInanexperimenttocomparemethodsforreducingtransmissionofHIV,subjectswererandomlyassignedtoatreatmentgroupandacontrolgroup.Result:thetreatmentgroupandthecontrolgrouphadthesamerateofHIVinfection.Researchersdescribedthisasan“incidentrateratio”of1.00.Aratioabove1.00wouldmeanthattherewasagreaterrateofHIVinfectioninthetreatmentgroup,whilearatiobelow1.00wouldindicateagreaterrateofHIVinfectioninthecontrolgroup.The95%confidenceintervalfortheincidentrateratiowasreportedas0.63to1.58.SayingthatthetreatmenthasnoeffectonHIVinfectionismisleading.Theconfidenceintervalfortheincidentrateratioindicatesthatthetreatmentmaybeabletoachievea37%decreaseininfection.Itmightalsoproducea58%increaseininfection.Clearly,moredataareneededtodistinguishbetweenthesepossibilities.

The situation can be worse. Research in some fields has rarely been published unless significance at the 5% level is attained. For instance, a survey of four journals published by the American Psychological Association showed that of 294 articles using statistical tests, only 8 reported results that did not attain the 5% significance level.24 That’s too bad, because we can learn a great deal from studies that fail to find convincing evidence.

In some areas of research, small differences that are detectable only with large sample sizes can be of great practical significance. Data accumulated from a large number of patients taking a new drug may be needed before we can conclude that there are life-threatening consequences for a small number of people. When planning a study, verify that the test you plan to use has a high probability (power) of detecting a difference of the size you hope to find.

Statistical Inference Is Not Valid for All Sets of Data Badly designed surveys or experiments often produce invalid results. Formal statistical inference cannot correct basic flaws in the design. Each test is valid only in certain circumstances, with properly produced data being particularly important.

Example–DoesMusicIncreaseWorkerProductivityWheninferenceisn’tvalidYouwonderwhetherbackgroundmusicwouldimprovetheproductivityofthestaffwhoprocessmailordersinyourbusiness.Afterdiscussingtheideawiththeworkers,youaddmusicandfindasignificantincrease.Youshouldnotbeimpressed.Infact,almostanychangeintheworkenvironmenttogetherwithknowledgethatastudyisunderwaywillproduceashort-termproductivityincrease.ThisistheHawthorneeffect,namedaftertheWesternElectricmanufacturingplantwhereitwasfirstnoted.Thesignificancetestcorrectlyinformsyouthatanincreasehasoccurredthatislargerthanwouldoftenarisebychancealone.Itdoesnottellyouwhatotherthanchancecausedtheincrease.Themostplausibleexplanationisthatworkerschangetheirbehaviorwhentheyknowtheyarebeingstudied.Yourexperimentwasuncontrolled,sothesignificantresultcannotbeinterpreted.Arandomizedcomparativeexperimentwouldisolatetheactualeffectofbackgroundmusicandsomakesignificancemeaningful.Hawthorneeffect-Thefactthatalmostanychangeintheworkenvironmenttogetherwithknowledgethatastudyisunderwaywillproduceashort-termproductivityincrease.

Page 15: 9.3.1 Carrying out a Significance Test for mu A company ...teachers.dadeschools.net/rvancol/StatsNoteTaking... · Example – Better Batteries The battery company wants to test H

Significance tests and confidence intervals are based on the laws of probability. Random sampling and random assignment ensure that these laws apply. Always ask how the data were produced. Don’t be too impressed by P-values on a printout until you are confident that the data deserve a formal analysis.

Beware of Multiple Analyses Statistical significance ought to mean that you have found a difference that you were looking for. The reasoning behind statistical significance works well if you decide what difference you are seeking, design a study to search for it, and use a significance test to weigh the evidence you get. In other settings, significance may have little meaning. Example–CellPhonesandBrainCancerDon’tsearchforsignificanceMighttheradiationfromcellphonesbeharmfultousers?Manystudieshavefoundlittleornoconnectionbetweenusingcellphonesandvariousillnesses.Hereispartofanewsaccountofonestudy:

Ahospitalstudythatcomparedbraincancerpatientsandasimilargroupwithoutbraincancerfoundnostatisticallysignificantassociationbetweencellphoneuseandagroupofbraincancersknownasgliomas.Butwhen20typesofgliomawereconsideredseparately,anassociationwasfoundbetweenphoneuseandonerareform.Puzzlingly,however,thisriskappearedtodecreaseratherthanincreasewithgreatermobilephoneuse.

Thinkforamoment.Supposethatthe20nullhypothesesforthese20significancetestsarealltrue.Theneachtesthasa5%chanceofbeingsignificantatthe5%level.That’swhatα=0.05means:resultsthisextremeoccuronly5%ofthetimejustbychancewhenthenullhypothesisistrue.Weexpectabout1of20teststogiveasignificantresultjustbychance.Runningonetestandreachingtheα=0.05levelisreasonablygoodevidencethatyouhavefoundsomething;running20testsandreachingthatlevelonlyonceisnot.