words, more words … and statistics · picking out single words in a flow of speech is no easy...

4
Words, more words … and statistics To segment words, the brain could be using statistical methods May 19, 2016 Picking out single words in a flow of speech is no easy task and, according to linguists, to succeed in doing it the brain might use statistical methods. A group of SISSA scientists has applied a statistics-based method for word segmentation and measured its efficacy on natural language, in 9 different languages, to discover that linguistic rhythm plays an important role. The study has just been published in the Journal of Developmental Science. Have you ever racked your brains trying to make out even a single word of an uninterrupted flow

Upload: others

Post on 15-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • Words,morewords…andstatistics

    Tosegmentwords,thebraincouldbeusingstatisticalmethodsMay19,2016Pickingoutsinglewordsinaflowofspeechisnoeasytaskand,accordingtolinguists,tosucceedindoingitthebrainmightusestatisticalmethods.AgroupofSISSAscientistshasappliedastatistics-basedmethodforwordsegmentationandmeasureditsefficacyonnaturallanguage,in9differentlanguages,todiscoverthatlinguisticrhythmplaysanimportantrole.ThestudyhasjustbeenpublishedintheJournalofDevelopmentalScience.

    Haveyoueverrackedyourbrainstryingtomakeoutevenasinglewordofanuninterruptedflow

  • ofspeechinalanguageyouhardlyknowatall?Itisnaïvetothinkthatinspeechthereiseventhesmallestofpausesbetweenonewordandthenext(likethespaceweconventionallyinsertbetweenwordsinwriting):inactualfact,speechisalmostalwaysacontinuousstreamofsound.However,whenwelistentoournativelanguage,word“segmentation”isaneffortlessprocess.Whatare,linguistswonder,theautomaticcognitivemechanismsunderlyingthisskill?Clearly,knowledgeofthevocabularyhelps:memoryofthesoundofthesinglewordshelpsustopickthemout.However,manylinguistsargue,therearealsoautomatic,subconscious“low-level”mechanismsthathelpusevenwhenwedonotrecognisethewordsorwhen,asinthecaseofveryyoungchildren,ourknowledgeofthelanguageisstillonlyrudimentary.Thesemechanisms,theythink,relyonthestatisticalanalysisofthefrequency(estimatedbasedonpastexperience)ofthesyllablesineachlanguage.Oneindicatorthatcouldcontributetosegmentationprocessesis“transitionalprobability”(TP),whichprovidesanestimateofthelikelihoodoftwosyllablesco-occurringinthesameword,basedonthefrequencywithwhichtheyarefoundassociatedinagivenlanguage.Inpractice,ifeverytimeIhearthesyllable“TA”itisinvariablyfollowedbythesyllable“DA”,thenthetransitionalprobabilityfor“DA”,given“TA”,is1(thehighest).If,ontheotherhand,wheneverIhearthesyllable“BU”itisfollowedhalfofthetimebythesyllable”DI”andhalfofthetimeby“FI”,thenthetransitionalprobabilityof“DI”(and“FI”),given“BU”,is0.5,andsoforth.Thecognitivesystemcouldbeimplicitlycomputingthisvaluebyrelyingonlinguisticmemory,fromwhichitwouldderivethefrequencies.ThestudyconductedbyAmandaSaksida,researchscientistattheInternationalSchoolforAdvancedStudies(SISSA)inTrieste,withthecollaborationofAlanLangus,SISSAresearchfellow,underthesupervisionofSISSAprofessorMarinaNespor,usedTPtosegmentnaturallanguage,byusingtwodifferentapproaches.BasedonrhythmSaksida’sstudyisbasedontheworkwithcorpora,thatis,bodiesoftextsspecificallycollectedforlinguisticanalysis.Inthecaseathand,thecorporaconsistedoftranscriptionsofthe“linguisticsoundenvironment”thatinfantsareexposedto.“Wewantedtohaveanexampleofthetypeoflinguisticenvironmentinwhichachild’slanguagedevelops”,explainedSaksida,“Wewonderedwhetheralow-levelmechanismsuchastransitionalprobabilityworkedwithreal-lifelanguagecues,whichareverydifferentfromtheartificialcuesnormallyusedinthelaboratory,whicharemoreschematicandfreeofsourcesof‘noise’.Furthermore,thequestionwaswhetherthesamelow-levelcueisequallyefficientindifferentlanguages”.Saksidaandcolleaguesusedcorporaofnolessthan9differentlanguages,andtoeachtheyappliedtwodifferentTP-basedmodels.FirsttheycalculatedtheTPvaluesforeachpointofthelanguageflowforallofthecorpora,andthenthey“segmented”theflowusingtwodifferentmethods.Thefirstwasbasedonabsolutethresholding:acertainfixedreferenceTPvaluewasestablishedbelowwhichaboundarywasidentified.Thesecondmethodwasbasedonrelativethresholding:theboundariescorresponded

  • tothelocallylowestTPfunction.Inallcases,Saksidaandcolleaguesfoundthattransitionalprobabilitywasaneffectivetoolforsegmentation(49%to86%ofwordsidentifiedcorrectly)irrespectiveofthesegmentationalgorithmused,whichconfirmsTPefficacy.Ofnote,whilebothmodelsprovedtobequiteefficient,whenonemodelwasparticularlysuccessfulwithonelanguage,thealternativemodelalwaysperformedsignificantlyworse.“Thiscross-linguisticdifferencesuggeststhateachmodelisbettersuitedthantheotherforcertainlanguagesandviceversa.Wethereforeconductedfurtheranalysestounderstandwhatlinguisticfeaturescorrelatedwiththebetterperformanceofonemodelovertheother”,explainsSaksida.Thecrucialdimensionprovedtobelinguisticrhythm.“WecandivideEuropeanlanguagesintotwolargegroupsbasedonrhythm:stress-timedandsyllable-timed“.Stress-timedlanguageshavefewervowelsandshorterwords,andincludeEnglish,SlovenianandGerman.Syllable-timedlanguagescontainmorevowelsandlongerwordsonaverage,andincludeItalian,SpanishandFinnish.ThethirdrhythmicgroupoflanguagesdoesnotexistinEuropeandisbasedon“morae”(apartofthesyllable),suchasJapanese.Thisgroupisknownas“mora-timed”andcontainsevenmorevowelsthansyllable-timedlanguages.Theabsolutethresholdmodelprovedtoworkbestonstress-timedlanguages,whereasrelativethresholdingwasbetterforthemora-timedones.“It’sthereforepossiblethatthecognitivesystemlearnstousethesegmentationalgorithmthatisbestsuitedtoone’snativelanguage,andthatthisleadstodifficultiessegmentinglanguagesbelongingtoanotherrhythmiccategory.Experimentalstudieswillclearlybenecessarytotestthishypothesis.Weknowfromthescientificliteraturethatimmediatelyafterbirthinfantsalreadyuserhythmicinformation,andwethinkthatthestrategiesusedtochoosethemostappropriatesegmentationcouldbeoneoftheareasinwhichinformationaboutrhythmismostuseful”.Thestudyisinfactunabletosaywhetherthecognitivesystem(ofbothadultsandchildren)reallyusesthistypeofstrategy.“Ourstudyclearlyconfirmsthatthisstrategyworksacrossawiderangeoflanguages”,concludesSaksida.“Itwillnowserveasaguideforlaboratoryexperiments.”USEFULLINKS:

    • OriginalpaperArticolooriginale:http://goo.gl/cOk5VD

    IMAGES:

    • Credits:Jev55(Flickr:https://goo.gl/yVVdJ3)

    Contact:

    Pressoffice:[email protected]

  • Tel:(+39)0403787644|(+39)366-3677586viaBonomea,26534136TriesteMoreinformationaboutSISSA:www.sissa.it