improvement of log pattern extracting algorithm using text
TRANSCRIPT
![Page 1: Improvement of Log Pattern Extracting Algorithm Using Text](https://reader030.vdocuments.mx/reader030/viewer/2022012719/61b1822029991c562377f2e6/html5/thumbnails/1.jpg)
ImprovementofLogPatternExtractingAlgorithmUsingTextSimilarity
ZHAOYiningComputerNetworkInformationCenter,
ChineseAcademyofSciencesinHPBDC18,2018/05/21
![Page 2: Improvement of Log Pattern Extracting Algorithm Using Text](https://reader030.vdocuments.mx/reader030/viewer/2022012719/61b1822029991c562377f2e6/html5/thumbnails/2.jpg)
Content
v CNGrid&LARGEv WhyLogPatterns&ExtractingAlgorithmv AlgorithmofIdenticalWordRatev TextSimilarityBasedApproach
Ø ImprovedExtractingFormation&LCSØ ExperimentResult
v ModifiedLogComparingModelv Summary&FutureWork
![Page 3: Improvement of Log Pattern Extracting Algorithm Using Text](https://reader030.vdocuments.mx/reader030/viewer/2022012719/61b1822029991c562377f2e6/html5/thumbnails/3.jpg)
CNGrid&LARGE
v ChinaNationalHPCEnvironment
2OperatingCenters(Beijing/Hefei)
19Sites(200PF+162PB)
PortalwithMicro-ServiceArchitecture
ApplicationorientedGlobalScheduling&Predicting
ResourceEvaluationStandard&ComprehensiveEvaluationIndex
![Page 4: Improvement of Log Pattern Extracting Algorithm Using Text](https://reader030.vdocuments.mx/reader030/viewer/2022012719/61b1822029991c562377f2e6/html5/thumbnails/4.jpg)
CNGrid&LARGE
v LogAnalyzingfRameworkinGridEnvironment
![Page 5: Improvement of Log Pattern Extracting Algorithm Using Text](https://reader030.vdocuments.mx/reader030/viewer/2022012719/61b1822029991c562377f2e6/html5/thumbnails/5.jpg)
LogPatterns&ExtractingAlgorithm
v Wewanttobealertedforlogsincertainpatterns,but…Ø toomanylogsforhumantoreadØ needtosummarizepatternsbeforedefiningalertrules
v Setoflogpatternsinourcontext:Ø patternsaredifferentfromeachotherØ coveringalllogsinoriginalsetØ significantlylessthanoriginal
v TheprocessofusinglogpatternsØ filterandremovefrequentnormallogsØ uselogpatternextractionalgorithmstogetthesetofpatternsØ manuallycheckthesetandpickoutabnormalpatternsØ definerulestogeneratealertsforthesepatterns
![Page 6: Improvement of Log Pattern Extracting Algorithm Using Text](https://reader030.vdocuments.mx/reader030/viewer/2022012719/61b1822029991c562377f2e6/html5/thumbnails/6.jpg)
AlgorithmofIdenticalWordRate
v Algorithmofidenticalwordrate–astraightforwardwayØ identicalwords
• 2wordsthatareidentical• andinthesamepositionin2originallogs
Ø identicalwordrate• (numberofidenticalwords)/(totalwords)• predefinedthresholdt• IfIWRisgreaterthant,thetwologsareinonepattern
v ProcessofalgorithmofIWRØ setthresholdtandinitialemptypatternsetPØ foreachnewincominglogs,computeIWRwitheachpatterninPØ ifpatternmatched,skiptonext;ifnonematched,addtoP
v SignificantLimitationØ LogswithdifferentlengthhasIWRofZERO!
![Page 7: Improvement of Log Pattern Extracting Algorithm Using Text](https://reader030.vdocuments.mx/reader030/viewer/2022012719/61b1822029991c562377f2e6/html5/thumbnails/7.jpg)
TextSimilarityBasedApproach(1)
v UsingTextSimilaritytoresolvetheproblemØ S=PxOØ S:similarity,P:propotionofcommonwords,O:orderfactor
v Twologsl1andl2,L1andL2arewordsetsrespectivelyØ defineP:P(l1,l2)=(|L1∩L2|×2)/(|L1|+|L2|)Ø defineO:O(l1,l2)=SeqSim(l1,l2)/|L1∩L2|Ø henceS:S(l1,l2)=(SeqSim(l1,l2)×2)/(|L1|+|L2|)
v Bythis,logsindifferentlengthscanbecompared
![Page 8: Improvement of Log Pattern Extracting Algorithm Using Text](https://reader030.vdocuments.mx/reader030/viewer/2022012719/61b1822029991c562377f2e6/html5/thumbnails/8.jpg)
TextSimilarityBasedApproach(2)
v UsingLongestCommonSubsequencetodefineSeqSim(l1,l2)Ø S(l1,l2)=(|LCS(l1,l2)|×2)/(|L1|+|L2|)Ø SamepatternifS(l1,l2)≥t,wheretisthepredefinedthreshold
v TheprocessofimprovedlogpatternextractingalgorithmØ setthethresholdvaluet.SettheinitiallogpatternsetPtobean
emptysetØ foranewloglappearingfromtheinputlogsetL,computeSi(l,pi)
betweenlandeverypi∈PusingaLCSalgorithmØ ifthereisnoSi(l,pi)≥t,addltoPØ afteralllogsinLhavebeenchecked,returnP
v IncreasetimecostforsinglecomparisonØ butreducetotalnumberofcomparisonsØ canbeoffsetbychoosingabetterLCSalgorithm
![Page 9: Improvement of Log Pattern Extracting Algorithm Using Text](https://reader030.vdocuments.mx/reader030/viewer/2022012719/61b1822029991c562377f2e6/html5/thumbnails/9.jpg)
TextSimilarityBasedApproach(3)
v ExperimentresultØ numbersofextractedpatterns
![Page 10: Improvement of Log Pattern Extracting Algorithm Using Text](https://reader030.vdocuments.mx/reader030/viewer/2022012719/61b1822029991c562377f2e6/html5/thumbnails/10.jpg)
TextSimilarityBasedApproach(3)
v ExperimentresultØ timecostsofcandidatealgorithms(inmilliseconds)
![Page 11: Improvement of Log Pattern Extracting Algorithm Using Text](https://reader030.vdocuments.mx/reader030/viewer/2022012719/61b1822029991c562377f2e6/html5/thumbnails/11.jpg)
ModifiedPatternComparingModel(1)
v TheoriginalmodelisbadintimecostofsearchingpatternsØ hastovisitallpatternsuntiltheoneismet
v UsehashmaptoacceleratethematchingØ dividepatternsetintosubsetsbyinitialwordsØ skipmajorityofpatternsinirrelevantsubsets
v Matchingprocess:1. getinitialwordofthelog2. hashtheword3. finddesiredsubsetinhashmap4. comparewithpatterns
inthesubset
![Page 12: Improvement of Log Pattern Extracting Algorithm Using Text](https://reader030.vdocuments.mx/reader030/viewer/2022012719/61b1822029991c562377f2e6/html5/thumbnails/12.jpg)
ModifiedPatternComparingModel(2)
v ThisapproachcannotdealwithpatternswithunfixedinitialsØ buildanunfixedpatternset
v Inrealsystem,wesplitpatternsetin4parts:Ø fixedalertpatternsetØ unfixedalertpatternsetØ fixednormalpatternsetØ unfixednormalpatternset
v Whenanewlogcomes,itiscomparedinthe4setsinturntodecideprocessingmethods
![Page 13: Improvement of Log Pattern Extracting Algorithm Using Text](https://reader030.vdocuments.mx/reader030/viewer/2022012719/61b1822029991c562377f2e6/html5/thumbnails/13.jpg)
ModifiedPatternComparingModel(3)
v Realtimecostcomparisonbetweenoriginal&modifiedmodels
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
1800000
originalmodel modifiedmodel
cronmillisecond
0
500000
1000000
1500000
2000000
2500000
3000000
originalmodel modifiedmodel
maillogmillisecond
0
100000
200000
300000
400000
500000
600000
originalmodel modifiedmodel
securemillisecond
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
8000000
9000000
10000000
originalmodel modifiedmodel
messagesmillisecond
![Page 14: Improvement of Log Pattern Extracting Algorithm Using Text](https://reader030.vdocuments.mx/reader030/viewer/2022012719/61b1822029991c562377f2e6/html5/thumbnails/14.jpg)
Summary&FutureWork
v Logpatterns:usedtobuildlogrecognitionv AlgorithmofIWRisn’tcapabletomatchlogsindifferent
lengthsv UsingtheideaoftextsimilarityandLCStoimprovethe
algorithmv Modifylogcomparingmodeltoacceleratetheprocess
v Futurework:logpatternbasedanalysesinCNGridØ logpatternassociationsØ logflowfeaturemodeling