ainl 2016: kravchenko
TRANSCRIPT
Solution for workshop
AINL FRUCT: Artificial Intelligence and Natural
Language Conference
10-12 NOVEMBER 2016SAINT-PETERSBURG
http://ainlconf.ru/
Paraphrase Detection using Semantic Similarity Algorithms
Dmitry Kravchenko
Ben-Gurion University of the Negev
Tasks description
Input:
2 files with list of pairs of sentences in Russian in XML format:
a) training set
b) test set
Output:
Task 1:
Algorithm should classify each pair into one of three classes: Non-paraphrase, Near-paraphrase, Precise-paraphrase
Task 2:
Algorithm should classify each pair into one of three classes: Non-paraphrase, Paraphrase
Algorithm Data-Flow
SEMILAR Toolkit
DKPro Similarity
Python difflib
NLTK WordNet
Swoogle
BLEU algorithms
Yandex
MicrosoftGradient BoostingClassifier
Input
substitution of acronymsusing onlinedictionary:
wiktionary.org
Output
Classification algorithm● GradientBooster Classifier
● Task 1:– Feature vector which contain 77 features:
● 18 features: 6 scores of SEMILAR toolkit * 3 translation engines
● 39 features: 13 scores of DKPro Similarity toolkit * 3 translation engines
● 3 features: 1 python difflib similarity score * 3 translation engines
● 6 features: 2 scores of sentence similarity scores (Yuhua Li, David McLean, etc. et al) * 3 translation engines
● 3 features: 1 score of Swoogle comparator * 3 translations● 8 BLEU scores on source sentences (in Russian)
Classification algorithm● Task 2:
– Feature vector which contain 69 features:● 18 features: 6 scores of SEMILAR toolkit * 3 translation
engines● 39 features: 13 scores of DKPro Similarity toolkit * 3
translation engines● 3 features: 1 python difflib similarity score * 3 translation
engines● 6 features: 2 scores of sentence similarity scores (Yuhua Li,
David McLean, etc. et al) * 3 translation engines● 3 features: 1 score of Swoogle comparator * 3 translations
● (without BLEU scores)
6 scores of SEMILAR toolkit
● greedyComparerWNLin● optimumComparerLSATasa ● dependencyComparerWnLeskTanim● cmComparer● bleuComparer● lsaComparer
greedyComparerWNLin
This score refers to a sentence to sentence similarity method which greedily aligns words between given sentences. The word alignment method used is WordNet based method proposed by Lin in 1998: article name is “An information-theoretic definition of similarity”.
Please refer to:
A Comparison of Greedy and Optimal Assessment of Natural Language Student Input Using Word-to-Word Similarity Metrics
http://www.aclweb.org/website/old_anthology/W/W12/W12-20.pdf#page=175
optimumComparerLSATasa
Similar to greedyComparerWNLin, but the words are aligned optimally (similar to job assignment problem) and the word-to-word similarity method
Article name is: Latent Semantic Analysis Models on Wikipedia and TASA
http://deeptutor2.memphis.edu/Semilar-Web/public/downloads/LSA-Models-LREC014/LSAModelsOnWikipediaAndTASADanEtAl-LREC014.pdf
dependencyComparerWnLeskTanim
Please see:● https://www.aaai.org/ocs/index.php/FLAIRS/200
9/paper/viewFile/55/298.
The word-to-word similarity method used.
It is WordNet based method proposed by Lesk and Tanim
cmComparer
Method proposed by Corley and Mihalcea. (article name is: SEMILAR: The Semantic Similarity Toolkit)
lsaComparer
LSA based word representation are summed up for each sentence and the similarity is calculated using the resultant representation.
● (resultant Vector based method is described in the article: NeRoSim: A System for Measuring and Interpreting Semantic Textual Similarity http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval030.pdf)
Word-to-word Similarity score
Article: NeRoSim: A System for Measuring and Interpreting Semantic Textual Similarity
13 scores of DKPro Similarity toolkit
● CosineSimilarity,
● ExactStringMatchComparator,
● GreedyStringTiling2-gram, GreedyStringTiling 4-gram,
● JaroSecondStringComparator,
● JaroWinklerSecondStringComparator,
● normalized LevenshteinComparator,
● LongestCommonSubsequenceNormComparator,
● SubstringMatchComparator,
● WordNGramContainmentMeasure,
● WordNGram-JaccardMeasure 2-gram, WordNGramJaccardMeasure 3-gram, WordNGramJaccardMeasure 4-gram
Four rest Toolkits
● Python difflib comparator ● NLTK WordNet. Sentence similarity scores
(Yuhua Li, David McLean, etc. et al) ● Swoogle comparator ● BLEU scores (for Russian language, no need
for English translation): bleu def 1-gram, bleu def 2-gram, bleu def 3-gram, bleu def 4-gram, bleu lin 1-gram, bleu lin 2-gram, bleu lin 3-gram, bleu lin 4-gram
Results on Test Set
Task number Accuracy F1 macro Place
First Task Standard 0.5695 0.5437 4 out of 11
Second Task Standard 0.7153 0.7853 6 out of 10
Which impact Toolkits gave?
SEMILAR DKPro Similarity Swoogle NLTK WordNet Python difflib66.00
68.00
70.00
72.00
74.00
76.00
78.00
80.00
82.00
80.1379.52
78.94 78.76
75.92
77.02
75.78
75.03 75.02
71.36
Accuracy F1 macro
5-fold cross validation results on the Training Set Second Task
Which Translation Engine is Better?5-fold cross validation results on the Training Set Second Task
Symbols for Toolkit on X axis: 1: SEMILAR 2: DKPro Similarity 3: Python difflib 4: NLTK WordNet 5: Swoogle 6: All 5 Toolkits together
Conclusion
By using this algorithm we can detect semantic similarity not only for Russian language, but for
any other language, which translation is available via translation engines.
Thank you!