ainl 2016: kravchenko

Solution for workshop

AINL FRUCT: Artificial Intelligence and Natural

Language Conference

10-12 NOVEMBER 2016SAINT-PETERSBURG

http://ainlconf.ru/

Paraphrase Detection using Semantic Similarity Algorithms

Dmitry Kravchenko

Ben-Gurion University of the Negev

Tasks description

Input:

2 files with list of pairs of sentences in Russian in XML format:

a) training set

b) test set

Output:

Task 1:

Algorithm should classify each pair into one of three classes: Non-paraphrase, Near-paraphrase, Precise-paraphrase

Task 2:

Algorithm should classify each pair into one of three classes: Non-paraphrase, Paraphrase

Algorithm Data-Flow

SEMILAR Toolkit

DKPro Similarity

Python difflib

NLTK WordNet

Swoogle

BLEU algorithms

Google

Yandex

MicrosoftGradient BoostingClassifier

Input

substitution of acronymsusing onlinedictionary:

wiktionary.org

Output

Classification algorithm● GradientBooster Classifier

● Task 1:– Feature vector which contain 77 features:

● 18 features: 6 scores of SEMILAR toolkit * 3 translation engines

● 39 features: 13 scores of DKPro Similarity toolkit * 3 translation engines

● 3 features: 1 python difflib similarity score * 3 translation engines

● 6 features: 2 scores of sentence similarity scores (Yuhua Li, David McLean, etc. et al) * 3 translation engines

● 3 features: 1 score of Swoogle comparator * 3 translations● 8 BLEU scores on source sentences (in Russian)

Classification algorithm● Task 2:

– Feature vector which contain 69 features:● 18 features: 6 scores of SEMILAR toolkit * 3 translation

engines● 39 features: 13 scores of DKPro Similarity toolkit * 3

translation engines● 3 features: 1 python difflib similarity score * 3 translation

engines● 6 features: 2 scores of sentence similarity scores (Yuhua Li,

David McLean, etc. et al) * 3 translation engines● 3 features: 1 score of Swoogle comparator * 3 translations

● (without BLEU scores)

6 scores of SEMILAR toolkit

● greedyComparerWNLin● optimumComparerLSATasa ● dependencyComparerWnLeskTanim● cmComparer● bleuComparer● lsaComparer

greedyComparerWNLin

This score refers to a sentence to sentence similarity method which greedily aligns words between given sentences. The word alignment method used is WordNet based method proposed by Lin in 1998: article name is “An information-theoretic definition of similarity”.

Please refer to:

A Comparison of Greedy and Optimal Assessment of Natural Language Student Input Using Word-to-Word Similarity Metrics

http://www.aclweb.org/website/old_anthology/W/W12/W12-20.pdf#page=175

optimumComparerLSATasa

Similar to greedyComparerWNLin, but the words are aligned optimally (similar to job assignment problem) and the word-to-word similarity method

Article name is: Latent Semantic Analysis Models on Wikipedia and TASA

http://deeptutor2.memphis.edu/Semilar-Web/public/downloads/LSA-Models-LREC014/LSAModelsOnWikipediaAndTASADanEtAl-LREC014.pdf

dependencyComparerWnLeskTanim

Please see:● https://www.aaai.org/ocs/index.php/FLAIRS/200

9/paper/viewFile/55/298.

The word-to-word similarity method used.

It is WordNet based method proposed by Lesk and Tanim

cmComparer

Method proposed by Corley and Mihalcea. (article name is: SEMILAR: The Semantic Similarity Toolkit)

lsaComparer

LSA based word representation are summed up for each sentence and the similarity is calculated using the resultant representation.

● (resultant Vector based method is described in the article: NeRoSim: A System for Measuring and Interpreting Semantic Textual Similarity http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval030.pdf)

Word-to-word Similarity score

Article: NeRoSim: A System for Measuring and Interpreting Semantic Textual Similarity

13 scores of DKPro Similarity toolkit

● CosineSimilarity,

● ExactStringMatchComparator,

● GreedyStringTiling2-gram, GreedyStringTiling 4-gram,

● JaroSecondStringComparator,

● JaroWinklerSecondStringComparator,

● normalized LevenshteinComparator,

● LongestCommonSubsequenceNormComparator,

● SubstringMatchComparator,

● WordNGramContainmentMeasure,

● WordNGram-JaccardMeasure 2-gram, WordNGramJaccardMeasure 3-gram, WordNGramJaccardMeasure 4-gram

Four rest Toolkits

● Python difflib comparator ● NLTK WordNet. Sentence similarity scores

(Yuhua Li, David McLean, etc. et al) ● Swoogle comparator ● BLEU scores (for Russian language, no need

for English translation): bleu def 1-gram, bleu def 2-gram, bleu def 3-gram, bleu def 4-gram, bleu lin 1-gram, bleu lin 2-gram, bleu lin 3-gram, bleu lin 4-gram

Results on Test Set

Task number Accuracy F1 macro Place

First Task Standard 0.5695 0.5437 4 out of 11

Second Task Standard 0.7153 0.7853 6 out of 10

Which impact Toolkits gave?

SEMILAR DKPro Similarity Swoogle NLTK WordNet Python difflib66.00

68.00

70.00

72.00

74.00

76.00

78.00

80.00

82.00

80.1379.52

78.94 78.76

75.92

77.02

75.78

75.03 75.02

71.36

Accuracy F1 macro

5-fold cross validation results on the Training Set Second Task

Which Translation Engine is Better?5-fold cross validation results on the Training Set Second Task

Symbols for Toolkit on X axis: 1: SEMILAR 2: DKPro Similarity 3: Python difflib 4: NLTK WordNet 5: Swoogle 6: All 5 Toolkits together

Conclusion

By using this algorithm we can detect semantic similarity not only for Russian language, but for

any other language, which translation is available via translation engines.

Thank you!

ainl 2016: kravchenko

Science