plagirism checker

17
Plagiarism Checker

Upload: hossam-nabih

Post on 18-Jul-2015

130 views

Category:

Data & Analytics


6 download

TRANSCRIPT

Page 1: Plagirism checker

Plagiarism Checker

Page 2: Plagirism checker

What is Plagiarism ?

to steal and pass off (the ideas or words of another)

as one's own

to use (another's production) without crediting the source

to commit literary theft

to present as new and original an idea or product

derived from an existing source

Not just Copying or borrowing

Page 3: Plagirism checker

Types of Plagiarism ?

CLONESubmitting another’s work, word-for-word, as one’s own

CTRL-CContains significant portions of text from a single source without alterations

FIND - REPLACEChanging key words and phrases but retaining the essential content of the source

REMIXParaphrases from multiple sources, made to fit together

RECYCLEBorrows generously from the writer’s previous work without citation

HYBRIDCombines perfectly cited sources with copied passages without citation

MASHUPMixes copied material from multiple sources

404 ERRORIncludes citations to non-existent or inaccurate information about sources

AGGREGATORIncludes proper citation to sources but the paper contains almost no original work

RE-TWEETIncludes proper citation, but relies too closely on the text’s original wording and/or structure

Page 4: Plagirism checker

Algorithm

Page 5: Plagirism checker
Page 6: Plagirism checker

How To do it practically Document 1

• A document is a written, drawn, presented or recorded representation of thoughts. Originating from the Latin Documentum meaning lesson -the verb doceō means to teach, and is pronounced similarly, in the past it was usually used as a term for a written proof used as evidence. In the computer age, a document is usually used to describe a primarily textual file, along with its structure and design, such as fonts, colors and additional images.

Document 2

• A document is a written, drawn, presented or recorded representation of thoughts. Originating from the Latin Documentum meaning lesson -the verb doceō means to teach, and is pronounced similarly, in the past it was usually used as a term for a written proof used as evidence. In the computer age, a document is usually used to describe a primarily textual file, along with its structure and design, such as fonts, colors and additional images.

Threeshold

Page 7: Plagirism checker

Algorithm 1 (document

level),

Algorithm 3 (sentence

level).

(Lexical semantics )-lesk

WordNet

Algorithm 2 (paragraph

level),

Page 8: Plagirism checker

Two input documents

• Input : DocA, DocB // Two input documents

• Output: similarity

• Begin

• DocMinSize = min (|DocA|, |DocB|)

• DocIntersectionSize = |DocA ∩ DocB|

• If (DocIntersectionSize >= DocMinSize*DocThreshold)

• Then

• //Possible similarity

• //Check similarity at paragraph level

• similarity = true

• Else

• similarity = false

• End

Page 9: Plagirism checker

Two input paragraphs

• Input : ParA, ParB // Two input paragraphs

Output: similarity

• Begin

• ParMinSize = min (|ParA|, |ParB|)

• ParIntersectionSize = |ParA ∩ ParB|

• If (ParIntersectionSize >= ParMinSize*ParThreshold)

• Then

• //Possible similarity

• //Check similarity at sentence level

• similarity = true

• Else

• similarity = false

• End

Page 10: Plagirism checker

Sentence level

• Algorithm 3: Sentence level heuristic

• Input : SenA, SenB

• Output: similarity, similar substrings in SenA and SenB

• Begin

• SenMinSize = min(|SenA|, |SenB|)

• SenIntersectionSize = |SenA ∩ SenB|

• If (SenIntersectionSize >= SenMinSize*SenThreshold)

• Then

• //Similarity detected

• //Determine similar

• //substrings

• similarity = true

• Else

• similarity = false

• Else

• similarity = false

• End

Page 11: Plagirism checker

WordnetWordNet

•A very large lexical database of English:

–117K nouns, 11K verbs, 22K adjectives, 4.5K adverbs

•Word senses grouped into synonym sets (“synsets”) linked into a conceptual-semantic hierarchy

–82K noun synsets, 13K verb synsets, 18K adjectives synsets, 3.6K adverb synsets

–Avg. # of senses: 1.23/noun, 2.16/verb, 1.41/adj, 1.24/adverb

•Conceptual-semantic relations

–hypernym/hyponym

Page 12: Plagirism checker

Lesk algorithm

Compare the context with the dictionary definition of the sense

–Construct the signatureof a word in context by the signatures of its senses in the dictionary

•Signature= set of context words (in examples/gloss or in context)

–Assign the dictionary sense whose gloss and examples are the most similarto the context in which the word occurs

•Similarity = size of intersection of context signature and sense signature

Page 13: Plagirism checker

Sense signatures -------bank1

Gloss: a financial institution that accepts deposits and channels

the moneyinto lending activities

Examples: “he cashedthe checkat the bank”,

“that bank holdsthe mortgageon my home”

------bank2

Gloss: slopingland(especially the slopebeside a bodyof water)

Examples: “they pulledthe canoeup on the bank”,

“he saton the bank of the riverand watchedthe current”

Signature(bank1) = {financial, institution, accept, deposit,

channel, money, lend, activity, cash, check, hold, mortgage, home}

Signature(bank1) = {slope, land, body, water, pull, canoe, sit,

river, watch, current}

Page 14: Plagirism checker

Final Result Uniqe

Also may be containing a report with details

Page 15: Plagirism checker

Team Members NLP

Eslam Hamouda

Ahmed Wahdan

HossamNabih

Mohamed Shalan

Page 16: Plagirism checker

Demo

Page 17: Plagirism checker

Thank You