plagirism checker
TRANSCRIPT
Plagiarism Checker
What is Plagiarism ?
to steal and pass off (the ideas or words of another)
as one's own
to use (another's production) without crediting the source
to commit literary theft
to present as new and original an idea or product
derived from an existing source
Not just Copying or borrowing
Types of Plagiarism ?
CLONESubmitting another’s work, word-for-word, as one’s own
CTRL-CContains significant portions of text from a single source without alterations
FIND - REPLACEChanging key words and phrases but retaining the essential content of the source
REMIXParaphrases from multiple sources, made to fit together
RECYCLEBorrows generously from the writer’s previous work without citation
HYBRIDCombines perfectly cited sources with copied passages without citation
MASHUPMixes copied material from multiple sources
404 ERRORIncludes citations to non-existent or inaccurate information about sources
AGGREGATORIncludes proper citation to sources but the paper contains almost no original work
RE-TWEETIncludes proper citation, but relies too closely on the text’s original wording and/or structure
Algorithm
How To do it practically Document 1
• A document is a written, drawn, presented or recorded representation of thoughts. Originating from the Latin Documentum meaning lesson -the verb doceō means to teach, and is pronounced similarly, in the past it was usually used as a term for a written proof used as evidence. In the computer age, a document is usually used to describe a primarily textual file, along with its structure and design, such as fonts, colors and additional images.
Document 2
• A document is a written, drawn, presented or recorded representation of thoughts. Originating from the Latin Documentum meaning lesson -the verb doceō means to teach, and is pronounced similarly, in the past it was usually used as a term for a written proof used as evidence. In the computer age, a document is usually used to describe a primarily textual file, along with its structure and design, such as fonts, colors and additional images.
Threeshold
Algorithm 1 (document
level),
Algorithm 3 (sentence
level).
(Lexical semantics )-lesk
WordNet
Algorithm 2 (paragraph
level),
Two input documents
• Input : DocA, DocB // Two input documents
• Output: similarity
• Begin
• DocMinSize = min (|DocA|, |DocB|)
• DocIntersectionSize = |DocA ∩ DocB|
• If (DocIntersectionSize >= DocMinSize*DocThreshold)
• Then
• //Possible similarity
• //Check similarity at paragraph level
• similarity = true
• Else
• similarity = false
• End
Two input paragraphs
• Input : ParA, ParB // Two input paragraphs
Output: similarity
• Begin
• ParMinSize = min (|ParA|, |ParB|)
• ParIntersectionSize = |ParA ∩ ParB|
• If (ParIntersectionSize >= ParMinSize*ParThreshold)
• Then
• //Possible similarity
• //Check similarity at sentence level
• similarity = true
• Else
• similarity = false
• End
Sentence level
• Algorithm 3: Sentence level heuristic
• Input : SenA, SenB
• Output: similarity, similar substrings in SenA and SenB
• Begin
• SenMinSize = min(|SenA|, |SenB|)
• SenIntersectionSize = |SenA ∩ SenB|
• If (SenIntersectionSize >= SenMinSize*SenThreshold)
• Then
• //Similarity detected
• //Determine similar
• //substrings
• similarity = true
• Else
• similarity = false
• Else
• similarity = false
• End
WordnetWordNet
•A very large lexical database of English:
–117K nouns, 11K verbs, 22K adjectives, 4.5K adverbs
•Word senses grouped into synonym sets (“synsets”) linked into a conceptual-semantic hierarchy
–82K noun synsets, 13K verb synsets, 18K adjectives synsets, 3.6K adverb synsets
–Avg. # of senses: 1.23/noun, 2.16/verb, 1.41/adj, 1.24/adverb
•Conceptual-semantic relations
–hypernym/hyponym
Lesk algorithm
Compare the context with the dictionary definition of the sense
–Construct the signatureof a word in context by the signatures of its senses in the dictionary
•Signature= set of context words (in examples/gloss or in context)
–Assign the dictionary sense whose gloss and examples are the most similarto the context in which the word occurs
•Similarity = size of intersection of context signature and sense signature
Sense signatures -------bank1
Gloss: a financial institution that accepts deposits and channels
the moneyinto lending activities
Examples: “he cashedthe checkat the bank”,
“that bank holdsthe mortgageon my home”
------bank2
Gloss: slopingland(especially the slopebeside a bodyof water)
Examples: “they pulledthe canoeup on the bank”,
“he saton the bank of the riverand watchedthe current”
Signature(bank1) = {financial, institution, accept, deposit,
channel, money, lend, activity, cash, check, hold, mortgage, home}
Signature(bank1) = {slope, land, body, water, pull, canoe, sit,
river, watch, current}
Final Result Uniqe
Also may be containing a report with details
Team Members NLP
Eslam Hamouda
Ahmed Wahdan
HossamNabih
Mohamed Shalan
Demo
Thank You