reputation network analysis for email filtering ravi emani ramesh ravindran
TRANSCRIPT
Reputation Network Reputation Network Analysis for Email FilteringAnalysis for Email Filtering
Ravi EmaniRavi EmaniRamesh RavindranRamesh Ravindran
Describes about…Describes about…
E-mail Scoring mechanism based on a E-mail Scoring mechanism based on a social network augmented with reputation social network augmented with reputation ratingsratings
Algorithm for inferring reputation ratingsAlgorithm for inferring reputation ratings
Integration into a mail application – Integration into a mail application – TrustMailTrustMail
Preventing Spam…Preventing Spam…
Trying to prevent spam from even reaching the Trying to prevent spam from even reaching the user’s mailboxuser’s mailbox
Methods:Methods:
- Whitelist filters- Whitelist filters
- Social Networks- Social Networks
- Connecting Users - Connecting Users
Whitelist FiltersWhitelist Filters
Messages accepted according to a list of Messages accepted according to a list of approved addresses created by the userapproved addresses created by the user
AdvantagesAdvantages
- No spam in user’s inbox- No spam in user’s inbox
- Filters the spam into a low-priority folder- Filters the spam into a low-priority folder
DisadvantagesDisadvantages
-Extra burden on the user-Extra burden on the user
-Filters even the valid emails-Filters even the valid emails
Social NetworksSocial Networks
Proposed by Boykin and RoychowdhuryProposed by Boykin and RoychowdhurySocial network created from the messages Social network created from the messages received by the userreceived by the userMessages identified as spam, valid or Messages identified as spam, valid or unknown based on clustering thresholds unknown based on clustering thresholds and structural properties like the and structural properties like the propensity for local clustering. propensity for local clustering. Classifies about 50% of user’s email into Classifies about 50% of user’s email into spam or other valid categoriesspam or other valid categories
Optimization…Optimization…
Extension of whitelisting and social network Extension of whitelisting and social network based filteringbased filteringUses a network that connects usersUses a network that connects usersA score of ‘reputation’ or ‘trust’ is assigned by A score of ‘reputation’ or ‘trust’ is assigned by the users to the people they knowthe users to the people they knowResults in a large reputation network connecting Results in a large reputation network connecting thousands of usersthousands of usersMessages sorted by the score shown next to the Messages sorted by the score shown next to the messages in the inboxmessages in the inbox
Optimization…Optimization…
Overcomes the problem of the whitelistsOvercomes the problem of the whitelists
More reliable than the whitelists even though the More reliable than the whitelists even though the user takes the burden for creating an initial set of user takes the burden for creating an initial set of reputation ratingsreputation ratings
Less work comparatively Less work comparatively
Creating the Reputation NetworkCreating the Reputation Network
Uses a Distributed, web based social Uses a Distributed, web based social networknetworkReputation rating inferred from one user to Reputation rating inferred from one user to anotheranotherIndividuals are connected to each person Individuals are connected to each person they ratedthey ratedResults in a large interconnected network Results in a large interconnected network of usersof users
How is it related to Semantic Web?How is it related to Semantic Web?
The only requirement is that the The only requirement is that the individuals should assert their reputation individuals should assert their reputation ratings for one another in the networkratings for one another in the networkIndividuals will be controlling their own Individuals will be controlling their own datadataData is maintained in a distributed fashionData is maintained in a distributed fashionData can be stored anywhere and Data can be stored anywhere and integrated through a common foundationintegrated through a common foundation
Role of Semantic Web...Role of Semantic Web...
Semantic web, along with its component Semantic web, along with its component languages RDF, RDFS, OWL utilize web languages RDF, RDFS, OWL utilize web architecturearchitectureSupports distributed data managementSupports distributed data managementUsers create ontologies with classes and Users create ontologies with classes and properties and hence instancesproperties and hence instancesThe instances of the classes help in The instances of the classes help in describing the data on the webdescribing the data on the web
FOAF ProjectFOAF Project
Friend-Of-A-Friend project developed on Friend-Of-A-Friend project developed on Semantic WebSemantic WebAn ontological vocabulary for describing An ontological vocabulary for describing people and their relationshipspeople and their relationshipsExtended by providing a mechanism Extended by providing a mechanism describing the reputation relationshipsdescribing the reputation relationshipsAllows people to rate the reputation or Allows people to rate the reputation or trustworthiness of another persontrustworthiness of another person
Fig: The reputation network developed as part of the semantic web trust project at http://trust.mindswap.org.
Algorithms for Inferring Reputation Algorithms for Inferring Reputation between Individualsbetween Individuals
Recommendations are made to one Recommendations are made to one person(person(sourcesource) about the reputation of ) about the reputation of another person(another person(sinksink))Trust and reputation literature contains Trust and reputation literature contains many different metricsmany different metricsThese metrics are categorized according These metrics are categorized according to the perspective used for making to the perspective used for making calculationscalculations
Perspective in Reputation Perspective in Reputation Inference AlgorithmsInference Algorithms
GlobalGlobal metrics calculate a single value for each metrics calculate a single value for each entity in the networkentity in the networkLocal Local metrics calculate a reputation rating for an metrics calculate a reputation rating for an individual in the networkindividual in the networkIn In global global system an entity will always have the system an entity will always have the same inferred ratingsame inferred ratingIn In locallocal system an entity could be rated system an entity could be rated differently depending on the node the inference differently depending on the node the inference is made foris made for
Perspective in Reputation Perspective in Reputation Inference AlgorithmsInference Algorithms
Global metrics can be highly Global metrics can be highly effective in situations where the effective in situations where the experiences of users are similarexperiences of users are similar
Local metrics can be appropriate Local metrics can be appropriate where user’s opinions vary about where user’s opinions vary about the same topicthe same topic
A
DC
B
E
10
1 9
10
Accurate Metrics for Inferring Accurate Metrics for Inferring ReputationReputation
The inferred rating from the source to the sink is given by a weighted average of the neighbors’ reputation ratings of the sink.Reputation rating ‘t’ from Reputation rating ‘t’ from sourcesource ‘i’ to ‘i’ to sinksink ‘s’ is ‘s’ is written as ‘written as ‘ttisis’’
No inference needed if source is directly No inference needed if source is directly connected to the sinkconnected to the sinkIf not, the reputation rating is calculated by weighted average of the reputation ratings returned for the sink by each of its n neighbors.
getRating(source, sink)
mark source as seen
if source has no rating for sink
denom = 0
num = 0
for each j in neighbors(source)
if j has not been seen
denom ++ j2sink = in(rating(source,j),getRating(j,sink))
num += rating(source,j) * j2sink
mark j unseen
rating(source,sink) = num/denom
return rating(source,sink)
Accurate metrics for Inferring Accurate metrics for Inferring ReputationReputation
n
j ij
ijjsis t
ttt0
2js ij if
js ij if *
tttt
The concise representation of how tis is weighted is shown as follows:
The condition in this formula ensures that the source will never trust the sink more than any intermediate node
Reputation Metric EvaluationReputation Metric Evaluation
To determine the accuracy of this metricTo determine the accuracy of this metric
Reputation rating tReputation rating tijij is recorded for each is recorded for each
neighbor ‘j’ by iterating through each neighbor ‘j’ by iterating through each individual ‘i’ in the networkindividual ‘i’ in the network
Later the connection from i to j is removed Later the connection from i to j is removed and the reputation rating tand the reputation rating tijij` is recorded` is recorded
The accuracy is measured as |tThe accuracy is measured as |tijij-t-tijij`|`|
TrustMail: A PrototypeTrustMail: A Prototype
Message Scoring SystemMessage Scoring System
Adds reputation ratings to the folder views Adds reputation ratings to the folder views of a messageof a message
Helps sort messages accordingly by the Helps sort messages accordingly by the user after he sees the reputation ratingsuser after he sees the reputation ratings
Highlights the important and relevant Highlights the important and relevant messagesmessages
Conclusion and Future WorkConclusion and Future Work
Our algorithm infers reputation Our algorithm infers reputation relationships in a networkrelationships in a network
Benefit - Valid emails from unknown Benefit - Valid emails from unknown people can receive high scores because of people can receive high scores because of the connections within the social networkthe connections within the social network
Future work involves the refinement of the Future work involves the refinement of the algorithm for inferring reputation ratingsalgorithm for inferring reputation ratings
Conclusion and Future workConclusion and Future work
May involve developing and studying the May involve developing and studying the TrustMail interfaceTrustMail interfaceThe number of ratings received will The number of ratings received will change with the size of a networkchange with the size of a networkImportant issues to be consideredImportant issues to be considered-Techniques combining best with -Techniques combining best with reputation filteringreputation filtering
- Percentage of messages accurately - Percentage of messages accurately scoredscored
ReferencesReferences
Boykin, P. O. & Roychowdhury, V. Personal email Boykin, P. O. & Roychowdhury, V. Personal email networks: an effective anti-spam tool networks: an effective anti-spam tool http://www.arxiv.org/abs/cond-mat/0402143, (2004).http://www.arxiv.org/abs/cond-mat/0402143, (2004).http://sites.wiwiss.fu-berlin.de/suhl/bizer/SWTSGuide/http://sites.wiwiss.fu-berlin.de/suhl/bizer/SWTSGuide/RDFWeb: FOAF: ‘The Friend of a Friend Vocabulary’, RDFWeb: FOAF: ‘The Friend of a Friend Vocabulary’, http://xmlns.com/foaf/0.1/http://xmlns.com/foaf/0.1/Golbeck, Jennifer, Bijan Parsia, James Hendler, “Trust Golbeck, Jennifer, Bijan Parsia, James Hendler, “Trust Networks on the Semantic Web,”Networks on the Semantic Web,”Richardson, Matthew, Rakesh Agrawal, Pedro Richardson, Matthew, Rakesh Agrawal, Pedro Domingos. “Trust Management for the Semantic Web,” Domingos. “Trust Management for the Semantic Web,” Proceedings of the Second International Semantic Web Proceedings of the Second International Semantic Web Conference, Sanibel Island, Florida, 2003.Conference, Sanibel Island, Florida, 2003.