1 lu yang, biplab sarker, virendrakumar c. bhavsar and harold boley [email protected] faculty of...
TRANSCRIPT
1
Lu Yang, Biplab Sarker, Lu Yang, Biplab Sarker, Virendrakumar C. BhavsarVirendrakumar C. Bhavsar and Harold Boley and Harold Boley
[email protected]@unb.caFaculty of Computer ScienceFaculty of Computer Science
University of New Brunswick (UNB)University of New Brunswick (UNB)Fredericton, CanadaFredericton, Canada
IICAI, December 20, 2005IICAI, December 20, 2005
Range Similarity Measures between Buyers and Sellers in e-Marketplaces
2
Agenda Agenda • Motivation• Partonomy Tree Similarity Algorithm
• Tree representation• Partonomy similarity• Non-semantic matching on nodes
• Semantic Matching• Inner nodes vs. leaf nodes• Global similarity measure (for inner nodes)
• Taxonomic class similarity• Encoding subtaxonomies into partonomy trees
• Local similarity measures (for leaf nodes)• Conclusion
3
Main Server
User Info
User Profiles
User Agents
…
…
Agents
…
…
Matcher1 Matchern
To other sites (network)
Web Browser
User
e-Market
• e-business, e-learning …• Buyer-Seller matching• Metadata for buyers and sellers
• Keywords/keyphrases• Trees
• Tree similarity
Motivation
4
Partonomy Tree Similarity Algorithm─ Tree Representation
• Tree representation for product/service descriptions [Bhavsar et al. 2004]
• Characteristics of our trees
• Node-labled, arc-labled and arc-weighted
• Sibling arcs are labled in lexicographical order
• Sibling arc weights sum to 1.0
A simple example “Car” tree:
2002
Car
FordBlack
Make
Color Year0.3
0.2
0.5
5
(si (wi + w'i)/2)
(A(si)(wi + w'i)/2)
A(si) ≥ si
lom
educational
0.5
general
format platform0.50.50.5
Introduction to Oracle
t t´
technical0.3334 0.33330.3333
edu-set gen-set tec-set
language
en
title
HTML WinXP
lom
0.1
general
format platform0.90.80.2
Basic Oracle
technical0.70.3
gen-set tec-setlanguage
en
title
* WinXP
* : Don’t Care
• Partonomy similarity [Bhavsar et al. 2004]
Fragments of learning object trees [Boley et al. 2005] for learning object matching (http://www.cs.unb.ca/agentmatcher)
Partonomy Tree Similarity Algorithm─ Similarity Algorithm
6
• Non-semantic matching on both inner and leaf nodes
• Exact string matching
binary result 0.0 or 1.0
• Permutation of strings
“Java Programming” vs “Programming in Java”
Number of identical words
Maximum length of the two strings
Example 1:
For two node labels “a b c” and “a b d e”, their similarity is:
2
4= 0.5
Partonomy Tree Similarity Algorithm─ Non-Semantic Matching
7
Example 2:Node labels “electric chair” and “committee chair”
1
2= 0.5 meaningful?
• Semantic matching techniques are needed for the above problems
Partonomy Tree Similarity Algorithm─ Non-Semantic Matching
8
Semantic Matching
• Inner nodes vs. leaf nodes• Inner nodes — class-oriented
• Inner node labels can be classes• Classes are located in a taxonomy tree• Taxonomic class similarity measure (global similarity measure)
• Leaf nodes — type-oriented• Address, currency, date, price and so on• Type similarity measures (local similarity measures)
9
Semantic Matching (Cont'd)
String Permutation (both inner
and leaf nodes)
Exact String Matching (both inner
and leaf nodes)
Non-Semantic Matching
Taxonomic Class Similarity
(inner
nodes)
Type Similarity (leaf nodes)
Semantic Matching
10
Distributed Programming
Credit
“Introduction to Distributed Programming”
Textbook
TuitionDuration
$8002months3
0.20.1 0.3
0.4
t1 t2
Object-Oriented Programming
Credit
“Objected-Oriented Programming Essentials”
Textbook
TuitionDuration
$10003months3
0.10.5 0.2
0.2
partonomy trees
• Global similarity measure (for inner nodes) [Yang et al. 2005]
Semantic Matching ─ Global Similarity
11
Programming Techniques
Applicative Programming
0.60.5General
Automatic Programming
Concurrent Programming
Sequential Programming
Object-Oriented Programming
Distributed Programming
Parallel Programming
0.8 0.50.9
0.7
0.7 0.5
• The taxonomy tree of “Programming Techniques” according to the ACM Computing Classification System (http://www.acm.org/class/1998/ccs98.txt)
Semantic Matching ─ A Taxonomy Tree
12
• The arc weights can be determined by human experts or machine learning algorithms [Singh 2005]
• Sibling arc weights do not need to add up to 1
• Three factors that affect the taxonomic class similarity
• The shortest path length between two classes
• Arc weights on the shortest path
• Level difference of two classes
Semantic Matching ─ Taxonomic Class Similarity
13
• Taxonomic class similarity computation [Yang et al. 2005]
21**)1(),( 21cc dd
t
s GMN
NccTS
where
TS(c1, c2) is the taxonomic class similarity of classes c1 and c2
Ns: the number of edges of the shortest path
Nt: the number of edges of the whole tree
M: the product of the arc weights on the shortest path
: the level difference factor where G’s value is in (0.0, 1.0) and is the absolute difference of the depths of classes c1 and c2 (We assume G=0.5 here)
21 cc ddG
||21 cc dd
Semantic Matching ─ Taxonomic Class Similarity
14
Programming Techniques
Applicative Programming
0.60.5General
Automatic Programming
Concurrent Programming
Sequential Programming
Object-Oriented Programming
Distributed Programming
Parallel Programming
0.8 0.5 0.90.7
0.7 0.5
Example
0766.012
5.0*)7.0*5.0*7.0(*)8
31(
)gProgrammin Oriented-Object g,Programmin dDistribute(
TS
• red arrows stop at their nearest common ancestor
Semantic Matching ─ Taxonomic Class Similarity
15
• Encoding subtaxonomy trees into partonomy trees
• A converse task Computes the similarity of pairs of taxonomies e.g. subtaxonomies of the background taxonomy, as required in our Teclantic project (http://teclantic.cs.unb.ca)
• Allows the direct reuse of our partonomy similarity algorithm and permits weighted (or ‘fuzzy’) taxonomic subsumption with no added effort
Semantic Matching ─ Encoding Subtaxonomies
16
Programming Techniques
ApplicativeProgramming
0.1 0.15General
AutomaticProgramming
ConcurrentProgramming
SequentialProgramming
Object-OrientedProgramming
DistributedProgramming
ParallelProgramming
0.3
0.1
0.15
* **
* *
* *
*
0.6 0.4
0.2
• Sibling arc weights must sum up to 1.0
• Classes are represented as arc labels (lexicographical ordered)
• All node labels except the root node label are changed into “Don’t Care”
Background Taxonomy tree of “Programming Techniques” for encoding
Semantic Matching ─ Encoding Subtaxonomies
17
Credit TitleTuition
Duration$800
2months3
0.050.1 0.15
0.05Classification
0.65taxonomy
DistributedProgramming
course
SequentialProgramming
ParallelProgramming
*
*
0.6 0.4
*
*0.7 0.3
1.0Programming Techniques
*
DistributedProgramming
ConcurrentProgramming
Credit TitleDuration
$1000
3months3
0.20.05 0.05
0.05Classification
0.65taxonomy
Object-OrientedProgramming
course
SequentialProgramming
**0.8 0.2
1.0Programming Techniques
*
Tuition
Object-OrientedProgramming
Two course trees with encoded subtaxonomy trees
Semantic Matching ─ Encoding Subtaxonomies
• Weight assignment in the "Classification" branch (two options)
• By human expert
• By machine learning
• Normalizes corresponding weights in the background taxonomy
18
Semantic Matching ─ Local Similarity
• Local similarity measures (for leaf nodes) Special-purpose similarity measures for various data types realizing semantics to be invoked when computing similarity of any two of their instances
• “Price” type
• “Date” type [Yang et al. 2005]
• . . .
19
• Price
• Price is the omnipresent factor that determines buyers’ and sellers’ decision-making
• Price similarity seems to be asymmetric for buyers and sellers
e.g. buyer asks $800 and seller asks $1000 — Unsuccessful buyer asks $1000 and seller asks $800 — Successful The similarity of $800 and $1000 is different for the above cases
Semantic Matching ─ Price Matching
20
• Transform the asymmetry to symmetry
• Buyers and sellers always have price ranges in their minds [Bpref, Bmax] and [Smin, Spref]
Bpref : buyer’s preferred price
Bmax : buyer’s maximum acceptable price
Smin : seller’s minimum acceptable price
Spref : seller’s preferred price
• Our price-range similarity measure is based on the intuition that the greater the overlap between the buyer’s and seller’s price ranges, the higher is their similarity value
Semantic Matching ─ Price Matching
21
PriceRangeSim ([Bpref, Bmax], [Smin, Spref])Begin If Spref <= Bpref similarity = 1.0 else if Bmax < Smin similarity = 0.0 else if Bmax = Smin
similarity = else { MIN = min{MIN, Smin} MAX = max{MAX, Bmax}
similarity = } return similarity End.• This algorithm can be easily adapted to the “price”-typed attributes
e.g. “salary range” in job seeking and recruiting e-Market
• Pseudo code of the price-range similarity algorithm
MINMAX
005.0
MINMAX minmax SB
Semantic Matching ─ Price Matching Algorithm
22
• “Date”-typed leaf node similarity measure
{1 –
| d1 – d2 |
365
0.0 if | d1 – d2 | ≥ 365
otherwiseDS(d1, d2) =
0.5
end_date
Nov 3, 2004
0.5
t1 t 2
start_date
May 3, 2004
Project
0.5
end_date
Feb 18, 2005
0.5
start_date
Jan 20, 2004
Project
0.74
where DS(d1, d2) is the date similarity of two dates d1 and d2
Semantic Matching ─ Date Matching
23
Conclusion• Weighted trees for product/service descriptions
• Partonomy tree similarity algorithm
• Synchronously traverses trees top-down
• Aggregates intermediate similarity values bottom-up
• Semantic Global and Local Matching
• Taxonomic Class Similarity
• Encoding Subtaxonomies into Partonomies
• Leaf-Node Type Similarity Measures
• Future Work
• Improvement of Taxonomic Class Similarity
• Generalization of Local Similarity Measures
24
References
[1] Yang, L., Ball, M., Bhavsar, V.C., and Boley, H. Weighted Partonomy-Taxonomy Trees with Local Similarity Measures for Semantic Buyer-Seller Match-Making, Journal of Business and Technology (to appear).[2] Boley, H., Bhavsar, V.C., Hirtle, D., Singh, A., Sun, Z., and Yang, L. A Match-Making System for Learners and Learning Objects. International Journal of Interactive Technology and Smart Education, August, 2005, 2(3):171-178.[3] Bhavsar, V.C., Boley, H., and Yang, L. A Weighted-Tree Similarity Algorithm for Multi-Agent Systems in e-Business Environments. Computational Intelligence, 2004, 20(4):584-602.[4] Singh, A., LOMGenIE: A Weighted Tree Metadata Extraction Tool, Master Thesis, Faculty of Computer Science, University of New Brunswick, Fredericton, Canada, September 2005.
25
Thank you !
26
Seller Weights
• Advertisements on TV, Internet, and in newspaper
Sellers always emphasize specific product/service attributes to attract buyers
• Our match-making system is buyer-seller-centric
Sellers also seek buyers having close preferences
27
Seller Weights (Cont’d)• Suppose sellers do not have weights
buyer tree seller tree
2002
Car
FordWhite
Make
Color Year0.1
0.1
0.8
2002
Car
FordRed
Make
Color Year0.0
0.0
0.0
Similarity=1/2(0.1+0.0)1.0 // for “Make” +1/2(0.8+0.0)1.0 // for “Year” = 0.45
28
Seller Weights (Cont’d)• Suppose sellers have identical weights
buyer tree seller tree
2002
Car
FordWhite
Make
Color Year0.1
0.1
0.8
2002
Car
FordRed
Make
Color Year0.3333
0.3333
0.3334
0.7834
29
Seller Weights (Cont’d)• Sellers have arbitrary weights
buyer tree seller tree 1
2002
Car
FordWhite
Make
Color Year0.1
0.1
0.8
2002
Car
FordRed
Make
Color Year0.05
0.05
0.9
0.925
2002
Car
FordRed
Make
Color Year0.2
0.2
0.6
seller tree 2
2002
Car
FordRed
MakeColor Year
0.10.6 0.3
seller tree 3
0.85 0.65
• All the seller trees above are identical except the arc weights
• The buyer prefers to negotiate with seller 1 because they have closer preferences on the car attributes
30
Seller Weights (Cont’d)
• Sellers can always select the averaged weights if they do not want to emphasize any attributes of their products/services
• Using seller weights, both buyers and sellers can find the most promising trading partners
• The negotiation space is decreased
31
Publications
[1] Lu Yang, Marcel Ball, Virendrakumar C. Bhavsar, and Harold Boley, "Weighted Partonomy-Taxonomy Trees with Local Similarity Measures for Semantic Buyer-Seller Match-Making", Journal of Business and Technology (to appear).[2] Harold Boley, Virendrakumar C. Bhavsar, David Hirtle, Anurag Singh, Zhongwei Sun, and Lu Yang, "A Match-Making System for Learners and Learning Objects", International Journal of Interactive Technology and Smart Education, August, 2005, 2(3):171-178. [3] Jing Jin, Biplab K. Sarker, Virendrakumar C. Bhavsar, Harold Boley, and Lu Yang, "Towards a Weighted-Tree Similarity Algorithm for RNA Secondary Structure Comparison", In Proceedings of the 8th International Conference on High Performance Computing in Asia Pacific Region, IEEE Computer Society, December 2005. [4] Lu Yang, Marcel Ball, Virendrakumar C. Bhavsar, and Harold Boley, "Weighted Partonomy-Taxonomy Trees with Local Similarity Measures for Semantic Buyer-Seller Match-Making", In Proceedings of Workshop of Business Agents and the Semantic Web (BASeWEB'05), May 8, 2005, Victoria, British Columbia, Canada.[5] Lu Yang, Biplab K. Sarker, Virendrakumar C. Bhavsar, and Harold Boley, "A Weighted-Tree Simplicity Algorithm for Similarity Matching of Partial Product Descriptions", In Proceedings of ISCA 14th International Conference on Intelligent and Adaptive Systems and Software Engineering, Toronto 2005, pp.55-60.[6] Virendrakumar C. Bhavsar, Harold Boley, and Lu Yang, "A Weighted-Tree Similarity Algorithm for Multi-Agent Systems in e-Business Environments", Computational Intelligence, 2004, 20(4), pp.584-602.[7] Riyanarto Sarno, Lu Yang, Virendrakumar C. Bhavsar, and Harold Boley, "The AgentMatcher Architecture Applied to Power Grid Transactions", In Proceedings of the First International Workshop on Knowledge Grid and Grid Intelligence, Halifax, 2003, pp.92-99.[8] Virendrakumar C. Bhavsar, Harold Boley, and Lu Yang, "A Weighted-Tree Similarity Algorithm for Multi-Agent Systems in e-Business Environments", In Proceedings of 2003 Business Agents and the Semantic Web (BASeWEB'03) Workshop, Halifax, Canada, June 14, 2003.