1 knowledge discovery from transportation network data paper review jiang, w., vaidya, j.,...
TRANSCRIPT
1
Knowledge Discovery from Transportation Network Data
Paper Review
Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery from Transportation Network Data. In ICDE, 2005
2
Outline
● Background.● Experiments.
Structurally Similar Routes
Temporally Repeated Routes
● Experiment results.● Conventional techniques.● New challenges.
3
A natural application area for Data Mining
● Transportation and logistics are an important sector of the economy.
--Transportation consumes 60% of oil worldwide
● Data mining has lead to significant gains in other areas
● Computer use is widespread in transportation and logistics.
--Inventory management, parcel tracking, and even on-truck location sensors
4
Existing Applications
Data Mining● Mining with transactional characteristics of freight and
events.
-- i.e. classification on safety/accident records might find that trucks are prone to accidents at 7:00 AM on east - west roads.
-- NO geometry of the network.
Network Structure● Optimization
-- Finds solution (Minimize cost)
5
Transportation Networks
● Graph problems● Graph mining
i.e. Finding the frequent sub-graphs
Algorithms
* WARMR
* AGM
* SUBDUE
* FSG
6
Dataset
● Six months of origin-destination (OD) data from a large third-party logistic company. 98,292 transactions.
● Represented as a directed graph by mapping locations to vertices.
● Each transaction can then be represented as the edge of an OD pair.
● The edges are labeled with the other attributes of the transaction: pickup date, delivery date, distance, hours, weight, and mode. (binning strategy)
7
8
Mining Interests
● Structurally Similar Routes
--Identify structurally similar patterns that occur in many locations.
Methods * SUBDUE
* FSG
● Temporally Repeated Routes
--Find patterns of routes repeated in time, rather than space.
Method * FSG
9
Structurally Similar Routes
● We assign all vertices the same label.● Three variants for edge labels: weight, distance,
and time.
-- OD_TD : TOTAL-DISTANCE
-- OD_GW : GROSS-WEIGHT
-- OD_TH : MOVE-TRANSIT-HOURS
10
Experiments with SUBDUE (MDL principle)
SUBDUE: A substructure discovery system
Results:● Took about 3.25 hours to handle a graph of 100
vertices and 561 edges to find the best 3 patterns of beam size 4.
● Would need 6 months on the complete graph.● Results were trivial.
11
● Significant traffic from node 2 to node 4 via node 3, but not much return traffic (deadheading)
12
Experiments with FSG● FSG mines patterns across a set of graph
transactions.● Divides the single graph into multiple distinct
sub-graphs, and treats each sub-graph as a separate transaction.
✔ Breadth first partitioning
✔ Depth first partitioning
✔ Both may result in patterns being broken across partitions
13
Results● Partition sizes; 400, 800, 1200 and 1600.● Depth-first partitioning: 200 frequent patterns
were found with the minimum support 120.● Breadth-first partitioning: 667 frequent patterns
were found with the minimum support 240.● Had runtime and memory problems with lower
supports on the breadth-first partitions.
● FSG is not an appropriate tool to use for mining recurrence patterns in a large single graph
14
15
Temporally Repeated Routes
● FSG
● Exploits the temporal nature of the transportation graph
● Partition each graph into a set of graph transactions based on date
16
Results
● Unable to run FSG on the entire data set due to insufficient memory / swap space.
● Most were small patterns. (The following is the biggest one)
17
Patterns Discovered by Using ConventionalMining Algorithms
● Mapped the dataset into a standard “transactional” representation.
● Used traditional data mining approaches.
● Used Weka for association rule mining, instance (tuple) classification and cluster analysis on the transportation data.
18
Evaluations of Conventional Algorithms
● Traditional data mining techniques have produced interesting and meaningful results to summarize our data.
● Further experimentation is required to explore the potential and limitations of these techniques on temporal transportation network data.
● Lose some insights from the structural characteristics of the data.
19
Challenges forData Mining Research
● Handling the temporal aspects of graphs (dynamic graphs).
● Incorporating the notion of events into a graph.● Expanding graph mining techniques beyond
data similar to molecular structures.● Determining what makes a graph pattern
interesting.