1 knowledge discovery from transportation network data paper review jiang, w., vaidya, j.,...

19
1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery from Transportation Network Data. In ICDE, 2005

Upload: annabel-strickland

Post on 18-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery

1

Knowledge Discovery from Transportation Network Data

Paper Review

Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery from Transportation Network Data. In ICDE, 2005

Page 2: 1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery

2

Outline

● Background.● Experiments.

Structurally Similar Routes

Temporally Repeated Routes

● Experiment results.● Conventional techniques.● New challenges.

Page 3: 1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery

3

A natural application area for Data Mining

● Transportation and logistics are an important sector of the economy.

--Transportation consumes 60% of oil worldwide

● Data mining has lead to significant gains in other areas

● Computer use is widespread in transportation and logistics.

--Inventory management, parcel tracking, and even on-truck location sensors

Page 4: 1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery

4

Existing Applications

Data Mining● Mining with transactional characteristics of freight and

events.

-- i.e. classification on safety/accident records might find that trucks are prone to accidents at 7:00 AM on east - west roads.

-- NO geometry of the network.

Network Structure● Optimization

-- Finds solution (Minimize cost)

Page 5: 1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery

5

Transportation Networks

● Graph problems● Graph mining

i.e. Finding the frequent sub-graphs

Algorithms

* WARMR

* AGM

* SUBDUE

* FSG

Page 6: 1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery

6

Dataset

● Six months of origin-destination (OD) data from a large third-party logistic company. 98,292 transactions.

● Represented as a directed graph by mapping locations to vertices.

● Each transaction can then be represented as the edge of an OD pair.

● The edges are labeled with the other attributes of the transaction: pickup date, delivery date, distance, hours, weight, and mode. (binning strategy)

Page 7: 1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery

7

Page 8: 1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery

8

Mining Interests

● Structurally Similar Routes

--Identify structurally similar patterns that occur in many locations.

Methods * SUBDUE

* FSG

● Temporally Repeated Routes

--Find patterns of routes repeated in time, rather than space.

Method * FSG

Page 9: 1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery

9

Structurally Similar Routes

● We assign all vertices the same label.● Three variants for edge labels: weight, distance,

and time.

-- OD_TD : TOTAL-DISTANCE

-- OD_GW : GROSS-WEIGHT

-- OD_TH : MOVE-TRANSIT-HOURS

Page 10: 1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery

10

Experiments with SUBDUE (MDL principle)

SUBDUE: A substructure discovery system

Results:● Took about 3.25 hours to handle a graph of 100

vertices and 561 edges to find the best 3 patterns of beam size 4.

● Would need 6 months on the complete graph.● Results were trivial.

Page 11: 1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery

11

● Significant traffic from node 2 to node 4 via node 3, but not much return traffic (deadheading)

Page 12: 1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery

12

Experiments with FSG● FSG mines patterns across a set of graph

transactions.● Divides the single graph into multiple distinct

sub-graphs, and treats each sub-graph as a separate transaction.

✔ Breadth first partitioning

✔ Depth first partitioning

✔ Both may result in patterns being broken across partitions

Page 13: 1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery

13

Results● Partition sizes; 400, 800, 1200 and 1600.● Depth-first partitioning: 200 frequent patterns

were found with the minimum support 120.● Breadth-first partitioning: 667 frequent patterns

were found with the minimum support 240.● Had runtime and memory problems with lower

supports on the breadth-first partitions.

● FSG is not an appropriate tool to use for mining recurrence patterns in a large single graph

Page 14: 1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery

14

Page 15: 1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery

15

Temporally Repeated Routes

● FSG

● Exploits the temporal nature of the transportation graph

● Partition each graph into a set of graph transactions based on date

Page 16: 1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery

16

Results

● Unable to run FSG on the entire data set due to insufficient memory / swap space.

● Most were small patterns. (The following is the biggest one)

Page 17: 1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery

17

Patterns Discovered by Using ConventionalMining Algorithms

● Mapped the dataset into a standard “transactional” representation.

● Used traditional data mining approaches.

● Used Weka for association rule mining, instance (tuple) classification and cluster analysis on the transportation data.

Page 18: 1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery

18

Evaluations of Conventional Algorithms

● Traditional data mining techniques have produced interesting and meaningful results to summarize our data.

● Further experimentation is required to explore the potential and limitations of these techniques on temporal transportation network data.

● Lose some insights from the structural characteristics of the data.

Page 19: 1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery

19

Challenges forData Mining Research

● Handling the temporal aspects of graphs (dynamic graphs).

● Incorporating the notion of events into a graph.● Expanding graph mining techniques beyond

data similar to molecular structures.● Determining what makes a graph pattern

interesting.