polaris query, analysis, and visualization of large hierarchical relational databases

31
Polaris Query, Analysis, and Visualization of Large Hierarchical Relational Databases Pat Hanrahan With Chris Stolte and Diane Tang Computer Science Department Stanford University

Upload: roana

Post on 25-Feb-2016

47 views

Category:

Documents


2 download

DESCRIPTION

Polaris Query, Analysis, and Visualization of Large Hierarchical Relational Databases. Pat Hanrahan With Chris Stolte and Diane Tang Computer Science Department Stanford University. Motivation. Large databases have become very common Corporate data warehouses Amazon, Walmart,… - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases

PolarisQuery, Analysis, and Visualization

of Large Hierarchical Relational Databases

Pat HanrahanWith Chris Stolte and Diane Tang

Computer Science DepartmentStanford University

Page 2: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases

Motivation

Large databases have become very common Corporate data warehouses

Amazon, Walmart,… Scientific projects:

Human Genome Project Sloan Digital Sky Survey

Need tools to extract meaning from these databases

Page 3: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases

Related Work

Formalisms for graphics Bertin’s “Semiology of Graphics” Mackinlay’s APT Roth et al.’s Sage and SageBrush Wilkinson’s “Grammar of Graphics”

Visual exploration of databases DeVise DataSplash/Tioga-2

Visualization and data mining SGI’s MineSet IBM’s Diamond

Page 4: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases

Formalism

Page 5: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases

Polaris Formalism

UI interpreted as visual specification that defines: Table configuration Type of graphic in each pane Encoding of data as visual properties of

marks Data transformations and queries

Page 6: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases

SchemaMarketStateYearQuarterMonthProduct TypeProduct

ProfitSalesPayrollMarketingInventoryMarginCOGS...

Ordinal fields(categorical)

Quantitative fields(measures)

Coffee chain data[Visual Insights]

Page 7: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases

Polaris Visual Encodings

Principle of Importance Ordering: Encode the most importantinformation in the most effective way [Cleveland & McGill]

Page 8: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases

The Pivot Table Interface

Common interface to statistical packages/Excel Cross-tabulations

Simple interface based on drag-and-drop

Page 9: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases
Page 10: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases

Data Cubes

Structure relation as n-dimensional cube

Each cell aggregatesall measures for those dimensions

Each cube axiscorresponds to a dimension in the relation

Page 11: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases

Table Algebra: Operands

Ordinal fields: interpret domain as a set that partitions table into rows and columns:

Quarter = {(Qtr1),(Qtr2),(Qtr3),(Qtr4)}

Quantitative fields: treat domain as single element set and encode spatially as axes:

Profit = {(Profit)}

Page 12: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases

Concatenation (+) Operator

Ordered union of two setsQuarter + ProductType

= {(Qtr1),(Qtr2),(Qtr3),(Qtr4)}+{(Coffee),(Espresso)}

= {(Qtr1),(Qtr2),(Qtr3),(Qtr4),(Coffee),(Espresso)}

Profit + Sales = {(Profit),(Sales)}

Page 13: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases

Cross () Operator

Direct-product of two sets

Quarter ProductType ={(Qtr1,Coffee), (Qtr1, Tea), (Qtr2, Coffee), (Qtr2, Tea), (Qtr3, Coffee), (Qtr3, Tea), (Qtr4, Coffee), (Qtr4,Tea)}

ProductType Profit =

Page 14: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases
Page 15: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases
Page 16: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases
Page 17: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases
Page 18: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases

SQL Dataflow

Notes Aggregation operators applied after sort Only one layer is shown; additional z-sort

Relational Table Tuples in Panes Marks in Panes

Sort

Page 19: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases

Multiscale Visualization

Page 20: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases

Hierarchical Structure

Challenge: these databases are very large Queries/Vis should not require all the

recordsAugment database with hierarchical structure

Provide meaningful levels of abstraction Derived from domain or clustering Provides metadata (missing data for

context)

Page 21: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases

Hierarchies and Data Cubes

Each dimension in the cube is structured as a tree

Each level in tree corresponds to level of detail

Page 22: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases

Schema: Star Schema

StateMonthProductProfitSalesPayrollMarketingInventoryMargin...

Measures

LocationMarketState

TimeYearQuarterMonthProducts

Product TypeProduct Name

Fact tableExistence Table

Generalizations• Snowflake schemas• Lattices (DAGs)

Page 23: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases

Categorical Hierarchies

Quarter Month Direct product of two sets Would create twelve entries for each

quarter, i.e. (Qtr1, December)Quarter / Month

Based on tuples in database not semantics Would only create three entries per quarter Can be expensive to compute

Quarter . Month Based on tuples in existence tables (not db)

Page 24: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases

Cartographic GeneralizationCanterbury and East Kent

1:50,000 1:625,000

Page 25: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases

Generalization: Techniques

Selection

Simplification

Exaggeration

Regularization

Displacement

Aggregation

Page 26: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases
Page 27: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases
Page 28: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases
Page 29: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases
Page 30: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases

Summary

Polaris Spreadsheet or table-based displays Simple drag-and-drop interface Built on a formalism that allows algebraic

manipulation of visual mapping of tuples to marks

Multiscale visualizations using data and visual abstraction

Connects to SQL/MDX servers

See http://www.graphics.stanford.edu/projects/polaris

Page 31: Polaris Query, Analysis, and Visualization  of  Large Hierarchical Relational Databases

Future Work

Articulate full-set of multiscale design patterns

Transition between levels of detail Develop system infrastructure for browsing

VLDB Support layers/lenses/linking with tuple flow Device independence through graphical

encodings Extend formalism to 3D Couple scientific and information visualization …