gis and big data: theory and best practice case studies dr. dave schrader director – strategy and...
TRANSCRIPT
GIS AND BIG DATA:
THEORY AND BEST PRACTICE CASE STUDIES
Dr. Dave SchraderDirector – Strategy and Marketing, Teradata
October 2012 – University of Redlands
WHO IS TERADATA?
WHAT IS TERADATA’S STRATEGY?
HOW DO BIG DATA AND GEOSPATIAL FIT?
3 © Teradata 2012
• Founded 1979, first shipment 1984
• $2.4B a year in revenues, growing 22%
• Leading vendor of Enterprise-sized Data Warehouses (HW, SW, PS)
• Engineering HQ is in Rancho Bernardo
• We sell to the Global 3000, blue chip customer base
• Well-known to all database experts
• Moving from “back office” to “frontline” (Active), increasing # of data types
TERADATA
4 © Teradata 2012
The Teradata Story – History of Big Data
1983: Teradata ships 1st system to Wells Fargo
Jan 1992Walmart passes 1TB
Jan 2006WMT loads 1B rows/day, 1 hr latency
June 2012eBay loads 1TB/minute More than 25 customers with
>25,000 Terabytes at their fingertips
5 © Teradata 2012
What Data is Driving Growth? … The W’s
• More detailed data comes from`
> Detailed Customer Behavioral Data– “Where” in all industries: mobile and geospatial– “What and When” granularity – e.g., browsing on web, including non-
clicks and non-transactions – Telco: all the detail behind each phone call (BSS, OSS): location– Social networking data – tweets, blogs
> Detailed Operations Data– “How” – Process data– Network congestion, goal planning– Transportation optimizations in real-time– Manufacturing: sensor and test data
6 © Teradata 2012
560
Data Mart Appliance
Extreme Data
Appliance
Data Warehouse Appliance
Extreme Performance
Appliance
Active Intelligent Data Warehouse
Purpose
Test &Development
-or-Data Marts
Strategic Analytics on
Extreme Data Volumes
Data Warehouse-or-
Departmental Data Marts
Extreme Performance for
Operational Analytics
Enterprise Scale
Strategic & Operational Intelligence
ScalabilitySMP
Up to 12TBMPP
Up to 186PBMPP
Up to 315TBMPP
Up to 18TBMPP
Up to 92PB
Active Users
Scalability
Flexibility
Purpose-Built Teradata Platform Family
1650
2690
4600
66X
X
7 © Teradata 2012
TOP RATING BY GARTNER - DBMS
Why the TOP Rating for Data Warehousing?
Happy Customers!
Superior Technology!
Innovative Users!
8 © Teradata 2012
The Next Generation of Analytics: Trends
• Transaction: Value to the business• Interaction: EXPERIENCE with the business
• Consumer is CEO of the household• Consumers making intelligent decisions based
upon analytics & perfect economic information
• Format: Structured & MULTI-STRUCTURED Data• Type: Web, social, location, device, channel• VOLUME and VELOCITY
9 © Teradata 2012
Teradata and its Acquisitions
• Teradata Integrated Data Warehouse
• Operational BI/Intelligence
• Platform Family• Interoperability
& Consulting
Business Business ApplicationsApplications
Big DataBig Data Analytics Analytics
DataDataWarehousingWarehousing
• Aster Data• Extreme Data
Appliance• Partnerships
•Aprimo Applications
•Strategic Partnerships
TERADATA +GEOSPATIAL
11 © Teradata 2012
DataWarehouse
BIG DATA
OLAP Cubes
AgileAnalytics
Data Mining
Geospatial
Application Development
PERIOD DataM01 M02 M03 M04 M05 M06 M07
REG2 SEG1 Accts Balances Accts Balances Accts Balances Accts Balances Accts Balances Accts Balances Accts1 A 1 $1 1 $2 1 $1 2 $1 2 $1 1 $1 2
B 4 $14 4 $9 4 $10 5 $13 4 $12 4 $14 5 C 137 $369 129 $299 124 $317 165 $323 144 $349 136 $364 153 D 50 $45 45 $38 42 $37 61 $37 60 $36 52 $45 56 E 24 $71 22 $55 21 $76 31 $59 26 $77 24 $61 27 F 2 $2 2 $1 2 $1 3 $1 3 $1 2 $1 3 G 2 $5 1 $2 1 $5 2 $5 2 $3 2 $3 1 H 11 $36 10 $36 9 $37 13 $32 11 $39 10 $40 11
1 Total 231 $542 215 $442 204 $485 281 $471 252 $518 231 $528 258 2 A 1 $3 1 $1 1 $1 2 $3 2 $1 2 $1 2
B 5 $12 4 $12 4 $10 6 $14 5 $10 5 $9 5 C 73 $249 69 $200 68 $164 84 $186 74 $150 72 $204 79 D 35 $30 32 $24 31 $24 40 $24 39 $21 39 $26 41 E 20 $29 19 $36 21 $32 25 $38 21 $45 21 $54 22 F 0 $0 0 $0 0 $0 0 $0 0 $0 0 $0 0 G 1 $4 1 $3 1 $3 1 $4 1 $2 1 $3 1 H 5 $20 5 $13 5 $13 6 $13 6 $12 6 $71 6
2 Total 141 $346 132 $289 133 $247 164 $282 148 $242 146 $369 156 3 A 0 $0 0 $0 0 $1 0 $1 0 $0 0 $1 0
B 1 $1 1 $2 1 $1 1 $2 1 $2 1 $1 1 C 30 $87 29 $72 27 $64 32 $75 30 $68 29 $76 30 D 26 $29 25 $25 23 $22 30 $26 30 $23 28 $24 28 E 9 $26 8 $28 9 $27 11 $20 10 $19 10 $41 10 F 1 $1 1 $0 1 $0 1 $0 1 $1 1 $1 1 G 0 $0 0 $0 0 $0 0 $1 0 $0 0 $1 0 H 2 $7 2 $29 2 $11 2 $6 2 $17 2 $7 2
3 Total 70 $151 67 $157 63 $128 78 $131 75 $130 71 $152 72 4 A 0 $0 0 $0 0 $1 1 $1 0 $0 0 $0 0
B 1 $2 1 $4 1 $1 1 $1 1 $2 1 $3 1 C 54 $130 47 $122 41 $110 62 $121 49 $118 45 $137 49 D 2 $1 2 $2 2 $2 2 $1 3 $1 2 $2 3 E 4 $6 3 $5 3 $6 4 $14 4 $12 4 $14 4 F 0 $0 0 $0 0 $0 0 $0 0 $0 0 $0 0 G 1 $1 0 $0 0 $1 1 $0 1 $0 1 $0 1 H 6 $18 5 $13 5 $11 6 $20 5 $15 5 $14 5
4 Total 68 $159 60 $146 52 $132 78 $159 62 $150 58 $171 63
Temptation: Build Analytic Silos, Geospatial Silos
12 © Teradata 2012
Analytics for Everyone
DataWarehouse
BIG DATA
OLAP Cubes
AgileAnalytics
Data Mining
Geospatial
Application Development
PERIOD DataM01 M02 M03 M04 M05 M06 M07
REG2 SEG1 Accts Balances Accts Balances Accts Balances Accts Balances Accts Balances Accts Balances Accts1 A 1 $1 1 $2 1 $1 2 $1 2 $1 1 $1 2
B 4 $14 4 $9 4 $10 5 $13 4 $12 4 $14 5 C 137 $369 129 $299 124 $317 165 $323 144 $349 136 $364 153 D 50 $45 45 $38 42 $37 61 $37 60 $36 52 $45 56 E 24 $71 22 $55 21 $76 31 $59 26 $77 24 $61 27 F 2 $2 2 $1 2 $1 3 $1 3 $1 2 $1 3 G 2 $5 1 $2 1 $5 2 $5 2 $3 2 $3 1 H 11 $36 10 $36 9 $37 13 $32 11 $39 10 $40 11
1 Total 231 $542 215 $442 204 $485 281 $471 252 $518 231 $528 258 2 A 1 $3 1 $1 1 $1 2 $3 2 $1 2 $1 2
B 5 $12 4 $12 4 $10 6 $14 5 $10 5 $9 5 C 73 $249 69 $200 68 $164 84 $186 74 $150 72 $204 79 D 35 $30 32 $24 31 $24 40 $24 39 $21 39 $26 41 E 20 $29 19 $36 21 $32 25 $38 21 $45 21 $54 22 F 0 $0 0 $0 0 $0 0 $0 0 $0 0 $0 0 G 1 $4 1 $3 1 $3 1 $4 1 $2 1 $3 1 H 5 $20 5 $13 5 $13 6 $13 6 $12 6 $71 6
2 Total 141 $346 132 $289 133 $247 164 $282 148 $242 146 $369 156 3 A 0 $0 0 $0 0 $1 0 $1 0 $0 0 $1 0
B 1 $1 1 $2 1 $1 1 $2 1 $2 1 $1 1 C 30 $87 29 $72 27 $64 32 $75 30 $68 29 $76 30 D 26 $29 25 $25 23 $22 30 $26 30 $23 28 $24 28 E 9 $26 8 $28 9 $27 11 $20 10 $19 10 $41 10 F 1 $1 1 $0 1 $0 1 $0 1 $1 1 $1 1 G 0 $0 0 $0 0 $0 0 $1 0 $0 0 $1 0 H 2 $7 2 $29 2 $11 2 $6 2 $17 2 $7 2
3 Total 70 $151 67 $157 63 $128 78 $131 75 $130 71 $152 72 4 A 0 $0 0 $0 0 $1 1 $1 0 $0 0 $0 0
B 1 $2 1 $4 1 $1 1 $1 1 $2 1 $3 1 C 54 $130 47 $122 41 $110 62 $121 49 $118 45 $137 49 D 2 $1 2 $2 2 $2 2 $1 3 $1 2 $2 3 E 4 $6 3 $5 3 $6 4 $14 4 $12 4 $14 4 F 0 $0 0 $0 0 $0 0 $0 0 $0 0 $0 0 G 1 $1 0 $0 0 $1 1 $0 1 $0 1 $0 1 H 6 $18 5 $13 5 $11 6 $20 5 $15 5 $14 5
4 Total 68 $159 60 $146 52 $132 78 $159 62 $150 58 $171 63
20-40%+ wasted moving data
13 © Teradata 2012
Teradata Integrated Analytics
Optimized in-database data mining technology
from leading vendors, open
source and Teradata
AdvancedAnalytics
Native temporal
support to manage and update time dimension
Temporal
Native database geospatial data types
and analytics
Geospatial
Analytic platforms and partner tools to analyze
unstructured and
structured data
Big DataIntegration
Teradata Database
Tools and techniques to
accelerate development of analytics
ApplicationDevelopment
Teradata Open Parallel Framework
In-database data labs to accelerate
exploration of new data and
ideas
AgileAnalytics
CustomServices
EmbeddedServices
VirtualMachines
Teradata Purpose-Built Platform Family
Teradata Integrated Analytics
14 © Teradata 2012
Native Geospatial Data TypesSpatial Data Integrated with Non-Spatial Data
• Geospatial is a feature that allows us to store, process, consume geospatial data• Teradata Geospatial based on the ST_Geometry data type
> SQL/MM Standard> Like numeric or string types native to Teradata> Location is type ST_Geometry
– Point (x y)
– Line or curve (xy, xy, xy)
– Polygon (xy, xy, xy, xy..)
Customer IDInteger
Customer NameChar
Customer AddressChar
Customer TypeChar
LocationST_Geometry
38327 John Smith 2110 Oak St. San Francisco, CA 94112
C Point (37.40113, 122.2091)
39234 William White 100 Broadway, Deaborn, MI 21002
A Point (42.153, -83.1078)
Geocoded Customer Table Example:
pointline
polygon
15 © Teradata 2012
MeasurementsST_AreaST_DistanceST_SphericalDistanceST_SpheroidalDistanceST_PerimeterST_Length
Spatial RelationshipsST_IntersectsST_OverlapsST_RelateST_TouchesST_WithinST_ContainsST_DisjointST_CrossesST_Equals
AttributeST_AsBinaryST_AsTextST_CoordDimST_DimensionST_GeometryTypeST_IsEmptyST_IsSimpleST_IsClosedST_NumPointsST_SRID…
Spatial OperatorST_BufferST_IntersectionST_BoundaryST_DifferenceST_EnvelopeST_ExteriorRingST_GeometryNST_InteriorRingNST_Transform
Teradata Geospatial Spatial Methods – sampleHigh Speed Big Data Analytics
16 © Teradata 2012
Geospatial QueriesAnswering ‘Where’
• ST_Geometry functions…> Measurements
– Distance, surface, perimeter…> Relationship between two
objects– Intersect, contains, within,
adjacent…
> Simplified Example - find top 100 customers by value within the store area boundaries and their distance from the store:
SELECT top 100 C.name, C.address, C.value, C.location.ST_Distance(S.location) AS Distance
FROM cities C, stores S, store_area SAWHERE S.id=1 and S.id=SA.id and
C.location.ST_WITHIN(SA.area)ORDER BY 3 Desc;
Customer
Retail Outlet
Distance
Competitor outlet
Mail Campaign Targets
Store Area
17 © Teradata 201217 > 04/19/23
Telco – RetailAccelerates Analytics with Teradata
Find the 3 closest stores within 50 miles of each customer location.> Over 30 million customers> Over 2,200 stores> Target customers changing frequently
Manual Geospatial Analytics• Calculate distance between each
store and customer> Calculations based on complex
trigonometric functions> Over 65 billion calculations> Filter results <= 50 miles> Retained 1 billion results
In-database Geospatial Analytics• Teradata Geospatial functions
> Set a 50 mile buffer (filter) for stores> Identify customers within the buffer> Calculate spherical distance for those
customers
25 times faster
Store
Store
18 © Teradata 2012
Teradata Geospatial Analytics
• Integrated spatial and non-spatial data
• High speed processing of big data
• Innovation simplified via Data Labs
• Proven by industry leaders
19 © Teradata 2012
Big Data - provides enormous insight…
Customer behavior, calling/browsing habits, their social network…
…keyword use…
… location, travel destinations…
…personal profiles…
…sensor data and metrics…
…Opportunity to move beyond traditional analytics !
20 © Teradata 2012
A major Telco uses
real-time analytics to find remedies for dropped mobile
phone calls