big data + big sim: query processing over unstructured cfd models
TRANSCRIPT
![Page 1: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/1.jpg)
Bill Howe Information School
Computer Science & Engineering
University of Washington
Big Data + Big Sim:
Query Processing over
Unstructured CFD Models
8/7/2017 Bill Howe, UW 1
Scott Moe
Applied Math
University of Washington
![Page 2: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/2.jpg)
This morning…
• Data-intensive science in oceanography
• Background on databases and query
algebras
• Regridding: Integrating ocean models using
a database-style algebra
• If time: Responsible data science
8/7/2017 Bill Howe, UW 2
Motivation Algebraic Optimization Regridding End
![Page 3: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/3.jpg)
My position for this talk…
• Simulations are sources of data
• Analysis requires querying across
heterogeneous data sources, including
simulations
• The CS database community has the
right set of concepts and approaches
…but ultimately we’re just plumbers
8/7/2017 Bill Howe, UW 3
Motivation Algebraic Optimization Regridding End
![Page 4: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/4.jpg)
The Fourth Paradigm
1. Empirical + experimental
2. Theoretical
3. Computational
4. Data-Intensive
Jim Gray
8/7/2017 Bill Howe, UW 4
Motivation Algebraic Optimization Regridding End
![Page 5: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/5.jpg)
Nearly every field of discovery is transitioning
from “data poor” to “data rich”
Astronomy: LSSTPhysics: LHC
Oceanography: OOI
Social Sciences
Biology: Sequencing
Economics
Neuroscience: EEG, fMRI
Motivation Algebraic Optimization Regridding End
![Page 6: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/6.jpg)
8/7/2017 Bill Howe, UW 6
Complex System
“Little linear windows”
Academic research
Practitioners
One view of “data science” is the streamline the discovery, interpretation,
and operationalization of semi-robust local patterns that have predictive
power for some task.1
In general, these don’t exist. But in specific situations, they do.
![Page 7: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/7.jpg)
slide: John Delaney, UW
Motivation Algebraic Optimization Regridding End
![Page 8: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/8.jpg)
Regional Scale Nodes
8/7/2017 Bill Howe, UW 8
John
Delaney
10s of Gigabits/second from the ocean floor
Motivation Algebraic Optimization Regridding End
![Page 9: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/9.jpg)
8/7/2017 Bill Howe, UW 9
17 federal organizations named as partners
11 Regional Associations
“a strategy for incorporating observation systems from …
near shore waters as part of … a network of observatories.”
Motivation Algebraic Optimization Regridding End
![Page 10: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/10.jpg)
Center for Coastal Margin
Observation and Prediction (CMOP)
8/7/2017 Bill Howe, UW 10
Antonio
Baptista
Motivation Algebraic Optimization Regridding End
![Page 11: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/11.jpg)
Virtual Mekong Basin
8/7/2017 Bill Howe, UW 11
img src: Mark Stoermer, UW Center for Environmental Visualization
Jeff
RicheyMotivation Algebraic Optimization Regridding End
![Page 12: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/12.jpg)
So what?
• Geosciences are transitioning from
expedition-based to observatory-based
science
• Enormous investments in integrating
sensors and models
• The big problem: ad hoc queries over
large, heterogeneous, distributed datasets
and models
8/7/2017 Bill Howe, UW 12
Motivation Algebraic Optimization Regridding End
![Page 13: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/13.jpg)
So what do we do about querying across
heterogeneous sources?
Raise the level of abstraction and let the
system handle the details
8/7/2017 Bill Howe, UW 13
Motivation Algebraic Optimization Regridding End
![Page 14: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/14.jpg)
Pre-Relational: if your data changed, your application broke.
Early RDBMS were buggy and slow (and often reviled), but
required only 5% of the application code.
“Activities of users at terminals and most application programs should
remain unaffected when the internal representation of data is changed and
even when some aspects of the external representation are changed.”
Key Idea: Programs that manipulate tabular data exhibit an algebraic
structure allowing reasoning and manipulation independently of physical
data representation
Digression: Relational Database History
-- Codd 1979
Motivation Algebraic Optimization Regridding End
![Page 15: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/15.jpg)
Key Idea: An Algebra of Tables
select
project
join join
Other operators: aggregate, union, difference, cross product
Motivation Algebraic Optimization Regridding End
![Page 16: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/16.jpg)
16
Review: Algebraic OptimizationN = ((4*2)+((4*3)+0))/1
Algebraic Laws: 1. (+) identity: x+0 = x2. (/) identity: x/1 = x3. (*) distributes: (n*x+n*y) = n*(x+y)4. (*) commutes: x*y = y*x
Apply rules 1, 3, 4, 2: N = (2+3)*4
two operations instead of five, no division operator
Same idea works with very large tables, but the payoff is much higher
Motivation Algebraic Optimization Regridding End
![Page 17: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/17.jpg)
17
Algebraic Optimization:
Find a better logical plan
Product Purchase
pid=pid
price>100 and city=‘Seattle’
x.name,z.name
δ
cid=cid
Customer
Π
σ
Product(pid, name, price)
Purchase(pid, cid, store)
Customer(cid, name, city)
SELECT DISTINCT x.name, z.name
FROM Product x, Purchase y, Customer z
WHERE x.pid = y.pid and y.cid = z.cid and
x.price > 100 and z.city = ‘Seattle’
Motivation Algebraic Optimization Regridding End
![Page 18: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/18.jpg)
18
Algebraic Optimization:
Find a better logical plan
Product Purchase
pid=pid
city=‘Seattle’
x.name,z.name
δ
cid=cid
Customer
Π
σprice>100
σ
Query optimization =
finding cheaper,
equivalent expressions
SELECT DISTINCT x.name, z.name
FROM Product x, Purchase y, Customer z
WHERE x.pid = y.pid and y.cid = z.cid and
x.price > 100 and z.city = ‘Seattle’
Motivation Algebraic Optimization Regridding End
![Page 19: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/19.jpg)
Same logical expression, different physical
algorithms
Which is faster?
SELECT *
FROM Order o, Item i
WHERE o.order = i.order
join
scan scan
o.order = i.order
Order oItem i
for each record i in Item:
for each record o in Order:
if o.order = i.order:
return (r,s)
Option 1
for each record i in Item:
insert into hashtable
for each record o in Order:
lookup corresponding records in hashtable
return matching pairs
Option 2
O(N)
O(1)
O(M)
O(1)
O(N)
O(1)
O(~1)
O(M)overall:
O(N*M)
overall:
O(N+M)
Motivation Algebraic Optimization Regridding End
![Page 20: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/20.jpg)
3/12/09 Bill Howe, eScience Institute 20
H0 : (x,y,b) V0 : (z)
A
restrict(0, z >b)
B
color is depth
Algebraic Manipulation of Scientific Datasets,
B. Howe, D. Maier, VLDBJ 2005
H0 : (x,y,b) V0 : ( )
apply(0, z=(surf b) * )
bind(0, surf)
C
color is salinity
GridFields: An Algebra of MeshesMotivation Algebraic Optimization Regridding End
![Page 21: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/21.jpg)
Example (1)
H = Scan(context, "H")
rH = Restrict("(326<x) & (x<345) & (287<y) & (y<302)", 0, H)
H = rH =
dimensionpredicate
color: bathymetry
Motivation Algebraic Optimization Regridding End
![Page 23: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/23.jpg)
8/7/2017 [email protected]
Transect: Bad Query Plan
H(x,y,b)
V(z)
r(z>b) b(s) regrid
PP V
1) Construct full-size 3D grid
2) Construct 2D transect grid
3) Interpolate 1) onto 2)
Motivation Algebraic Optimization Regridding End
![Page 24: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/24.jpg)
8/7/2017 [email protected]
Transect: Optimized Plan
P V
V(z)P
H(x,y,b)regrid b(s) regrid
1) Find 2D cells containing points
2) Create “stacks” of 2D cells carrying data
3) Create 2D transect grid
4) Interpolate 2) onto 3)
Motivation Algebraic Optimization Regridding End
![Page 25: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/25.jpg)
8/7/2017 [email protected]
1) Find cells containing points in PMotivation Algebraic Optimization Regridding End
![Page 26: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/26.jpg)
8/7/2017 [email protected]
1)
4)
2)
1) Find cells containing points in P
2) Construct “stacks” of cells
4) Interpolate
Motivation Algebraic Optimization Regridding End
![Page 27: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/27.jpg)
Transect: Results
8/7/2017 [email protected]
0
5
10
15
20
25
30
35
40
45
vtk(3D) interpolate simple interp_o simple_o
secs
800 MB
(1 timestep)
Motivation Algebraic Optimization Regridding End
![Page 28: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/28.jpg)
Back to integrating models:
What is the right abstraction?
• Claim: Everything reduces to regridding
• Model-data comparisons skill assessment?
Regrid observations onto model mesh
• Model-model comparison?
Regrid one model’s mesh onto the other’s
• Model coupling?
Regrid a meso-scale atmospheric model onto your regional ocean model
• Visualization?
Regrid onto a 3D mesh, or regrid onto a 2D array of pixels
8/7/2017 Bill Howe, UW 28
Motivation Algebraic Optimization Regridding End
![Page 29: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/29.jpg)
Status Quo
• “FTP + MATLAB”
• “Nascent Databases”
– File-based, format-specific API
– UniData’s NetCDF, HDF5
– Some IO optimization, some indexing
• “Data Servers”
– Same as file-based systems,
– but supports RPC
8/7/2017 Bill Howe, UW 29
HyraxNone of this scales
- up with data volumes
- up with number of sources
- down with developer expertise
Motivation Algebraic Optimization Regridding End
![Page 30: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/30.jpg)
Summary so far
• “Integration” means “regridding”
– mesh to pixels, mesh to mesh, trajectory to mesh
– satellites to models, models to models, observations to models
• Regridding is hard
– Must be easy, tolerant of unusual grids, numerically conservative, efficient
Our goal
• Define a “universal regridding” operator with nice algebraic
properties
• Use it to implement efficient distributed data sharing applications,
parallel algorithms, and more
8/7/2017 Bill Howe, UW 30
Motivation Algebraic Optimization Regridding End
![Page 31: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/31.jpg)
What are some complexities we want to
hide?
• Unstructured Grids
• Numerical Conservation
• Choice of Algorithms
8/7/2017 Bill Howe, UW 31
Motivation Algebraic Optimization Regridding End
![Page 32: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/32.jpg)
8/7/2017 Bill Howe, UW 32
Motivation Algebraic Optimization Regridding End
![Page 33: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/33.jpg)
8/7/2017 Bill Howe, UW 33
Washington
Oregon
Columbia River Estuary
Motivation Algebraic Optimization Regridding End
![Page 34: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/34.jpg)
Washington
Oregon
Columbia River Estuary
Motivation Algebraic Optimization Regridding End
![Page 35: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/35.jpg)
SciDBHyrax
GridFields
ESMF
VTK/Paraview
easy; good support hard; poor support
Motivation Algebraic Optimization Regridding End
![Page 36: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/36.jpg)
Structured grids are easy
8/7/2017 Bill Howe, eScience Institute 36
The data model…
(Cartesian products of coordinate variables)
…immediately implies a representation,
(multidimensional arrays)
…an API,
(reading and writing subslabs)
…and an efficient implementation
(address calculation using array “shape”)
Motivation Algebraic Optimization Regridding End
![Page 37: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/37.jpg)
What are some complexities we want to
hide?
• Unstructured Grids
• Numerical Conservation
• Choice of Algorithms
8/7/2017 Bill Howe, UW 37
Motivation Algebraic Optimization Regridding End
![Page 38: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/38.jpg)
Naïve Method: Interpolation (Spatial Join)
8/7/2017 Bill Howe, UW 38
For each vertex in the target grid,
Find containing cell in the source grid,
Evaluate the basis functions to interpolate
Motivation Algebraic Optimization Regridding End
![Page 39: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/39.jpg)
8/7/2017 Bill Howe, UW 39
Motivation Algebraic Optimization Regridding End
![Page 40: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/40.jpg)
Supermeshing [Farrell 10]
8/7/2017 Bill Howe, UW 40
For each cell in the target grid,
Find overlapping cells in the source grid,
Compute their intersections
Derive new coefficients to minimize L2 norm
* Guaranteeed Conservative
* Minimizes Error
But:
Domains must match exactly
Motivation Algebraic Optimization Regridding End
![Page 41: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/41.jpg)
8/7/2017 Bill Howe, UW 41
Motivation Algebraic Optimization Regridding End
![Page 42: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/42.jpg)
What are some complexities we want to
hide?
• Unstructured Grids
• Numerical Conservation
• Choice of algorithms
8/7/2017 Bill Howe, UW 42
Motivation Algebraic Optimization Regridding End
![Page 43: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/43.jpg)
8/7/2017 Bill Howe, UW 43
Motivation Algebraic Optimization Regridding End
![Page 44: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/44.jpg)
Finding mesh intersections
8/7/2017 Bill Howe, UW 44
Motivation Algebraic Optimization Regridding End
![Page 45: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/45.jpg)
8/7/2017 Bill Howe, UW 45
Motivation Algebraic Optimization Regridding End
![Page 46: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/46.jpg)
8/7/2017 Bill Howe, UW 46
Motivation Algebraic Optimization Regridding End
![Page 47: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/47.jpg)
8/7/2017 Bill Howe, UW 47
Restrict(Regrid(X,Y)) = Regrid(Restrict(X), Restrict(Y))
Commutativity of Regrid and Restrict:
G0 = Regrid(Restrict0(X), Restrict0(Y)))
G1 = Regrid(Restrict1(X), Restrict1(Y)))
:
GN = Regrid(Restrict2(X), Restrict2(Y)))
R = Stitch(G0, G1, G2)
Motivation Algebraic Optimization Regridding End
![Page 48: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/48.jpg)
8/7/2017 Bill Howe, UW 48
Motivation Algebraic Optimization Regridding End
![Page 49: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/49.jpg)
“Lumping”
8/7/2017 Bill Howe, UW 49
Motivation Algebraic Optimization Regridding End
![Page 50: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/50.jpg)
8/7/2017 Bill Howe, UW 50
Motivation Algebraic Optimization Regridding End
![Page 51: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/51.jpg)
8/7/2017 Bill Howe, UW 51
Motivation Algebraic Optimization Regridding End
![Page 52: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/52.jpg)
8/7/2017 Bill Howe, UW 52
Globally conservative
Parallelizable
Commutes with user-
selected restrictions
masking to handle
mismatched domains
Todos:
• Characterize the error relative to plain supermeshing
• Universal Regridding-as-a-Service
Motivation Algebraic Optimization Regridding End
![Page 53: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/53.jpg)
Outreach and Usage
• Code is available, but in transition to github
– Search “gridfields” on google code
– http://code.google.com/p/gridfields/
– C++ with Python bindings
• Integrated into the Hyrax Data Server
– OPULS project funded by NOAA
– Server-side processing of unstructured grids
• Other users
– US Geological Survey
– NOAA 8/7/2017 Bill Howe, UW 538/7/2017 Bill Howe, UW 53
Motivation Algebraic Optimization Regridding End
![Page 54: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/54.jpg)
8/7/2017 Bill Howe, UW 54
• Screenshot of OPeNDAP demo
http://ec2-174-129-186-110.compute-1.amazonaws.com:8088/nc/test4.nc.nc?
ugrid_restrict(0,"Y>41.5&Y<42.75&X>-68.0&X<-66.0")
Motivation Algebraic Optimization Regridding End
![Page 55: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/55.jpg)
Wrap up
• Integration of big data and big models is the game
• Database-style systems are about hiding complexity
and raising the level of abstraction
• A database-style query algebra for FEMs emphasizing
interpolation and regridding across data and models
made sense to us
• But more broadly: a richer infrastructure for comparing
and sharing model results and data
• One idea: “Virtual datasets” where the model is
executed in response to queries, perhaps with simpler
grids and relaxed assumptions
8/7/2017 Bill Howe, UW 55
Motivation Algebraic Optimization Regridding End
![Page 56: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/56.jpg)
56
Propublica, May 2016
Motivation Regridding Supermeshing
Database Algebras Evaluation
Numerical conservation
Responsible Data Science
![Page 57: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/57.jpg)
57
The Special Committee on Criminal Justice Reform's
hearing of reducing the pre-trial jail population.
Technical.ly, September 2016
Philadelphia is grappling with the prospect of a racist computer algorithm
Any background signal in the
data of institutional racism is
amplified by the algorithm
operationalized by the algorithm
legitimized by the algorithm
“Should I be afraid of risk assessment tools?”
“No, you gotta tell me a lot more about yourself.
At what age were you first arrested?
What is the date of your most recent crime?”
“And what’s the culture of policing in the
neighborhood in which I grew up in?”
Motivation Regridding Supermeshing
Database Algebras Evaluation
Numerical conservation
Responsible Data Science
![Page 58: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/58.jpg)
8/7/2017 Bill Howe, UW 58
Amazon Prime Now Delivery Area: Atlanta Bloomberg, 2016Motivation Regridding Supermeshi
ngDatabase Algebras Evaluat
ionNumerical conservation
Responsible Data Science
![Page 59: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/59.jpg)
8/7/2017 Bill Howe, UW 59
Amazon Prime Now Delivery Area: Boston Bloomberg, 2016Motivation Regridding Supermeshi
ngDatabase Algebras Evaluat
ionNumerical conservation
Responsible Data Science
![Page 60: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/60.jpg)
8/7/2017 Bill Howe, UW 60
Amazon Prime Now Delivery Area: Chicago Bloomberg, 2016Motivation Regridding Supermeshi
ngDatabase Algebras Evaluat
ionNumerical conservation
Responsible Data Science
![Page 61: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/61.jpg)
First decade of Data Science research and practice:
What can we do with massive, noisy, heterogeneous datasets?
Next decade of Data Science research and practice:
What should we do with massive, noisy, heterogeneous datasets?
The way I think about this…..(1)
Motivation Regridding Supermeshing
Database Algebras Evaluation
Numerical conservation
Responsible Data Science
![Page 62: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/62.jpg)
The way I think about this…. (2)
Decisions are based on two sources of information:
1. Past examplese.g., “prior arrests tend to increase likelihood of future arrests”
2. Societal constraintse.g., “we must avoid racial discrimination”
8/7/2017 Data, Responsibly / SciTech NW 62
We’ve become very good at automating the use of past examples
We’ve only just started to think about incorporating societal constraints
Motivation Regridding Supermeshing
Database Algebras Evaluation
Numerical conservation
Responsible Data Science
![Page 63: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/63.jpg)
The way I think about this… (3)
How do we apply societal constraints to algorithmic
decision-making?
Option 1: Rely on human oversight
Ex: EU General Data Protection Regulation requires that a
human be involved in legally binding algorithmic decision-making
Ex: Wisconsin Supreme Court says a human must review
algorithmic decisions made by recidivism models
Issues with scalability, prejudice
Option 2: Build systems to help enforce these constraints
This is the approach we are exploring
8/7/2017 Data, Responsibly / SciTech NW 63
Motivation Regridding Supermeshing
Database Algebras Evaluation
Numerical conservation
Responsible Data Science
![Page 64: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/64.jpg)
The way I think about this…(4)
On transparency vs. accountability:
• For human decision-making, sometimes explanations are
required, improving transparency
– Supreme court decisions
– Employee reprimands/termination
• But when transparency is difficult, accountability takes over
– medical emergencies, business decisions
• As we shift decisions to algorithms, we lose both
transparency AND accountability
• “The buck stops where?”
8/7/2017 Data, Responsibly / SciTech NW 64
Motivation Regridding Supermeshing
Database Algebras Evaluation
Numerical conservation
Responsible Data Science
![Page 65: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/65.jpg)
FairnessAccountability TransparencyPrivacyReproducibility
Fides: A platform for responsible data science
joint with Stoyanovich [US], Abiteboul [FR], Miklau [US], Sahuguet [US], Weikum [DE]
Data Curation
novel features to support:
So what do we do about it?Motivation Regridding Supermeshi
ngDatabase Algebras Evaluat
ionNumerical conservation
Responsible Data Science
![Page 66: Big Data + Big Sim: Query Processing over Unstructured CFD Models](https://reader031.vdocuments.mx/reader031/viewer/2022030317/5a66602a7f8b9aa92f8b4abb/html5/thumbnails/66.jpg)
Motivation Regridding Supermeshing
Database Algebras Evaluation
Numerical conservation
Responsible Data Science