big data visualization · big data file size is a useless statistic (giga, tera, peta, exa, zetta,...
TRANSCRIPT
![Page 1: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/1.jpg)
Chief Scientist, H2O Adjunct Professor UIC
[email protected] www.cs.uic.edu/~wilkinson
Big Data Visualization
Stats 285, Stanford November 11, 2019
Leland Wilkinson
![Page 2: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/2.jpg)
BigDataFilesizeisauselessstatistic(giga,tera,peta,exa,zetta,…) Graphdata howmanynodesandedges?
Rectangulardata howmanyrowsandcolumns?
howlongarethestrings?
howmanydistinctstrings?
whatistheprecisionofthenumbers?
whatisthefileformat?
Imagedata howmanyimages?
resolutionoftheimages?
Textdata
howmanywords?
whatlanguage?
![Page 3: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/3.jpg)
BigDataManybigdataproblemscanbeattackedwithsoftwareorhardware distributedfilesystems
GPUs
Columnarin-memorydatabases
Andsomebigdataproblemscanbeattackedwithmodels deeplearning
stacking
Thesearethekindofthingscomputerscientiststhinkare“solutions”
Buttheproblemsbigdatapresentstovisualizationinvolveotherthings humanfactors(perception,cognition,…)
displaylimitations(pixels,rendering,…)
real-timeperformance(constrainedbyhumanintheloop)
Solet’slookatsomeoftheseproblemspeculiartovisualization
![Page 4: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/4.jpg)
BigDataProblems:Difficultiespeculiartobigdatavisualization
Solutions:
Architecture:Designofabigdatavisualizationsystem
Wrangling:Waystomakebigdatatractableforvisualization
Graphics:Graphicssuitedforbigdataexploration
![Page 5: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/5.jpg)
ProblemsComplexity:Manyfunctionsarepolynomialorexponential
CurseofDimensionality:distancestendtowardconstantas
Chokepoint:Cannotsendbigdataoverthewire
RealEstate:Cannotplotbigdataontheclient
![Page 6: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/6.jpg)
SolutionsArchitecture:Designofabigdatavisualizationsystem
Wrangling:Waystomakebigdatatractableforvisualization
Graphics:Graphicssuitedforbigdataexploration
![Page 7: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/7.jpg)
ArchitectureOld New
![Page 8: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/8.jpg)
WranglingAggregate(bign)
usuallyreducesaccuracywhen
Reduce(bigp)usuallyviolatestriangleinequalitywhen
WehavesomeflexibilitybecauseoflimitedrangeofprecisioninvisualizationButthat’snotahuntinglicense
nnumberofrowsindataset
pnumberofcolumnsindataset
knumberrowsinaggregateddataset
dnumberofcolumnsinreduceddataset
![Page 9: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/9.jpg)
AggregateDonotaggregateifnistractable:unlessresolutiondemandsit
Donotsample:unlessusingbootstrappedvisualizationtorepresenterror
Dousedifferentalgorithms:1D,2D,nD
![Page 10: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/10.jpg)
DoNotAggregateResidualplots.Rossmann Stores Kaggle dataset (https://www.kaggle.com/c/rossmann-store-sales)
![Page 11: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/11.jpg)
DoNotAggregateResidualplots.
![Page 12: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/12.jpg)
DoNotAggregateResidualplots.
![Page 13: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/13.jpg)
DoNotSample
Sampling
![Page 14: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/14.jpg)
UnlessBootstrapping
Gonnelli, S., Cepollaro, C., Montagnani, A., Monaci, G., Campagna, M.S., Franci, M.B., and Gennari, C. (1996). Bone alkaline phosphatase measured with a new immunoradiometric assay in patients with metabolic bone diseases. European Journal of Clinical Investigation, 26, 391– 396.
![Page 15: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/15.jpg)
1DAggregation/QuantizationDotplotalgorithmWilkinson, L. (1999). Dot plots. The American Statistician, 53, 276–281.
HistogrammingalgorithmChoosesmallbinwidth(k=100binsworkswellformostdisplayresolutions)Whenfinished,averagethevaluesineachbintogetasinglecentroidDeleteemptybinsandreturncentroidsandcountsineachbin
SortdataChoosedotsizeandsetfirststacktomin(x)locationFori=min(x)tomax(x):adddottostackatcurrentstackorstartnewstackatxiifnocollision
![Page 16: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/16.jpg)
2DAggregationHistogramming/griddingalgorithm
![Page 17: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/17.jpg)
2DAggregationHistogramming/griddingalgorithm
![Page 18: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/18.jpg)
nDAggregationLeaderAlgorithm
Resemblesasetcover(coresets)
In1D,reducestotheWilkinsondotplotalgorithm
Disk(ball)radiusrisaparameterthatdeterminesdegreeofaggregation
Wewanttoendupwithdisks(clusters)
![Page 19: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/19.jpg)
nDAggregationLeaderAlgorithm
Eachdiskiscenteredonanexemplar(realdatapoint,notacentroidasink-means)
Eachdiskcontainsmmembers
Disk(ball)radiusrdependsondistributionofpoints Simplestrategyistorunalgorithmwithtinyrandthenexpandrinagoldensearchtowardk=500 Leaderhasworstcasecomplexity
Butif(whichisourusualcase),itwillbemuchfaster
![Page 20: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/20.jpg)
AggregatingCategoricalVariablesMultipleCorrespondenceAnalysis
Foreachcategoricalvariable: dummycode(0/1)categories
computefirstprincipalcomponentoncovariancematrixofdummycodes
numericvalueisproductofdummycodesofcategoryandfirstprincipalcomponent
Othercategoricalcontinuousmappingscouldbeused
![Page 21: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/21.jpg)
AggregateALLstatisticsonaggregateddataMUSTincludefrequencyweights
forxindata:ifweights!=None:wt=weights[i]ifwt>0:ifx!=None:xCount+=1xWeightedCount+=wtxSum+=x*wtxd=(x-xMean)*wtxMean+=xd/xWeightedCountxSS+=(x-xMean)*xd
for(intk=left;k<=right;k++){ doublexk=x[k];doubleyk=y[k];doubledist=xk-xi;if(k<i) dist=xi-xk;doublewt=tricube(dist*denom)*weights[k]*frequencies[k];doublexkw=xk*wt;sumWeights+=wt;sumX+=xkw;sumXSquared+=xk*xkw;sumY+=yk*wt;sumXY+=yk*xkw;}
LOESS(Java)MOMENTS(Python)
![Page 22: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/22.jpg)
ReduceProjectionofasetofpointsinto:suchthat PrincipalComponents,SVD:linearprojection
Random:randomprojection
Discoveringdimensiondisproblematic:don’tplanonlookingforelbow
FeatureExtraction:replacepointswithderivedfeatures
![Page 23: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/23.jpg)
PrincipalComponents(SVD)SingularValueDecomposition
Alternatively
Pickfirstkprincipalcomponents
![Page 24: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/24.jpg)
RandomProjectionsReplaceprincipalcomponentswithrandomGaussianelements
WismatrixofrandomGaussians
Wcanbematrixof{1,0,-1}
Achlioptas, D. (2001). Database-friendly random projections. In PODS ’01: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM, New York, 274– 281.
Li, P., Hastie, T. J., and Church, K. W. (2006). Very sparse random projections. In KDD ’06: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, USA, 287–296.
Johnson, W. B. and Lindenstrauss, J. (1984). Lipschitz mapping into Hilbert space. Contemporary Mathematics 26, 189–206.
![Page 25: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/25.jpg)
DiscoveringdimensionalityofembeddingElbowtestonscreeplotdoesn’tworkonmostrealdata Evenpiecewiseregressionwithcutpointasestimatedparameterdoesn’twork
Kaisermethoddoesn’tworkeither Factorcorrelationmatrix
Retaincomponentswitheigenvalues>1Kaiser H. (1960). The Application of Electronic Computers to Factor Analysis. Educational and Psychological Measurement, 20. 141–151.
Horn’sParallelAnalysisworksbetter(butnotalways)
Generaterandomdataforproblemofsamesize
Computeeigendecompositiononrandomandrealdata
Computeaverageeigenvalueoverksamplesofrandomdata
RetainrealdatacomponentswhoseeigenvaluesaregreaterthanaverageofrandomeigenvaluesHorn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30 (2). 179–185
![Page 26: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/26.jpg)
FeatureSelectionScagnostics
![Page 27: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/27.jpg)
Scagnostics
AnideaofJohnandPaulTukey
Neverpublished,butdiscussedinaJSMtalk
Givenmanyscatterplots(toomanytoviewinascatterplotmatrix)
Howcanweidentifyunusualscatterplots?
Theirapproachinvolvedexpensivecomputations principalcurves,kernels,etc.
![Page 28: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/28.jpg)
Scagnostics
Wecharacterizeascatterplotwithninemeasures.
Webaseourmeasuresonthreegeometricgraphs.ConvexHullAlphaShapeMinimumSpanningTree
Wilkinson L., Anand, A., and Grossman, R. (2006). High-Dimensional visual analytics: Interactive exploration guided by pairwise views of point distributions. IEEE Transactions on Visualization and Computer Graphics, 12(6) pp. 1363-1372.
![Page 29: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/29.jpg)
Scagnostics
EachgeometricgraphisasubsetoftheDelaunaytriangulation
![Page 30: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/30.jpg)
Scagnostics
![Page 31: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/31.jpg)
Scagnostics
![Page 32: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/32.jpg)
Scagnostics
![Page 33: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/33.jpg)
Scagnostics
![Page 34: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/34.jpg)
Scagnostics
![Page 35: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/35.jpg)
Scagnostics
![Page 36: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/36.jpg)
GraphicsonAggregatedData1D Boxplots
Histograms
Dotplots
Kerneldensities
2D Scatterplots
nD Scatterplotmatrices
Parallelcoordinates
Projections
Graphlayout
![Page 37: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/37.jpg)
BoxPlotsNotsuitableforlargen Tukeydesignedthemforhandcalculationonsmallbatches
VellemanandHoaglindevisedacomputerprogramtoplotthem
Tukey’sfenceswerebasedonfractilesofnormaldistribution
Sincenisnotpartofthealgorithm,outliersexplodewithbign
Hofmann, H., Wickham, H. & Kafadar, K. (2017) Letter-Value Plots: Boxplots for Large Data, Journal of Computational and Graphical Statistics, 26:3, 469-477.
![Page 38: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/38.jpg)
HistogramsIntro-statbookalgorithmswillgetyouintotrouble
Details
Choosenumberofbins(or,equivalently,binwidth)
Alignscaletickvalueswithedgesofbins,soscalechoicedependsonbinwidths
Ahistogramprogramshouldrecognizewhenyoufeeditdiscretevalues
Sturges (1926), Doane (1976), Scott (1979) , Freedman and Diaconis (1981) , Stone (1985), Yu & Speed (1991)
![Page 39: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/39.jpg)
Histograms
Changelocation
Changebinwidth
![Page 40: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/40.jpg)
DotPlotsOriginallyforverysmallbatches,butworksonlarge
Facilitatesbrushingandlinking
Beeswarmplots??
Wilkinson, L. (1999). Dot plots. The American Statistician, 53, 276–281.
![Page 41: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/41.jpg)
KernelDensityPlotsAconvolution Kernelisinredonlowerrightgraphic(normalkernel),hisabandwidth(smoothness)parameter
Parzen, E. (1962). "On Estimation of a Probability Density Function and Mode." The Annals of Mathematical Statistics. 33(3): 1065–1076.
Dotplotsaresimilartokerneldensityestimationwithacountingkernelratherthanacontinuousprobabilitydensityfunction(histhedotwidth).
![Page 42: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/42.jpg)
ScatterplotsForaggregateddata,weneedtorepresentcountsateachaggregatedpoint color,size,shape…
Forsizeaesthetic,wesometimescallthesebubbleplots
butgoalistoclampsizessotheresultlookslikeaplotoftherawdata
![Page 43: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/43.jpg)
ScatterplotsAnalternativeistousehexbinning thisexampleisfromggplot2
Yukio Kosugi, Jun Ikebe, Nobuyuki Shitara, and Kintomo Takakura (1986). Graphical Presentation of Multidimensional Flow Histogram Using Hexagonal Segmentation. Cytometry 7, 291-294.
Carr, D.B., Littlefield, R.J., Nicholson, W.L., and Littlefield, J.S. (1987). Scatterplot matrix techniques for large N. Journal of the American Statistical Association, 82, 424–436.
![Page 44: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/44.jpg)
ScatterplotsOralphablending again,ggplot2
![Page 45: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/45.jpg)
ScatterplotMatricesPutanythinginthereHartigan, J.A. (1975). Printer graphics for clustering. Journal of Statistical Computation and Simulation, 4, 187–213.
Chambers, J.M., Cleveland, W.S., Kleiner, B., and Tukey, P.A. (1983). Graphical Methods for Data Analysis. Monterey, CA: Wadsworth.
![Page 46: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/46.jpg)
3DDon’tbeafraid Dang, T. N., Wilkinson, L., and Anand, A. (2010). Stacking graphic elements to avoid over-plotting. Proceedings of
the IEEE Symposium on Information Visualization 2010, October, 23-25, Salt Lake City, UT.
![Page 47: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/47.jpg)
ParallelCoordinatesContinuousandCategoricalVariablesInselberg, A. (2009). Parallel Coordinates: Visual Multidimensional Geometry and Its Applications. New York: Springer.
![Page 48: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/48.jpg)
ManifoldLearningtSNEL.J.P. van der Maaten and G.E. Hinton (2008). Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research 9, 2579-2605.https://lvdmaaten.github.io/tsne/
MNISTdata:http://yann.lecun.com/exdb/mnist/index.html.
SammonmappingtSNE ISOMAP
![Page 49: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/49.jpg)
HeatmapsBohdan Bohdanovich Khomtchouk, Unpublished, Stanford.
Behrisch, M., Schreck, T., and Pfister, H.P. (2019). GUIRO: User-Guided Matrix Reordering, VisWeek, TVCG.
Wilkinson, L. and Friendly, M (2009). The History of the Cluster Heat Map. The American Statistician 63(2), 179-184.
![Page 50: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/50.jpg)
HeatmapsWilkinson, L. (2008). The Grammar of Graphics. New York:Springer.
![Page 51: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/51.jpg)
Heatmaps
![Page 52: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/52.jpg)
3DSpinningPrim-9
Donoho, A.W., Donoho, D.L., and Gasko, M. (1988). MacSpin: Dynamic graphics on a desktop computer. In W.S. Cleveland and M.E. McGill, (Eds.), Dynamic Graphics for Statistics (pp. 331–351). Belmont, CA: Wadsworth.
![Page 53: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/53.jpg)
TablePlot
Malik, W.A. et al. (2010) An Interactive Graphical System for Visualizing Data Quality: Tableplot Graphics. In H. Loracek-Junge & C. Weihs (eds.), Classification as a Tool for Research, Proceedings of the 11th IFCS Conference. Berlin: Springer, 331-339.
Malik, Unwin, Gribov (2010).
![Page 54: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/54.jpg)
AutoVisGraham Wills and Leland Wilkinson. 2010. AutoVis: Automatic visualization. Information Visualization 9, 1 (March 2010), 47-69.
![Page 55: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/55.jpg)
H2OAutoViz
![Page 56: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/56.jpg)
References
Donoho, D. (2017) 50 Years of Data Science. Journal of Computational and Graphical Statistics 26 (4), 745-766.
Tukey, J. W. (1962). The Future of Data Analysis. Ann. Math. Statist. 33 (1), 1-67.
Breiman, L. (2001). Statistical Modeling: The Two Cultures. Statist. Sci. 16 (3), 199-231.
Friedman, J. (2001). Data Mining and Statistics: What’s the connection? Proc. 29th Symposium on the Interface.
Cleveland, W.S. (2001). Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics. International Statistical Review, 69 (1), 21-26.
![Page 57: Big Data Visualization · Big Data File size is a useless statistic (giga, tera, peta, exa, zetta, …) Graph data how many nodes and edges? Rectangular data how many rows and columns?](https://reader031.vdocuments.mx/reader031/viewer/2022040223/5e4dc955d124d3078c4190be/html5/thumbnails/57.jpg)
ThankYou!