introduction to spark: or how i learned to love 'big data' after all
TRANSCRIPT
![Page 1: Introduction to Spark: Or how I learned to love 'big data' after all](https://reader033.vdocuments.mx/reader033/viewer/2022051707/58ed89941a28ab46518b4645/html5/thumbnails/1.jpg)
Introduction to Introduction to
Peadar Coyle @springcoil
Luxembourg - Early 2016
![Page 2: Introduction to Spark: Or how I learned to love 'big data' after all](https://reader033.vdocuments.mx/reader033/viewer/2022051707/58ed89941a28ab46518b4645/html5/thumbnails/2.jpg)
Aims of this talkAims of this talk
Explain what Spark is.Explain what Spark is.I'm more a data scientist than an engineer...
![Page 3: Introduction to Spark: Or how I learned to love 'big data' after all](https://reader033.vdocuments.mx/reader033/viewer/2022051707/58ed89941a28ab46518b4645/html5/thumbnails/3.jpg)
Who am I?Who am I?Math and Data nerdInterested in machine learning and data processingSpeaker at PyData/ PyCons throughout Europe
![Page 4: Introduction to Spark: Or how I learned to love 'big data' after all](https://reader033.vdocuments.mx/reader033/viewer/2022051707/58ed89941a28ab46518b4645/html5/thumbnails/4.jpg)
'Big data' so far'Big data' so far
![Page 5: Introduction to Spark: Or how I learned to love 'big data' after all](https://reader033.vdocuments.mx/reader033/viewer/2022051707/58ed89941a28ab46518b4645/html5/thumbnails/5.jpg)
Why care?Why care?big data analytics in memoryResilient Distributed Datasets (RDD)Flexible programming modelscomplements Hadoopbetter performance than Hadoophttps://github.com/springcoil/scalable_ml
![Page 6: Introduction to Spark: Or how I learned to love 'big data' after all](https://reader033.vdocuments.mx/reader033/viewer/2022051707/58ed89941a28ab46518b4645/html5/thumbnails/6.jpg)
Who uses it?Who uses it?Current tech not future tech!Current tech not future tech!
![Page 7: Introduction to Spark: Or how I learned to love 'big data' after all](https://reader033.vdocuments.mx/reader033/viewer/2022051707/58ed89941a28ab46518b4645/html5/thumbnails/7.jpg)
Supported LanguagesSupported Languages
![Page 8: Introduction to Spark: Or how I learned to love 'big data' after all](https://reader033.vdocuments.mx/reader033/viewer/2022051707/58ed89941a28ab46518b4645/html5/thumbnails/8.jpg)
val data = Array(1, 2, 3, 4, 5)val distData = sc.parallelize(data)
sc is 'Spark ContextHere is one RDD
CodeCode
![Page 9: Introduction to Spark: Or how I learned to love 'big data' after all](https://reader033.vdocuments.mx/reader033/viewer/2022051707/58ed89941a28ab46518b4645/html5/thumbnails/9.jpg)
https://github.com/springcoil/scalable_ml/
package scalable_ml
import org.apache.spark.mllib.regression.LabeledPointimport org.apache.spark.mllib.linalg.DenseVectorimport org.apache.spark.rdd.RDDimport breeze.linalg.{DenseVector => BDV}import breeze.linalg.{DenseMatrix => BDM}
class LeastSquaresRegression { def fit(dataset: RDD[LabeledPoint]): DenseVector = { val features = dataset.map { _.features }
val covarianceMatrix: BDM[Double] = features.map { v => val x = BDM(v.toArray) x.t * x }.reduce(_ + _) val featuresTimesLabels: BDV[Double] = dataset.map { xy => BDV(xy.features.toArray) * xy.label }.reduce(_ + _)
val weight = covarianceMatrix \ featuresTimesLabels
new DenseVector(weight.data) }}
![Page 10: Introduction to Spark: Or how I learned to love 'big data' after all](https://reader033.vdocuments.mx/reader033/viewer/2022051707/58ed89941a28ab46518b4645/html5/thumbnails/10.jpg)
Resilient DistributedResilient DistributedDatasets (RDD)Datasets (RDD)
Process in parallelActions on RDDs = transformations and actionspersistance: Memory, Disk, Memory and Disk
![Page 11: Introduction to Spark: Or how I learned to love 'big data' after all](https://reader033.vdocuments.mx/reader033/viewer/2022051707/58ed89941a28ab46518b4645/html5/thumbnails/11.jpg)
Spark EcosystemSpark Ecosystem
Spark streamingSpark SQL - Really the creation of a data frameMore stuff will come soon... IBM and others heavily investing in this.
![Page 12: Introduction to Spark: Or how I learned to love 'big data' after all](https://reader033.vdocuments.mx/reader033/viewer/2022051707/58ed89941a28ab46518b4645/html5/thumbnails/12.jpg)
Any questions?Any questions?