databricks community cloud
TRANSCRIPT
Databricks Community Cloud
By: Robert Sanders
2Page:
Databricks Community Cloud
• Free/Paid Standalone Spark Cluster•Online Notebook• Python• R• Scala• SQL
• Tutorials and Guides• Shareable Notebooks
3Page:
Why is it useful?
• Learning about Spark• Testing different versions of Spark• Rapid Prototyping•Data Analysis• Saved Code•Others…
4Page:
Forumshttps://forums.databricks.com/
5Page:
Login/Sign Uphttps://community.cloud.databricks.com/login.html
6Page:
Home Page
7Page:
Active Clusters
8Page:
Create a Cluster - Steps
1. From the Active Clusters page, click the “+ Create Cluster” button
2. Fill in the cluster name3. Select the version of Apache Spark4. Click “Create Cluster”5. Wait for the Cluster to start up and be in a
“Running” state
9Page:
Create a Cluster
10Page:
Active Clusters
11Page:
Active Clusters – Spark Cluster UI - Master
12Page:
Workspaces
13Page:
Create a Notebook - Steps
1. Right click within a Workspace and click Create -> Notebook
2. Fill in the Name3. Select the programming language4. Select the running cluster you’ve created that you
want to attach to the Notebook5. Click the “Create” button
14Page:
Create a Notebook
15Page:
Notebook
16Page:
Using the Notebook
17Page:
Using the Notebook – Code Snippets
> sc
> sc.parallelize(1 to 5).collect()
18Page:
Using the Notebook - Shortcuts
Short Cut ActionShift + Enter Run Selected Cell and Move to
next CellCtrl + Enter Run Selected CellOption + Enter Run Selected Cell and Insert Cell
BellowCtrl + Alt + P Create Cell Above Current CellCtrl + Alt + N Create Cell Bellow Selected
Cell
19Page:
Tables
20Page:
Create a Table - Steps
1. From the Tables section, click “+ Create Table”2. Select the Data Source (bellow steps assume you’re using
File as the Data Source)3. Upload a file from your local file system
1. Supported file types: CSV, JSON, Avro, Parquet4. Click Preview Table5. Fill in the Table Name6. Select the File Type and other Options depending on the File
Type7. Change Column Names and Types as desired8. Click “Create Table”
21Page:
Create a Table – Upload File
22Page:
Create a Table – Configure Table
23Page:
Create a Table – Review Table
24Page:
Notebook – Access Table
25Page:
Notebook – Access Table – Code Snippets
> sqlContext
> sqlContext.sql("show tables").collect()
> val got = sqlContext.sql("select * from got")> got.limit(10).collect()
26Page:
Notebook – Display
27Page:
Notebook – Data Cleaning for Charting
28Page:
Notebook – Plot Options
29Page:
Notebook – Charting
30Page:
Notebook – Display and Charting – Code Snippets
> filter(got)
> val got = sqlContext.sql("select * from got")> got.limit(10).collect()
> import org.apache.spark.sql.functions._ > val allegiancesCleanupUDF = udf[String, String] (_.toLowerCase().replace("house ", ""))> val isDeathUDF = udf{ deathYear: Integer => if(deathYear != null) 1 else 0}> val gotCleaned = got.filter("Allegiances != \"None\"").withColumn("Allegiances", allegiancesCleanupUDF($"Allegiances")).withColumn("isDeath", isDeathUDF($"Death Year"))> display(gotCleaned)
31Page:
Publish Notebook - Steps
1. While in a Notebook, click “Publish” on the top right
2. Click “Publish” on the pop up3. Copy the link and send it out
32Page:
Publish Notebook