spark cassandra connector dataframes
TRANSCRIPT
![Page 1: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/1.jpg)
Cassandra And Spark Dataframes
Russell Spitzer Software Engineer @ Datastax
![Page 2: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/2.jpg)
Cassandra And Spark Dataframes
![Page 3: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/3.jpg)
Cassandra And Spark Dataframes
![Page 4: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/4.jpg)
Cassandra And Spark Dataframes
![Page 5: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/5.jpg)
Cassandra And Spark Dataframes
![Page 6: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/6.jpg)
Tungsten Gives Dataframes OffHeap Power!
Can compare memory off-heap and bitwise! Code generation!
![Page 7: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/7.jpg)
The Core is the Cassandra Source
https://github.com/datastax/spark-cassandra-connector/tree/master/spark-cassandra-connector/src/main/scala/org/apache/spark/sql/cassandra
/** * Implements [[BaseRelation]]]], [[InsertableRelation]]]] and [[PrunedFilteredScan]]]] * It inserts data to and scans Cassandra table. If filterPushdown is true, it pushs down * some filters to CQL * */
DataFrame
source org.apache.spark.sql.cassandra
![Page 8: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/8.jpg)
The Core is the Cassandra Source
https://github.com/datastax/spark-cassandra-connector/tree/master/spark-cassandra-connector/src/main/scala/org/apache/spark/sql/cassandra
/** * Implements [[BaseRelation]]]], [[InsertableRelation]]]] and [[PrunedFilteredScan]]]] * It inserts data to and scans Cassandra table. If filterPushdown is true, it pushs down * some filters to CQL * */
DataFrameCassandraSourceRelation
CassandraTableScanRDDConfiguration
![Page 9: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/9.jpg)
Configuration Can Be Done on a Per Source Level
clusterName:keyspaceName/propertyName. Example Changing Cluster/Keyspace Level Properties val conf = new SparkConf() .set("ClusterOne/spark.cassandra.input.split.size_in_mb","32") .set("default:test/spark.cassandra.input.split.size_in_mb","128")
val lastdf = sqlContext .read .format("org.apache.spark.sql.cassandra") .options(Map( "table" -> "words", "keyspace" -> "test" , "cluster" -> "ClusterOne" ) ).load()
![Page 10: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/10.jpg)
Configuration Can Be Done on a Per Source Level
clusterName:keyspaceName/propertyName. Example Changing Cluster/Keyspace Level Properties val conf = new SparkConf() .set("ClusterOne/spark.cassandra.input.split.size_in_mb","32") .set("default:test/spark.cassandra.input.split.size_in_mb","128")
val lastdf = sqlContext .read .format("org.apache.spark.sql.cassandra") .options(Map( "table" -> "words", "keyspace" -> "test" , "cluster" -> "ClusterOne" ) ).load()
Namespace: ClusterOne spark.cassandra.input.split.size_in_mb=32
![Page 11: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/11.jpg)
Configuration Can Be Done on a Per Source Level
clusterName:keyspaceName/propertyName. Example Changing Cluster/Keyspace Level Properties val conf = new SparkConf() .set("ClusterOne/spark.cassandra.input.split.size_in_mb","32") .set("default:test/spark.cassandra.input.split.size_in_mb","128")
val lastdf = sqlContext .read .format("org.apache.spark.sql.cassandra") .options(Map( "table" -> "words", "keyspace" -> "test" , "cluster" -> "ClusterOne" ) ).load()
Namespace: default Keyspace: test
spark.cassandra.input.split.size_in_mb=128
Namespace: ClusterOne spark.cassandra.input.split.size_in_mb=32
![Page 12: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/12.jpg)
Configuration Can Be Done on a Per Source Level
clusterName:keyspaceName/propertyName. Example Changing Cluster/Keyspace Level Properties val conf = new SparkConf() .set("ClusterOne/spark.cassandra.input.split.size_in_mb","32") .set("default:test/spark.cassandra.input.split.size_in_mb","128")
val lastdf = sqlContext .read .format("org.apache.spark.sql.cassandra") .options(Map( "table" -> "words", "keyspace" -> "test" , "cluster" -> "ClusterOne" ) ).load()
Namespace: default Keyspace: test
spark.cassandra.input.split.size_in_mb=128
Namespace: ClusterOne spark.cassandra.input.split.size_in_mb=32
![Page 13: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/13.jpg)
Configuration Can Be Done on a Per Source Level
clusterName:keyspaceName/propertyName. Example Changing Cluster/Keyspace Level Properties val conf = new SparkConf() .set("ClusterOne/spark.cassandra.input.split.size_in_mb","32") .set("default:test/spark.cassandra.input.split.size_in_mb","128")
val lastdf = sqlContext .read .format("org.apache.spark.sql.cassandra") .options(Map( "table" -> "words", "keyspace" -> "test" , "cluster" -> "default" ) ).load()
Namespace: default Keyspace: test
spark.cassandra.input.split.size_in_mb=128
Namespace: ClusterOne spark.cassandra.input.split.size_in_mb=32
![Page 14: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/14.jpg)
Configuration Can Be Done on a Per Source Level
clusterName:keyspaceName/propertyName. Example Changing Cluster/Keyspace Level Properties val conf = new SparkConf() .set("ClusterOne/spark.cassandra.input.split.size_in_mb","32") .set("default:test/spark.cassandra.input.split.size_in_mb","128")
val lastdf = sqlContext .read .format("org.apache.spark.sql.cassandra") .options(Map( "table" -> "words", "keyspace" -> "other" , "cluster" -> "default" ) ).load()
Namespace: default Keyspace: test
spark.cassandra.input.split.size_in_mb=128
Namespace: ClusterOne spark.cassandra.input.split.size_in_mb=32
Connector Default
![Page 15: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/15.jpg)
Predicate Pushdown Is Automatic!
Select * From cassandraTable where clusteringKey > 100
![Page 16: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/16.jpg)
Predicate Pushdown Is Automatic!
Select * From cassandraTable where clusteringKey > 100
DataFrame DataFromC*
Filter clusteringKey > 100
Show
![Page 17: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/17.jpg)
Predicate Pushdown Is Automatic!
Select * From cassandraTable where clusteringKey > 100
DataFrame DataFromC*
Filter clusteringKey > 100
Show
Catalyst
![Page 18: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/18.jpg)
Predicate Pushdown Is Automatic!
Select * From cassandraTable where clusteringKey > 100
DataFrame DataFromC*
Filter clusteringKey > 100
Show
Catalyst
https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/main/scala/org/apache/spark/sql/cassandra/PredicatePushDown.scala
![Page 19: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/19.jpg)
Predicate Pushdown Is Automatic!
Select * From cassandraTable where clusteringKey > 100
DataFrame DataFromC* AND
add where clause to CQL
"clusteringKey > 100"
Show
Catalyst
https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/main/scala/org/apache/spark/sql/cassandra/PredicatePushDown.scala
![Page 20: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/20.jpg)
What can be pushed down?
1. Only push down no-partition key column predicates with =, >, <, >=, <= predicate 2. Only push down primary key column predicates with = or IN predicate. 3. If there are regular columns in the pushdown predicates, they should have at least one EQ
expression on an indexed column and no IN predicates. 4. All partition column predicates must be included in the predicates to be pushed down, only
the last part of the partition key can be an IN predicate. For each partition column, only one predicate is allowed.
5. For cluster column predicates, only last predicate can be non-EQ predicate including IN predicate, and preceding column predicates must be EQ predicates.
6. If there is only one cluster column predicate, the predicates could be any non-IN predicate. There is no pushdown predicates if there is any OR condition or NOT IN condition.
7. We're not allowed to push down multiple predicates for the same column if any of them is equality or IN predicate.
![Page 21: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/21.jpg)
What can be pushed down?
If you could write in CQL it will get pushed down.
![Page 22: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/22.jpg)
What are we Pushing Down To?
CassandraTableScanRDD
All of the underlying code is the same as with sc.cassandraTable so everything with Reading and Writing
applies
![Page 23: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/23.jpg)
What are we Pushing Down To?
CassandraTableScanRDD
All of the underlying code is the same as with sc.cassandraTable so everything with Reading and Writing
applies
https://academy.datastax.com/ Watch me talk about this in the privacy of your own home!
![Page 24: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/24.jpg)
How the Spark Cassandra Connector
Reads Data
![Page 25: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/25.jpg)
Spark RDDs Represent a Large
Amount of Data Partitioned into Chunks
RDD
1 2 3
4 5 6
7 8 9Node 2
Node 1 Node 3
Node 4
![Page 26: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/26.jpg)
Node 2
Node 1
Spark RDDs Represent a Large
Amount of Data Partitioned into Chunks
RDD
2
346
7 8 9
Node 3
Node 4
1 5
![Page 27: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/27.jpg)
Node 2
Node 1
RDD
2
346
7 8 9
Node 3
Node 4
1 5
Spark RDDs Represent a Large
Amount of Data Partitioned into Chunks
![Page 28: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/28.jpg)
Cassandra Data is Distributed By Token Range
![Page 29: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/29.jpg)
Cassandra Data is Distributed By Token Range
0
500
![Page 30: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/30.jpg)
Cassandra Data is Distributed By Token Range
0
500
999
![Page 31: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/31.jpg)
Cassandra Data is Distributed By Token Range
0
500
Node 1
Node 2
Node 3
Node 4
![Page 32: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/32.jpg)
Cassandra Data is Distributed By Token Range
0
500
Node 1
Node 2
Node 3
Node 4
Without vnodes
![Page 33: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/33.jpg)
Cassandra Data is Distributed By Token Range
0
500
Node 1
Node 2
Node 3
Node 4
With vnodes
![Page 34: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/34.jpg)
Node 1
120-220300-500780-830
0-50
spark.cassandra.input.split_size_in_mb 1
Reported density is 100 tokens per mb
The Connector Uses Information on the Node to Make Spark Partitions
![Page 35: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/35.jpg)
Node 1
120-220300-500
0-50
The Connector Uses Information on the Node to Make Spark Partitions
1
780-830
spark.cassandra.input.split_size_in_mb 1
Reported density is 100 tokens per mb
![Page 36: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/36.jpg)
1
Node 1
120-220
300-500
0-50
The Connector Uses Information on the Node to Make Spark Partitions
780-830
spark.cassandra.input.split_size_in_mb 1
Reported density is 100 tokens per mb
![Page 37: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/37.jpg)
2
1
Node 1 300-500
0-50
The Connector Uses Information on the Node to Make Spark Partitions
780-830
spark.cassandra.input.split_size_in_mb 1
Reported density is 100 tokens per mb
![Page 38: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/38.jpg)
2
1
Node 1 300-500
0-50
The Connector Uses Information on the Node to Make Spark Partitions
780-830
spark.cassandra.input.split_size_in_mb 1
Reported density is 100 tokens per mb
![Page 39: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/39.jpg)
2
1
Node 1
300-400
0-50
The Connector Uses Information on the Node to Make Spark Partitions
780-830400-500
spark.cassandra.input.split_size_in_mb 1
Reported density is 100 tokens per mb
![Page 40: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/40.jpg)
21
Node 1
0-50
The Connector Uses Information on the Node to Make Spark Partitions
780-830400-500
spark.cassandra.input.split_size_in_mb 1
Reported density is 100 tokens per mb
![Page 41: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/41.jpg)
21
Node 1
0-50
The Connector Uses Information on the Node to Make Spark Partitions
780-830400-500
3
spark.cassandra.input.split_size_in_mb 1
Reported density is 100 tokens per mb
![Page 42: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/42.jpg)
21
Node 1
0-50
The Connector Uses Information on the Node to Make Spark Partitions
780-830
3
400-500
spark.cassandra.input.split_size_in_mb 1
Reported density is 100 tokens per mb
![Page 43: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/43.jpg)
21
Node 1
0-50
The Connector Uses Information on the Node to Make Spark Partitions
780-830
3
spark.cassandra.input.split_size_in_mb 1
Reported density is 100 tokens per mb
![Page 44: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/44.jpg)
4
21
Node 1
0-50
The Connector Uses Information on the Node to Make Spark Partitions
780-830
3
spark.cassandra.input.split_size_in_mb 1
Reported density is 100 tokens per mb
![Page 45: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/45.jpg)
4
21
Node 1
0-50
The Connector Uses Information on the Node to Make Spark Partitions
780-830
3
spark.cassandra.input.split_size_in_mb 1
Reported density is 100 tokens per mb
![Page 46: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/46.jpg)
421
Node 1
The Connector Uses Information on the Node to Make Spark Partitions
3
spark.cassandra.input.split_size_in_mb 1
Reported density is 100 tokens per mb
![Page 47: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/47.jpg)
4
spark.cassandra.input.page.row.size 50
Data is Retrieved Using the DataStax Java Driver
0-50780-830
Node 1
![Page 48: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/48.jpg)
4
spark.cassandra.input.page.row.size 50
Data is Retrieved Using the DataStax Java Driver
0-50
780-830
Node 1
SELECT * FROM keyspace.table WHERE token(pk) > 780 and token(pk) <= 830
SELECT * FROM keyspace.table WHERE token(pk) > 0 and token(pk) <= 50
![Page 49: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/49.jpg)
4
spark.cassandra.input.page.row.size 50
Data is Retrieved Using the DataStax Java Driver
0-50
780-830
Node 1
SELECT * FROM keyspace.table WHERE token(pk) > 780 and token(pk) <= 830
SELECT * FROM keyspace.table WHERE token(pk) > 0 and token(pk) <= 50
![Page 50: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/50.jpg)
4
spark.cassandra.input.page.row.size 50
Data is Retrieved Using the DataStax Java Driver
0-50
780-830
Node 1
SELECT * FROM keyspace.table WHERE token(pk) > 780 and token(pk) <= 830
SELECT * FROM keyspace.table WHERE token(pk) > 0 and token(pk) <= 50
50 CQL Rows
![Page 51: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/51.jpg)
4
spark.cassandra.input.page.row.size 50
Data is Retrieved Using the DataStax Java Driver
0-50
780-830
Node 1
SELECT * FROM keyspace.table WHERE token(pk) > 780 and token(pk) <= 830
SELECT * FROM keyspace.table WHERE token(pk) > 0 and token(pk) <= 50
50 CQL Rows
![Page 52: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/52.jpg)
4
spark.cassandra.input.page.row.size 50
Data is Retrieved Using the DataStax Java Driver
0-50
780-830
Node 1
SELECT * FROM keyspace.table WHERE token(pk) > 780 and token(pk) <= 830
SELECT * FROM keyspace.table WHERE token(pk) > 0 and token(pk) <= 50
50 CQL Rows50 CQL Rows
![Page 53: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/53.jpg)
4
spark.cassandra.input.page.row.size 50
Data is Retrieved Using the DataStax Java Driver
0-50
780-830
Node 1
SELECT * FROM keyspace.table WHERE token(pk) > 780 and token(pk) <= 830
SELECT * FROM keyspace.table WHERE token(pk) > 0 and token(pk) <= 50
50 CQL Rows50 CQL Rows
![Page 54: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/54.jpg)
4
spark.cassandra.input.page.row.size 50
Data is Retrieved Using the DataStax Java Driver
0-50
780-830
Node 1
SELECT * FROM keyspace.table WHERE token(pk) > 780 and token(pk) <= 830
SELECT * FROM keyspace.table WHERE token(pk) > 0 and token(pk) <= 50
50 CQL Rows50 CQL Rows50 CQL Rows
![Page 55: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/55.jpg)
4
spark.cassandra.input.page.row.size 50
Data is Retrieved Using the DataStax Java Driver
0-50
780-830
Node 1
SELECT * FROM keyspace.table WHERE token(pk) > 780 and token(pk) <= 830
SELECT * FROM keyspace.table WHERE token(pk) > 0 and token(pk) <= 50
50 CQL Rows50 CQL Rows50 CQL Rows
![Page 56: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/56.jpg)
4
spark.cassandra.input.page.row.size 50
Data is Retrieved Using the DataStax Java Driver
0-50
780-830
Node 1
SELECT * FROM keyspace.table WHERE token(pk) > 780 and token(pk) <= 830
SELECT * FROM keyspace.table WHERE token(pk) > 0 and token(pk) <= 50
50 CQL Rows50 CQL Rows50 CQL Rows 50 CQL Rows
![Page 57: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/57.jpg)
4
spark.cassandra.input.page.row.size 50
Data is Retrieved Using the DataStax Java Driver
0-50
780-830
Node 1
SELECT * FROM keyspace.table WHERE token(pk) > 780 and token(pk) <= 830
SELECT * FROM keyspace.table WHERE token(pk) > 0 and token(pk) <= 50
50 CQL Rows50 CQL Rows50 CQL Rows50 CQL Rows
![Page 58: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/58.jpg)
4
spark.cassandra.input.page.row.size 50
Data is Retrieved Using the DataStax Java Driver
0-50
780-830
Node 1
SELECT * FROM keyspace.table WHERE token(pk) > 780 and token(pk) <= 830
SELECT * FROM keyspace.table WHERE token(pk) > 0 and token(pk) <= 50
50 CQL Rows50 CQL Rows50 CQL Rows50 CQL Rows 50 CQL Rows
![Page 59: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/59.jpg)
4
spark.cassandra.input.page.row.size 50
Data is Retrieved Using the DataStax Java Driver
0-50
780-830
Node 1
SELECT * FROM keyspace.table WHERE token(pk) > 780 and token(pk) <= 830
SELECT * FROM keyspace.table WHERE token(pk) > 0 and token(pk) <= 50
50 CQL Rows50 CQL Rows50 CQL Rows50 CQL Rows50 CQL Rows
![Page 60: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/60.jpg)
4
spark.cassandra.input.page.row.size 50
Data is Retrieved Using the DataStax Java Driver
0-50
780-830
Node 1
SELECT * FROM keyspace.table WHERE token(pk) > 0 and token(pk) <= 5050 CQL Rows50 CQL Rows50 CQL Rows50 CQL Rows50 CQL Rows
![Page 61: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/61.jpg)
4
spark.cassandra.input.page.row.size 50
Data is Retrieved Using the DataStax Java Driver
0-50
780-830
Node 1
SELECT * FROM keyspace.table WHERE token(pk) > 0 and token(pk) <= 5050 CQL Rows50 CQL Rows50 CQL Rows50 CQL Rows50 CQL Rows 50 CQL Rows
![Page 62: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/62.jpg)
4
spark.cassandra.input.page.row.size 50
Data is Retrieved Using the DataStax Java Driver
0-50
780-830
Node 1
SELECT * FROM keyspace.table WHERE token(pk) > 0 and token(pk) <= 5050 CQL Rows50 CQL Rows50 CQL Rows50 CQL Rows50 CQL Rows
50 CQL Rows
![Page 63: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/63.jpg)
4
spark.cassandra.input.page.row.size 50
Data is Retrieved Using the DataStax Java Driver
0-50
780-830
Node 1
SELECT * FROM keyspace.table WHERE token(pk) > 0 and token(pk) <= 5050 CQL Rows50 CQL Rows50 CQL Rows50 CQL Rows50 CQL Rows
50 CQL Rows
50 CQL Rows50 CQL Rows50 CQL Rows50 CQL Rows
![Page 64: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/64.jpg)
4
spark.cassandra.input.page.row.size 50
Data is Retrieved Using the DataStax Java Driver
0-50
780-830
Node 1
SELECT * FROM keyspace.table WHERE token(pk) > 0 and token(pk) <= 5050 CQL Rows50 CQL Rows50 CQL Rows50 CQL Rows50 CQL Rows
50 CQL Rows50 CQL Rows50 CQL Rows50 CQL Rows50 CQL Rows
![Page 65: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/65.jpg)
How The Spark Cassandra Connector
Writes Data
![Page 66: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/66.jpg)
Spark RDDs Represent a Large
Amount of Data Partitioned into Chunks
RDD
1 2 3
4 5 6
7 8 9Node 2
Node 1 Node 3
Node 4
![Page 67: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/67.jpg)
Node 2
Node 1
Spark RDDs Represent a Large
Amount of Data Partitioned into Chunks
RDD
2
346
7 8 9
Node 3
Node 4
1 5
![Page 68: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/68.jpg)
Node 2
Node 1
RDD
2
346
7 8 9
Node 3
Node 4
1 5
The Spark Cassandra Connector saveToCassandra
method can be called on almost all RDDs
rdd.saveToCassandra("Keyspace","Table")
![Page 69: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/69.jpg)
Node 11
A Java Driver connection is made to the local node and a prepared statement
is built for the target table
Java Driver
![Page 70: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/70.jpg)
Node 11
Batches are built from data in Spark partitions
Java Driver
1,1,1
1,2,1
2,1,1
3,8,1
3,2,1
3,4,1
3,5,1
3,1,1
1,4,1
5,4,1
2,4,1
8,4,1
9,4,1
3,9,1
![Page 71: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/71.jpg)
Node 11
By default these batches only contain CQL Rows which share the same
partition key
Java Driver
1,1,1
1,2,1
2,1,1
3,8,1
3,2,1
3,4,1
3,5,1
3,1,1
1,4,1
5,4,1
2,4,1
8,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition spark.cassandra.output.batch.size.rows 4 spark.cassandra.output.batch.buffer.size 3 spark.cassandra.output.concurrent.writes 2 spark.cassandra.output.throughput_mb_per_sec 5
3,9,1
![Page 72: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/72.jpg)
Node 11 Java Driver
1,1,11,2,1
2,1,1
3,8,1
3,2,1
3,4,1
3,5,1
3,1,1
1,4,1
5,4,1
2,4,1
8,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition spark.cassandra.output.batch.size.rows 4 spark.cassandra.output.batch.buffer.size 3 spark.cassandra.output.concurrent.writes 2 spark.cassandra.output.throughput_mb_per_sec 5
3,9,1
By default these batches only contain CQL Rows which share the same
partition key
PK=1
![Page 73: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/73.jpg)
Node 11
When an element is not part of an existing batch, a new batch is started
Java Driver
1,1,1 1,2,1
2,1,1
3,8,1
3,2,1
3,4,1
3,5,1
3,1,1
1,4,1
5,4,1
2,4,1
8,4,1
9,4,1
11,4,
spark.cassandra.output.batch.grouping.key partition spark.cassandra.output.batch.size.rows 4 spark.cassandra.output.batch.buffer.size 3 spark.cassandra.output.concurrent.writes 2 spark.cassandra.output.throughput_mb_per_sec 5
3,9,1
PK=1
![Page 74: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/74.jpg)
Node 11 Java Driver
1,1,1 1,2,1
2,1,1
3,8,1
3,2,1
3,4,1
3,5,1
3,1,1
1,4,1
5,4,1
2,4,1
8,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition spark.cassandra.output.batch.size.rows 4 spark.cassandra.output.batch.buffer.size 3 spark.cassandra.output.concurrent.writes 2 spark.cassandra.output.throughput_mb_per_sec 5
3,9,1
When an element is not part of an existing batch, a new batch is started
PK=1
PK=2
![Page 75: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/75.jpg)
Node 11 Java Driver
1,1,1 1,2,1
2,1,1
3,8,1
3,2,1
3,4,1
3,5,1
3,1,1
1,4,1
5,4,1
2,4,1
8,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition spark.cassandra.output.batch.size.rows 4 spark.cassandra.output.batch.buffer.size 3 spark.cassandra.output.concurrent.writes 2 spark.cassandra.output.throughput_mb_per_sec 5
3,9,1
When an element is not part of an existing batch, a new batch is started
PK=1
PK=2
![Page 76: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/76.jpg)
Node 11 Java Driver
1,1,1 1,2,1
2,1,1
3,8,13,2,1 3,4,1 3,5,1
3,1,1
1,4,1
5,4,1
2,4,1
8,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition spark.cassandra.output.batch.size.rows 4 spark.cassandra.output.batch.buffer.size 3 spark.cassandra.output.concurrent.writes 2 spark.cassandra.output.throughput_mb_per_sec 5
3,9,1
If a batch size reaches batch.size.rows or batch.size.bytes
it is executed by the driver
PK=1
PK=2
PK=3
![Page 77: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/77.jpg)
Node 11 Java Driver
1,1,1 1,2,1
2,1,1
3,8,13,2,1 3,4,1 3,5,1
3,1,1
1,4,1
5,4,1
2,4,1
8,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition spark.cassandra.output.batch.size.rows 4 spark.cassandra.output.batch.buffer.size 3 spark.cassandra.output.concurrent.writes 2 spark.cassandra.output.throughput_mb_per_sec 5
3,9,1
PK=1
PK=2
PK=3
If a batch size reaches batch.size.rows or batch.size.bytes
it is executed by the driver
![Page 78: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/78.jpg)
Node 11 Java Driver
1,1,1 1,2,1
2,1,1
1,4,1
5,4,1
2,4,1
8,4,1
9,4,1
11,4,3,9,1
3,1,1
spark.cassandra.output.batch.grouping.key partition spark.cassandra.output.batch.size.rows 4 spark.cassandra.output.batch.buffer.size 3 spark.cassandra.output.concurrent.writes 2 spark.cassandra.output.throughput_mb_per_sec 5
If a batch size reaches batch.size.rows or batch.size.bytes
it is executed by the driver
PK=1
PK=2
![Page 79: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/79.jpg)
Node 11 Java Driver
1,1,1 1,2,1
2,1,1
3,1,1
1,4,1
5,4,1
2,4,1
8,4,1
9,4,1
11,4,3,9,1 spark.cassandra.output.batch.grouping.key partition spark.cassandra.output.batch.size.rows 4 spark.cassandra.output.batch.buffer.size 3 spark.cassandra.output.concurrent.writes 2 spark.cassandra.output.throughput_mb_per_sec 5
If a batch size reaches batch.size.rows or batch.size.bytes
it is executed by the driver
PK=1
PK=2
PK=3
![Page 80: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/80.jpg)
Node 11
If more than batch.buffer.size batches are currently being made,
the largest batch is executed by the Java Driver
Java Driver
1,1,1 1,2,1
2,1,1
3,1,1
1,4,1
5,4,1
2,4,1
8,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition spark.cassandra.output.batch.size.rows 4 spark.cassandra.output.batch.buffer.size 3 spark.cassandra.output.concurrent.writes 2 spark.cassandra.output.throughput_mb_per_sec 5
3,9,1
PK=1
PK=2
PK=3
![Page 81: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/81.jpg)
Node 11 Java Driver
2,1,1
3,1,1
5,4,1
2,4,1
8,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition spark.cassandra.output.batch.size.rows 4 spark.cassandra.output.batch.buffer.size 3 spark.cassandra.output.concurrent.writes 2 spark.cassandra.output.throughput_mb_per_sec 5
3,9,1
PK=2
PK=3
If more than batch.buffer.size batches are currently being made,
the largest batch is executed by the Java Driver
![Page 82: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/82.jpg)
Node 11 Java Driver
2,1,1
3,1,1
5,4,1
2,4,1
8,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition spark.cassandra.output.batch.size.rows 4 spark.cassandra.output.batch.buffer.size 3 spark.cassandra.output.concurrent.writes 2 spark.cassandra.output.throughput_mb_per_sec 5
3,9,1
If more than batch.buffer.size batches are currently being made,
the largest batch is executed by the Java Driver
PK=2
PK=3
PK=5
![Page 83: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/83.jpg)
Node 11 Java Driver
2,1,1
3,1,1
5,4,1
2,4,1
8,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition spark.cassandra.output.batch.size.rows 4 spark.cassandra.output.batch.buffer.size 3 spark.cassandra.output.concurrent.writes 2 spark.cassandra.output.throughput_mb_per_sec 5
3,9,1
If more than batch.buffer.size batches are currently being made,
the largest batch is executed by the Java Driver
PK=2
PK=3
PK=5
![Page 84: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/84.jpg)
Node 11
If more batches are currently being executed by the Java driver than concurrent.writes, we
wait until one of the requests has been completed.
Java Driver
2,1,1
3,1,1
5,4,1
2,4,18,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition spark.cassandra.output.batch.size.rows 4 spark.cassandra.output.batch.buffer.size 3 spark.cassandra.output.concurrent.writes 2 spark.cassandra.output.throughput_mb_per_sec 5
3,9,13,9,1
PK=2
PK=3
PK=5
![Page 85: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/85.jpg)
Node 11
If more batches are currently being executed by the Java driver than concurrent.writes, we
wait until one of the requests has been completed.
Java Driver
2,1,1
3,1,1
5,4,1
2,4,18,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition spark.cassandra.output.batch.size.rows 4 spark.cassandra.output.batch.buffer.size 3 spark.cassandra.output.concurrent.writes 2 spark.cassandra.output.throughput_mb_per_sec 5
3,9,13,9,1
Write Acknowledged PK=2
PK=3
PK=5
![Page 86: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/86.jpg)
Node 11
If more batches are currently being executed by the Java driver than concurrent.writes, we
wait until one of the requests has been completed.
Java Driver
2,1,1
3,1,1
5,4,1
2,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition spark.cassandra.output.batch.size.rows 4 spark.cassandra.output.batch.buffer.size 3 spark.cassandra.output.concurrent.writes 2 spark.cassandra.output.throughput_mb_per_sec 5
8,4,1
3,9,1
PK=2
PK=3
PK=5
![Page 87: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/87.jpg)
Node 11
If more batches are currently being executed by the Java driver than concurrent.writes, we
wait until one of the requests has been completed.
Java Driver
3,1,1
5,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition spark.cassandra.output.batch.size.rows 4 spark.cassandra.output.batch.buffer.size 3 spark.cassandra.output.concurrent.writes 2 spark.cassandra.output.throughput_mb_per_sec 5
8,4,1
3,9,1
PK=3
PK=5
![Page 88: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/88.jpg)
Node 11
If more batches are currently being executed by the Java driver than concurrent.writes, we
wait until one of the requests has been completed.
Java Driver
3,1,1
5,4,1
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition spark.cassandra.output.batch.size.rows 4 spark.cassandra.output.batch.buffer.size 3 spark.cassandra.output.concurrent.writes 2 spark.cassandra.output.throughput_mb_per_sec 5
8,4,1
3,9,1
PK=8
PK=3
PK=5
![Page 89: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/89.jpg)
Node 11
If more batches are currently being executed by the Java driver than concurrent.writes, we
wait until one of the requests has been completed.
Java Driver
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition spark.cassandra.output.batch.size.rows 4 spark.cassandra.output.batch.buffer.size 3 spark.cassandra.output.concurrent.writes 2 spark.cassandra.output.throughput_mb_per_sec 5
3,1,1
5,4,1
8,4,1
3,9,1
PK=8
PK=3
PK=5
![Page 90: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/90.jpg)
Node 11
The last parameter throughput_mb_per_sec blocks further batches if we have written more than
that much in the past second.
Java Driver
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition spark.cassandra.output.batch.size.rows 4 spark.cassandra.output.batch.buffer.size 3 spark.cassandra.output.concurrent.writes 2 spark.cassandra.output.throughput_mb_per_sec 5
3,1,1
5,4,1
8,4,1
3,9,1
PK=8
PK=3
PK=5
![Page 91: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/91.jpg)
Node 11
The last parameter throughput_mb_per_sec blocks further batches if we have written more than
that much in the past second.
Java Driver
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition spark.cassandra.output.batch.size.rows 4 spark.cassandra.output.batch.buffer.size 3 spark.cassandra.output.concurrent.writes 2 spark.cassandra.output.throughput_mb_per_sec 5
3,1,1
5,4,1
8,4,1
3,9,1
PK=8
PK=3
PK=5
Write Acknowledged
![Page 92: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/92.jpg)
Node 11
The last parameter throughput_mb_per_sec blocks further batches if we have written more than
that much in the past second.
Java Driver
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition spark.cassandra.output.batch.size.rows 4 spark.cassandra.output.batch.buffer.size 3 spark.cassandra.output.concurrent.writes 2 spark.cassandra.output.throughput_mb_per_sec 5
3,1,1
5,4,1
8,4,1
3,9,1
PK=8
PK=3
PK=5
![Page 93: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/93.jpg)
Node 11
The last parameter throughput_mb_per_sec blocks further batches if we have written more than
that much in the past second.
Java Driver
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition spark.cassandra.output.batch.size.rows 4 spark.cassandra.output.batch.buffer.size 3 spark.cassandra.output.concurrent.writes 2 spark.cassandra.output.throughput_mb_per_sec 5
3,1,1
5,4,1
8,4,1
3,9,1
PK=8
PK=3
PK=5
Write Acknowledged
![Page 94: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/94.jpg)
Node 11
The last parameter throughput_mb_per_sec blocks further batches if we have written more than
that much in the past second.
Java Driver
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition spark.cassandra.output.batch.size.rows 4 spark.cassandra.output.batch.buffer.size 3 spark.cassandra.output.concurrent.writes 2 spark.cassandra.output.throughput_mb_per_sec 5
Block
3,1,1
5,4,1
8,4,1
3,9,1
PK=8
PK=3
PK=5
![Page 95: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/95.jpg)
Node 11
The last parameter throughput_mb_per_sec blocks further batches if we have written more than
that much in the past second.
Java Driver
9,4,1
11,4, spark.cassandra.output.batch.grouping.key partition spark.cassandra.output.batch.size.rows 4 spark.cassandra.output.batch.buffer.size 3 spark.cassandra.output.concurrent.writes 2 spark.cassandra.output.throughput_mb_per_sec 5
3,1,1
5,4,1
8,4,1
3,9,1
PK=8
PK=3
PK=5
![Page 96: Spark Cassandra Connector Dataframes](https://reader034.vdocuments.mx/reader034/viewer/2022050613/589ff4271a28ab46598b52d5/html5/thumbnails/96.jpg)
Thanks for Coming and I hope you Have a Great Time At C* Summit
http://cassandrasummit-datastax.com/agenda/the-spark-cassandra-connector-past-present-and-future/
Also ask these guys really hard questions
Jacek PiotrAlex