data federation with apache spark

Data Federation with SparkDan Marshall

[email protected]

06/13/2017

PostgreSQL - Spark

HBase Source

hbase(main):004:0* create 'hb_dept','cf1'

=> Hbase::Table - hb_dept

hbase(main):008:0* put 'hb_dept','M1','cf1:dept_name','Maintenance'

hbase(main):009:0> put 'hb_dept','F1','cf1:dept_name','Entertainment'

hbase(main):010:0> put 'hb_dept','S2','cf1:dept_name','Sports'

hbase(main):012:0* scan 'hb_dept'

ROW COLUMN+CELL

F1 column=cf1:dept_name, timestamp=1496621309775,

value=Entertainment

M1 column=cf1:dept_name, timestamp=1496621309741, value=Maintenance

S2 column=cf1:dept_name, timestamp=1496621309863, value=Sports

3 row(s) in 0.0590 seconds

HBase - Spark

Join – HBase and PostgreSQL

Cassandra Source

Connected to Test Cluster at cassandra:9042.

[cqlsh 5.0.1 | Cassandra 3.10 | CQL spec 3.4.4 | Native protocol v4]

Use HELP for help.

cqlsh> use mykeyspace;

cqlsh:mykeyspace> create table bonus_table (userid int primary key, bonus_amount decimal);

cqlsh:mykeyspace> insert into bonus_table (userid, bonus_amount) values (1, 500.00);

cqlsh:mykeyspace> insert into bonus_table (userid, bonus_amount) values (4, 1000.00);

cqlsh:mykeyspace> select * from bonus_table;

userid | bonus_amount

--------+--------------

1 | 500.00

4 | 1000.00

(2 rows)

Cassandra – Spark

Use SQL on DataFrame from Cassandra Source

Join – Hbase,PostgreSQL,Cassandra

JSON Source

{"dept":"F1"}

{"dept":"S2"}

JSON Code

Join – Hbase,PostgreSQL,Cassandra,JSON

SQL on Final Temp View

JDBC Write

Enriched Table in PostgreSQL

Data Federation with SparkDan Marshall

[email protected]

06/13/2017

data federation with apache spark

Technology