data federation with apache spark
TRANSCRIPT
PostgreSQL Source
postgres=# create table pg_employee (emp_id int, emp_name varchar,
emp_title varchar, emp_hire_date date, emp_dept_id varchar);
postgres=# select * from pg_employee;
emp_id | emp_name | emp_title | emp_hire_date | emp_dept_id
--------+------------------+-----------+---------------+-------------
1 | Fred Flinstone | Quarryman | 2001-07-01 | M1
2 | Donald Duck | Fisherman | 2011-04-28 | F1
3 | Larry Fitzgerald | Receiver | 2005-11-12 | S2
4 | Randy Johnson | Pitcher | 2008-01-11 | S2
(4 rows)
PostgreSQL - Spark
PostgreSQL - Spark
HBase Source
hbase(main):004:0* create 'hb_dept','cf1'
=> Hbase::Table - hb_dept
hbase(main):008:0* put 'hb_dept','M1','cf1:dept_name','Maintenance'
hbase(main):009:0> put 'hb_dept','F1','cf1:dept_name','Entertainment'
hbase(main):010:0> put 'hb_dept','S2','cf1:dept_name','Sports'
hbase(main):012:0* scan 'hb_dept'
ROW COLUMN+CELL
F1 column=cf1:dept_name, timestamp=1496621309775,
value=Entertainment
M1 column=cf1:dept_name, timestamp=1496621309741, value=Maintenance
S2 column=cf1:dept_name, timestamp=1496621309863, value=Sports
3 row(s) in 0.0590 seconds
HBase - Spark
Join – HBase and PostgreSQL
Cassandra Source
Connected to Test Cluster at cassandra:9042.
[cqlsh 5.0.1 | Cassandra 3.10 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh> use mykeyspace;
cqlsh:mykeyspace> create table bonus_table (userid int primary key, bonus_amount decimal);
cqlsh:mykeyspace> insert into bonus_table (userid, bonus_amount) values (1, 500.00);
cqlsh:mykeyspace> insert into bonus_table (userid, bonus_amount) values (4, 1000.00);
cqlsh:mykeyspace> select * from bonus_table;
userid | bonus_amount
--------+--------------
1 | 500.00
4 | 1000.00
(2 rows)
Cassandra – Spark
Use SQL on DataFrame from Cassandra Source
Join – Hbase,PostgreSQL,Cassandra
JSON Source
{"dept":"F1"}
{"dept":"S2"}
JSON Code
Join – Hbase,PostgreSQL,Cassandra,JSON
SQL on Final Temp View
JDBC Write
Enriched Table in PostgreSQL