dxdt

15
dX/dT Tracking Amateur Radio Activity Insight Data Engineering 2015A Matthew Ho

Upload: hmat22

Post on 10-Aug-2015

228 views

Category:

Data & Analytics


1 download

TRANSCRIPT

dX/dT Tracking Amateur Radio Activity

Insight Data Engineering 2015A Matthew Ho

DXing

•  DXing – establishing contact with distant stations

•  Sometimes as far as half a world away

•  Takes skill to identify favorable propagation conditions and tune radio parameters

One Day…

CQ CQ CQ This is J28NC…

A Possible Solution

•  Reverse Beacon Network: a network of stations around the world that listens to and logs amateur radio traffic •  One data file per day

•  Need to comb through thousands of files to find records corresponding to that callsign

Demo

Data

•  Callsign: Station ID

•  ~22.5 GB of data in CSV format over 4 years (2011-2014)

de   de_pfx  de_cont   freq   band   dx   dx_pfx  dx_cont  mode  db   date   speed  tx_mode  

OL5Q   OK   EU   10106.6  30m  9M2RS   9M2   AS   CQ   3  1/12/12  0:00   27   CW  

W0MU   K   NA   7021   40m  F8IQS   F   EU   CQ   8  1/12/12  0:00   21   CW  

DE station Callsign DX Station Callsign Date

DX station Callsign: OL5Q

DE station Callsign: 9M2RS

Reverse Beacon

Pipeline

QRZ (lat/long)

Reverse Beacon

Pipeline

QRZ (lat/long)

Hive vs Pig

•  SQL-like

•  Ad hoc queries

•  Computing aggregate functions *

•  Large table à Small table

•  Pig Latin scripts

•  Cleaning and “reshuffling” data in multiple steps •  FLATTEN operator was

very useful

•  Large table à Large table

* Counting the number of DX logs for North America in 1/2014 took ~48s in Hive vs 90 seconds in Pig

Candidate HBase Schema (Spots Lookup)

DX/DE  

1A0AR  

1A0CA  

1A0CF  

2B89C  

K8AZ   LA6TPA   EA4TX   XW4TB   WQ6G4   RT45V   WX3B9   MP49C  

{  {DATE:  2014-­‐01-­‐01,  dB:  8,  band:  80m,  tx_mode:  CW},  {DATE:  2014-­‐01-­‐02,  dB:  9,  band:  40m,  tx_mode:  CW},  …  }  

{  {DATE:  2014-­‐08-­‐31,  dB:  8,  band:  80m,  tx_mode:  BPSK},  {DATE:  2014-­‐11-­‐11,  dB:  15,  band:  20m,  tx_mode:  CW},  …  }  

•  Indeterminate cell size •  More downstream processing needed to extract metadata of interest

Final HBase Schema (Spots Lookup)

dB   band   freq   tx_mode   …  

1A0AR_K8AZ_2014-­‐12-­‐29  06:53:22   13   80m   3510.0   CW   …  

1A0AR_LA6TPA_2014-­‐12-­‐31  04:09:38   5   80m   3520.0   CW   …  

1A0CA_EA4TX_2014-­‐12-­‐31  03:34:22   3   80m   3520.0   CW   …  

•  Composite keys are powerful – keys can contain “values” as well

•  Amount of post-processing needed after grabbing data from Hbase is minimized

•  Additional columns can easily be added to any row

DX DE Timestamp

Takeaways

•  Schema design should follow directly from the queries of interest

•  Small changes in the queries you want to support can result in significant changes in schema design

About Me

•  BS, MS, Electrical Engineering, Stanford •  Dynamic Systems and Optimization

•  MIT Lincoln Laboratory •  Advanced Sensor Techniques Group •  Signal Processing, Wireless Communications

•  Contact: •  KK6RHJ •  [email protected] •  https://github.com/typicalset

Data

HBase Schema (Trends)

201101   201102   201103   201104   …  

AS   5   4   6   9   …  

NA   43   47   41   40   …  

EU   55   54   47   61   …  

OC   3   4   3   5   …  

Count  

20110101   108k  

20110102   106k  

20110103   80k  

20110104   68k  

…   …