dxdt
TRANSCRIPT
DXing
• DXing – establishing contact with distant stations
• Sometimes as far as half a world away
• Takes skill to identify favorable propagation conditions and tune radio parameters
A Possible Solution
• Reverse Beacon Network: a network of stations around the world that listens to and logs amateur radio traffic • One data file per day
• Need to comb through thousands of files to find records corresponding to that callsign
Data
• Callsign: Station ID
• ~22.5 GB of data in CSV format over 4 years (2011-2014)
de de_pfx de_cont freq band dx dx_pfx dx_cont mode db date speed tx_mode
OL5Q OK EU 10106.6 30m 9M2RS 9M2 AS CQ 3 1/12/12 0:00 27 CW
W0MU K NA 7021 40m F8IQS F EU CQ 8 1/12/12 0:00 21 CW
DE station Callsign DX Station Callsign Date
DX station Callsign: OL5Q
DE station Callsign: 9M2RS
Hive vs Pig
• SQL-like
• Ad hoc queries
• Computing aggregate functions *
• Large table à Small table
• Pig Latin scripts
• Cleaning and “reshuffling” data in multiple steps • FLATTEN operator was
very useful
• Large table à Large table
* Counting the number of DX logs for North America in 1/2014 took ~48s in Hive vs 90 seconds in Pig
Candidate HBase Schema (Spots Lookup)
DX/DE
1A0AR
1A0CA
1A0CF
2B89C
K8AZ LA6TPA EA4TX XW4TB WQ6G4 RT45V WX3B9 MP49C
{ {DATE: 2014-‐01-‐01, dB: 8, band: 80m, tx_mode: CW}, {DATE: 2014-‐01-‐02, dB: 9, band: 40m, tx_mode: CW}, … }
{ {DATE: 2014-‐08-‐31, dB: 8, band: 80m, tx_mode: BPSK}, {DATE: 2014-‐11-‐11, dB: 15, band: 20m, tx_mode: CW}, … }
• Indeterminate cell size • More downstream processing needed to extract metadata of interest
Final HBase Schema (Spots Lookup)
dB band freq tx_mode …
1A0AR_K8AZ_2014-‐12-‐29 06:53:22 13 80m 3510.0 CW …
1A0AR_LA6TPA_2014-‐12-‐31 04:09:38 5 80m 3520.0 CW …
1A0CA_EA4TX_2014-‐12-‐31 03:34:22 3 80m 3520.0 CW …
• Composite keys are powerful – keys can contain “values” as well
• Amount of post-processing needed after grabbing data from Hbase is minimized
• Additional columns can easily be added to any row
DX DE Timestamp
Takeaways
• Schema design should follow directly from the queries of interest
• Small changes in the queries you want to support can result in significant changes in schema design
About Me
• BS, MS, Electrical Engineering, Stanford • Dynamic Systems and Optimization
• MIT Lincoln Laboratory • Advanced Sensor Techniques Group • Signal Processing, Wireless Communications
• Contact: • KK6RHJ • [email protected] • https://github.com/typicalset