![Page 1: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability](https://reader035.vdocuments.mx/reader035/viewer/2022062919/5ee0cca5ad6a402d666be750/html5/thumbnails/1.jpg)
Data Acquisition
Axel NgongaLead Data AcquisitionBIG Data PPFhttp://big-project.eu
![Page 2: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability](https://reader035.vdocuments.mx/reader035/viewer/2022062919/5ee0cca5ad6a402d666be750/html5/thumbnails/2.jpg)
Motivation
● Increasing amout of data○ 4K new pictures on Instagram○ 100K tweets○ 800K new pieces of content on Facebook○ …
![Page 3: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability](https://reader035.vdocuments.mx/reader035/viewer/2022062919/5ee0cca5ad6a402d666be750/html5/thumbnails/3.jpg)
Motivation
![Page 4: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability](https://reader035.vdocuments.mx/reader035/viewer/2022062919/5ee0cca5ad6a402d666be750/html5/thumbnails/4.jpg)
Motivation
● Big data technologies for ○ Improved business intelligence○ Secure decisions○ Customized services○ …
● Use Cases○ Mission planning○ Trade market○ Customized services○ Criminality prediction○ ...
![Page 5: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability](https://reader035.vdocuments.mx/reader035/viewer/2022062919/5ee0cca5ad6a402d666be750/html5/thumbnails/5.jpg)
Definition
● Data acquisition stands for ○ Selecting of data sources○ Collection of information from these sources ○ Filtering and cleaning data
![Page 6: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability](https://reader035.vdocuments.mx/reader035/viewer/2022062919/5ee0cca5ad6a402d666be750/html5/thumbnails/6.jpg)
Overview
DS
DS
DS
DS
Processing(cleaning,
classification)Storage
![Page 7: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability](https://reader035.vdocuments.mx/reader035/viewer/2022062919/5ee0cca5ad6a402d666be750/html5/thumbnails/7.jpg)
More than 3 Vs
● The 9(?) Vs of Big Data Acquisition○ Volume○ Velocity○ Variety○ Vocabulary○ Variability (security models, ownership)○ Veracity (trustworthiness of data)○ Visibility (integrated view of data)○ Value (worth of data for data consumer)○ Visualization
![Page 8: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability](https://reader035.vdocuments.mx/reader035/viewer/2022062919/5ee0cca5ad6a402d666be750/html5/thumbnails/8.jpg)
Requirements
● Extensibility of protocols● High scalability of approaches● Low memory consumption● Parallelism● Elasticity● Fast ROI● High throughput (real-time)
![Page 9: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability](https://reader035.vdocuments.mx/reader035/viewer/2022062919/5ee0cca5ad6a402d666be750/html5/thumbnails/9.jpg)
Technology Overview
● Gathering○ Advanced Message Queuing Protocol
■ Wire-level protocol■ OASIS Standard since Oct. 2012■ Large number of implementations incl.
RabbitMQ, SwiftMQ, Apache ActiveMQ, Windows Azure Service Bus
○ JMS 2.0○ Kestrel (Memcached)○ Apache Kafka○ Apache Flume (log data)○ FB Scribe (log data)
![Page 10: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability](https://reader035.vdocuments.mx/reader035/viewer/2022062919/5ee0cca5ad6a402d666be750/html5/thumbnails/10.jpg)
Technology Overview
● Processing○ Facebook Scribe (Aggregation)○ Twitter Storm (Stream Data Processing, Analysis)○ MOA (Massive Online Analysis, esp. classification)○ Hadoop (Distributed Processing)○ InfoSphere Streams (Analysis)
![Page 11: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability](https://reader035.vdocuments.mx/reader035/viewer/2022062919/5ee0cca5ad6a402d666be750/html5/thumbnails/11.jpg)
Technology Overview
● Storage○ MongoDB (BSON)○ Apache CouchDB (JSON)○ Neo4J (Graph DB)○ Oracle NoSQL○ IBM DB2 NoSQL
● Holistic Frameworks○ Oracle's Big Data Suite○ IBM's Big Data Suite○ Karmasphere
![Page 12: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability](https://reader035.vdocuments.mx/reader035/viewer/2022062919/5ee0cca5ad6a402d666be750/html5/thumbnails/12.jpg)
Tool Matrix
![Page 13: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability](https://reader035.vdocuments.mx/reader035/viewer/2022062919/5ee0cca5ad6a402d666be750/html5/thumbnails/13.jpg)
Simple Recipe
1. Which of the 9Vs are important for me?2. What are my sources?
○ Protocols○ Velocity○ Type of data (logs, XML, …)○ ...
3. What’s my current storage architecture?○ NoSQL?○ Distributed?
![Page 14: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability](https://reader035.vdocuments.mx/reader035/viewer/2022062919/5ee0cca5ad6a402d666be750/html5/thumbnails/14.jpg)
Thank You!Questions?
Axel NgongaUniversity of Leipzig
AKSW Research [email protected]
http://aksw.org/AxelNgongahttp://big-project.eu
![Page 15: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability](https://reader035.vdocuments.mx/reader035/viewer/2022062919/5ee0cca5ad6a402d666be750/html5/thumbnails/15.jpg)
![Page 16: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability](https://reader035.vdocuments.mx/reader035/viewer/2022062919/5ee0cca5ad6a402d666be750/html5/thumbnails/16.jpg)
Questionnaire