introduction to big data for (university) … data for system... · •graph: allegro, neo4j,...
TRANSCRIPT
![Page 1: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/1.jpg)
INTRODUCTION TO BIG DATA FOR (UNIVERSITY) SYSTEM ADMINISTRATOR
Asst. Prof. Natawut Nupairoj, Ph.D.Mobile Application and System Services Research GroupHead of DepartmentDepartment of Computing EngineeringChulalongkorn [email protected]
![Page 2: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/2.jpg)
![Page 3: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/3.jpg)
“ขอมลจะมความส าคญทางเศรษฐกจ เหมอนกบเงนและทอง” - World
Economic Forum
“ในป 2020, ขอมลในโลกทงหมดจะมขนาด 40ZB หรอ 5.2TB ตอคนหนงคน” – IDC
“มขอมลเพยง 3% เทานนทพรอมถกน าไปใชงาน และมเพยง 1 ใน 6 ของขอมลทพรอมถกน าไปใชงาน หรอ 0.5% ของขอมลทงหมด ทสามารถน าไปวเคราะหได” – IDC
B | KB | MB | GB | TB | PB | EB | ZB
![Page 4: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/4.jpg)
![Page 5: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/5.jpg)
![Page 6: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/6.jpg)
![Page 7: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/7.jpg)
![Page 8: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/8.jpg)
ลกษณะของ BIG DATA
Source: IBM
![Page 9: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/9.jpg)
ตวอยางเลกๆ BIG DATA ของมหาวทยาลย
![Page 10: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/10.jpg)
ขนาดการจดเกบส าหรบ 30 วน = 13,000,000 events (2.1TB)
![Page 11: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/11.jpg)
![Page 12: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/12.jpg)
MOBILE & DEVICES - COMPUTING EVERYWHERE
Thailand’s rate is 147% (smartphone = 49%)
Wearable devices’ shipment will be doubled in 4 years (from 72m in 2015 to 155m in 2019)
20% will be healthcare related devices
![Page 13: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/13.jpg)
Source: http://www.wareable.com/wearable-watchlist/50-best-wearable-tech
Whistle
![Page 14: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/14.jpg)
INTRODUCING FDA-APPROVED INGESTIBLE SENSORS IN PILLS
http://www.forbes.com/sites/singularity/2012/08/09/no-more-skipping-your-medicine-fda-approves-first-digital-pill/
![Page 15: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/15.jpg)
![Page 16: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/16.jpg)
![Page 17: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/17.jpg)
Behavioral trend tracking – customize fitness program setupFood intake tracking - visual recognize food intakeEnvironment factor tracking – modify fitness program recommendation
![Page 18: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/18.jpg)
Under Armour | Connected Life
![Page 19: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/19.jpg)
แนวทางการใชงาน BIG DATA
Bigger / Faster / More Up-to-Date Data Warehouse
Product Recommendation
Social Listening
Fraud Detection and Risk Management
Micro Customer Segmentation
Demand Sensing for Supply Chain
Precision Medicine
![Page 20: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/20.jpg)
แนวทางการใชงานในมหาวทยาลย
Storage ส าหรบการเกบขอมลขนาดใหญราคาถกจดเกบและการวเคราะห Log
Smart IDS
การวเคราะห User Experiences ของ Web Site / Mobile Site
การท า Crowdsourcing เกยวกบปญหาของ Wifi
การวเคราะหพฤตกรรมของการใช LMS และสอ Online ของนสตPrecision Education
![Page 21: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/21.jpg)
![Page 22: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/22.jpg)
Source: collegestats.org
![Page 23: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/23.jpg)
Source: collegestats.org
![Page 24: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/24.jpg)
![Page 25: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/25.jpg)
Data Storage
(Primitive) Big Data Architecture
Data Ingestion NoSQL
MapReduce
Data Visualization
VolumeVelocityVariety
Data Source
GatherFilterDeliver
Data Processing
![Page 26: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/26.jpg)
Opensource software framework โดยมแนวความคดจาก Google Search
Engine Architecture
เนนการใช Commodity Hardware
Map-Reduced ท าใหงายตอการเขยนโปรแกรมท างานบน Cluster โดยไมจ าเปนตองช านาญดาน Parallel Processing
ม Hadoop File System (HDFS) ในการจดเกบขอมลท reliable ในราคาไมแพง
ผใช: Yahoo!, Facebook, Amazon, eBay, American Airline, Apple, Google,
HP, IBM, Microsoft, Netflix, New York Times, ฯลฯ
![Page 27: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/27.jpg)
ตวอยางจากของจรง
500,000 บาท
Intel NUCintel Core i5 (4cores)RAM 16 Gb
24,500 บาท x 20 เครอง80 cores
RAM 320 Gb
World-Class Brand Serverintel XEON (Up to 18 cores)RAM 512Gb
![Page 28: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/28.jpg)
HARDWARE VS. SOFTWARE
Hardware: Reliable Software: easy
Hardware: VulnerableSoftware : ????
![Page 29: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/29.jpg)
ประเดนของ BIG DATA อยท I/OA B C
Config Single RAID-10 Parallel
จ านวน HD 1 8 16
ความเรว 100 MB/sec 800 MB/sec 1600 MB/sec
เวลาในการอาน 200GB 30 นาท 4 นาท 2 นาท
![Page 30: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/30.jpg)
หลกการท างานของ MAPREDUCE
1. ขอมลกระจายในเครองตาง2. MAP – ท าการประมวลผลในแตละเครองพรอมๆกน3. REDUCE - สรปผลกลบมาทเครองหลก
![Page 31: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/31.jpg)
ตวอยาง – WORD COUNT
นบความถของค าในหนงสอ
![Page 32: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/32.jpg)
WORD FREQ.: MAPREDUCE
With your data, please count.
Store a part of data. MapMap Map
Map Map
Reduce
![Page 33: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/33.jpg)
DISKS
อายการใชงานเฉลย 1,200,00
ชม.
ส าหรบ Disk 10,000 ลก จะมลกทเสย 1 ลกทกๆ 5 วน
Source: google
![Page 34: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/34.jpg)
HADOOP HDFS
Rackaware
3 copy
![Page 35: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/35.jpg)
การท างานของ HADOOP
![Page 36: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/36.jpg)
HADOOP ARCHITECTURE
![Page 37: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/37.jpg)
ระบบงานประมวลผลโดยใชหนวยความจ าเปนหลก (In-Memory Data
Processing) ของ UC Berkeley
ขยาย MapReduce ใหรองรบ batch executions, interactive queries, และstream processing
รองรบหลายภาษา ทง Java, Python, Scala, และ R และม analytic libraries
(machine learning, graph processing)
ไดรบความรวมมอในการพฒนา และการสนบสนนจากคนทวโลกเรวกวา Hadoop 10-100 เทา
![Page 38: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/38.jpg)
ประสทธภาพของ SPARK
![Page 39: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/39.jpg)
NOSQL – NOT ONLY SQL
เปนทางเลอกในการเกบขอมลขนาดใหญ โครงสรางซบซอน โดยเปนระบบกระจาย ทท างานแบบ Non-relational
และรองรบการ Scale-Out
• Column: Accumulo, Cassandra, HBase
• Document: Apache CouchDB, Couchbase, MongoDB
• Search Engine: ElasticSearch, Solr
• Key-value: CouchDB, Dynamo, MemcacheDB, Redis
• Graph: Allegro, Neo4J, InfiniteGraph, OrientDB
![Page 40: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/40.jpg)
SELECT array_agg(players), player_teamsFROM (SELECT DISTINCT t1.t1player AS players, t1.player_teamsFROM (
SELECTp.playerid AS t1id,concat(p.playerid,':', p.playername, ' ') AS t1player,array_agg(pl.teamid ORDER BY pl.teamid) AS player_teams
FROM player pLEFT JOIN plays pl ON p.playerid = pl.playeridGROUP BY p.playerid, p.playername
) t1INNER JOIN (SELECT
p.playerid AS t2id,array_agg(pl.teamid ORDER BY pl.teamid) AS player_teams
FROM player pLEFT JOIN plays pl ON p.playerid = pl.playeridGROUP BY p.playerid, p.playername
) t2 ON t1.player_teams=t2.player_teams AND t1.t1id <> t2.t2id) innerQueryGROUP BY player_teams
![Page 41: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/41.jpg)
CAP THEOREM (BREWER’S THEOREM)
โดย Eric Brewer (University of California, Berkeley)
ระบบกระจายใดๆ ทม server หลายเครอง จะไมสามารถมคณสมบตตอไปนทง 3 อยางพรอมกน
• Consistency: ทกเครองมขอมลเหมอนกนตลอดเวลา• Availability: ทกการรองขอในการจดการขอมลจาก Client จะไดรบการตอบกลบ ไมวาจะส าเรจหรอไม
• Partition tolerance: ระบบสามารถท างานตอไปได แมเครอง server ไมสามารถสงขอมลระหวางกนได
![Page 42: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/42.jpg)
CAP - NORMAL OPERATION – C+A
Source: http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
![Page 43: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/43.jpg)
CAP - NETWORK PARTITION – ไดแค A เทานน
Source: http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
![Page 44: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/44.jpg)
CAP THEOREM AND NOSQL
Source: http://blog.flux7.com/blogs/nosql/cap-theorem-why-does-it-matter
![Page 45: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/45.jpg)
Source: http://db-engines.com/en/ranking
![Page 46: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/46.jpg)
ตวอยาง NOSQL - MONGODB
Document-Oriented NoSQL database
BSON store (binary-format JSON)
Databases – Collections - Documents
รองรบหลาย Schema ในเวลาเดยวกน = Document ใน Collection เดยวกนสามารถมโครงสราง (ฟลด) ตางกนไดใช JavaScript เปนภาษาหลกในการเขาถงขอมล และม Driver ส าหรบภาษาอนๆเชน Java และ Python
รองรบ load-balancing และ replication
![Page 47: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/47.jpg)
{"firstName": "John","lastName": "Smith","isAlive": true,"age": 25,"height_cm": 167.6,"address": {
"streetAddress": "21 2nd Street","city": "New York","state": "NY","postalCode": "10021-3100"
},"phoneNumbers": [
{ "type": "home","number": "212 555-1234"
},{ "type": "office","number": "646 555-4567"
}],
}
![Page 48: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/48.jpg)
PREDICTIVE ANALYTICSเปนเครองมอใหกบ Data Scientist ในการวเคราะหหารปแบบของขอมลในอดต เพอใชท านายอนาคตมเทคนคหลายรปแบบทง statistics,
modeling, machine learning, data
mining, time series analysis, deep
learning, text analytics, image
processing, location analytics,
ฯลฯ
![Page 49: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/49.jpg)
ประเภทของ DATA ANALYTICS
![Page 50: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/50.jpg)
BIG DATA ARCHITECTURE ในการท างานจรง
Data Source
Data Source
Data Source
Data Source
Data Ingestion
Fast Data Path
Big Data Path
Data Stream Processors
Data Lake (Landing Zone)
Data Refinery / Data Analytics
Data Visualization
Traditional Data Warehouse / Reporting tools
![Page 51: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/51.jpg)
เทยบ BIG DATA ARCHITECTURE ในการกบระบบ LOG
(GRAYLOG)
Data Source
Data Source
Data Source
Data Source
Data Ingestion
Fast Data Path
Big Data Path
Data Stream Processors
Data Lake (Landing Zone)
Data Refinery / Data Analytics
Data Visualization
KafkaElasticSearch
Graylog-WebGraylog-Event
![Page 52: INTRODUCTION TO BIG DATA FOR (UNIVERSITY) … Data for System... · •Graph: Allegro, Neo4J, InfiniteGraph, OrientDB. SELECT array_agg(players), player_teams FROM (SELECT DISTINCT](https://reader033.vdocuments.mx/reader033/viewer/2022052708/5a71697d7f8b9aa7538cd2ca/html5/thumbnails/52.jpg)