data science conference belgrade
TRANSCRIPT
![Page 2: Data Science Conference Belgrade](https://reader034.vdocuments.mx/reader034/viewer/2022051101/587002541a28ab427f8b537d/html5/thumbnails/2.jpg)
Agenda• Big Data• Hadoop• HDFS• Map Reduce• YARN• SPARK• Ekosistem
![Page 3: Data Science Conference Belgrade](https://reader034.vdocuments.mx/reader034/viewer/2022051101/587002541a28ab427f8b537d/html5/thumbnails/3.jpg)
Big Data• Big Data predstavlja podatke koji pristižu velikom brzinom i one su
količine koja prevazilazi mogućnosti tradicionalnog softvera za skladištenje, obradu i upravljanje podacima.
• Big Data je sve ono što ne može da stane u Excel.
![Page 4: Data Science Conference Belgrade](https://reader034.vdocuments.mx/reader034/viewer/2022051101/587002541a28ab427f8b537d/html5/thumbnails/4.jpg)
Big Data - Dimenzije
Kompleksnost podataka
Količina (Volume)
Raznovrsnost (Variety) Brzina (Velocity)
Kvalitet (Veracity)
![Page 5: Data Science Conference Belgrade](https://reader034.vdocuments.mx/reader034/viewer/2022051101/587002541a28ab427f8b537d/html5/thumbnails/5.jpg)
Big Data – Izvori podataka• Društvene mreže (Twitter, Facebook…)
• Email, HTML, Click Stream
• slike, video, logovi, senzorski podaci
• Relacione baze podataka
![Page 6: Data Science Conference Belgrade](https://reader034.vdocuments.mx/reader034/viewer/2022051101/587002541a28ab427f8b537d/html5/thumbnails/6.jpg)
![Page 7: Data Science Conference Belgrade](https://reader034.vdocuments.mx/reader034/viewer/2022051101/587002541a28ab427f8b537d/html5/thumbnails/7.jpg)
Big Data - Korisnici
![Page 8: Data Science Conference Belgrade](https://reader034.vdocuments.mx/reader034/viewer/2022051101/587002541a28ab427f8b537d/html5/thumbnails/8.jpg)
Hadoop• Hadoop je open-source softver Apache fondacije.
• Služi za skladištenje i procesiranje velikih količina podataka.
• Napisan je u Java programskom jeziku.
![Page 9: Data Science Conference Belgrade](https://reader034.vdocuments.mx/reader034/viewer/2022051101/587002541a28ab427f8b537d/html5/thumbnails/9.jpg)
Hadoop• Hadoop Common
• HDFS
• Map Reduce
• YARN
![Page 10: Data Science Conference Belgrade](https://reader034.vdocuments.mx/reader034/viewer/2022051101/587002541a28ab427f8b537d/html5/thumbnails/10.jpg)
Hadoop HDFS
![Page 11: Data Science Conference Belgrade](https://reader034.vdocuments.mx/reader034/viewer/2022051101/587002541a28ab427f8b537d/html5/thumbnails/11.jpg)
Hadoop HDFS
![Page 12: Data Science Conference Belgrade](https://reader034.vdocuments.mx/reader034/viewer/2022051101/587002541a28ab427f8b537d/html5/thumbnails/12.jpg)
Hadoop Map Reduce
HDFS
Data
MAP
MAP
MAP
REDUCE
REDUCE
Results
![Page 13: Data Science Conference Belgrade](https://reader034.vdocuments.mx/reader034/viewer/2022051101/587002541a28ab427f8b537d/html5/thumbnails/13.jpg)
Hadoop YARN• ResourceManager• Scheduler – Alokacija resursa• ApplicationsManager – Prihvatanje poslova …
• Nove aplikacije na Hadoop-u (Real Time, Interactive…)• Veća iskorišćenost resursa
![Page 14: Data Science Conference Belgrade](https://reader034.vdocuments.mx/reader034/viewer/2022051101/587002541a28ab427f8b537d/html5/thumbnails/14.jpg)
Spark• Apache Spark je platforma za Big Data obradu, sa ugrađenim
modulima za mašinsko učenje, SQL, streaming i graf obradu.• Obrada u memoriji.• 10x brži od Map Reduce-a.
![Page 15: Data Science Conference Belgrade](https://reader034.vdocuments.mx/reader034/viewer/2022051101/587002541a28ab427f8b537d/html5/thumbnails/15.jpg)
Hadoop• Hadoop nije zamena za RDBMS.• Hadoop nije baza podataka.• Offline analitika.• Jedan data centar.
![Page 16: Data Science Conference Belgrade](https://reader034.vdocuments.mx/reader034/viewer/2022051101/587002541a28ab427f8b537d/html5/thumbnails/16.jpg)
Hadoop Mane• Brzina• Kompleksnost• Podrška• Obrada u memoriji• Streaming
![Page 17: Data Science Conference Belgrade](https://reader034.vdocuments.mx/reader034/viewer/2022051101/587002541a28ab427f8b537d/html5/thumbnails/17.jpg)
Ekosistem• Hadoop je moguće nadogaraditi brojnim alatima kojima se
poboljšavaju mogućnosti i efikasnost obrade podataka.
• Dele se na alate za prenošenje podataka, analizu podataka, upravljanje klasterom…
![Page 18: Data Science Conference Belgrade](https://reader034.vdocuments.mx/reader034/viewer/2022051101/587002541a28ab427f8b537d/html5/thumbnails/18.jpg)
Ekosistem – Neki od alata• Unos podatak
• Flume• Kafka• Sqoop• …
• Obrada• Hive• Pig• Storm• …
• Upravljanje klasterom• Ambari• ..
![Page 19: Data Science Conference Belgrade](https://reader034.vdocuments.mx/reader034/viewer/2022051101/587002541a28ab427f8b537d/html5/thumbnails/19.jpg)
![Page 20: Data Science Conference Belgrade](https://reader034.vdocuments.mx/reader034/viewer/2022051101/587002541a28ab427f8b537d/html5/thumbnails/20.jpg)
Korisni linkovi• Hadoop Srbija• Hadoop• Hortonworks• Cloudera