tatyana matvienko,senior java developer, big data storages
TRANSCRIPT
Big Data Storages
Agenda[Big]Data Source: when it becomes Big?What cluster is? Horizontal and vertical scaling[Big]Data Storage challengesDisadvantagesNoSQL = Not only SQLMost popular and trendyTech Example: Apache Cassandra architectureDemo
Big Data Storage ConceptsOnly stores facts (events), doesn’t analyze itImmutableTime series data (based on timestamps and, maybe,
origin)Store everything, delete nothing
Where: Messages (email, twitter), social networks, Sensor data (IoT), Log files, Locations
Cluster. Horizontal and vertical scalingWhat cluster is?Load balancerCommunication:
master/slave architecture
Fault tolerance and replication factor
Size (keep and search huge amount of data)
Speed (data acquisition, data search)
Availability (fault tolerance, partition tolerance)
Big Data Storage Challenges
Disadvantages of Big Data Storages
No transactions (ACID)Less matureBig variety of concepts, lack of standardizationNo BI or analytics in queriesAdministration
Distributed File storage
Amazon
Storages: Key-Value
Examples: Redis, DynamoDB, MemcacheDB, Riak KV, Aerospike, OrientDB
Storages: Document oriented
Examples: Apache CouchDB, Couchbase, MongoDB
Storages: Graphs
Examples: Allegro, Neo4J, OrientDB, Titan
Storages: Column basedExamples: Cassandra, HBase, Accumulo, Vertica
Why Cassandra?
Apache Cassandra: basicsMasterless architecture with read/write anywhere design
All nodes are the same
No single point of failure
Zone support
Linear scalability
CQL - cassandra query language
Availability and Partition Tolerance but Eventual Consistency
Partitioning and Replication
Data modeling
Demo