infinispan, transactional key value data grid and nosql database
DESCRIPTION
Infinispan use case presentation made for BAKSIA meetup group.TRANSCRIPT
Infinispan, transactional key-value DataGrid and NoSQL database
11. April 2013 Alexander Petrov
Alexander PetrovAlexander Petrov
• Sr. Consultant at Inmeta Consulting
• Current project: Skattetaten Grid POC
• Previous projects involving grid technologies:
• Mattilsynet food authority system.
• FrameSolution BPM framework used in Lovisa National Court Authority(Norway), Mattilsynet Food Authority
• Other noteworthy projects
• Coca Cola Basis ERP system – Coca Cola Bottler factories
• mPower Mobilitec 300 million subscribers worldwide, and delivers over 500,000 pieces of content every day.
Usage scenariosUsage scenarios
• Big data, Databases are slow. Memory is FAST!
• Provides huge computing power.
• Tax calculation
• Financial organizations
• Government organizations use it for communication and data sharing between the different departments.
• Scientific computations
• MMORPG games
AgendaAgenda
• General terminology relevant to Distributed Caching
• Challenges related to introducing distributed caching to existing system
• Metrics and tuning
Distributed Caching - ConceptsDistributed Caching - Concepts
• Cache JSR – 107
• Java Data Grid JSR - 347
• In memory Data Grid
• Cluster
• Distribution
• Node – a member of a cluster
• Transaction awareness
• Colocation
• Map / Reduce
• Consistency
Real World Use CaseReal World Use Case
Typical J2EE backendTypical J2EE backend
Data accessData access
• Transaction scope
• Locking\deadlocking
• Flushing policies
• Mixing the technology
stack.
• Performance
Legacy CacheLegacy Cache
Our end goalOur end goal
• Wow we did it!
SummarySummary
• Our Custom cache is super fast, but its cache hit ratio is rather low.
• Our custom cache has a tendency of getting dirty as the updates to the shared data can not be propagated. At the same time the separation of the data regions is not full.
• Marshaling is a rather slow and heavy process.
• We are facing a technological cocktail and we need to keep integrity.
ReplicationReplication
• Write through
• Write Behind
• Replication Queue
InvalidationInvalidation
DistributionDistribution
More terminologyMore terminology
• Eviction
• Least Recently Used
• First In First Out
• LIRS
• Custom
• Expiration
• Invalidation
Caching topologies – Mirrored CacheCaching topologies – Mirrored Cache
• Ref. Data vs Transactional
• Reference data: Good.
Max 30000 reads/sec 1k size
• Transactional data: Good.
Max 25000 writes/sec 1k size
.
Caching topologies – Replica CacheCaching topologies – Replica Cache
• Reference data: Good.
30000 reads/sec per server.
Grow linearly by adding servers.
• Transactional data: Not so
good. Max 20000writes/second.
Drops if you add 3rd server to
2500.
Caching topologies – Partitioned CacheCaching topologies – Partitioned Cache
• Ref. Data vs Transactional
• Reference data: Good.
Max 30000 reads/sec 1k size
• Transactional data: Good.
Max 25000 writes/sec 1k size
Caching topologies - Partitioned ReplicaCaching topologies - Partitioned Replica
• Reference data(1kb):Good.
30000 reads/sec per server.
Grow linearly by adding servers.
• Transactional data(1kb):Good.
20000 writes/sec per server.
Grow linearly by adding servers.
How to define our topologyHow to define our topology
• What is the size of our cluster? Reads vs. Writes
• Communication inside our grid
• UDP,TCP
• Synchronous vs. Asynchronous.
• What about the transaction isolation?
• Repeatable Reads vs. Read Committed
• What is the nature of our application?
• Read intensive data
• CMS systems
• Write Intensive Data
• Document Management System
Level 1 Cache / Near CacheLevel 1 Cache / Near Cache
• Level1 cache is
Supported only for
Distribution mode
• Level 1 cache might
have a performance
Impact in certain
systems
Cache stores and loadersCache stores and loaders
• Passivation
• Activation
• Hibernate
Transactions, Isolation and LockingTransactions, Isolation and Locking
• Long running transactions need to be avoided.
• What is a long running transaction? How long is actually long.
• Read Committed vs Repeatable Reads
Classic Deadlock situationClassic Deadlock situation
begin Update(A) Update(B) Update(C) Update(B)
Begin Update(C) Update(B) Release(A) Lock(A)
TX1 (Wants update A,B,C)
TX2 (Wants to update C,B,A)
C is locked by TX2
A is locked by TX1
Repeatable ReadRepeatable Read
begin get(k) - - Get(k)
Begin Get(k) put(k, v2) commit
What is returned??
TX1
TX2
Cache statisticsCache statistics
Remoting statisticsRemoting statistics
Locking statisticsLocking statistics
Marshaling dataMarshaling data
• Java serialization
• Java externalization
• Impact on performance
• Generic domain.
Real World Use CaseReal World Use Case
Data accessData access
• Transaction scope
• Locking\deadlocking
• Flushing policies
• Mixing the technology
stack.
• Performance
Our end goalOur end goal
• Wow we did it!
The EndThe End
• Thank you for your attention
Used sourcesUsed sources
http://www.alachisoft.com/ncache/caching-topology.html
http://www.infoq.com/news/2011/10/java-data-grid
https://github.com/datagrids/spec/wiki
http://www.jboss.org/infinispan/documentation
http://code.google.com/p/thrift-protobuf-compare/wiki/Benchmarking