infinispan, transactional key value data grid and nosql database

Infinispan, transactional key-value DataGrid and NoSQL database

11. April 2013 Alexander Petrov

Alexander PetrovAlexander Petrov

• Sr. Consultant at Inmeta Consulting

• Current project: Skattetaten Grid POC

• Previous projects involving grid technologies:

• Mattilsynet food authority system.

• FrameSolution BPM framework used in Lovisa National Court Authority(Norway), Mattilsynet Food Authority

• Other noteworthy projects

• Coca Cola Basis ERP system – Coca Cola Bottler factories

• mPower Mobilitec 300 million subscribers worldwide, and delivers over 500,000 pieces of content every day.

Usage scenariosUsage scenarios

• Big data, Databases are slow. Memory is FAST!

• Provides huge computing power.

• Tax calculation

• Financial organizations

• Government organizations use it for communication and data sharing between the different departments.

• Scientific computations

• MMORPG games

AgendaAgenda

• General terminology relevant to Distributed Caching

• Challenges related to introducing distributed caching to existing system

• Metrics and tuning

Distributed Caching - ConceptsDistributed Caching - Concepts

• Cache JSR – 107

• Java Data Grid JSR - 347

• In memory Data Grid

• Cluster

• Distribution

• Node – a member of a cluster

• Transaction awareness

• Colocation

• Map / Reduce

• Consistency

Real World Use CaseReal World Use Case

Typical J2EE backendTypical J2EE backend

Data accessData access

• Transaction scope

• Locking\deadlocking

• Flushing policies

• Mixing the technology

stack.

• Performance

Legacy CacheLegacy Cache

Our end goalOur end goal

• Wow we did it!

SummarySummary

• Our Custom cache is super fast, but its cache hit ratio is rather low.

• Our custom cache has a tendency of getting dirty as the updates to the shared data can not be propagated. At the same time the separation of the data regions is not full.

• Marshaling is a rather slow and heavy process.

• We are facing a technological cocktail and we need to keep integrity.

ReplicationReplication

• Write through

• Write Behind

• Replication Queue

InvalidationInvalidation

DistributionDistribution

More terminologyMore terminology

• Eviction

• Least Recently Used

• First In First Out

• LIRS

• Custom

• Expiration

• Invalidation

Caching topologies – Mirrored CacheCaching topologies – Mirrored Cache

• Ref. Data vs Transactional

• Reference data: Good.

Max 30000 reads/sec 1k size

• Transactional data: Good.

Max 25000 writes/sec 1k size

.

Caching topologies – Replica CacheCaching topologies – Replica Cache


30000 reads/sec per server.

Grow linearly by adding servers.

• Transactional data: Not so

good. Max 20000writes/second.

Drops if you add 3rd server to

2500.

Caching topologies – Partitioned CacheCaching topologies – Partitioned Cache

• Ref. Data vs Transactional


Max 30000 reads/sec 1k size

• Transactional data: Good.

Max 25000 writes/sec 1k size

Caching topologies - Partitioned ReplicaCaching topologies - Partitioned Replica

• Reference data(1kb):Good.

30000 reads/sec per server.


• Transactional data(1kb):Good.

20000 writes/sec per server.


How to define our topologyHow to define our topology

• What is the size of our cluster? Reads vs. Writes

• Communication inside our grid

• UDP,TCP

• Synchronous vs. Asynchronous.

• What about the transaction isolation?

• Repeatable Reads vs. Read Committed

• What is the nature of our application?

• Read intensive data

• CMS systems

• Write Intensive Data

• Document Management System

Level 1 Cache / Near CacheLevel 1 Cache / Near Cache

• Level1 cache is

Supported only for

Distribution mode

• Level 1 cache might

have a performance

Impact in certain

systems

Cache stores and loadersCache stores and loaders

• Passivation

• Activation

• Hibernate

Transactions, Isolation and LockingTransactions, Isolation and Locking

• Long running transactions need to be avoided.

• What is a long running transaction? How long is actually long.

• Read Committed vs Repeatable Reads

Classic Deadlock situationClassic Deadlock situation

begin Update(A) Update(B) Update(C) Update(B)

Begin Update(C) Update(B) Release(A) Lock(A)

TX1 (Wants update A,B,C)

TX2 (Wants to update C,B,A)

C is locked by TX2

A is locked by TX1

Repeatable ReadRepeatable Read

begin get(k) - - Get(k)

Begin Get(k) put(k, v2) commit

What is returned??

TX1

TX2

Cache statisticsCache statistics

Remoting statisticsRemoting statistics

Locking statisticsLocking statistics

Marshaling dataMarshaling data

• Java serialization

• Java externalization

• Impact on performance

• Generic domain.

Real World Use CaseReal World Use Case

Data accessData access

• Transaction scope

• Locking\deadlocking

• Flushing policies

• Mixing the technology

stack.

• Performance

Our end goalOur end goal

• Wow we did it!

The EndThe End

• Thank you for your attention

Used sourcesUsed sources

http://www.alachisoft.com/ncache/caching-topology.html

http://www.infoq.com/news/2011/10/java-data-grid

https://github.com/datagrids/spec/wiki

http://www.jboss.org/infinispan/documentation

http://code.google.com/p/thrift-protobuf-compare/wiki/Benchmarking











infinispan, transactional key value data grid and nosql database

Technology

transactional

reference

transactional

custom cache

grow linearly

read committed

adding servers

technology