everything you wanted to know about velocity (but were afraid to … · 2010. 4. 12. · everything...

Everything you wanted to know about

Velocity

(but were afraid to cache)(but were afraid to cache)

Scott Colestock

[email protected]

Marcato Partners, LLC

What is it?

Velocity is a distributed in-memory key/value cache that provides .NET developers with a way to increase

performance and scalability when writing data-centric applications.

What is it? (2)

• The combined RAM available to all servers in a

Velocity cluster is presented to Velocity clients

as a unified whole

• Any serializable CLR object can be stored• Any serializable CLR object can be stored

– Actual location within cluster is transparent

– Client is a simple key/value API at heart

• Run as a service accessed across the network

• Additional servers can be added on demand

What we’ll cover

• What motivates this product/technology

• Terms / Pictures / Concepts

• Deploy / Install Process

• A lap around the API & Admin model• A lap around the API & Admin model

• Demos

• Gotchyas

Motivation

• Data-centric applications have been the norm for a long while– Relational data

– More recently, “service-obtained” data

• Velocity is about increasing performance by bringing the data physically closer to the consumer

• Velocity is about increasing performance by bringing the data physically closer to the consumer– Reduce pressure on underlying data stores/services

• Velocity can be about storing data in value-added form (logically closer to the consumer)– Object graphs

– Output caching (not explicit in V1)

– Aggregated data in xml or other transformed formats

Motivation (2)

• Databases are always a point of high contention

as you scale out, and tuning is expensive

– Are your data retrieval sprocs getting harder to

maintain - excessive sql chops required?maintain - excessive sql chops required?

• Service calls for reference data (internal/external)

are often slow or intentionally throttled

• Caching has always been considered a solution

for these issues…

Motivation (3)

• Machine-local caching solutions (like Microsoft’s “Enterprise Library Caching Application Block”) can provide partial answer– Easy key/value API

– Flexible store (memory, disk-backed, etc.)

– Flexible expiration and eviction policy– Flexible expiration and eviction policy

• Limitations:– Limited by the memory available to a single node…

– Application recycles typically mean you lose the cache

– In a load-balanced environment, a large data set means you will frequently “miss” when attempting to load from cache…

Motivation (4)

Key 3,5,23

Machine-local caches wind

up being sparsely populated

when used with a load

balancer (if the data set has

many keys)

Load Balancer

Key 7,11,47

Key 12,16,33

Motivation (5)

• Without a distributed cache, you have no central place to update/delete

• This means you can only cache data that can afford to be stale by some time period

– If the time period is short, you need a low TTL (time-to-– If the time period is short, you need a low TTL (time-to-live, aka expiration) which means more cache misses

• You can’t cache data that must have changes visible to the system in (near) real time

• With a distributed cache, you have one cache to shoot in the event of an update/delete

– Might be able to live with no expiration

What we’ll cover





• Demos

• Gotchyas

Windows Server AppFabric Caching

• History: AppFabric caching was a separate component

– Public debut at TechEd 2008 (earlier?)

– Codename: Velocity– Codename: Velocity

• “Dublin” was a separate effort, focused on providing a hosting and management environment around WCF/WF

• November 2009: Technologies grouped under heading of “Windows Server AppFabric”

Relationship to Windows Azure

AppFabric• Service bus: Handle communication and authentication

for accessing applications– Expose apps through firewalls, NAT gateways, etc.

– Assist cloud-based apps talking to on-premise apps

– Other composite app scenarios; pub/sub

• Access Control Service: Allow you to avoid setting up • Access Control Service: Allow you to avoid setting up federated identity agreements just to grant partner/customer access to your cloud-based or on-premise apps.

•Today: Only common

marketing/branding with Windows

Server AppFabric.

•Later: Common services for both

Cache-Aside Pattern

• In the current version, the out-of-box support

is for the “cache-aside” pattern.

– Check cache

– If miss, retrieve data, then populate the cache– If miss, retrieve data, then populate the cache

• Lots of other patterns you might contemplate

(and simulate) with what is provided

– Read-through/Write-through

– Refresh-ahead/Write-behind

Cache-Aside Pattern

Cache Cluster

Logical Hierarchy

Server A

Cache Host A

Server B

Cache Host B

Server C

Cache Host C

Client apps work with a

single logical unit of cache

Regions can

be implicit

or explicit.

Use explicit

only for

Named Cache: Product Catalog

Default Cache

Region: Sports

Region 1 Region 3

Server process is

DistributedCacheService.exe

Caches

explicitly

created

with TTL,

expiration,

HA policy

Regions represent a partition of

data (subset of key/value pairs).

Live on one node. Unit of

replication/failover.

only for

bulk gets or

searching.

Logical Hierarchy

ID (Key) Payload

(Value)

Tags/VersionInfo

1 Foo …

2 Bar …

3 Baz …


Default Cache

Region: Sports

Region 1

Cache Cluster

Physical Layout

Web Server A

IIS 7.x

Web Server B

IIS 7.xLoad

Balancer

Cache Server A

Cache Host

Cache Server B

• Cache servers designed to run in a domain

• Caches can have access control applied…

• Consider the nature of data stored in cache, and secure appropriately (don’t let cache be weakest link)

IIS 7.x

Web Server C

IIS 7.x

BalancerCache Host

Cache Server C

Cache Host

Combined Deployment

Web Server A

IIS 7.x

Web Server B

Cache Host

Web Server B

IIS 7.x

Web Server C

IIS 7.x

Load

Balancer Cache Host

Cache Host

Physical LayoutCache Cluster

Web Server A

IIS 7.x

Web Server B

IIS 7.xLoad

Balancer

Cache Server A

Cache Host

Cache Server B

Cache Host

Config

Store

(File share or

Sql Server)

• Configuration store contains cache policies and global partition map (how keys divide into regions, which servers have which regions)

• If Sql config store, servers will send heartbeat to Sql. Otherwise, heartbeat goes to one or more “lead hosts”

• Partition map used by “Global Partition Manager” (one node in the cluster, but auto failover) to communicate routing information to Velocity clients

Web Server C

IIS 7.x

Cache Host

Cache Server C

Cache Host

Sql Server)

Regions as unit of replication/failover

(Global Partition Manager in action)

Cache Cluster

Server A

Cache Host A

Server B

Cache Host B

Server C

Cache Host C


Default Cache

Region: Sports

Region 1

Regions as unit of replication/failover

(When using Secondaries)

Cache Cluster

Server A

Cache Host A

Server B

Cache Host B

Server C

Cache Host C


Default Cache

Region: Sports

Region 1

Sports secondary

Region 1 secondary

(Updates done synchronously)

Local CacheCache Cluster

Web Server A

IIS 7.x

Web Server B

IIS 7.xLoad

Balancer

Cache Server A

Cache Host

Cache Server B

Cache Host

Local

Cache

Local

Cache

• Local cache is an option that can be enabled when creating the cache client (DataCacheFactory)

• Allows a local cache to be populated that will prevent network hop (and serialization) if request

can be satisfied locally

• Best when data set is (relatively) small, changes infrequently, and stale data is acceptable

• Can expire via TTL or notifications (which might be late/lost)

• Can specify max object count before evicting LRU

Web Server C

IIS 7.xCache Server C

Cache HostLocal

Cache

Data Types and Caching

Considerations• Reference Data: Product catalogs, “lookup” tables, other

slow-moving content– Safe to cache for a defined period of time because you probably

live with staleness already

– “Local” cache option might be desirable for small data sets

• Activity Data: Shopping carts or other transient transaction • Activity Data: Shopping carts or other transient transaction state– Accessed for read and write operations, but not shared.

Low/No concurrency considerations – exclusive write.

– Safe to cache for reads and keep in cache for writes

• Resource Data: Inventory, Orders, and other core transactional data– Accessed concurrently for read and write

– Caching will require a concurrency model to be chosen and managed

What we’ll cover





• Demos

• Gotchyas

Deploy/Install Considerations

• Windows “Application Server” Role required

• Hotfix required for Vista/Win2k8; not for Win7/Win2k8R2

• You’ll need Powershell 2 (already in Win7/Win2k8R2)

• You’ll need Powershell 2 (already in Win7/Win2k8R2)

• .NET3.5SP1 for cache clients; .NET4 for servers

• Windows XP cannot be a client…

• “Install” and “Configure” for AppFabric are two distinct steps (much like BizTalk)


• Primary screen of

interest is choosing your

configuration store:

– XML/File share

– Sql-Based

• File share avoids the

need for Sql Server, but

requires that some requires that some

nodes in the cache

cluster be special (“Lead

Hosts”)

• Using Sql as the

configuration store is

the better engineering

choice for production –

you may have other

reasons to avoid it.


• As you build out your Velocity Cache Cluster,

you will do “New Cluster” on the first node,

and “Join Cluster” on subsequent nodes

• Ultimately, all of Windows Server AppFabric is • Ultimately, all of Windows Server AppFabric is

a set of features underneath the Application

Server Role – so standard command line

installations work.– Setup.exe /i CacheAdmin,CacheService,CacheClient

AppFabric as Application Server

“Role Service”


• Can do a “Cache client” install for clients, or

for internal apps, just incorporate client

assemblies in your own build/deploy processMicrosoft.ApplicationServer.Caching.Core.dll

Microsoft.ApplicationServer.Caching.Client.dllMicrosoft.ApplicationServer.Caching.Client.dll

Microsoft.WindowsFabric.Common.dll

Microsoft.WindowsFabric.Data.Common.dll

What we’ll cover





• Demos

• Gotchyas

Caching Classes

DataCacheFactory

DataCacheFactory()

DataCacheFactory(configuration)

DataCache GetCache(string cache)

GetDefaultCache()

DataCache

Add

Adds a new object to the

cache. Exception if the item

is already in the cache.

DataCacheFactoryConfiguration

LocalCacheProperties

NotificationProperties

SecurityProperties

DataCacheServerEndpoint[] Servers

(Can set these via configuration)

is already in the cache.

Put

Adds a new object to the

cache. Replaces if already in

cache.

GetReturns an object from the

cache.

RemoveRemoves an object from the

cache.

Caching Classes

DataCache with DataCacheItemVersion

• GetCacheItem: returns tags and version info

• GetIfNewer: lets you use that version info!

• Put and Remove have overloads that takes

version infoversion info

– Allows for an optimistic concurrency model

– Will only succeed if version information matches

what is current for the cached item

DataCache and Locking

• GetAndLock: Allows you to lock a cache item

for a specified time period, even if not present

– (Will fail if already locked)– public Object GetAndLock (string key, TimeSpan timeout, – public Object GetAndLock (string key, TimeSpan timeout,

out DataCacheLockHandle lockHandle, bool forceLock)

• PutAndUnlock: Unlock an item, with given key

and lock handle

• Unlock: Explicitly unlock, optional extend TTL

DataCache and Tags/Regions

• Explicitly created regions live on a single

node…can create a hot spot for both call

volume and memory growth

• But they offer bulk retrieval and flexible tag-• But they offer bulk retrieval and flexible tag-

based retrieves

• Instead of regions: can simulate secondary

indexes with your own secondary-to-primary

mapping

Administrative Model

• Administration for AppFabric Caching done purely through PowerShell

• Can administrate entire Cache Cluster from wherever administrative portion of install has wherever administrative portion of install has been done – all nodes addressable from single command line location

• Use-CacheCluster points the shell at a particular cluster to administrate

• Remember: Get-CacheHelp ☺

What we’ll cover





• Demos

• Gotchyas

Gotchyas

• Not a gotchya: AppFabric provides a SessionStoreProvider class that plugs into the ASP.NET session storage provider model

• Balance number of nodes in cluster with memory per node. – Too many nodes = cluster overhead, too much memory per node = GC

overhead

• If you don’t use Sql Config Store, you need to manually run Start-CacheHost after rebootCacheHost after reboot

• Sql Config Store requires high Sql privileges right now at point of install

• Currently service runs as network service account

• Consider what you will do when cache is down– You can go after source of truth

– How do you avoid leaving stale data in the cache?

Thank you -

Questions?

everything you wanted to know about velocity (but were afraid to … · 2010. 4. 12. · everything...

Documents