website survival: concealing back-end outages with oracle coherence and hotcache

39

Upload: hayfa-holland

Post on 03-Jan-2016

29 views

Category:

Documents


2 download

DESCRIPTION

Website Survival: Concealing Back-End Outages with Oracle Coherence and HotCache. Jim Xu Senior Technology Architect TELUS Randy Stafford Architect At-Large Oracle Coherence Product Development. Presented with. Session Agenda. 1. TELUS introduction and business challenge - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache
Page 2: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Website Survival:Concealing Back-End Outages with Oracle Coherence and HotCache

Jim XuSenior Technology ArchitectTELUS

Randy StaffordArchitect At-LargeOracle Coherence Product Development

Presented with

Page 3: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Session Agenda

TELUS introduction and business challenge

Oracle technology addressing the challenge

Technical highlights of the implementation

HotCache whole product: transformation and more

Solution validation through business metrics

1

2

3

4

5

Page 4: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

TELUS Public4

Introduction to TELUS

TELUS (TSX: T, NYSE: TU) - Canada’s fastest-growing national telecommunications company

Headquarter Burnaby, British Columbia, Canada Revenue $11.7 billion EBITDA $4.0 billion Customer 13.4 million connections, including

7.9 million wireless subscribers

3.2 million wireline network access lines

1.4 million Internet subscribers

865,000 TV customers

Website www.telus.com

Page 5: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

TELUS Public5

Business Challenges

Context Digital experience serving several millions of customers

Challenges 80% of clients researched online prior to purchase 85% of clients preferred to solve problems online Slow responding web pages and frequent unplanned outages seriously degraded client experience Voice of Client indicated 39% of complaints were related to speed & stability Unreliable self-serve impacted web adoption and drove calls to call centers Subscriber growth increased considerably with traffic and load

Goals Under 3 seconds to render customer experience 99.99% uptime

Page 6: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

TELUS Public6

Journey on Performance Improvement

High Availability and Resiliency Program was started in 2011

A number of enhancements reduced response time from 21 sec to 8 sec in 2012, then 6 sec in 2013

Q1 Q2 Q3 Q4 (2012)

East

West

National

Page 7: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

TELUS Public7

Tipping Point

Impossible to reach 3 sec and 99.99% uptime target without architecture redesign and new technologies

Extended outages (10-20 hours) during quarterly releases and maintenance windows Customer data is collected from multiple data sources across multiple data centers Legacy infrastructure requires frequent maintenance

Caching data is critical Coherence 3.7 was introduced, but facing challenges in keeping cached data fresh Custom cache updater was considered but later discarded due to complexity

Page 8: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Session Agenda

TELUS introduction and business challenge

Oracle technology addressing the challenge

Technical highlights of the implementation

HotCache whole product: transformation and more

Solution validation through business metrics

1

2

3

4

5

Page 9: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

TELUS Public9

The Solution

Build In-Memory Data Grid with Coherence 12c Resolve cache data update issue with HotCache Conceal back-end outages to provide 7/24 service reliability Improve system performance and maintain consistent client experience

Technologies: Exalogic X3-2 and X4-2 Coherence Data Grid edition 12.1.2 Oracle Traffic Director Golden Gate Weblogic 12c

Stats: Cached raw data – 212 G Number of objects: 821 Million

Page 10: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Java EE Application Physical Tiering - and Scalability

Site 1

Grid Tiercache servers

App Tierapp servers

WebTierweb servers

These tiers can scale out…

The EIS tier is hard to scale!DatabaseExternal Service

Legacy System

EIS Tier

The grid tier scales out!

10

Page 11: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 11

Coherence GoldenGate HotCache• Push DB changes to Coherence• Via GoldenGate and TopLink JPA• Tables map to entities, caches• Event-driven and efficient• Solves stale cache problem when

external apps write to shared DB• Allows caching to be leveraged in

that class of application

Database

Coherence

Coherence Application

ExternalApplication

Read / Write

Read / Write

GoldenGate HotCache

GoldenGate

Page 12: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. 12

Exalogic System Hardware OverviewFast. Easy. Open

Compute I/O Fabric Storage

• 2 socket, 12-core, 2.7 GHz Intel Xeon processors

• 256 GB of 1600MHz DRAM

• (2) 400 GB SSDs• Dual-port QDR

InfiniBand HCA (PCIe)

• Between 2-4 InfiniBand Gateway Switches- (32) QDR

InfiniBand ports- (8) 10GbE ports

for datacenter connectivity

• 40 Gb/sec internal I/O backplane

• Enterprise-class, integrated Network Attached Storage

• 80 TB SAS disk, 6.4 TB read cache, 292 GB write cache

• Clones, snapshots, remote replication

Page 13: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

Copyright © 2014 Oracle and/or its affiliates. All rights reserved.

Coherence on Exalogic

• MessageBus: an asynchronous, binary, message-based, event-driven transport layer in Coherence, with pluggable implementations

• Exabus: a native RDMA implementation of MessageBus, bypassing the OS kernel, avoiding buffer copies

• Exabus preprocesses messages on I/O threads, avoiding context switches between Coherence threads prior to Exabus

• Separate MessageBus per Coherence service, instead of all services sharing same transport layer prior to MessageBus, allows utilizing full IB bandwidth

MessageBus and Exabus

Page 14: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

TELUS Public14

Data Grid Server - Exalogic vs Commodity

Latency

Throughput

Failover

CPU Utilization

Page 15: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Session Agenda

TELUS introduction and business challenge

Oracle technology addressing the challenge

Technical highlights of the implementation

HotCache whole product: transformation and more

Solution validation through business metrics

1

2

3

4

5

Page 16: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

TELUS Public16

System Architecture

Page 17: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

TELUS Public17

Data Consolidation

Data Grid

Golden Gate

Billing

Benefits:

• Reduce data roundtrips

• Improve performance

• Less dependency on legacy

data centers

• Canonical model across

multiple source databases

Account CustomerUser

Profile

Page 18: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

TELUS Public18

Data Grid Geo-Redundancy

Data Grid

Benefits:

• Replicated infrastructure and

data

• Active-Active to support

production

• Data and Services closer to

consumers

Data Grid

Global Traffic Manager

Data Services Data Services

EastWest

Page 19: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

TELUS Public19

Data Grid Synchronization (Current State)

Page 20: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

TELUS Public20

Data Grid Synchronization (Next Stage )

Page 21: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

TELUS Public21

Project Timeline on Data Grid

Aggressive timeline on launching Data Grid

Closely collaborated with Oracle to resolve any technical issues

Page 22: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

TELUS Public22

Manage Object Relationships

Cached Data:

• Objects are independent

in the grid

• But they are logically

related

• Object traversal through

keys

Page 23: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Session Agenda

TELUS introduction and business challenge

Oracle technology addressing the challenge

Technical highlights of the implementation

HotCache whole product: transformation and more

Solution validation through business metrics

1

2

3

4

5

Page 24: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Data Transformation with Coherence Live Events

AA

Object/Cache Mapping

HotCacheLive Events

BC PH

K V

K V

K V

AA

BC

PH

Coherence Data Grid Legacy Schema

K V

K V

K V

K V

Addr

BA

Cust

Svc

Canonical Domain Model

Page 25: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Live Events Use Cases in HotCache

• Project HotCache model into desired model• Duplicate data for denormalization• Ensure referential integrity in relationship implementations

– Merge data from multiple databases– Pending Mutations pattern

• Refresh Aggregates when child tables are not replicated

Page 26: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

TELUS Public26

Data Aggregation with Layered Caches

Stage Cache JPA entities Identical data structure as source database to simplify HotCache implementation Initial load is not required, and object can be removed after target object is updated Reduced data grid memory footprint

Target Cache Similar to database view, populated with UI optimised domain objects Denormalized/flatten objects to improve performance for data retrieval Process object dependencies through Event Interceptors and Entry Processors

Page 27: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

TELUS Public27

Update Dependent Objects with Event Interceptors

private Double currentAccountBalance;private Date billingEffectiveDt;private Date paymentDueDt; ....

PaymentHistory

private int billCycle;private Integer cycleCloseDay; private Integer nextCycleCloseDay;....

BillCycle

private Long addressAssignmentId;private Long entityId;private String entityTypeCd;private Long addressId;private Date effectiveEndDt;....

AddressAssignment

Event Interceptor

Event Interceptor

Event Interceptor

private Integer billingAccountNum;private String billingAccountTypeCd;private Double currentAccountBalance;private int billCycle;private Long addressId;....

BillingAccount

Stage Cache

Target Cache

Page 28: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Scaling HotCache via Parallel Data Flows

• DB schema must be amenable (related tables in same trail)• One HotCache throughput: 700-3000 TPS depending on HW, configuration• This approach has been tested to 18,000 TPS

Coherence Data Grid

Extract 1 Trail 1

K V

K V

K V

HotCache 1

Extract 2 Trail 2 HotCache 2

Extract N Trail N HotCache N

Page 29: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

HotCache High Availability• Coherence is already HA• Oracle Clusterware manages redundant GoldenGate HotCache processes• http://www.oracle.com/technetwork/middleware/goldengate/overview/

ha-goldengate-whitepaper-128197.pdf

Active

GoldenGate

Manager

HotCache

Passive

GoldenGate

Manager

HotCache

Oracle Clusterware

check()

stop() start()

Page 30: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Monitoring HotCache

Page 31: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Session Agenda

TELUS introduction and business challenge

Oracle technology addressing the challenge

Technical highlights of the implementation

HotCache whole product: transformation and more

Solution validation through business metrics

1

2

3

4

5

Page 32: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

TELUS Public32

Business Benefits

No more outages! - supported all major releases and infrastructure maintenance since initial launch in last November

Enhanced performance at the service level 2 – 30x faster

Reduced dependency on legacy data centers and hardware footprint

Offered single view on customer with data from various legacy systems

Page 33: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

TELUS Public33

Performance Metrics

Data grid enabled Client Account WS response time: 20ms vs 99-10294ms

Outage Mode – Portal overview page response time: 3.2s

Operational Mode – 48% performance gain from staging performance test

0

500

1000

1500

2000

2500

3000

3500

4000

4500

1 7 13 19 25 31 37 43 49 55 61 67 73 79

Data Grid

Web Service Call

Average Daily webService

MS

HRS

Page 34: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Questions?

Page 35: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Safe Harbor StatementThe preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Page 36: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Focus on Oracle Coherence

Session ID Session Title Date and Time Location

CON7471 Oracle Coherence 12c: Strategy and Roadmap Monday, Sep. 294:00-4:45 p.m. Moscone South - 304

CON7895 Synergy: Using Oracle WebLogic Server and Oracle Coherence in Combination

Monday, Sep. 295:15-6:00 p.m. Moscone South - 236

CON7875 Write Once, Read Everywhere: Fast Multisite Data Access with Oracle Coherence

Tuesday, Sep. 3012:00-12:45 p.m. Moscone South – 200

CON7029 Oracle Fusion Middleware: Meet This Year’s Most Impressive Innovators

Tuesday, Sep. 305:00-5:45 p.m.

Yerba Buena Center for the Arts - Theater

Page 37: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Focus on Oracle Coherence (continued)

Session ID Session Title Date and Time Location

CON7942 Lockdown! Security Practices for Oracle WebLogic Server and Oracle Coherence

Wednesday, Oct. 110:15-10:00 a.m. Moscone South - 304

CON7898 Website Survival: Concealing Back-End Outages with Oracle Coherence and HotCache

Wednesday, Oct. 12:00-2:45 p.m. Moscone South – 200

CON7939 Maximum Availability in the Cloud: Oracle WebLogic Server and Oracle Coherence

Wednesday, Oct. 13:30-4:15 p.m. Moscone South – 304

HOL9436 Pushing Database Transactions to JCache with Oracle Coherence and Oracle GoldenGate

Wednesday, Oct. 14:15-5:15 p.m. Hotel Nikko – Nikko Ballroom II

CON7896 Rapid Delivery of Innovative Real-Time Applications with Oracle Coherence

Thursday, Oct. 29:30-10:15 a.m. Moscone South - 304

Page 38: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Page 39: Website Survival: Concealing Back-End Outages with Oracle Coherence and  HotCache