website survival: concealing back-end outages with oracle coherence and hotcache
DESCRIPTION
Website Survival: Concealing Back-End Outages with Oracle Coherence and HotCache. Jim Xu Senior Technology Architect TELUS Randy Stafford Architect At-Large Oracle Coherence Product Development. Presented with. Session Agenda. 1. TELUS introduction and business challenge - PowerPoint PPT PresentationTRANSCRIPT
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Website Survival:Concealing Back-End Outages with Oracle Coherence and HotCache
Jim XuSenior Technology ArchitectTELUS
Randy StaffordArchitect At-LargeOracle Coherence Product Development
Presented with
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Session Agenda
TELUS introduction and business challenge
Oracle technology addressing the challenge
Technical highlights of the implementation
HotCache whole product: transformation and more
Solution validation through business metrics
1
2
3
4
5
TELUS Public4
Introduction to TELUS
TELUS (TSX: T, NYSE: TU) - Canada’s fastest-growing national telecommunications company
Headquarter Burnaby, British Columbia, Canada Revenue $11.7 billion EBITDA $4.0 billion Customer 13.4 million connections, including
7.9 million wireless subscribers
3.2 million wireline network access lines
1.4 million Internet subscribers
865,000 TV customers
Website www.telus.com
TELUS Public5
Business Challenges
Context Digital experience serving several millions of customers
Challenges 80% of clients researched online prior to purchase 85% of clients preferred to solve problems online Slow responding web pages and frequent unplanned outages seriously degraded client experience Voice of Client indicated 39% of complaints were related to speed & stability Unreliable self-serve impacted web adoption and drove calls to call centers Subscriber growth increased considerably with traffic and load
Goals Under 3 seconds to render customer experience 99.99% uptime
TELUS Public6
Journey on Performance Improvement
High Availability and Resiliency Program was started in 2011
A number of enhancements reduced response time from 21 sec to 8 sec in 2012, then 6 sec in 2013
Q1 Q2 Q3 Q4 (2012)
East
West
National
TELUS Public7
Tipping Point
Impossible to reach 3 sec and 99.99% uptime target without architecture redesign and new technologies
Extended outages (10-20 hours) during quarterly releases and maintenance windows Customer data is collected from multiple data sources across multiple data centers Legacy infrastructure requires frequent maintenance
Caching data is critical Coherence 3.7 was introduced, but facing challenges in keeping cached data fresh Custom cache updater was considered but later discarded due to complexity
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Session Agenda
TELUS introduction and business challenge
Oracle technology addressing the challenge
Technical highlights of the implementation
HotCache whole product: transformation and more
Solution validation through business metrics
1
2
3
4
5
TELUS Public9
The Solution
Build In-Memory Data Grid with Coherence 12c Resolve cache data update issue with HotCache Conceal back-end outages to provide 7/24 service reliability Improve system performance and maintain consistent client experience
Technologies: Exalogic X3-2 and X4-2 Coherence Data Grid edition 12.1.2 Oracle Traffic Director Golden Gate Weblogic 12c
Stats: Cached raw data – 212 G Number of objects: 821 Million
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Java EE Application Physical Tiering - and Scalability
Site 1
Grid Tiercache servers
App Tierapp servers
WebTierweb servers
These tiers can scale out…
The EIS tier is hard to scale!DatabaseExternal Service
Legacy System
EIS Tier
The grid tier scales out!
10
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 11
Coherence GoldenGate HotCache• Push DB changes to Coherence• Via GoldenGate and TopLink JPA• Tables map to entities, caches• Event-driven and efficient• Solves stale cache problem when
external apps write to shared DB• Allows caching to be leveraged in
that class of application
Database
Coherence
Coherence Application
ExternalApplication
Read / Write
Read / Write
GoldenGate HotCache
GoldenGate
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. 12
Exalogic System Hardware OverviewFast. Easy. Open
Compute I/O Fabric Storage
• 2 socket, 12-core, 2.7 GHz Intel Xeon processors
• 256 GB of 1600MHz DRAM
• (2) 400 GB SSDs• Dual-port QDR
InfiniBand HCA (PCIe)
• Between 2-4 InfiniBand Gateway Switches- (32) QDR
InfiniBand ports- (8) 10GbE ports
for datacenter connectivity
• 40 Gb/sec internal I/O backplane
• Enterprise-class, integrated Network Attached Storage
• 80 TB SAS disk, 6.4 TB read cache, 292 GB write cache
• Clones, snapshots, remote replication
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Coherence on Exalogic
• MessageBus: an asynchronous, binary, message-based, event-driven transport layer in Coherence, with pluggable implementations
• Exabus: a native RDMA implementation of MessageBus, bypassing the OS kernel, avoiding buffer copies
• Exabus preprocesses messages on I/O threads, avoiding context switches between Coherence threads prior to Exabus
• Separate MessageBus per Coherence service, instead of all services sharing same transport layer prior to MessageBus, allows utilizing full IB bandwidth
MessageBus and Exabus
TELUS Public14
Data Grid Server - Exalogic vs Commodity
Latency
Throughput
Failover
CPU Utilization
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Session Agenda
TELUS introduction and business challenge
Oracle technology addressing the challenge
Technical highlights of the implementation
HotCache whole product: transformation and more
Solution validation through business metrics
1
2
3
4
5
TELUS Public16
System Architecture
TELUS Public17
Data Consolidation
Data Grid
Golden Gate
Billing
Benefits:
• Reduce data roundtrips
• Improve performance
• Less dependency on legacy
data centers
• Canonical model across
multiple source databases
Account CustomerUser
Profile
TELUS Public18
Data Grid Geo-Redundancy
Data Grid
Benefits:
• Replicated infrastructure and
data
• Active-Active to support
production
• Data and Services closer to
consumers
Data Grid
Global Traffic Manager
Data Services Data Services
EastWest
TELUS Public19
Data Grid Synchronization (Current State)
TELUS Public20
Data Grid Synchronization (Next Stage )
TELUS Public21
Project Timeline on Data Grid
Aggressive timeline on launching Data Grid
Closely collaborated with Oracle to resolve any technical issues
TELUS Public22
Manage Object Relationships
Cached Data:
• Objects are independent
in the grid
• But they are logically
related
• Object traversal through
keys
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Session Agenda
TELUS introduction and business challenge
Oracle technology addressing the challenge
Technical highlights of the implementation
HotCache whole product: transformation and more
Solution validation through business metrics
1
2
3
4
5
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Data Transformation with Coherence Live Events
AA
Object/Cache Mapping
HotCacheLive Events
BC PH
K V
K V
K V
AA
BC
PH
Coherence Data Grid Legacy Schema
K V
K V
K V
K V
Addr
BA
Cust
Svc
Canonical Domain Model
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Live Events Use Cases in HotCache
• Project HotCache model into desired model• Duplicate data for denormalization• Ensure referential integrity in relationship implementations
– Merge data from multiple databases– Pending Mutations pattern
• Refresh Aggregates when child tables are not replicated
TELUS Public26
Data Aggregation with Layered Caches
Stage Cache JPA entities Identical data structure as source database to simplify HotCache implementation Initial load is not required, and object can be removed after target object is updated Reduced data grid memory footprint
Target Cache Similar to database view, populated with UI optimised domain objects Denormalized/flatten objects to improve performance for data retrieval Process object dependencies through Event Interceptors and Entry Processors
TELUS Public27
Update Dependent Objects with Event Interceptors
private Double currentAccountBalance;private Date billingEffectiveDt;private Date paymentDueDt; ....
PaymentHistory
private int billCycle;private Integer cycleCloseDay; private Integer nextCycleCloseDay;....
BillCycle
private Long addressAssignmentId;private Long entityId;private String entityTypeCd;private Long addressId;private Date effectiveEndDt;....
AddressAssignment
Event Interceptor
Event Interceptor
Event Interceptor
private Integer billingAccountNum;private String billingAccountTypeCd;private Double currentAccountBalance;private int billCycle;private Long addressId;....
BillingAccount
Stage Cache
Target Cache
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Scaling HotCache via Parallel Data Flows
• DB schema must be amenable (related tables in same trail)• One HotCache throughput: 700-3000 TPS depending on HW, configuration• This approach has been tested to 18,000 TPS
Coherence Data Grid
Extract 1 Trail 1
K V
K V
K V
HotCache 1
Extract 2 Trail 2 HotCache 2
Extract N Trail N HotCache N
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
HotCache High Availability• Coherence is already HA• Oracle Clusterware manages redundant GoldenGate HotCache processes• http://www.oracle.com/technetwork/middleware/goldengate/overview/
ha-goldengate-whitepaper-128197.pdf
Active
GoldenGate
Manager
HotCache
Passive
GoldenGate
Manager
HotCache
Oracle Clusterware
check()
stop() start()
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Monitoring HotCache
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Session Agenda
TELUS introduction and business challenge
Oracle technology addressing the challenge
Technical highlights of the implementation
HotCache whole product: transformation and more
Solution validation through business metrics
1
2
3
4
5
TELUS Public32
Business Benefits
No more outages! - supported all major releases and infrastructure maintenance since initial launch in last November
Enhanced performance at the service level 2 – 30x faster
Reduced dependency on legacy data centers and hardware footprint
Offered single view on customer with data from various legacy systems
TELUS Public33
Performance Metrics
Data grid enabled Client Account WS response time: 20ms vs 99-10294ms
Outage Mode – Portal overview page response time: 3.2s
Operational Mode – 48% performance gain from staging performance test
0
500
1000
1500
2000
2500
3000
3500
4000
4500
1 7 13 19 25 31 37 43 49 55 61 67 73 79
Data Grid
Web Service Call
Average Daily webService
MS
HRS
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Questions?
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor StatementThe preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Focus on Oracle Coherence
Session ID Session Title Date and Time Location
CON7471 Oracle Coherence 12c: Strategy and Roadmap Monday, Sep. 294:00-4:45 p.m. Moscone South - 304
CON7895 Synergy: Using Oracle WebLogic Server and Oracle Coherence in Combination
Monday, Sep. 295:15-6:00 p.m. Moscone South - 236
CON7875 Write Once, Read Everywhere: Fast Multisite Data Access with Oracle Coherence
Tuesday, Sep. 3012:00-12:45 p.m. Moscone South – 200
CON7029 Oracle Fusion Middleware: Meet This Year’s Most Impressive Innovators
Tuesday, Sep. 305:00-5:45 p.m.
Yerba Buena Center for the Arts - Theater
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Focus on Oracle Coherence (continued)
Session ID Session Title Date and Time Location
CON7942 Lockdown! Security Practices for Oracle WebLogic Server and Oracle Coherence
Wednesday, Oct. 110:15-10:00 a.m. Moscone South - 304
CON7898 Website Survival: Concealing Back-End Outages with Oracle Coherence and HotCache
Wednesday, Oct. 12:00-2:45 p.m. Moscone South – 200
CON7939 Maximum Availability in the Cloud: Oracle WebLogic Server and Oracle Coherence
Wednesday, Oct. 13:30-4:15 p.m. Moscone South – 304
HOL9436 Pushing Database Transactions to JCache with Oracle Coherence and Oracle GoldenGate
Wednesday, Oct. 14:15-5:15 p.m. Hotel Nikko – Nikko Ballroom II
CON7896 Rapid Delivery of Innovative Real-Time Applications with Oracle Coherence
Thursday, Oct. 29:30-10:15 a.m. Moscone South - 304
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |