z/tpf migration experiences @ klm errol smit – pm z/tpf project tug scottsdale, fall 2009
TRANSCRIPT
z/TPF Migration Experiences
@ KLM
Errol Smit – PM z/TPF projectTUG Scottsdale, Fall 2009
2z/TPF Migration
Agenda
• KLM environment (see also TUG Spring 2008 SCP SC)
• z/TPF Project Scope
• z/TPF Project Experiences
• z/TPF Project Tools
• z/TPF Migration
• Q & A
3z/TPF Migration
TPF4.1 at KLM in 2007
• Level is PUT 19++. TPF4.1 runs at KLM since 1996
• Tightly coupled system
• Total approximately 21500 application sources
• Assembler application programs: appr. 67 %
• C/C++ language application programs: appr. 33 %
• Extensive use of TPFDF & MQ Series
• TPF connectivity:- Terminals and Printers: appr. 10K+ RES related - Message Switching - Host-to-Host Links: Altéa, AF, NW, WSP e-ticketing, etc
4z/TPF Migration
What functionality?
• TPF at KLM hosts the following primary applications:- Corda (reservations, fares, ticketing, inventory)- Codeco (check-in, load & balance)- Cargoal (Cargo booking system)- Firda (Flight information)
• TPF at KLM is part of more than 50 service chains - Electronic Booking Tool, Self Service and Internet Check-In, etc., etc.
• via TPF peaks of around 1400 msg/sec are processed
• An availability of 99,998% over 2008 (65 minutes total downtime), incl. 19 planned & 1 unplanned downtimes
Moved to Amadeus Altéa RES early 2007
5z/TPF Migration
z/TPF Project Scope
• z/TPF EE 1.1 PUT4+
• KLM TPF Systems and Communications modifications
• Single Source updates TPF4.1 applications
• Based upon z/VM V5.3 the introduction of z/Linux Red Hat V5.0 environment for z/TPF purposes, including GCC compilers
• Migrate Idefix software management system (based upon z/OS V1.8 and USS/HFS) to z/Linux based z/Idefix, including new GUI & interfacing with related tools (GCC compilers, loader, etc.)
• Migration of OLDF (Online Dump Facility) to z/ODF
• Migration of the KL Stress test tool (capture/playback) to z/TPF
6z/TPF Migration
z/TPF Project Milestones(originally planned vs. realized in red)
• 17 Sept. 2007: formal start of project
• 30 Jan. 2008: finish Global Designs & Single Source updates- 29 Feb. 2008 Global Designs- 18 July 2008 Single Source updates on production system
• Mid-feb. 2008: z/Idefix environment ready for basic project usage
• 1 April 2008: vanilla z/TPF test system up and running- 18 July 2008 incl. all basic KLM mods
• 19 Sept. 2008: start integration tests- 10 Dec. 2008 incl. all KLM mods (systems, comms & appl.)
• 1 Jan. 2009: start user acceptance tests- 28 Feb. 2009 user involvement already during integration tests
• 21 March 2009: target cut-over date- 17 May 2009 realized cut-over date
7z/TPF Migration
z/TPF Project Effort & Investment
• Total project effort registered: 23437 hours- z/Linux & z/Idefix 19 % - Applications 17 % - Systems & Comms 56 %- Project Management 6 %- Miscellaneous (TM & SLM) 2 %
• Lead time: 20 months excl. 2 months formal after-care period
• Hardware and software investments:- IFL’s, z/VM, z/Linux, GCC support
8z/TPF Migration
z/TPF Project Organisation
• Prince II Steering Group including VP IS Operations
• Project Manager
• Two Project Leaders (SYS/Comms & AD)
• Project team member numbers varied between 8 - 25 people
• Test Management support for QC & weekly z/TPF test status mail
• Several End-Users representing the four major TPF applications and related non-TPF based systems and services
• Weekly conference call with the IBM TPF Lab
9z/TPF Migration
z/TPF Impact on HW infra
• CPU- Expected 10% increase in CPU production capacity required for
z/TPF (see separate slide on performance)- 2 X IFL required for z/VM & z/Linux (implemented in z/OS Z9)
• Memory- Expected ± three (3) times the memory requirements of TPF4.1
TPF production increased from 1.5 GB to 10 GB (i.e. 6 times)- Extra memory required for z/VM & z/Linux (IFL’s: 8 GB)
• Database- Small impact on TPF DB.- Extra disk space (SAN) for z/VM (2*80 GB) & z/Linux (2*100 GB)
• (Virtual) Tape: No impact
10z/TPF Migration
- z/TPF EE 1.1 PUT4+- PUT5 & PUT6 APAR monitoring (a total of over 180 APARs were pre-applied) - TPF GCC: unexpected Red Hat support contract required, GCC differences - HLASM for z/Linux: code page issues (z/OS 1047 versus z/Linux 500)- SST: not available anymore
- z/VM 5.3 (no real issues)- Performance toolkit- Dirmaint: not implemented yet- RACF: not implemented yet- VMBackup- VMTape
- z/Linux RHEL5.0 (no real issues)- DB2 Connect- TSM client- Tivoli Enterprise Console Agent- Control-M client: not used, solved with an exec
z/TPF Impact on SW
11z/TPF Migration
- Stress Test tools were used with the same set of messages on a TPF 4.1 and a z/TPF standalone test system (LSAS).
- Data Collection, ZTRAP and KLM resource logging were used to compare the results
- Initial results were much higher as predicted 10% increase in CPU utilization
- With Branch Trace off, the difference was even 30-40+%- Temporary CPU upgrade (Z9EC-506 -> 507) was
implemented just before cut-over for safety reasons- After c/o an average 13.5% increase was measured
z/TPF Impact on Performance I
12z/TPF Migration
- Following APAR(s) were implemented before cut-over- PJ35365 C-function trace and macro separated - PJ35517 remove shared defer list- PJ36022 correct CE1IST value
- Following APAR(s) were implemented after cut-over- PJ35509 / PK 79078 DF performance enhancements; this
saves an estimated additional 2-3 % CPU utilization
z/TPF Impact on Performance II
13z/TPF Migration
======== ENTER/BACK errors- PJ34357:5: BSO linkage problem not fixed by PJ33725 (CP branched to wrong stub routine)- PJ33982:5: ENTRC from HLASM to C program does not pass back the return code in R15 - PJ35155:6: after PJ34340 infinite loop in CCENBK when allocating CRPA from free chain- PJ35233:6: C program registers corrupted after ENTRC to several HLASM programs
(incorrect use of stack by CP)====== TPFDF errors- PK72087:5: dfred()may fail due to old search addresses above 2GB being used in a keylist
passed to dfkey().- PK69938:5: TPFDF ADD gives unpredictable results after jumping into branch table at
wrong location- PK76740:6: need to check extended keylist inuse indicator before trying to use extended
(>6 keylists) address.- PK76740:6: dfred AREA= parameter ignored because option bit not set in the SW00SR.======== OTHER- PJ34628:5: fix wait state PSW loaded whilst software profiler EI collection active- PJ35365:6: to allow C trace to be deactivated via ZSTRC option FUNCTR (performance
requirement)- PJ35503:6: system did not always return to norm state after Catastrophic due to corruption
in critical record filing
z/TPF Critical APARs applied(selection of KL raised APARs)
14z/TPF Migration
z/Idefix Development Tool
• Application development tool for z/TPF integrated in IBM’s TPF Toolkit 3.0 or stand-alone usage
• Windows XP based GUI called TPFfix (stand-alone)
• z/Linux based server application
• Defect/Feature control (Grips)
• Repository management (incl. keeping TPF4.1 & z/TPF in sync)
• Version management and promotion control
• Compiling and linking
• z/TPF load management (via GDS)
• Cross-reference and label search functionality
15z/TPF Migration
z/TPF Test Tools
• HP/Mercury Quality Center for:- Repository of Test Cases incl. status, assignments, sign-off- Project problem/defect registration during tests & cutover
• Several VPARS based test systems and a Live Sized Acceptance System (LSAS) which can run native as well
• TPF Debugger i.s.o. SST
• Stress Test Tool used weekly (appr. 1-2 million messages) and for performance comparison tests using Data Collection, PMC and ZTRAP
• z/ODF Online Dump Facility
• ERREPA (KLM’s version of a SNAP dump)
16z/TPF Migration
z/Online Dump Facility
• To improve the ONLINE problem solving activities in TPF, KLM has implemented a tool called OnLine Dump Facility (OLDF).
• OLDF Rel 5.21 has been adapted for z/TPF -> z/ODF
• Provides the following BASIC capabilities:- Creation of an online displayable MINI-DUMP or SNAPSHOT-DUMP.- Creation of online displayable ERROR-REPORTS and DUMP SUMMARIES.- Registration of occurrences of 'NODUPL' situations in the ERROR-REPORT.- RTA/RTT tape information.- System UP/DOWN information.
• PLUS: Breakpoint facility (new)
• More on z/ODF in the Operations & Coverage SC …
17z/TPF Migration
z/TPF Test & Migration Strategy
• A freeze period (3 months) during Acceptance tests and cut-over was instated for:- All TPF related software, hardware & related infrastructure- All TPF related services
• Main goal was to be aware of all related changes and define any special required activities like setting up a special test environment- All general TPF test systems incl. links were already on z/TPF
• Only one of a total of 29 freeze exception requests was rejected
• End-User involvement during Integration Tests and Single Source check-out increased User Acceptance test efficiency
18z/TPF Migration
z/TPF Migration
• Cut-over Sunday 17th of May 02:00 AMS LT
• Point of no return Tuesday 19th of May 16:00 AMS LT
• 24 hour on-site support in shifts until Point of no return
• 3 IBM staff on-site available during & after cut-over for extra support
• During cut-over copy of console projected on screens
• Actual downtime was 9 minutes
• Hourly status reports for Steering Group / Senior Management
• Before, during and after cut-over z/TPF status was published on Alfresco, AF/KL’s intranet site
19z/TPF Migration
z/TPF Post Migration I
• Only one major problem directly after c/o with some outstations not able to sign-in. Solved during the Sunday morning. RCB related.
• One outage (looping ECBs) due to z/ODF database not being cleared. Solved after initializing the database.
• Post Migration number of defects was only 10% of total. Usually this is 35-45%, which means z/TPF was very well tested !
20z/TPF Migration
z/TPF Post Migration II
• Two outages (3 weeks & 4 months after c/o) due to MQ handling. The MQ problems were caused by a mixture of errors.
• Those are mainly solved by the following APARS:- PJ33188 – Wrong MQ sweep logic can take the system in input
list shutdown.- PJ36440 – Unexpected TO2 errors due to wrong Recoup index
entry.- PJ36543 – A single MQ error cause all channels to fail.- PJ31218 – Problem in the checkpointing process.
21z/TPF Migration
z/TPF Project Issues
• Availability of very experienced End-Users
• Continuously growing number of APARs during the project- PUT 5 # 490; PUT 6 # 230 on 8th of May 2009- This extra effort caused a slight delay in the project
• Required project budget versus KLM’s overall business results- A significant budget over-run might have prematurely killed the project
• Sufficient Communication to all related (non-TPF) groups and departments on potential z/TPF migration impact was not always easy due unclear points-of-contacts
22z/TPF Migration
Contacts & previous presentations
• Overall z/TPF project: [email protected] TUG Spring 2008 SCP SC
• Application migration: [email protected] TUG Spring 2008 AD SC- TUG Fall 2008 Member presentations- TUG Spring 2009 AD SC
• Performance: [email protected] TUG Fall 2007 O&C SC
• z/ODF: [email protected] TUG Fall 2008 Vendor’s presentations & this TUG O&C SC
23z/TPF Migration
Summary & Conclusion
• After the JAL Front-end, NYPD & VISA, KLM became the first airline system which has z/TPF fully in production !
• z/TPF runs stable. Some MQ related problems. Debugger unstable at times. PUT5++ upgrade may help (1Q 2010)
• Great team effort between Systems, Comms, Application Development & End-Users contributed to the success.
• We received excellent support from the IBM TPF Lab
• QUESTIONS ?