mfs trouble shooting guide b9 ed11.pdf

117
ED 11 Release MFS Troubleshooting guide release B9 EVOLIUM 3bk29092JAAAPWZZA- ed11rl.doc 03/03/2006 3BK 29042 JAAA PWZZA 1/117 Site VELIZY EVOLIUM™ SAS Originators MFS integration team MFS TROUBLESHOOTING GUIDE B9 RELEASE System : ALCATEL 900 / BSS Sub-system : MFS Document Category : USER GUIDE ABSTRACT This document constitutes the reference location for storing troubleshooting actions related to operation of MFS B9. It is restricted to ALCATEL internal usage, notably for ALCATEL personnel providing on site support at customer premises. This document will be updated each time new problem occurs. Approvals Name App. A. WAZANA G. Acbard J-J BELLEGO Name App.

Upload: vu-anh-tuan

Post on 14-Nov-2015

266 views

Category:

Documents


15 download

TRANSCRIPT

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 1/117

    Site

    VELIZY

    EVOLIUM SAS

    Originators

    MFS integration team

    MFS TROUBLESHOOTING GUIDE

    B9 RELEASE

    System : ALCATEL 900 / BSS Sub-system : MFS Document Category : USER GUIDE

    ABSTRACT

    This document constitutes the reference location for storing troubleshooting actions related to operation of MFS B9. It is restricted to ALCATEL internal usage, notably for ALCATEL personnel providing on site support at customer premises.

    This document will be updated each time new problem occurs.

    Approvals

    Name App.

    A. WAZANA G. Acbard J-J BELLEGO

    Name App.

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 2/117

    REVIEW

    HISTORY

    Ed. 01 Proposal 01 Cancelled B8 chapters (FR close OUT, NRE, REL)

    Ed. 01 Proposal 02 01-11-2004 P.MENON Some clean up + synchronization with new tips from B8

    Ed. 01 Proposal 03 08-11-2004 P.MENON Suppress redundant informations with MFS Installation;Configuration,and Software replacement guide

    Ed. 01 Proposal 04 16-11-2004 P.MENON Minor corrections

    Ed. 01 Proposal 05 16-02-05 P.MENON

    - Add Unix boot impossible (wrong default kernel) - Add check if backup Mib is not corrupted - Add How to get contents of unix patch BL

    - Add for Trace of unix patch installation

    Ed. 01 release 11-03-05 Release for B9 MR0

    Ed. 02 release 01-06-05 Release for B9 MR2 P.MENON - update Corrective action: second step (install_lsm)

    Ed. 03 release 02-06-05 Release for B9 MR2 P.MENON - S99trace_srv.ds is renamed in S99trace_server.ds since MFSAW10F

    Ed. 04 release 02-06-05 Release for B9 MR2 P.MENON - Add for Failure on Update Remote Inventory

    Ed. 05 release 09-06-05 Release for B9 MR2 P.MENON - Add for rmdir fails during execution of ins_swcx.sh when cygwin is installed on the PC

    Ed. 06 release 30-06-05 Release for B9 MR2 P.MENON -update Error at step 5/10 (Isolation) Check the full SCSI chain...

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 3/117

    Ed. 07 release 06-07-05 Release for B9 MR2 P.MENON -Add Connection by ftp from a MFS station to an external server is impossible FR 3BKA20FBR164817 - Add TRACE_SERVER does not run FR 3BKA13FBR164932

    - Add GPU traces are not completed - Add Impossible to load patch GPU B8 on GPUs FR

    3BKA13FBR164932 - Add Unix patch installation from OMC stopped due to a

    network failure - Add After a roll-back it is impossible to open the IMT

    terminal FR 3BKA20FBR162930

    Ed. 08 Proposal 01 30-08-05 P.MENON - Add for Inall procedure stopped due to a station in "halt in" state FR 3BKA13FBR166921

    Ed. 08 Proposal 02 01-09-05 P.MENON - Add new Installation from a not english PC fails (FR 3BKA20FBR166358)

    Ed. 08 Proposal 03 09-09-05 P.MENON

    - Add new The trace server stops running after a while (FR 3BKA13FBR169218)

    - Add new Result of dupatch in B8 or B9 RC40 with BL24

    Ed. 08 Proposal 04 20-09-05 P.MENON Add new Control station reboots in loop with reset_code 214 after installation of BL22 (FR 3BKA13FBR170335)

    Ed. 08 Proposal 05 20-10-05 P.MENON - Update Control station reboots in loop with reset_code 214 after installation of BL22 (FR 3BKA13FBR170335) - Suppress yellow paragraph

    Ed. 08 Release 28-10-05 Release

    Ed. 09 Release 13-01-06 Release P.MENON - quality corrections - update Error at Step 2 (Creation) - Add new MFS UNIX patch installation makes Control Station unusable (B9 MR1 ED2) (FR 3BKA23FBR174370) - Update Error at step 5/10 (Isolation) (FR 3BKA13FBR175829) - Add new Reinstallation of the MFS and restauration of data from OMC - Add new How to restore the MIB without needing full reinstallation - Add new Sanity check script to prevent any potential problem on the MFS

    Ed. 10 Proposal 01 08-02-06 P.MENON - Add new GPU problem but alarm is "Failure of a JAET1 applique" (FR 3BKA13FBR177178) - Add new no more available disk space on /usr (FR 3BKA20FBR176683)

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 4/117

    Ed. 10 Proposal 02 09-02-06 P.MENON - update Sanity check script to prevent any potential problem on the MFS (FR 3BKA13CBR176689)

    Ed. 10 Proposal 03 13-02-06 P.MENON - update Sanity check script to prevent any potential problem on the MFS (FR 3BKA13CBR176689) after remarks - add Result of dupatch in B9 with BL22 since MR1 Edx (MFSSAW11E)

    Ed. 10 Proposal 04 16-02-06 P.MENON - update Error at step 5/10 (Isolation) - add System and Tomas (Nectar was the name in a former time) traces

    Ed. 10 Proposal 05 22-02-06 P.MENON - add Wrong httpd.conf

    Ed. 10 Release 23-02-06 Release

    Ed. 11 Release 02-03-06 Release P.MENON - update Sanity check script to prevent any potential problem on the MFS (FR 3BKA13CBR176689) - add JBETI traces - update The trace server stops running after a while - update TRACE_SERVER does not run - add not enough space for Backup MIB - add new GPU switch over no more possible (FR 3BKA20FBR149993 and 3BKA20FBR151855)

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 5/117

    TABLE OF CONTENTS

    1 INTRODUCTION............................................................................................................................. 11 1.1.1 Document organisation ................................................................................................. 11 1.1.2 Presentation .................................................................................................................. 11

    2 GPU................................................................................................................................................. 12 2.1 GPUs disappear from the IMT .................................................................................. 12

    2.1.1 Reference FR: None. .................................................................................................... 12 2.1.2 Problem description ...................................................................................................... 12 2.1.3 Corrective action ........................................................................................................... 12

    2.2 GPU SO Impossible ................................................................................................... 14 2.2.1 Reference FR: 3BKA20FBR108914 ............................................................................. 14 2.2.2 Problem description ...................................................................................................... 14 2.2.3 Corrective action ........................................................................................................... 14

    2.3 GPU reboots continuously ....................................................................................... 14 2.3.1 Reference FR: 3BKA20FBR119782 ............................................................................. 14 2.3.2 Problem description ...................................................................................................... 14 2.3.3 Corrective action ........................................................................................................... 15 2.3.4 Problem solved.............................................................................................................. 15

    2.4 GPU connection problem.......................................................................................... 15 2.4.1 Reference FR: none...................................................................................................... 15 2.4.2 Problem description ...................................................................................................... 15 2.4.3 Corrective action ........................................................................................................... 15

    2.5 GPU problem but alarm is "Failure of a JAETI1 applique".................................... 16 2.5.1 Reference FR: 3BKA13FBR177178 ............................................................................. 16 2.5.2 Problem description ...................................................................................................... 16 2.5.3 Corrective action ........................................................................................................... 16

    2.6 GPU switch over no more possible ......................................................................... 16 2.6.1 Reference FR: 3BKA20FBR149993 and 3BKA20FBR151855 .................................... 16 2.6.2 Problem description ...................................................................................................... 16 2.6.3 Preventive action........................................................................................................... 17 2.6.4 Corrective action ........................................................................................................... 17

    3 INSTALLATION.............................................................................................................................. 18 3.1 Station restart ............................................................................................................ 18

    3.1.1 Reference FR: None. .................................................................................................... 18 3.1.2 Problem description ...................................................................................................... 18 3.1.3 Corrective action ........................................................................................................... 18

    3.2 Impossible to rlogin/telnet to MFS as root.............................................................. 19 3.2.1 Reference FR: None. .................................................................................................... 19 3.2.2 Problem description ...................................................................................................... 19 3.2.3 Corrective action ........................................................................................................... 19

    3.3 Unix boot impossible (wrong default kernel) ......................................................... 20 3.3.1 Reference FR: None. .................................................................................................... 20 3.3.2 Problem description ...................................................................................................... 20 3.3.3 Corrective action ........................................................................................................... 21

    4 MFS BASED ON RC40 .................................................................................................................. 22 4.1 Installation from a not English PC fails................................................................... 23

    4.1.1 Reference FR: FR 3BKA20FBR166358 ....................................................................... 23 4.1.2 Problem description ...................................................................................................... 23 4.1.3 Corrective action ........................................................................................................... 23

    4.2 MFS installation failed............................................................................................... 23 4.2.1 Reference FR: none...................................................................................................... 23 4.2.2 Problem description ...................................................................................................... 23 4.2.3 Corrective action ........................................................................................................... 23

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 6/117

    4.3 Inall procedure stopped due to a station in "halt in" state ................................... 26

    4.3.1 Reference FR: 3BKA13FBR166921 ............................................................................. 26 4.3.2 Problem description ...................................................................................................... 26 4.3.3 Corrective action ........................................................................................................... 26

    4.4 Failure during the SWC from the OMC at step 1/10 (before file transfer)............ 27 4.4.1 Reference FR: none...................................................................................................... 27 4.4.2 Problem description ...................................................................................................... 27 4.4.3 Corrective action ........................................................................................................... 27

    4.5 Unix boot impossible ................................................................................................ 27 4.5.1 Reference FR: None. .................................................................................................... 27 4.5.2 Problem description ...................................................................................................... 27 4.5.3 Corrective action ........................................................................................................... 27

    5 AUTOMATIC SOFTWARE CHANGE ............................................................................................ 28 5.1 Error during execution of ins_swcx.sh ................................................................... 28

    5.1.1 Reference FR: None. .................................................................................................... 28 5.1.2 Problem description ...................................................................................................... 28 5.1.3 Corrective action ........................................................................................................... 28

    5.2 rmdir fails during execution of ins_swcx.sh when cygwin is installed on the PC 29

    5.2.1 Reference FR: 3BKA13FBR163888 ............................................................................. 29 5.2.2 Problem description ...................................................................................................... 29 5.2.3 Corrective action ........................................................................................................... 29

    5.3 Error Temporary local directory error on IMT during step 0 ............................. 29 5.3.1 Reference FR: None. .................................................................................................... 29 5.3.2 Problem description ...................................................................................................... 29 5.3.3 Corrective action ........................................................................................................... 29

    5.4 Error File Access Error" with dlv.bck always appears when doing SW replacement ................................................................................................................................ 29

    5.4.1 Reference FR: 3BKA20FBR150527 ............................................................................. 29 5.4.2 Problem description ...................................................................................................... 29 5.4.3 Corrective action ........................................................................................................... 30

    5.5 Error at step 2/10 (Creation) ..................................................................................... 31 5.5.1 Reference FR: None. .................................................................................................... 31 5.5.2 Problem description ...................................................................................................... 31 5.5.3 Corrective action ........................................................................................................... 31

    5.6 Error at step 3/10 (Verify) .......................................................................................... 32 5.6.1 Reference FR: 3BKA20FBR099035 = 3BKA13FBR102355 ........................................ 32 5.6.2 Problem description ...................................................................................................... 32 5.6.3 Corrective action ........................................................................................................... 32

    5.7 Error at step 5/10 (Isolation) ..................................................................................... 35 5.7.1 Reference FR: 3BK - A13FBR096085 / 105356 / 112480 - A20FBR096035 / 105055 /

    129810 / 139842 - A23FBR174097......................................................................................... 35 5.7.2 Save traces ................................................................................................................... 35 5.7.3 Problem description ...................................................................................................... 36 5.7.4 Specific case for 3BKA20FBR129810 : Problem occurs while Backup Server is down.37 5.7.5 Specific case for 3BKA13FBR175829: broken shared disk ......................................... 37

    5.8 Error at step 6/10 (Major version change)............................................................... 44 5.8.1 Reference FR: 3BKA13FBR107676 ............................................................................. 44 5.8.2 Problem description ...................................................................................................... 44 5.8.3 Check if disks are shared correctly ............................................................................... 44 5.8.4 Corrective action ........................................................................................................... 44

    5.9 Error at step 7/10 (Validation)................................................................................... 45 5.9.1 Reference FR: None. .................................................................................................... 45 5.9.2 Problem description ...................................................................................................... 45 5.9.3 Corrective action ........................................................................................................... 45

    5.10 Control station reboots in loop with reset_code 214 after installation of BL22 . 45 5.10.1 Reference FR: 3BKA13FBR170335 ............................................................................. 45 5.10.2 Problem description ...................................................................................................... 45 5.10.3 Corrective action ........................................................................................................... 46

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 7/117

    5.11 MFS UNIX patch installation makes Control Station unusable (B9 MR1 ED2).... 46

    5.11.1 Reference FR: 3BKA23FBR174370 ............................................................................. 46 5.11.2 Problem description ...................................................................................................... 46 5.11.3 Corrective action ........................................................................................................... 48

    6 MFS RUNNING............................................................................................................................... 49 6.1 The stand-by station is not operational .................................................................. 50

    6.1.1 Reference FR: None. .................................................................................................... 50 6.1.2 Problem description ...................................................................................................... 50 6.1.3 Corrective action ........................................................................................................... 50

    6.2 Station not reachable ................................................................................................ 50 6.2.1 Reference FR: none...................................................................................................... 50 6.2.2 Problem description ...................................................................................................... 50 6.2.3 Corrective action ........................................................................................................... 51 6.2.4 Problem solved.............................................................................................................. 51

    6.3 System console not reachable................................................................................. 51 6.3.1 Reference FR: none...................................................................................................... 51 6.3.2 Problem description ...................................................................................................... 51 6.3.3 Corrective action ........................................................................................................... 51

    6.4 A process is looping ................................................................................................. 52 6.4.1 Reference FR: 3BKA45FBR119174 ............................................................................. 52 6.4.2 Problem description ...................................................................................................... 52 6.4.3 Corrective action ........................................................................................................... 52 6.4.4 impacts .......................................................................................................................... 53

    6.5 Reboots in loop on MFS reset due to bad IP address ........................................... 53 6.5.1 Reference FR: 3BKA20FBR079434 - 3BKA20FBR081233 (close NIP) ...................... 53 6.5.2 Problem description ...................................................................................................... 53 6.5.3 Corrective action ........................................................................................................... 53

    6.6 Reboots in loop due to no more disk space........................................................... 54 6.6.1 Reference FR: None ..................................................................................................... 54 6.6.2 Problem description ...................................................................................................... 54 6.6.3 Corrective action ........................................................................................................... 54

    6.7 OMC-MFS link problem at different interface cases .............................................. 55 6.8 Ethernet connection problem................................................................................... 57

    6.8.1 Reference FR: none...................................................................................................... 57 6.8.2 Problem description ...................................................................................................... 57 6.8.3 Corrective Action........................................................................................................... 57

    6.9 Sleeping cells............................................................................................................. 57 6.9.1 Alerter definition ............................................................................................................ 57

    6.10 DS10 servers dont come up automatically after power off/power on................. 58 6.10.1 Reference FR: 3BKA45FBR17363, 3BKA20FBR135619............................................. 58 6.10.2 Problem description ...................................................................................................... 58 6.10.3 Corrective action ........................................................................................................... 60

    6.11 Failure on Update Remote Inventory....................................................................... 61 6.11.1 Reference FR: none...................................................................................................... 61 6.11.2 Problem description ...................................................................................................... 61 6.11.3 Corrective Action........................................................................................................... 61

    6.12 Connection by ftp from a MFS station to an external server is impossible ........ 61 6.12.1 Reference FR: 3BKA20FBR164817 ............................................................................. 61 6.12.2 Problem description ...................................................................................................... 61 6.12.3 Corrective action ........................................................................................................... 61

    6.13 The trace server stops running after a while.......................................................... 62 6.13.1 Reference FR: 3BKA13FBR169218 ............................................................................. 62 6.13.2 Problem description ...................................................................................................... 62 6.13.3 Corrective action ........................................................................................................... 62

    6.14 TRACE_SERVER does not run................................................................................. 62 6.14.1 Reference FR: 3BKA13FBR164932 ............................................................................. 62 6.14.2 Problem description ...................................................................................................... 62 6.14.3 Corrective action ........................................................................................................... 63

    6.15 GPU traces are not completed ................................................................................. 63

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 8/117

    6.15.1 Reference FR: none...................................................................................................... 63 6.15.2 Problem description ...................................................................................................... 63 6.15.3 Corrective action ........................................................................................................... 63

    6.16 Impossible to load patch GPU B8 on GPUs............................................................ 63 6.16.1 Reference FR: 3BKA13FBR164932 ............................................................................. 63 6.16.2 Problem description ...................................................................................................... 63 6.16.3 Corrective action ........................................................................................................... 63

    6.17 Unix patch installation from OMC stopped due to a network failure................... 63 6.17.1 Reference FR: none...................................................................................................... 63 6.17.2 Problem description ...................................................................................................... 63 6.17.3 Corrective action ........................................................................................................... 64

    6.18 After a roll-back it is impossible to open the IMT terminal ................................... 64 6.18.1 Reference FR: 3BKA20FBR162930 ............................................................................. 64 6.18.2 Problem description ...................................................................................................... 64 6.18.3 Corrective action ........................................................................................................... 64

    6.19 Telnet access from Windows .................................................................................. 64 6.19.1 Reference FR: none...................................................................................................... 64 6.19.2 Problem description ...................................................................................................... 64 6.19.3 Corrective Action........................................................................................................... 64

    6.20 no more available disk space on /usr...................................................................... 65 6.20.1 Reference FR: 3BKA20FBR176683 ............................................................................. 65 6.20.2 Problem description ...................................................................................................... 65 6.20.3 Corrective action ........................................................................................................... 65

    6.21 Wrong httpd.conf....................................................................................................... 66 6.21.1 Reference FR: none...................................................................................................... 66 6.21.2 Problem description ...................................................................................................... 66 6.21.3 Corrective action ........................................................................................................... 66

    6.22 not enough space for Backup MIB .......................................................................... 67 6.22.1 Reference FR: none...................................................................................................... 67 6.22.2 Problem description ...................................................................................................... 67 6.22.3 Corrective action ........................................................................................................... 68

    7 CRASH/TRACES............................................................................................................................ 69 7.1 Determine crash cause ............................................................................................. 69 7.2 Save traces................................................................................................................. 69 7.3 O&M trace................................................................................................................... 70

    7.3.1 SCIM (RTA)................................................................................................................... 70 7.3.2 Q3.................................................................................................................................. 70 7.3.3 RETIX............................................................................................................................ 71 7.3.4 UNIX.............................................................................................................................. 71

    7.4 GPU trace ................................................................................................................... 71 7.4.1 Trace level..................................................................................................................... 71 7.4.2 Which level to activate .................................................................................................. 72 7.4.3 How to modify size of mfs_trace_p_XX file?................................................................. 72

    7.5 JBETI trace ................................................................................................................. 74 7.6 Traces of unix patch installation.............................................................................. 74 7.7 Problems..................................................................................................................... 75

    7.7.1 GPU traces.................................................................................................................... 75 7.7.2 Trace Server.................................................................................................................. 75 7.7.3 Disk quota ..................................................................................................................... 75 7.7.4 mfs_trace_p_XX traces location ................................................................................... 75

    7.8 System and Tomas (Nectar was the name in a former time) traces .................... 76 7.8.1 system traces (if required)............................................................................................. 76 7.8.2 Advfs traces................................................................................................................... 76 7.8.3 TOMAS traces............................................................................................................... 77

    8 VARIOUS INFORMATION ............................................................................................................. 78 8.1 User count creation via IMT on MFS........................................................................ 78

    8.1.1 Reference FR: 3BKA45FBR144680 ............................................................................. 78 8.1.2 Problem description ...................................................................................................... 78

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 9/117

    8.1.3 Corrective action ........................................................................................................... 78

    8.2 Update disk usage information ................................................................................ 78 8.2.1 Problem description ...................................................................................................... 78 8.2.2 Action ............................................................................................................................ 79

    8.3 Shared disks access ................................................................................................. 79 8.3.1 Problem description ...................................................................................................... 79 8.3.2 Action ............................................................................................................................ 79

    8.4 How to get MFS component versions ..................................................................... 81 8.4.1 Problem description ...................................................................................................... 81 8.4.2 Action ............................................................................................................................ 81

    8.5 How to know how many IMT are open at same time ? .......................................... 84 8.5.1 Reference FR: none...................................................................................................... 84 8.5.2 Problem description ...................................................................................................... 84 8.5.3 Corrective action ........................................................................................................... 84

    8.6 How to update time from OMC................................................................................. 85 8.6.1 Reference FR: 3BKA13FBR141970 ............................................................................. 85 8.6.2 Problem description ...................................................................................................... 85 8.6.3 Corrective action ........................................................................................................... 85 8.6.4 Problem solved.............................................................................................................. 85

    8.7 MFS restoration problem .......................................................................................... 86 8.7.1 Problem description ...................................................................................................... 86 8.7.2 Corrections description ................................................................................................. 86

    8.8 Check if backup Mib is corrupted ............................................................................ 86 8.8.1 Reference FR................................................................................................................ 86 8.8.2 Problem description ...................................................................................................... 86 8.8.3 Correction description ................................................................................................... 87

    8.9 Reinstallation of the MFS and restauration of data from OMC............................. 87 8.9.1 Reference FR: 3BKA13CBR177682............................................................................. 87 8.9.2 Problem description ...................................................................................................... 87 8.9.3 Correction description ................................................................................................... 88

    8.10 How to get contents of Unix patch BL How to get contents of Unix patch BL... 88 8.10.1 Problem description ...................................................................................................... 88 8.10.2 Action ............................................................................................................................ 88

    8.11 How to restore the MIB without needing full reinstallation .................................. 91 8.11.1 Reference FR: 3BKA13CBR177682............................................................................. 91 8.11.2 Problem description ...................................................................................................... 91 8.11.3 Correction description ................................................................................................... 91

    8.12 Sanity check script to prevent any potential problem on the MFS ...................... 92 8.12.1 Reference FR: 3BKA13CBR176689............................................................................. 92 8.12.2 Return codes explanation ............................................................................................. 92 8.12.3 Corrective action ........................................................................................................... 94 8.12.4 Example on AS800 (based on Tomas RC23)............................................................... 98 8.12.5 Example on DS10 (based on Tomas RC23)............................................................... 102 8.12.6 Example on DS10 (based on Tomas RC40)............................................................... 105

    9 GLOSSARY AND ABBREVIATIONS .......................................................................................... 109 A HW SETTINGS OF ENVIRONMENTAL VARIABLES (FW)........................................................ 110

    INTERNAL REFERENCED DOCUMENTS

    Not applicable

    REFERENCED DOCUMENTS [ 1 ] MFS B9 installation user guide, reference 3BK 09679 JAAA RJZZA

    [ 2 ] EVOLIUM A9135 MFS MAINTENANCE HANDBOOK, reference 3BK 20935 AAAA PCZZA

    [ 3 ] B8/B9 A9135 MFS SOFTWARE MIGRATION Release B9, reference 3BK 17422 0202 RJZZA

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 10/117

    RELATED DOCUMENTS

    PMU logging messages description and principles release B6.2 3BK 09850 FCAD PWZZA

    OPEN POINTS / RESTRICTIONS

    no open point and no restriction have been found

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 11/117

    1 INTRODUCTION

    1.1.1 Document organisation

    This document is organized the following way:

    1) This chapter

    2) Troubles coming from GPU, with, most of the time a Quality Alert attached

    3) Troubles coming at installation time

    4) Troubles coming at SW change time, depending on the SWC phase

    5) Troubles happening when MFS is started

    6) What to do in case of crash, which information to be kept?

    7) How to set and to get traces

    8) Information: general information, as disk usage,

    Plus an appendix for specific information

    A) IOLAN configuration

    B) HW setting of environmental variables

    1.1.2 Presentation

    Each chapter are introduced with a table summarising the addressed problems, origin and fix.

    Very few chapters can be shown to the customer. They are highlighted in green.

    Commands are presented in grey rectangle

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 12/117

    2 GPU

    What/behavior Trouble origin Fix

    1) GPUs disappear from IMT 1 or more GPU with bad components

    Change GPU

    2) Impossible GPU switch over JAE1 applique mistake Change JAE1 3) GPU reboots continuously GPU FW mistake Change the GPU 4) GPU connection problem Connection, ethernet Check Ethernet,

    Extract and re-plug the board 5) GPU problem but alarm is

    "Failure of a JAETI1 applique" Faulty GPU Change faulty GPU

    6) GPU switch over no more possible

    JBETI becomes blocked reset the active JBETI

    2.1 GPUs disappear from the IMT

    2.1.1 Reference FR: None.

    2.1.2 Problem description

    One or more (up to all) GPU in a subrack disappear from time to time on the Craft terminal (IMT), like they have been unpluged.

    The GSM and GPRS remains available, but its impossible to perform any remote action to these GPU (download or modify the configuration, switch over, reset data, lock ).

    A reset hardware (=> outage telecom GSM + GPRS) solve the problem for a short time (< 1 day).

    2.1.3 Corrective action

    At least one GPU in the subrack can have bad hardware components.

    All GPU of the subrack must be checked.

    To check one GPU, unplug it (=> outage telecom GSM + GPRS).

    Then compare the 5 components references like on the following pictures:

    For these 5 components (XXX):

    FB2041 is the good reference

    FBL2041 is a wrong reference !

    Bad component must be changed.

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 13/117

    JBGPU

    XXXXXXXXX

    JBGPU

    XXX

    XXX

    -

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 14/117

    2.2 GPU SO Impossible

    2.2.1 Reference FR: 3BKA20FBR108914

    2.2.2 Problem description

    Switch-over of one GPU (by Craft Terminal) on the spare GPU is not possible : the spare GPU begins to load its telecom configuration, but, some seconds after, the board is blocked and alarms On free run mode appears. The traffic is stopped.

    When a switch-back of the GPU board is done, the traffic comes back, and everything is normal.

    To confirm the problem, switch some GPU and some applique.

    The problem is due to a difference between 2 variants of the appliques for the technology of the redundancy bus transceivers: The AxABxx version is equipped with component FB2041BB (running with VCC= 5V ), and AxAAxx version is equipped with FBL2041BB (running with VCC=3.3 V).

    An hardware correction under study for 3BK08231AxABxx pcm appliques.

    2.2.3 Corrective action

    It has been demonstrated that the pcm applique with the reference number 3BK08231AxABxx causes the problem. PCM applique 3BK08231AxAAxx, must be fully operational.

    JAE1C boards (75 PCM) : Check the pcm applique reference :

    3BK08231ABAAxx : good board

    3BK08231ABABxx : faulty board : Change the board by a good JAE1C

    JAE1 boards (120 PCM) : Check the pcm applique reference :

    3BK08231AAAAxx : good board

    3BK08231AAABxx : faulty board : Change the board by a good JAE1

    2.3 GPU reboots continuously

    2.3.1 Reference FR: 3BKA20FBR119782

    2.3.2 Problem description The GPU reboots Continuously after configuration completed and board unlocked with GPU. After GPRS has been configured and the GPU and GPRS unlocked, it reboots continuously. When a switchover is performed, the same problem occurs. In internal GPU traces (file mfs_trace_p_XX), the following traces indicate there is a failure in PMU package initialisations:

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 15/117

    DATA_ERR : T: 200 : rrmswcomp.cpp : 160 : Cell Traffic Package init failure... DATA_ERR : T: 200 : rrmswcomp.cpp : 172 : Bss Management Package init failure...

    Then, check if the GPU reference (on the front side of the board) is GPU 3BK08064ABAC01

    2.3.3 Corrective action

    If the GPU reference is GPU 3BK08064ABAC01, and if the behavior is as described above, then contact the Local TAC, who has to change the GPU and to send the faulty GPU to Alcatel Repair Center, where a fix will be applied.

    The problem is due to a bad detection of the remote inventory by the firmware of the GPU: the firmware checks in the remote inventory the combination of functional variant (VF), realization variant (VR) ABAA. This is a bug, it should check (VF) AB field only and not care about (VR) AC field. As ABAA is not found, the GPU board is not detected as JBGPU2 ( with 128 MB of PPC memory ), but by default as a JBGPU ( with 64 MB of PPC memory). It explains that some PMU packages can not initialize their memory allocation.

    2.3.4 Problem solved

    Hardware correction under study.

    2.4 GPU connection problem

    2.4.1 Reference FR: none

    2.4.2 Problem description

    GPU stays initial/idle (craft site view) and does not connect to the MFS. The led can be either fixed or blinked orange.

    2.4.3 Corrective action

    1. Check that at least one Ethernet link is plugged for that GPU in one of the switch.

    2. Launch a Console on that GPU: plug a cable between the debug output of the applique and a COM port. (CTRL uu to enter GPU menu). Type help to list the available command. ve /vi display MAC / IP addresses.

    3. If the GPU initialization is stopped at boot request (the GPU does not know its IP address) ! there is no connection between GPU and control station. Check that UDP packets corresponding to boot request are actually sent through one of the interface (tu1 or tu2):

    Set-up the tcpdump on the net: cd /dev ./MAKEDEV pfilt pfconfig +p +c tu1 tcpdump i tu1 udp port 68 (if necessary : lan_config I tu1 s 10 x 0 a 0 # Set output to 10 Mega )

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 16/117

    Packet sent through port 68 are bootpc (client = GPU) ones. Port 67 packets are bootps (server = control station) answer.

    Check the file /etc/bootptab : It should have a line giving the board IP address according to the Ethernet address : gpu1_lg0: tc=DS.default: ha=00809F090804: ip=1.1.1.50: bf=Loader.hex:\

    ha is the Ethernet address, check with the console that the GPU gives the right address.

    4. If the GPU initialisation is stopped at BNP init (On GPU console, the following messages is printed: Wait for answer from GEM since x seconds

    ! there is a communication between the GPU and the control station (it is not an Ethernet problem). It is a known bug (see FR:A13/90904). Workaround: extract the board and plug it again. (This may be done several times)

    2.5 GPU problem but alarm is "Failure of a JAETI1 applique"

    2.5.1 Reference FR: 3BKA13FBR177178

    2.5.2 Problem description

    The origin of this issue seems to be real HW problem (faulty GPU) but the alarm is reported on the wrong board. (problem occured in B8 MR5 Ed4)

    The GPU's part number is 3BK08064ACAB06 and it is not impacted by known Quality Alerters

    2.5.3 Corrective action

    Unplug the problematic GPU and after reset the JBET1 either on left or right handside.

    2.6 GPU switch over no more possible

    2.6.1 Reference FR: 3BKA20FBR149993 and 3BKA20FBR151855

    2.6.2 Problem description

    Sometimes the JBETI becomes blocked, so that it won't treat any request ( Remote inventory, Gpu reset, Gpu switchover ), and alarm are not cleared neither raised, while alls led on the JBETI are green: a switchover is done on spare GPU but no telecom traffic possible. we can fall in this situation for the following reasons - after a GPU crash: On the GPU software crash, O&M detect the loss of supervion of this GPU board and send a reset order to this GPU through the JBETI, but as JBETI is blocked the GPU won't reset/reboot.

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 17/117

    Then after a while ( about 3 minutes ), as O&M don't see the GPU rebooting ( it conclude that the GPU is failling) , so O&M send a GPU switchover order (to route the PCM signal from applique to the spare GPU ), and send Telecom configuration to the spare GPU. but as the JBETI is blocked, it will not treat the GPU switchover order, leading to not traffic possible on spare GPU. - after a manual GPU switchover command sent from IMT O&M send a GPU switchover order (to route the PCM signal from applique to the spare GPU ), and send Telecom configuration to the spare GPU. but as the JBETI is blocked, it will not treat the GPU switchover order, leading to not traffic possible on spare GPU.

    To confirm that the JBETI is blocked :

    a remote inventory command from IMT will fail in time-out

    2.6.3 Preventive action

    None

    2.6.4 Corrective action when JBETI is suspected as blocked, reset the active JBETI

    2.6.4.1 Problem solved

    This pb should be corrected in B9 MR3

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 18/117

    3 INSTALLATION

    What/behavior Trouble origin Fix

    1) Both stations restart Wrong address declaration Modify address 2) rlogin is refused Impossible rlogin/telnet as root Modify securettys file 3) Unix boot impossible Wrong default kernel Modify boot_file variable in

    Firmware

    3.1 Station restart

    3.1.1 Reference FR: None.

    3.1.2 Problem description

    In some cases, when trying to restart one station, both of them restart, this may be due to the fact that they are declared to a wrong address.

    3.1.3 Corrective action

    Check (and if necessary modify) the firmware software configuration of the control stations, which can be accessed through the system console.

    1. At any terminal accessing one of the control stations (STATION_A or STATION_B, by telnet or rlogin), it is possible to access the system console of any control station by typing either:

    STATION_x> telnet 1.1.1.20 10002

    (for STATION_A system console) STATION_x> telnet 1.1.1.20 10003

    (for STATION_B system console)

    2. Type some to get the prompt ; then :

    1) The UNIX login or the shell prompt is displayed : login root if necessary, then halt the station gently under the firmware by typing the following command :

    STATION_x> init 0

    When the firmware prompt is displayed ( >>> ), go to step 6).

    2) The machine doesnt react and the display is still: force the machine to stop by typing the keystroke sequence:

    rmc

    3) Then, when the RMC prompt is available : RMC>halt in

    4) Then again:

    rmc

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 19/117

    5) Then when the RMC prompt is available : RMC>halt out

    Then the firmware prompt should be available.

    6) Type the following command >>>show *

    (give all firmware variables values)

    (Refer to Appendix B for a list of currently advised values depending on the hardware configuration)

    If values are erronous, especially pka0_host_id, pkb0_host_id, pkc0_host_id and auto_action, modify them. For example : >>>set pkc0_host_id 6

    When all checks and modifications are done, do the following : >>>init

    The machine should now reboot automatically .

    Now release the system console (as it is used from time to time by NECTAR Hardware Management) :

    - On Sun station, by typing :

    ]

    (simultaneously control and closing square bracket) then : telnet>quit

    - If the console is accessed from the other station through a PC/NT X terminal, close simply the window.

    (Another method to release the system console is to restart the iolan (see other chapter) from another session).

    3.2 Impossible to rlogin/telnet to MFS as root

    3.2.1 Reference FR: None.

    3.2.2 Problem description

    When trying to rlogin to the CS root, action is refused by the control station (access denied)

    3.2.3 Corrective action

    The file /etc/securettys is not good : it should include a line ptys to enable to be root from another terminal.

    Login as admin on one of the control stations.

    telnet 1.1.1.20 10002 /10003 to gain access to the system console (see 3.1.3 for more details)

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 20/117

    login as root

    type echo ptys >> /etc/securettys

    This adds the line ptys in securettys

    perform the same action on the other control station

    release the terminal or (in case of problem) reboot the iolan (telnet 1.1.1.20, return, su, iolan, reboot)

    3.3 Unix boot impossible (wrong default kernel)

    3.3.1 Reference FR: None.

    3.3.2 Problem description

    Unix cant boot because it cant open the default kernel 'vmunix.pre_capmn':

    You should have the following at the console:

    ff.fe.fd.fc.fb.fa.f9.f8.f7.f6.f5.f3.f2.f1.f0.ef.df.ee.f4. probing hose 0, PCI probing PCI-to-EISA bridge, bus 1 probing PCI-to-PCI bridge, bus 2 bus 0, slot 5 -- pka -- QLogic ISP10x0 bus 0, slot 6 -- vga -- S3 Trio64/Trio32 bus 2, slot 0 -- ewa -- DE500-BA Network Controller bus 2, slot 1 -- ewb -- DE500-BA Network Controller bus 2, slot 2 -- ewc -- DE500-BA Network Controller bus 2, slot 3 -- ewd -- DE500-BA Network Controller bus 0, slot 12, function 0 -- pkb -- NCR 53C875 bus 0, slot 12, function 1 -- pkc -- NCR 53C875 ed.ec.*** keyboard not plugged in... eb.....ea.e9.e8.e7.e6.e5.e4.e3.e2.e1.e0. V5.8-24, built on Jul 11 2001 at 10:57:51 Memory Testing and Configuration Status 512 Meg of System Memory Bank 0 = 512 Mbytes(128 MB Per DIMM) Starting at 0x00000000 Bank 1 = No Memory Detected

    CPU 0 booting

    waiting for pkb0.6.0.12.0 to poll... (boot dka0.0.0.5.0 -file vmunix.pre_capmn -flags S) block 0 of dka0.0.0.5.0 is a valid boot block reading 16 blocks from dka0.0.0.5.0 bootstrap code read in Building FRU table FRU table size = 0xbed base = 1d2000, image_start = 0, image_bytes = 2000 initializing HWRPB at 2000 initializing page table at 1ffce000 initializing machine state

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 21/117

    setting affinity to the primary CPU jumping to bootstrap code

    Digital UNIX boot - Mon Nov 1 17:21:23 EST 1999

    can't open vmunix.pre_capmn

    Enter [option_1 ... option_n] Hit to boot default kernel 'vmunix.pre_capmn':

    This is due to a wrong value of the boot_file variable at Firmware level:

    >>>show boot*_file boot_file vmunix.pre_capmn booted_file vmunix.pre_capmn

    3.3.3 Corrective action

    Modify the boot_file variable:

    >>>set boot_file vmunix

    Verify the boot_file variable:

    >>>show boot_file boot_file vmunix

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 22/117

    4 MFS BASED ON RC40

    What/behavior Trouble origin Fix

    4) Installation fails during ftp phase

    expect does not recognize not English words

    Use a PC configured in English

    5) Installation stops during the inconf phase

    Incorrect additional disk configuration

    Clear disk label

    6) Inall procedure is stopped in inpatch step

    Halt Button is IN, BOOT NOT POSSIBLE"

    under ">>>>" type "boot" then from the PC, in the Expect session, type "inall"

    7) Popup window pearl failed at step 1 of SWC

    /tmp partition is full Free disk space

    8) Vmunix file access impossible Impossible UNIX boot With UNIX CD

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 23/117

    4.1 Installation from a not English PC fails

    4.1.1 Reference FR: FR 3BKA20FBR166358

    4.1.2 Problem description The phrases in Portuguese are given by the installation PC during the ftp. They are not recognized by the script expect that is the program to recognize the standard words of ftp in English and in French only.

    4.1.3 Corrective action

    Use a PC configured in English

    4.2 MFS installation failed

    Incorrect additional disk configuration : HP must provided additional disk without any OS already installed. The internal additional disk must be: not formatted no partionning no labelling.

    This requirement is mandatory for the first factory installation.

    4.2.1 Reference FR: none

    4.2.2 Problem description

    The automatic MFS RC40 installation stops during the inconf phase. This problem can be pointed out by reading the inconf_STATION_A.log ( or inconf_STATION_B.log ) log file in the PC used for the installtion in the following directory /expect/bin/log.

    In the problem occurs then the following sequence of lines appear in the log file :

    Error: partition /dev/nfm/vol0a and overlapping partition(s) are marked in use in the disklabel. Use "disklabel -e" to fix the disklabel if it is improperly labeled. start Actif FMA retcode -1 errno 0

    Jun 8 21:16:20 STATION_A FM_Agent_stdalone[63148]: start_active: mount /omcxchg failed ret 256

    4.2.3 Corrective action

    Here is the way to fix the problem.

    A) Check the second disk is formatted with UNIX BSD4.2 by using the command :

    disklabel dsk1

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 24/117

    B) You should have information like :

    # /dev/rdisk/dsk1c:

    type: EIDE

    disk: 6E040L0

    label:

    flags: dynamic_geometry

    bytes/sector: 512

    0 ( 24, 41) FDX

    sectors/track: 63

    tracks/cylinder: 16

    sectors/cylinder: 1008

    cylinders: 16383

    sectors/unit: 78165360

    rpm: 4500

    interleave: 1

    trackskew: 0

    cylinderskew: 0

    headswitch: 0 # milliseconds

    track-to-track seek: 0 # milliseconds

    drivedata: 0

    8 partitions:

    # size offset fstype fsize bsize cpg # ~Cyl values

    a: 131072 0 unused 0 0 # 0 - 130*

    b: 262144 131072 unused 0 0 # 130*- 390*

    c: 78165360 0 4.2BSD 1024 8192 16 # 0 - 77544

    d: 0 0 unused 0 0 # 0 - 0

    e: 0 0 unused 0 0 # 0 - 0

    f: 0 0 unused 0 0 # 0 - 0

    g: 38886072 393216 unused 0 0 # 390*- 38967*

    h: 38886072 39279288 unused 0 0 # 38967*- 77544

    You can see that the BSD 4.2 is present

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 25/117

    C) Clear the disk label by using the command :

    disklabel -z dsk1

    D) Set the standard label name dsk1 : disklabel -wr dsk1

    E) Re-check the result :

    disklabel dsk1You should have information like :

    /dev/rdisk/dsk1c:

    type: EIDE

    disk: 6E040L0

    label:

    flags: dynamic_geometry

    bytes/sector: 512

    0 ( 24, 41) FDX

    sectors/track: 63

    tracks/cylinder: 16

    sectors/cylinder: 1008

    cylinders: 16383

    sectors/unit: 78165360

    rpm: 4500

    interleave: 1

    trackskew: 0

    cylinderskew: 0

    headswitch: 0 # milliseconds

    track-to-track seek: 0 # milliseconds

    drivedata: 0

    8 partitions:

    # size offset fstype fsize bsize cpg # ~Cyl values

    a: 131072 0 unused 0 0 # 0 - 130*

    b: 262144 131072 unused 0 0 # 130*- 390*

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 26/117

    c: 78165360 0 unused 1024 8192 16 # 0 - 77544

    d: 0 0 unused 0 0 # 0 - 0

    e: 0 0 unused 0 0 # 0 - 0

    f: 0 0 unused 0 0 # 0 - 0

    g: 38886072 393216 unused 0 0 # 390*- 38967*

    h: 38886072 39279288 unused 0 0 # 38967*- 77544

    Now the partition c is unused.

    F) Restart the installation from the beginning by typing on the PC used for the installation the following command :

    Open a DOS session and type : cd C:\expect\bin tclsh80 clear tclsh80 inall

    4.3 Inall procedure stopped due to a station in "halt in" state

    Inall procedure is stopped on station A in inpatch step.

    4.3.1 Reference FR: 3BKA13FBR166921

    4.3.2 Problem description

    Inpatch step proceeds to a boot of the station. This boot is refused with the following message displayed on screen (also in log file inpatchSTATIONA.log) ">>>boot Halt Button is IN, BOOT NOT POSSIBLE".

    4.3.3 Corrective action

    In order to continue the installation the following has been applied successfully: - log on station A by Iolan - under ">>>>" prompt, type "boot" - when station at UNIX level ("login:" prompt is displayed), then from the PC, in the Expect session, type "inall"

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 27/117

    4.4 Failure during the SWC from the OMC at step 1/10 (before file transfer)

    4.4.1 Reference FR: none

    4.4.2 Problem description

    Sometimes, the /tmp partition is full and the migration is stopped by errors displayed in a popup window with the following message on the IMT: "perl failed"

    On the OMC:

    - the log file (/alcatel/var/home/axadmin/alcatel/debug/s_Thu May 13 16:24:10 CEST 2004.out, the folowing message should be appear:

    SCGui - NewFTPSoftChange () - IOException raised: java.io.IOException: Not enough space

    - In /var/adm/messages the following message should be appear:

    On May 13 18:03:02 omcr08 unix: WARNING: /tmp: File system full, swap space limit exceeded

    4.4.3 Corrective action

    On OMC, login as root and check the available disk space, especially in /tmp and /alcatel partition by using the 'df -k' command and do a cleanup if needed.

    4.5 Unix boot impossible

    4.5.1 Reference FR: None.

    4.5.2 Problem description

    Unix cant boot because it cant access to vmunix file

    4.5.3 Corrective action

    Following actions have to be performed:

    insert the Unix 4.0F CDROM

    (warning : do not use an MFS+UNIX INSTALL CDROM which reformats the disks automatically)

    boot dka400 (AS800) boot dqb0 (DS10) cd /dev ./MAKEDEV rz0 cd /etc/fdmns touch .adfslock_root_domain mkdir root_domain cd root_domain ln s /dev/rz0a . cd / mount root_domain#root /mnt

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 28/117

    5 AUTOMATIC SOFTWARE CHANGE

    Note: Pre requisit for SWC are described in Installation user guide, reference [1].

    What/behavior Trouble origin Fix

    1) Bad exec of ins_swcx.sh Files created by wrong owner Clean up and restart 2) rmdir fails during execution

    of ins_swcx.sh cygwin is installed on the PC Rename /usr/bin/rmdir.exe

    3) Error Temporary local directory error at starting time

    Bad login Log as admin and restart

    4) Step 2: File Access Error for the /DELIV/dlv.bck file

    one /nfs partition not seen on standby CS

    Relauch standby CS with BUI command restart

    5) Step 2: Creation Root file system (/) of active CS is full

    Free disk space

    6) Step 3: Verify Various origin Check 4.3 chapter 7) Step 5: Isolation Various origin Check 4.4 chapter 8) Step 6: Major version

    change New active CS reboots Check Shared disk state

    9) Step 7: strange IMT display Old version not deleted Delete old version 10) CS reboots in loop with

    reset-code 214 /etc/sysconfigtab corrupted Add missing lines

    5.1 Error during execution of ins_swcx.sh

    5.1.1 Reference FR: None.

    5.1.2 Problem description

    A previous software was performed with a bad userid : that creates files owned by a wrong userid and prevents file creation by the automatic software change.

    All the software change from the OMC must be performed with username = axadmin for OMC and admin for IMT, otherwise there can be error during SW change.

    5.1.3 Corrective action

    Remove (logged as root on OMC) the following files and directories if existing:

    /var/tmp/cw323mt.dll /var/tmp/indus_ngp_del_desc_file.pl /var/tmp/install.pl /var/tmp/paexr.exe /var/tmp/perl.dll /var/tmp/perl.exe /alcatel/var/home/axadmin/alcatel/tmp_mfs (directory => rm rf ) /alcatel/tmp_mfs (directory => rm rf)

    Then, login as axadmin on OMC and perform again the preinstallation (ins_swcx.sh).

    Thereafter, reopen the IMT with username = admin .

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 29/117

    5.2 rmdir fails during execution of ins_swcx.sh when cygwin is installed on the PC

    5.2.1 Reference FR: 3BKA13FBR163888

    5.2.2 Problem description

    Rmdir fails with the following error:

    rmdir: option invalide -- q Pour en savoir davantage, faites: `rmdir --help'.

    5.2.3 Corrective action

    rename /usr/bin/rmdir.exe to /usr/bin/rmdir.exe.sav launch again ins_swcx.sh

    5.3 Error Temporary local directory error on IMT during step 0

    5.3.1 Reference FR: None.

    5.3.2 Problem description

    When trying to start automatic software change, an error happens

    5.3.3 Corrective action

    Log as admin user on IMT and OMC-R in order to perform automatic software change.

    5.4 Error File Access Error" with dlv.bck always appears when doing SW replacement

    5.4.1 Reference FR: 3BKA20FBR150527

    5.4.2 Problem description

    When performing a SW Replacement, the step 1/10 of the procedure completes, but when it is in step 2/10, there is an error message in the IMT "File Access Error" for the file /DELIV/dlv.bck

    Problem comes that one /nfs partition is not seen on the stanby station, so that /DELIV can not be seen on both stations.

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 30/117

    5.4.3 Corrective action

    Connect on Standby station and type :

    df -k

    You must see the following result concerning xxx.nfs partitions :

    secure_serveur.100:/var/nse/mnt/secure_serveur/RESERVED 102400 33 97736 1% /var/nse/mnt/secure_serveur/RESERVED.nfs

    secure_serveur.100:/var/nse/mnt/secure_serveur/BACKUP 102400 16 95008 1% /var/nse/mnt/secure_serveur/BACKUP.nfs

    secure_serveur.100:/var/nse/mnt/secure_serveur/DELIV 512000 101921 403624 21% /var/nse/mnt/secure_serveur/DELIV.nfs

    secure_serveur.100:/var/nse/mnt/secure_serveur/RESULT 307200 5372 295144 2% /var/nse/mnt/secure_serveur/RESULT.nfs

    secure_serveur.100:/var/nse/mnt/secure_serveur/omcxchg 102400 585 95680 1% /var/nse/mnt/secure_serveur/omcxchg.nfs

    secure_serveur.100:/var/nse/mnt/secure_serveur/spdata 65536 7434 52232 13% /var/nse/mnt/secure_serveur/spdata.nfs

    If you have not these elements in the output, you must relaunch the standby station.

    At the IMT, on the BUI->request window, you must type the following command

    if the standby station is STATION_A : action sta [PILOT/A] (restart());

    if the standby station is STATION_B : action sta [PILOT/B] (restart());

    Then, check on Nectar view that the standby station has come up.

    Do a roll-back until step1 and try the SW replacement again.

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 31/117

    5.5 Error at step 2/10 (Creation)

    5.5.1 Reference FR: None.

    5.5.2 Problem description

    During the software replacement the phase CREATION is stopped by errors displayed in a popup window with the following:

    Generic error enca_ope_failed ensw file check error PILOT/A - The file copy from delivery fileset to target fileset failed : fsync error

    that means the root file system ( / ) of the active station should be full (100%).

    5.5.3 Corrective action

    First open a xterm, and type df k . Check that /usr and /var directories are not full (less than 85% used). If its not the case, go to

    /usr/mfs/log => remove all big trace files (*.old, and TraceGOM if exist)

    /var/adm/crash => remove vmunix & vmcore files (\rm vm*)

    /var/adm/nectar/crash => remove all Dump file & Core files (\rm core*, \rm Dump*)

    Then check again the space left with df k command. Do not begin the migration in case the space left is too small.

    See following directories:

    /var/adm/nectar/log

    /usr/mfs/log

    /var/adm/nectar/crash

    /RESULT

    Perform also the quotacheck command to report the discrepancies between the calculated and recorded disk quota:

    On active Control Station:

    quotacheck -v /var quotacheck -v /usr quotacheck -v / quotacheck -v /DELIV quotacheck -v /spdata quotacheck -v /omcxchg quotacheck -v /RESULT

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 32/117

    On standby Control Station:

    quotacheck -v /var quotacheck -v /usr quotacheck -v /

    5.6 Error at step 3/10 (Verify)

    5.6.1 Reference FR: 3BKA20FBR099035 = 3BKA13FBR102355

    5.6.2 Problem description

    The IMT pops up an alert window with the following text: Error occur , see log file.

    This means that the software replacement is stopped due to errors found at the VERIFY phase.

    5.6.3 Corrective action

    Open the BUI reception view on IMT to see more details.

    Only four current cases are described below:

    5.6.3.1 Many errors found

    The best to do in this case is to remove and destroy the version by clicking several times on back button on IMT and perform again the automatic software change. The installation was probably badly performed.

    5.6.3.2 bad state error

    Example : > --- Software management error --- > Failed on request: action version[MFSSAT05_06A](verify()); > Message for request #63 => ACTION_RSP version [MFSSAT05_06A] ( verify(), /* Errors : ***************/ generic_err = ENCA_MAJOR_ERROR : A major error occurred during the action ..., specific_err = ENSW_CHECKSUM_ERROR : component checksum error, text_err = "PILOT/A - /usr/mfs/bin/mfsQ3Agt" ) ;

    > _____ Abortive session for request #63 => ACTION_RSP version [MFSSAT05_06A] ( verify(), /* Errors : ***************/ generic_err = ENCA_OPE_FAILED : the operation cannot be executed, specific_err = ENCM_PF_VERSION_BAD_STATE : The specified version is in a bad state for this request, text_err = "PILOT/A - /usr/mfs/bin/mfsQ3Agt" ) ; > --- Software management error end ------

    Rollback to the step two of the Software change. Perform again a software change

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 33/117

    5.6.3.3 Checksum errors on the MIB files

    These files are located into the /spdata directory.

    This procedure is to be used only with MIB files (i. e. files located in /spdata directory) , as the final purpose is to get rid of MIB checksum.

    When platforms will be installed with version MFSSAT05.05L and further).

    Example:

    Message for request #3 => ACTION_RSP version [MFSSAT05_05L] ( verify(),

    /* Errors : ***************/ generic_err = ENCA_MAJOR_ERROR : A major error occurred during the action ..., specific_err = ENSW_CHECKSUM_ERROR : component checksum error, text_err = "PILOT/B - /spdata/OAMMFSB701AT0511A/nom_mib/CM_cfg.mib checksum is f775f6be3ae87e18 " ) ;

    Consider that the VALID version (the current running version) is MFSSAT05_05J

    1) Open a terminal, login root, on the STATION_B (corresponding to PILOT/B)

    2) Find the version descriptor of the valid version: cd /usr/opt/MFSSAT05_05J

    3) Edit the version descriptor (you can save a copy of the original file): vi vdesc.mfs

    4) Locate the erroneous component in the descriptor:

    /CM_cfg.mib

    5) Take a look at the "record" (closed by a ";") dedicated to the component: /* Component record for "/spdata/nom_mib/CM_cfg.mib" */ /* category */ OFF_SITE, /* targetName */ "/spdata/OAMMFSB701AT0510A/nom_mib/CM_cfg.mib", /* dispatchingMode */ LOCAL, /* identification "/main/mfsb7/mfsb7.01/8" */ , /* checksum */ "6f8b7c9cab54fedf", /* adminName */ , /* writingMode */ NOT_ATOMIC, /* appliCategory */ , /* usageName */ "/spdata/nom_mib/CM_cfg.mib" ;

    You can check that the checsum value in the checksum field does not match with the one computed by Software Management

    (the value given in the error message)

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 34/117

    6) Remove the checksum value in the checksum field, here the string: "6f8b7c9cab54fedf" so that the record looks now like the following (do not remove any trailing existing ^M): /* Component record for "/spdata/nom_mib/CM_cfg.mib" */ /* category */ OFF_SITE, /* targetName */ "/spdata/OAMMFSB701AT0510A/nom_mib/CM_cfg.mib", /* dispatchingMode */ LOCAL, /* identification "/main/mfsb7/mfsb7.01/8" */ , /* checksum */ , /* adminName */ , /* writingMode */ NOT_ATOMIC, /* appliCategory */ , /* usageName */ "/spdata/nom_mib/CM_cfg.mib" ;

    7) Save and exit the file:

    :x

    Note that the modifications are to be performed for every concerned component and every concerned station

    8) Click two times on the BACK button of the version change follow-up window until the version is removed.

    9) Restart the version change from the IMT first step:

    1. Click on menu bar option "software change", list option "version change" A version change window appears displaying:

    "do you want to install MFSSAT05.05L version ?"

    2. click Yes.

    3. Follow then the standard version change procedure.

    5.6.3.4 Checksum errors on local files Example : Message for request #3 => ACTION_RSP version [MFSSAT05_05L] ( verify(),

    /* Errors : ***************/ generic_err = ENCA_MAJOR_ERROR : A major error occurred during the action ...,

    specific_err = ENSW_CHECKSUM_ERROR : component checksum error, text_err = "PILOT/A - /usr/opt/OAM31A/mfs/bin/mfsQ3Agt checksum is 5a03d81363ec510a " ) ; Abortive session for request #3 => ACTION_RSP version [MFSSAT05_05L] ( verify(),

    /* Errors : ***************/ generic_err = ENCA_OPE_FAILED : OperationFailed, specific_err = ENSW_CHECKSUM_ERROR : component checksum error, text_err = "PILOT/A - /usr/opt/OAM31A/mfs/bin/mfsQ3Agt checksum is 5a03d81363ec510a " ) ;

    In this example, there is a checksum error on the Q3 agent binary on STATION_A. It is then possible to restore the good binary from STATION_B to STATION_A.

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 35/117

    It is also possible to get it from the /DELIV partition where it has previously been downloaded by the software change process.

    Find it using (for example):

    find /DELIV name mfsQ3Agt

    To know where to replace the erroneous file of the new version, type the following commands:

    cd /usr/opt/new_version_name grep mfsQ3Agt vdesc.mfs

    On a line beginning with the keyword targetName like this:/* targetName */ , the searched file (here mfsQ3Agt) appears at the leaf of an absolute path: this is the absolute path of the location where to replace the current file by the new one. Hereafter is a sample line:

    /* targetName */ "/usr/opt/OAMMFSB701AT0513A/mfs/bin/mfsQ3Agt",

    5.7 Error at step 5/10 (Isolation)

    5.7.1 Reference FR: 3BK - A13FBR096085 / 105356 / 112480 - A20FBR096035 / 105055 / 129810 / 139842 - A23FBR174097

    5.7.2 Save traces

    On both stations save all userfiles,nse logs and crash files relative to this problem (refer to chapter 7.8) for analyse if required (clean previous crash files if needed)

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 36/117

    5.7.3 Problem description

    IMT displays the following error message : see logs for more details . On IMT click on BUI->ReceptionView

    A window opens displaying reason of abortion of command which can be :

    Generic error: ENCA_OPE_FAILED : the operation cannot be executed Specific error: ENCM_PF_FM_SPLIT : Split of the shared mirrored disks refused

    The ISOLATE phase cannot be performed because of a bad state of the shared disk.

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 37/117

    5.7.4 Specific case for 3BKA20FBR129810 : Problem occurs while Backup Server is down.

    While receiving the error logs in the BUI reception view described in the section Problem description, type this string on the master Control Station :

    ps -ef | grep mfs

    (IMT shows the master Control Station : Connected to: STATION_X, Station X is master)

    The results should be : root 749 1 0.0 15:18:07 ?? 0:00.02 /usr/mfs/bin/DUMP_SRV root 949 1 0.2 15:18:22 ?? 1:53.37 sh /usr/mfs/bin/start_http start root 976 1 0.0 15:18:24 ?? 0:01.23 /usr/mfs/bin/TRACE_SRV root 1927 1319 0.0 15:23:11 ?? 0:00.08 /usr/mfs/bin/GAM 6300 root 1940 1319 0.0 15:23:18 ?? 0:00.55 /usr/mfs/bin/GEM 6298 root 1941 1319 0.0 15:23:18 ?? 0:00.05 /usr/mfs/bin/SCA 6297 root 2046 1319 0.0 15:23:21 ?? 0:05.49 /usr/mfs/bin/GOM 6296 root 2047 1319 0.0 15:23:21 ?? 0:00.35 /usr/mfs/bin/GPM 6295 root 2065 1319 0.0 15:23:28 ?? 0:00.04 /usr/mfs/bin/GLM 6294 root 2075 1319 0.0 15:23:30 ?? 0:01.67 /usr/mfs/bin/mfsQ3Agt 6291 -compkg SOCKET root 2082 1319 0.0 15:23:30 ?? 0:00.03 /usr/mfs/bin/CRAFT_SUP 6290 root 5432 1 0.0 15:44:05 ?? 0:00.02 /usr/mfs/bin/PatchSrv root 5452 1 0.0 15:44:05 ?? 0:00.06 /usr/mfs/bin/BckpRstr

    If the line,

    root 5452 1 0.0 15:44:05 ?? 0:00.06 /usr/mfs/bin/BckpRstr,

    is missing the Backup Server in down on your master control station.

    You have to start the opration from step corrective action: first step (switch-over) to step Corrective action: second step (install_lsm) (see next chapter). if necessary, then relaunch the MFS SW replacement from step 2/10.

    5.7.5 Specific case for 3BKA13FBR175829: broken shared disk

    If you have something like that in userfile.log :

    Nov 9 02:14:42 STATION_A /usr/nectar/bin/UR_GCM[1727]: 769:NEC_GP:10001:GCM:10000:platform:NECTAR:10006:isolate:ncagadispa.c:843: Nov 9 02:14:42 STATION_A [1517]: NEC_CM:dm:NOTICE:61:DM SUSPEND: executed without delay Nov 9 02:17:43 STATION_A /usr/nectar/bin/CMA[1562]: 173:NEC_GP:10000:8:ncma_fsub1.c:2992:ncm_fsubs_fmsplit : error 60 in svcSendMsg - CMA_EXPORT_DEV - couple = 0 - hostname = STATION_B *** the timeout has expired *** Nov 9 02:17:43 STATION_A /usr/nectar/bin/UR_GCM[1727]: 173:NEC_GP:10000:14:ncmg_VCHANGE_SPLIT.c:585:ncm_VCHANGE_SPLIT_SPLITRSP/585/Impossible FM operation on shared mirrored disks (split)

    And :

    A shared disk is seen failed in nectar view (diskA (dudisk1Bus2) in the example)

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 38/117

    And:

    the disk_recover action can not be executed correctly.

    That means a shared disk is probably broken (diskA / rz16 in the example) and must be replaced

    If disk_recover is refused after changing the shared disk then you have to perform a install_lsm (see further chapter Corrective action: second step (install_lsm) )

    5.7.5.1 Check if a shared disk re-syncronization is in progress

    Type the following command on active station: ps -ef | grep fsgen

    If no synchronisation process is in progress, there is no answer (except the grep).

    If a process fsgen is displayed, wait and perform the operation few minutes later.

    This operation can take as long as 30 minutes.

    If no synchronisation process has been observed, go to next section.

    5.7.5.2 Check the LSM configuration Type the following command on the active control station (the length of diskA and diskB may be different than below, but identical on both logical disks):

    /usr/sbin/volprint Agts

    It should return the following answers:

    a) On an AS800 machine: Disk group: rootdg

    TYPE NAME ASSOC KSTATE LENGTH COMMENT

    dg rootdg rootdg - - dm rz0h rz0h - 1024

    Disk group: nsedg

    TYPE NAME ASSOC KSTATE LENGTH COMMENT

    dg nsedg nsedg - - dm diskA rz9 - 8379936 dm diskB rz17 - 8379936 sd bcl1 pl1 - 1 sd bcl2 pl2 - 1 sd bcl3 pl3 - 1 sd bcl4 pl4 - 1 sd bcl5 pl5 - 1 sd bcl6 pl6 - 1 sd sd1 pl1 - 131072 sd sd2 pl2 - 204800 sd sd3 pl3 - 614400 sd sd4 pl4 - 1024000 sd sd5 pl5 - 204800 sd sd6 pl6 - 204800 sd bcl1m pl1m - 1 sd bcl2m pl2m - 1 sd bcl3m pl3m - 1 sd bcl4m pl4m - 1

  • ED 11 Release MFS Troubleshooting guide release B9

    EVOLIUM 3bk29092JAAAPWZZA-ed11rl.doc 03/03/2006

    3BK 29042 JAAA PWZZA 39/117

    sd bcl5m pl5m - 1 sd bcl6m pl6m - 1 sd sd1m pl1m - 131072 sd sd2m pl2m - 204800 sd sd3m pl3m - 614400 sd sd4m pl4m - 1024000 sd sd5m pl5m - 204800 sd sd6m pl6m - 204800 plex pl1 vol1 ENABLED 131072 vol1 plex pl1m vol1 ENABLED 131072 vol1 plex pl2 vol2 ENABLED 204800 vol2 plex pl2m vol2 ENABLED 204800 vol2 plex pl3 vol3 ENABLED 614400 vol3 plex pl3m vol3 ENABLED 614400 vol3 plex pl4 vol4 ENABLED 1024000 vol4 plex pl4m vol4 ENABLED 1024000 vo