4036

Generated by Jive SBS on 2012-03-12-06:001

Rapid increasing er_bad_os at 8 Gbit...

AndyAtEon 10 posts since

Jul 12, 2005

HI,

I have seen very fast increasing error counter er_bad_os on storage ports. A change of thefill word mode does not fix the issue. The storage vendor recommends to set idle as fill word.We are using FOS 6.2 and 6.3.

I would like to understand what is going wrong and what has to be changed to fix the issue,switch or HBA firmware.

Thanks

Tags: 8gbit, dcx-4s, storage, dcx

hemant 422 posts sinceMar 3, 2010 1. Re: Rapid increasing er_bad_os at 8 Gbit speed Apr 9, 2010 10:21 AM

This also media related issue, change the Fibre Cable, SFP or HBA...

andreas.bergelt 600 posts sinceApr 12, 2010 2. Re: Rapid increasing er_bad_os at 8 Gbit speed Apr 12, 2010 1:17 AM

in response to: hemant

Thanks hemant,

in our case it is not related to media, cabel or SFP.

It is related to 8 gbit compatiblity issues. But I don't understand what is going wrong betweenswitch and device. I would like to get a more detailed description why idel nor abrff are notworking correctly.

/people/AndyAtEon

/people/AndyAtEon

/people/hemant

/people/hemant

/message/10469#10469

/people/andreas.bergelt


/message/10521#10521

/message/10469#10469

/message/10469#10469



Andreas

.


in response to: andreas.bergelt

Hi,

Pls do a portstatsclear and see thorugh portstatsshow and porterrshow , if the same isincreasing or not. You may have to upgrade the driver and FW version of the HBAs

andreas.bergelt 600 posts sinceApr 12, 2010 4. Re: Rapid increasing er_bad_os at 8 Gbit speed Apr 13, 2010 12:56 PM


Thanks,

for your help to find a solution, but my question was to get an explenation for this behavior. By the way a HBA firmware update on a storage array is not possible. The storage arrayruns the latest code.

I would like to know if other customers had the same issue with 8 Gbit storage ports on 8Gbit Brocade switches.



This parameters is (platform/port specific). did you do a portstatsclear and check itincreasing or not ?Check the compatibilty of HBA Driver and Firmware version with Storagemicrocode. You have to see other parameters also er_enc_out, if all these values areincreasing after doing portstatsclear, then you have to change the cable and SFP.BTWwhich storage is this with 8Gbps CHA ports?There is no such compatibiltiy issue . SAN SWworks on AN mode.

/people/hemant

/people/hemant

/message/10551#10551

/message/10521#10521

/message/10521#10521



/message/10557#10557

/message/10551#10551

/message/10551#10551

/people/hemant

/people/hemant

/message/10622#10622

/message/10557#10557

/message/10557#10557





All other errcounter are at zero level. So no cabling issue. If I configure the storage portfixed to 4 Gbit everything is fine only at 8 Gbit speed I can see the increasing er_bad_os.Hitachi storage is affected. USP-V and AMS 2500 with 8 Gbit ports. As mentioned aboveFOS 6.3.1a is running on the switch and latest code on the array. And only the switch portwhere the array is connected is affected. we have EMULEX LPe12000 and Brocade 815 onthe server side without any issues. Data traffic can pass the storage without problems. But Ican see the counter increase very quickly...

xwdees02_a1_ds:FID128:A15710> portcfgshow 1/9Area Number: 9SpeedLevel: AUTO(HW)Fill Word: 0(Idle-Idle)

....

xwdees02_a1_ds:FID128:A15710> statsclearxwdees02_a1_ds:FID128:A15710>portstatsshow 1/9stat_wtx 0 4-byte words transmittedstat_wrx 0 4-byte words receivedstat_ftx 0 Frames transmittedstat_frx 0 Frames receivedstat_c2_frx 0 Class 2 framesreceivedstat_c3_frx 0 Class 3 frames receivedstat_lc_rx 0 Link control frames receivedstat_mc_rx 0 Multicast framesreceivedstat_mc_to 0 Multicast timeoutsstat_mc_tx 0 Multicastframes transmittedtim_rdy_pri 0 Time R_RDY high prioritytim_txcrd_z 0 Time TX Credit Zero (2.5Us ticks)tim_txcrd_z_vc 0- 3: 0 0 0 0tim_txcrd_z_vc 4- 7: 0 0 0 0tim_txcrd_z_vc 8-11: 0 0 0 0tim_txcrd_z_vc 12-15: 0 0 0 0er_enc_in 0 Encoding errors inside of frameser_crc 0 Frames with CRCerrorser_trunc 0 Frames shorter than minimumer_toolong 0 Frames longer than maximumer_bad_eof 0 Frames with bad end-of-frameer_enc_out 0 Encoding error outside of frameser_bad_os 409610422 Invalid ordered seter_rx_c3_timeout 0 Class 3 receive framesdiscarded due to timeouter_tx_c3_timeout 0 Class 3 transmit frames discardeddue to timeouter_c3_dest_unreach 0 Class 3 frames discarded due to destinationunreachableer_other_discard 0 Other discardser_type1_miss 0 frames with FTB type 1 misser_type2_miss 0 frames with FTB type 2misser_type6_miss 0 frames with FTB type 6 misser_zone_miss 0 frames with hard zoning misser_lun_zone_miss 0 frames with LUN



/message/10628#10628

/message/10622#10622

/message/10622#10622



zoning misser_crc_good_eof 0 Crc error with good eofer_inv_arb 0 Invalid ARBopen 0 loop_opentransfer 0 loop_transferopened 0 FL_Port openedstarve_stop 0 tenancies stopped due to starvationfl_tenancy 0 number of times FL has thetenancynl_tenancy 0 number of times NL has the tenancyzero_tenancy 0 zero tenancy

Wait some seconds...

xwdees02_a1_ds:FID128:A15710> portstatsshow 1/9stat_wtx 0 4-bytewords transmittedstat_wrx 0 4-byte words receivedstat_ftx 0 Frames transmittedstat_frx 0 Frames receivedstat_c2_frx 0 Class 2 frames receivedstat_c3_frx 0 Class 3 framesreceivedstat_lc_rx 0 Link control frames receivedstat_mc_rx 0 Multicast frames receivedstat_mc_to 0 Multicast timeoutsstat_mc_tx 0 Multicast frames transmittedtim_rdy_pri 0 Time R_RDY highprioritytim_txcrd_z 0 Time TX Credit Zero (2.5Us ticks)tim_txcrd_z_vc 0- 3: 0 0 0 0tim_txcrd_z_vc 4- 7: 0 0 0 0tim_txcrd_z_vc 8-11: 0 0 0 0tim_txcrd_z_vc 12-15: 0 0 0 0er_enc_in 0 Encoding errors inside of frameser_crc 0 Frames with CRC errorser_trunc 0 Frames shorter thanminimumer_toolong 0 Frames longer than maximumer_bad_eof 0 Frames with bad end-of-frameer_enc_out 0 Encoding erroroutside of frameser_bad_os 716822618 Invalid ordered seter_rx_c3_timeout 0 Class 3 receive frames discarded due to timeouter_tx_c3_timeout 0 Class 3 transmit frames discarded due to timeouter_c3_dest_unreach 0 Class 3 frames discarded due to destination unreachableer_other_discard 0 Other discardser_type1_miss 0 frames with FTB type 1misser_type2_miss 0 frames with FTB type 2 misser_type6_miss 0 frames with FTB type 6 misser_zone_miss 0 frames with hard zoningmisser_lun_zone_miss 0 frames with LUN zoning misser_crc_good_eof 0 Crc error with good eofer_inv_arb 0 Invalid ARBopen 0 loop_opentransfer 0 loop_transferopened 0 FL_Port openedstarve_stop 0 tenancies stopped due tostarvationfl_tenancy 0 number of times FL has the tenancynl_tenancy 0 number of times NL has the tenancyzero_tenancy 0 zerotenancyxwdees02_a1_ds:FID128:A15710>



As you can see everything is fine.

If I create some load on the port there are also no increasing error count. These "er_bas_os"are not visible to the server and also not visible to the storage array. Hiatchi adviced us toconfigure the ports to the settings from above.

We have no transport erros currently.

I am looking for an explenation of this behavior.

Thanks,

Andreas



yes, definitely it is strange. Have you tried changing the port from this to other ports. you aresaying that while putting load does not increase the value. is the server HBA idle .What elsethe HITACHI people said.we can ignore this value also. unless until you face congestion.this parameter increase only due to server reboot, bad cable, SFP, . Also do a portstatsclearnot only statsclear. Is theer any error on porterrshow.what is showing on errdump? I do nothink this is a compatibiltiy issue at CHA port or switch port. you also say that while you setthe port to 4 gbps, it is ok. Check portshow also and portloginshow.



Have you tried changing the port from this to other ports --> Yes, the same issue

you are saying that while putting load does not increase the value--> I am sorry my english is notvery good. The er_bas_os increase in the same speed at the affected storage . The data flow hadno problems.

Is theer any error on porterrshow --> no

you also say that while you set the port to 4 gbps, it is ok --> Yes correct

Check portshow also and portloginshow --> No problems ervy thing is fine

/people/hemant

/people/hemant

/message/10629#10629

/message/10628#10628

/message/10628#10628



/message/10630#10630

/message/10629#10629

/message/10629#10629



What else the HITACHI people said --> Ignore the counter.

But this looks not as a well tested and compatible product combination. Looks like thatBrocade is not talking to the rest of the world and make sure that new technology works asplanned...

Can you ask Brocade the engineering what is going wrong? I think you are working forBrocade, correct?

I would like to understand what the problem is and who can fix it.

Thanks,

Andreas



Hi,

No, I am not working for Brocade, but I am BCFP, BCSD, BCFD, BCSM(4 & 8 Gbps)certified and working on Brocade Dir class products with a huge 4000 SAN SWs ports for 4.6 yrs. I have seen these things. I can say just to ignore this.

let me describe :

in portstatsshow we see port hardware statistics counter.some counters are platform andport specific and display only with those platforms and ports.

/people/hemant

/people/hemant

/message/10631#10631

/message/10630#10630

/message/10630#10630



This parameter wants to say , that any config/parameter is not set correct. That has severalreason....

I have seen my friends facing the same :If we change the Speed from 8 to 4 G the counterstops or the other solution is to change the fillword to ARBFF in Link Init, ARBFF as fill word.

Since ordered sets do not contain data, it has nothing to do with the dataflow. So we canignore these.

Remember, ordered sets are purely within the SAN; the OS will never see them.

Have you checked the compatibility matrices between your server/HBA and the switch/FOSlevel, and between the server/HBA and the storage?

about orderedset:The round trip delay is measured by transmitting a particular PrimitiveSignal. A Primitive Signal is an Ordered Set used to indicate an event. An Ordered Setis a 4-byte Transmission Word which has the Special Character as its first TransmissionCharacter. An Ordered Set may be a Frame Delimited, a Primitive Signal, or a PrimitiveSequence. Ordered Sets are used to distinguish Fibre Channel control information fromdata. A Transmission Word is a string of four consecutive Transmission Characters--a (validor invalid) 10-bit character transmitted serially over the fibre. Valid Transmission Charactersare determined by the 8B/10B encoding specification. The Special Character is a special10-bit Transmission Character which does not have a corresponding 8-bit value, but is stillconsidered valid. The Special Character is used to indicate that a particular TransmissionWord is an Ordered Set. The Special Character is the only Transmission Character to havefive 1's or 0's in a row. The Special Character is also referred to as K28.5 when using K/D format. For additional explanation of these various terms, one may refer to the FibreChannel standards, particularly FC-PH, which is ANSI publication X3.230, and is herebyincorporated by reference.

Also we have seen when there is no data tranmission between HBA and storage port thisvalue also increases.



If HITACHI people have said to ignore this then they must have queried Brocade.

TechHelp24 2,605 posts sinceFeb 23, 2004 10. Re: Rapid increasing er_bad_os at 8 Gbit speed Apr 16, 2010 12:32 AM


Andreas, they already gave themselves the answer...

--->>>in our case it is not related to media, cabel or SFP.

"It is related to 8 gbit compatiblity issues."

TechHelp24 2,605 posts sinceFeb 23, 2004 11. Re: Rapid increasing er_bad_os at 8 Gbit speed Apr 16, 2010 12:33 AM

Andreas, they already gave themselves the answer...

--->>>in our case it is not related to media, cabel or SFP.

"It is related to 8 gbit compatiblity issues."


in response to: TechHelp24

simply tell them to change the CHA board, or wait till the Microcode upgrade

pmescher 1 posts sinceMay 7, 2009 13. Re: Rapid increasing er_bad_os at 8 Gbit speed Apr 16, 2010 3:33 PM

No need to replace hardware. Simply upgrade to a FOS level with the portcfgfillwordcommand. (I think it was introduced somewhere in 6.2) Brocade defaults to ARBff, some

/people/TechHelp24

/people/TechHelp24

/message/10645#10645

/message/10557#10557

/message/10557#10557

/people/TechHelp24

/people/TechHelp24

/message/10647#10647

/people/hemant

/people/hemant

/message/10666#10666

/message/10647#10647

/message/10647#10647

/people/pmescher

/people/pmescher

/message/10672#10672



storage devices still expect six IDLEs between frames, and their state machines fail if thoseIDLEs aren't received. portcfgfillword <port number>, 0 and you will be all fixed.

This problem CAN cause data flow issues due to excess interrupts in HBAs. I've seen it onsome QL models.

Note that the fill word was changed for good reasons (ARBff improves signal timing), so thelatest FOS versions allow you send six IDLEs to satisfy the state machine, and then sendARBff's. So, you get the IDLEs for devices that require them, and you get the improvedsignal characteristics of ARBff.


in response to: pmescher

I do not have any idea about portcfgfillword because I have not used it. But at HITACHI levelmicrocode upgrade may resolve the issue.

ploufg 31 posts sinceApr 16, 2007 15. Re: Rapid increasing er_bad_os at 8 Gbit speed Aug 4, 2010 9:01 AM

Hi

I had the same problem with my DCX and 48000 (FC8-48), and I suggest not toignored this counters, even though it is not related to data transfer but more tolink signal and sync. I’d contact my support provider (HP) and the only way theyresolve it was re-configured the fill word port mode from 0 to 3.

portcfgfillword slot/port, 0 iddle

portcfgfillword slot/port, 3 arbff if failed use iddle for devices that expect this signal

Note: that if you run this command It will reset the port (disable-enable) so make sure thatyou servers have more than one path to the target

/people/hemant

/people/hemant

/message/10682#10682

/message/10672#10672

/message/10672#10672

/people/ploufg

/people/ploufg

/message/12396#12396



This configuration allow devices to used arbff as primitive signal or devices that expect idlleprimitive signal (like and auto-negociation).

As for technical reason it is related to Electromagnetic Interferance(EMI) and protocol (seeT11 updated documentation)

From T11 org

The following FC-FS-2 proposal is for the purpose of reducing EMI with 8G and higher seriallink . It replaces IDLE with the use of ARB(FF) which has a lower transition density. Thisallows the reduction of EMI without more significant changes that would involve randomizingthe data pattern.

I suggest that you contact your support provider

If I may had a comment i used the er_bad_os to identify bad devices like SFP, Cable, lostdb, (fc analyzer), or any

Hop this will help.

andreas.bergelt 600 posts sinceApr 12, 2010 16. Re: Rapid increasing er_bad_os at 8 Gbit speed Aug 5, 2010 1:57 AM

in response to: ploufg

Hello,

thanks for your answer. But in our case the affected storage vendor doesn't supportARB(FF) on the switch side. I assume that the FC ASIC on the storage side is not wellcoded and tested.

Regards,

Andreas

a.adamson 53 posts since



/message/12412#12412

/message/12396#12396

/message/12396#12396

/people/a.adamson

/people/a.adamson



Jun 24, 2009 17. Re: Rapid increasing er_bad_os at 8 Gbit speed Aug 9, 2010 1:37 AM


Hi,

I have had the same problem. After making sure I had the correct fill word (according to themanufacturer of the arrays in question), I found the problem disappeared when replacing thecabling that was going via a path panel by a brand-new direct attached cable.

I agree the errors do not necessarily suggest a cabling problem, but that is how we solved it.

Alastair

Mahendran 22 posts sinceApr 6, 2010 18. Re: Rapid increasing er_bad_os at 8 Gbit speed Aug 12, 2010 4:57 AM

in response to: a.adamson

I am facing similar issues on a 48000, whereby er_bad_os is increasing and the ports areflapping as well. Some other ports are facing the same issues as well. Any idea why this ishappening?

Firmware : v6.2.2a

Speed: N4 Online (4Gbps)

portstatsshow 4/1

stat_wtx 915040180 4-byte words transmitted

stat_wrx 2406319128 4-byte words received

stat_ftx 12732318 Frames transmitted

stat_frx 683736493 Frames received

/message/12465#12465

/message/12412#12412

/message/12412#12412

/people/Mahendran

/people/Mahendran

/message/12580#12580

/message/12465#12465

/message/12465#12465



stat_c2_frx 0 Class 2 frames received

stat_c3_frx 683736493 Class 3 frames received

stat_lc_rx 0 Link control frames received

stat_mc_rx 0 Multicast frames received

stat_mc_to 0 Multicast timeouts

stat_mc_tx 0 Multicast frames transmitted

tim_rdy_pri 0 Time R_RDY high priority

tim_txcrd_z 318 Time BB credit zero (2.5Us ticks)

er_enc_in 0 Encoding errors inside of frames

er_crc 0 Frames with CRC errors

er_trunc 0 Frames shorter than minimum

er_toolong 0 Frames longer than maximum

er_bad_eof 0 Frames with bad end-of-frame

er_enc_out 84 Encoding error outside of frames

er_bad_os 11245 Invalid ordered set

er_c3_timeout 2366 Class 3 frames discarded due to timeout

er_c3_dest_unreach 0 Class 3 frames discarded due to destination unreachable

er_other_discard 0 Other discards

er_zone_discard 0 Class 3 frames discarded due to zone mismatch

er_crc_good_eof 0 Crc error with good eof

er_inv_arb 0 Invalid ARB

open 0 loop_open

transfer 0 loop_transfer

opened 0 FL_Port opened

starve_stop 0 tenancies stopped due to starvation

fl_tenancy 0 number of times FL has the tenancy

nl_tenancy 0 number of times NL has the tenancy

portcfgshow 4/1

Area Number: 49

Speed Level: AUTO(HW)

Fill Word: 0(Idle-Idle)



AL_PA Offset 13: OFF

Trunk Port ON

Long Distance OFF

VC Link Init OFF

Locked L_Port OFF

Locked G_Port OFF

Disabled E_Port OFF

ISL R_RDY Mode OFF

RSCN Suppressed OFF

Persistent Disable OFF

NPIV capability ON

QOS E_Port OFF

Port Auto Disable: OFF

Mirror Port OFF

F_Port Buffers OFF

sfpshow 4/1

Identifier: 3 SFP

Connector: 7 LC

Transceiver: 150c402001000000 100,200,400_MB/s M5,M6 sw Inter_dist

Encoding: 1 8B10B

Baud Rate: 42 (units 100 megabaud)

Length 9u: 0 (units km)

Length 9u: 0 (units 100 meters)

Length 50u: 15 (units 10 meters)

Length 62.5u:7 (units 10 meters)

Length Cu: 0 (units 1 meter)



Vendor Name: FINISAR CORP.

Vendor OUI: 00:90:65

Vendor PN: FTLF8524P2BNV

Vendor Rev: A

Wavelength: 850 (units nm)

Options: 0032 Loss_of_Sig,Tx_Disable

BR Max: 0

BR Min: 0

Serial No: UA80ZN9

Date Code: 060821

Temperature: 34 Centigrade

Current: 6.228 mAmps

Voltage: 3291.9 mVolts

RX Power: -5.5 dBm (281.5 uWatts)

TX Power: -4.3 dBm (372.3 uWatts)


in response to: Mahendran

In your case I would say you have some issues with discards in your fabric.

Check your ports for this counter: er_c3_timeout.

You will see on affected ports that servers will have IO errors and a performance issue.

I thinks it is a serious issue to have dicards in the fabric.



/message/12581#12581

/message/12580#12580

/message/12580#12580



As an update: With FOS 6.3.2 Brocade shows the "discards direction" if it is onthe TX or RX side

This is cool :-)

Regards,

Andreas



Yes. We are seeing discards on the ISL links and we have upgraded from 6.1.2 --> 6.2.2but seems like its not helping. We have done everything from sfp replacments to cable andmove the cables to other ports. No clue on what to do next. Everyday there will be portshave enc out increasing and flapping. Sometimes tx/rx would be 0.0 watts and we replacedthe sfps but at time the sfp would be just fine but we still its flapping.



I asume that you came original from a 5.3 version before you update to 6.x, correct?

If so than you can reduce the dicards on the ISL and in the fabric if you add additional ISLs.In our case it had fixed the issue.

I have also disabled QoS on all ISL but this didn't fix the issue. It was only arecommendation from the OEM.

I suggest that you fix first your ISL issues and then do the next step.

Regards,

Andreas

/people/Mahendran

/people/Mahendran

/message/12582#12582

/message/12581#12581

/message/12581#12581



/message/12601#12601

/message/12582#12582

/message/12582#12582





Hi Andreas,

The previous version was 6.1.2. We haven't try to add additional ISLs but this is somethingworth a try. We also have issues with replication ports flapping and there are discards onthe MPR 7500 san router connecting the primary and secondary site. We are still pending toupgrade the firmware on the secondary site and I think we will proceed further from there.

Regards,

Mahen



Hi Andreas,

Apart from that, we are also experiencing enc out on storage ports which belongs to thesame USP. Could it be that the ports are set to auto nego?

Regards,

Mahen



Hello Mahen,

/people/Mahendran

/people/Mahendran

/message/12608#12608

/message/12601#12601

/message/12601#12601

/people/Mahendran

/people/Mahendran

/message/12636#12636

/message/12608#12608

/message/12608#12608



/message/12638#12638

/message/12636#12636

/message/12636#12636



auto nego is not an issue on a Brocade SAN. USP with 4 Gbit and USP-V with 8Gbit featurecards have no problem with auto.

I prefere to set the ports on both side fixed to the maximum speed.

I assume that you may have a bigger issue in the SAN.

Try primay to fix your discard issue.

If you talk about a 7500 router can you check if the router links are maybe overloaded?

It could be possible that this is the reason why you see dropped frames in your SAN.

But finally I would say that this thread has now a complete other direction compared to the

initial question which I raised some time ago....

.

Regards,

Andreas



Yeah agree. Maybe I should open a new thread.

/people/Mahendran

/people/Mahendran

/message/12639#12639

/message/12638#12638

/message/12638#12638





Let it here as it is.

What are your next steps to solve your issues?

Andreas



Hi Andres,

I have no clue what to do next. I am actually waiting for all the switches in the primary siteand secondary site to be upgraded and proceed to torubleshoot as adviced by the vendor. Iam open to any suggestions from you all.

Regards,

Mahendran

hemant 422 posts sinceMar 3, 2010 28. Re: Rapid increasing er_bad_os at 8 Gbit speed Aug 14, 2010 3:33 AM


why do not you check from HBA side.you are getting the value increasing on the ports ,connected to only HBAs or Storages also. have you done a portstatsclear and thenporterrshow and a portstatsshow again and again. have you checked then, the error isincreasing or not?



/message/12642#12642

/message/12639#12639

/message/12639#12639

/people/Mahendran

/people/Mahendran

/message/12653#12653

/message/12642#12642

/message/12642#12642

/people/hemant

/people/hemant

/message/12654#12654

/message/12653#12653

/message/12653#12653





Hi Hemant,

Let me give an overview of the infra:

We have 3 sites :

Site 1

Fabric 1 = 20 switches


Site 2


Fabric 2= 18 switches

Sites 3



/people/Mahendran

/people/Mahendran

/message/12655#12655

/message/12654#12654

/message/12654#12654



------------------------------------------------------------------------

We are experiencing discards on the ISL between 2 dir 4800 switches (Site 1 ; Fabric 1)and there quite number of port generating enc out and disc c3 on most of the switchesconnected to those dir switches. We have replaced a lot of sfps and cables but nothingis solving this issue. Everyday, there will be ports flapping. Firmware was upgraded from6.1.0c ---> 6.2.2a but this seems be not solving the issue. We have about 6 more switches tobe upgraded.

On your suggestions to replace the HBA, we did replace the HBA seems like the paths onthe server goes offline even after the replacement.

We have few hundred server facing intermittent paths going offline at the moment.

I have no clue where to start again.

Regards,

Mahen



Hi,

/people/hemant

/people/hemant

/message/12656#12656

/message/12655#12655

/message/12655#12655



If you are facing the issue on ISL path between 2 SWs , then connect 2 more cables tocreate another trunk.or if you have adjacent port available, then add m2 more cables, oncethese 2 new cables create a trunk, observe the error. If you will not get any error on thesenew cables , then remove the old cables and observe through these new trunk

one question, u say that u observe errors on ISL between 2 SWs, so the servers showingintermittent path offline are connected to only these 2 sws. Try to localize the servers andstorages also, that means both HBA and controller should be on the same SW.

Also if these are HITACHI controllers, check the HDLM version on hosts, you may have toupgrade the HDLM version with autofailback on and extended I/O settings.

If not the HITACHI arrays, then you should log a call with ur vendor, which will log a call inthe backend of Brocade.

One thing you have done that you have upgarded FOS from 6.1 to 6.2, but you haveupgrdae all the SWs in that fabric then.do nto keep it like this.



Hi Hemant,

I think I will proceed to propose to add ISLs links between these 2 dir switches. Thoseservers are connected : servers ---> 32poirt_switch ---> dir1 ----> dir3---->USP . So, all theservers connected via these switches are facing same problem.

Regards,

/people/Mahendran

/people/Mahendran

/message/12657#12657

/message/12656#12656

/message/12656#12656



Mahendran

andreas.bergelt 600 posts sinceApr 12, 2010 32. Re: Rapid increasing er_bad_os at 8 Gbit speed Aug 14, 2010 2:01 PM


Mahendran,

your are on the right way if you add ISL. Try to create a bigger trunk and not to add moretrunks to simplify the routing table and avoid to have a mash or ring topology.

The FOS code will manage the balancing by them selfs well.

Did you have on the 32port switch old servers running on 1 or 2 gbit speed with very oldPCI bus infrastructure and did they have zones to 8 Gbit storage ports?. If so this can causeback pressure on the ISL which will end in discards somewhere in the fabric. The storageport can overload the server.

Did your problems came up after your FOS code update?

Did you have a error history from the time before the update or did you now start the errormonitoring since you have the issue?

If the errors came up after your update then FOS code may causes the discards which resultin IO errors on the servers which you can see in also as LUN resets on the storage ports.

I have seen the same in my own environment nothing changed only the the FOS code. Afteradding ISLs every thing went fine as before.



/message/12658#12658

/message/12657#12657

/message/12657#12657



Don't waste your time to look at HDLM or HBA firmware. FIx the DISCARDS on ISL!

Andreas



Hi,

Otherwise if you can localize the servers and storages i.e. connect the HBAs to that SW,where th estorages are connected. Eliminate the hop. That will solve the issue.

a.adamson 53 posts sinceJun 24, 2009 34. Re: Rapid increasing er_bad_os at 8 Gbit speed Aug 16, 2010 8:38 AM


Mahendran,

Mahendran wrote:

I think I will proceed to propose to add ISLs links between these 2 dirswitches. Those servers are connected : servers ---> 32poirt_switch --->dir1 ----> dir3---->USP . So, all the servers connected via these switches arefacing same problem.

I am confused about your setup. You say a few posts back that you have three dual-frabricSANs. In your second post you mention an MPR 7500. So are all three SANs in a meta-SAN? You later say the discards are between two 48000 directors, Are these on the samefabric in the same SAN or across SANs in the meta-SAN, via the routers?

In your first post you show details of a specific port, presumably an e-port. Is this connectedto the 7500? What is the distance between sites? Are the inter-site links all in the backbonefabrics on the 7500s?

/people/hemant

/people/hemant

/message/12664#12664

/message/12657#12657

/message/12657#12657

/people/a.adamson

/people/a.adamson

/message/12668#12668

/message/12657#12657

/message/12657#12657



Thanks,

Alastair

pierre.cornet 1 posts sinceJan 18, 2010 35. Re: Rapid increasing er_bad_os at 8 Gbit speed Aug 16, 2010 11:43 AM


This error can't still be noticed in a non-isl environment regardless of GBIC speed.


in response to: pierre.cornet

In this case you just can ignore the error, because it will not harm ur data

aart.aalberts1 1 posts sinceApr 20, 2010 37. Re: Rapid increasing er_bad_os at 8 Gbit speed Sep 25, 2010 2:59 AM

we have same isl issue between 48k and dcx: v631a with c3 discarded frames setportcfgfillword on 8gbps ports no luck seems to me some internal timings issue between 4and 8gbps san ports for the rest have no clue as neither emc has

erwin.vanlonden 61 posts sinceJul 27, 2009 38. Re: Rapid increasing er_bad_os at 8 Gbit speed Dec 27, 2010 6:13 PM

I saw this one popping by. The below might shed some light on fillwords and "invalid orderedsets". I wrote this explanation in and HDS internal communiqe but it seems most of youwill benefit as well. Let me emphasize that this has nothing to do with hardware troubleswhatsoever when you see this phenomenon.

-----------------------------

/people/pierre.cornet

/people/pierre.cornet

/message/12674#12674

/message/12664#12664

/message/12664#12664

/people/hemant

/people/hemant

/message/12840#12840

/message/12674#12674

/message/12674#12674

/people/aart.aalberts1

/people/aart.aalberts1

/message/13469#13469

/people/erwin.vanlonden


/message/14857#14857



This goes back to the change in fibre channel protocol requirements for 8G and higherlinespeeds. On 4G and lower a so called IDLE fill-word is used which starts with a K28.5and is followed with 3 datawords. (D21.4 D21.5 D21.5) This fill-word is used to maintain bitand word synchronisation between two N-Ports. Due to the higher baud-rate on 8G speedsand the specific bit pattern of this IDLE fill-word it is known that this increases emission ofhigh radiation waves that might result into electrical interference with other equipment.

To circumvent that an other fillword was adopted which was already defined in the FC-AL protocol called ARB(ff) (K28.5 D20.4 D31.7 D31.7) . This is a similar fillword but has abetter bitpattern to prevent this radiation emission. These fillwords are called ordered sets(a K28.5 and three datawords are an ordered set). The standard defines that during wordsynchronisation (that is after speed negotiation on 8G and bit/character synchronisation)the ports shall send 6 IDLES upon entering the Active state to obtain word synchronisationand then switch to ARB(ff) as fillword. (I spare you the entire protocol definition on link statechanges)

If however the speed is negotiated at 8G during this init/transition sequence and onlyone port switches to ARB(ff) as fill-word you will see these er_bad_os counter increasevery fast. Be aware that even thought no actual frame is sent from an HBA the HBA andswitchport still send these fill-words constantly at the negotiated line-speed. Beside the word synchronisation the ports also use actual frames to sync their clockrate by looking atSOF and EOF delimiters. If however an HBA is not sending any frames and the port is notable to determine a sync state within a certain period of time it will do a link reset (LR) andwill go through the sync process again.

Brocade FOS pre 6.3.1 had only mode 0 and 1 (either IDLE or ARBff). This meant thatif one device was very strict in the standard but another was not it would sync up if theswitch port was configured as mode 0 but it would never switch to ARBff as required bythe standard. On the other hand if the switchport was configured as mode 1 and the HBAlived by the standard it could never get into a synch state because the switch would onlytransmit ARBff as fillwords and the HBA would only use IDLEs.

Fillwords can be replaced by other ordered sets (primitive signals or sequences). One of those is very important for buffer credit organisation and is calledR_RDY. If you loose R_RDY signals the sending device has no knowledge if these



buffers on the other side have been cleared. This may lead to performanceproblems etc. I’ll spare you the details.

As you can see these fillwords are used between frames. FLOGI and PLOGI are frames so to answer your question , no, changing fillwords has nothing todo with failed FLOGI or PLOGI’s. PLOGI’s from initiator to target devices mightsometimes get dropped as any other frame in class 3 service due to numerousreasons. Physical errors or congestion on ISL’s is one of the most likely causes.A FLOGI is one frame going from an N_Port to a F-port controller on a switchwhich registers it in the fabric controller. That is the only reason why a FLOGIis needed, to obtain a 24 bit fabric address. After the PLOGI and nameserver registration an RSCN is send out to all devices in it’s zone and the other end-to-end queries and registrations begin.

The fun becomes even more apparent next year with 16 and 32 G speeds where we switch from 8b/10 to 62b/64b encoding. This encoding mechanism is alreadyused on 10G FC hence the reason it’s not interoperable with 1/2/4/8G speeds.

In short if you have 8G ports on HBA’s and Storage and have Brocade 8G port with FOS => 6.3.1 use Mode 2. All other linespeeds (1/2/4) still use IDLEfillwords and require Mode 0. If you use FOS <6.3.1 it depends a bit on theimplementation of the HBA/Storage vendors. Recommendation is to upgrade tothe latest supported firmware levels to be able to adhere to the standard.

You may find that especially when using long distance connections over DWDM whereeither transponders or TDM multiplexers are used in some occasions these devices havenot adhered to the fibre channel standard yet. You should consult with the DWDM provideto upgrade the firmware in those devices to be able to get a reliable connection.

I hope this explains a bit these changes.

------------------------------------



Cheers

E

andreas.bergelt 600 posts sinceApr 12, 2010 39. Re: Rapid increasing er_bad_os at 8 Gbit speed Dec 27, 2010 2:46 AM

in response to: erwin.vanlonden

Hello Erwin,

many thanks for this details and the very good explanation of ABRff and IDEL.

What happens if you fix the switch port and storage port of an Hitachi Array to 8 Gbit. Iassume that both devices have to start with ABRff right from the beginning. Is this right?

Why does this cause problems on the Hitachi arrays to have difficulties to get in sync withthe switch port. I have seen switch ports which change to state faulty.

Is this related to miss behavior of the storage port firmware on the arrays?

From a user point of view it is very confusing if some devices work with IDEL and some withARBff at 8 Gbit speed.

Andreas

erwin.vanlonden 61 posts sinceJul 27, 2009 40. Re: Rapid increasing er_bad_os at 8 Gbit speed Dec 27, 2010 6:10 PM



/message/14858#14858

/message/14857#14857

/message/14857#14857



/message/14892#14892




Andreas,

Hitachi arrays follow the standard. Very strict. End of Story. The problem with the Brocadeswitches (or rather the administrators) have, is that they have options. :-) If you choose theincorrect one it doesn't work with any vendors array.

According to the standard during init state the ports still have to use IDLE primitivesirrespective of speed. Upon entering the Active state both ports have to switch to ARBffprimitives after sending at least 6 IDLE and recieving at least 2. (Also normal process in theFC-FS part). So only one mode (2) adheres to that. The reason Brocade came up with mode3 is because some vendors had 8G implementations while some issues in the standardweren't totally fleshed out. There was one issue were there could be a deadlock during initphase in which both ports could never come online. One port would send a NOS primitivein which the OLS/LR/LRR primitive sequence started off. This then would still end up in thisdeadlock situation. For this reason mode 3 was implemented where the Brocade switchwaits for a NOS and then switches to ARBff.

As you can see all these 4 modes serve their purpose because some vendors haddeviations in their implementations during the creation of this part of the standard. Brocadejust gave them the options. Only mode 2 adheres strictly to the standard.

Hope this explains some of the reasons behind these options as well as some internals.

Cheers

E

BryanO 1 posts sinceSep 21, 2010 41. Re: Rapid increasing er_bad_os at 8 Gbit speed Jan 20, 2011 10:20 AM


Mahendran,

/message/14858#14858

/message/14858#14858

/people/BryanO

/people/BryanO

/message/15333#15333

/message/12657#12657

/message/12657#12657



Any update on if you were able to resolve your issues?

Thanks

sai.nikhil 4 posts sinceDec 17, 2010 42. Re: Rapid increasing er_bad_os at 8 Gbit speed May 13, 2011 8:30 AM

I have the same issue, but the issue is on the 8G switch port where the VMWare host isconnected. Does this problem have a fix?

Can I try to peg down the speed of the port to 4G and see if it resolves the issue?

Thanks,

SK

andreas.bergelt 600 posts sinceApr 12, 2010 43. Re: Rapid increasing er_bad_os at 8 Gbit speed May 13, 2011 8:46 AM

in response to: sai.nikhil

Depends on your FOS code.

Check the fill word setting on the SAN switch port and have a play with it.

Try option mode 2 or 1. A change on this settings will disrupt the IO due to a link reset.

Andreas


/people/sai.nikhil

/people/sai.nikhil

/message/17359#17359



/message/17360#17360

/message/17359#17359

/message/17359#17359

/people/sai.nikhil

/people/sai.nikhil

/message/17361#17361




Thank you for the prompt reply, Andreas.

The interesting point I note is , this is happening on the switch ports on the two fabrics wherethe host is connected to.

One switch is at 6.4.1b and other side is on 6.3.0b.

Ok. Does this change prove to solve the issue?



Currently I would say this is not a real issue.

Traffic will flow through the switch without a big performance issue.

I assume that on FOS 6.3.0b this mode is not present.

So change it on FOS 6.4.1b and check if everything is OK on the server side.

Andreas



v6.3.0b has only 2 modes

/message/17360#17360

/message/17360#17360



/message/17362#17362

/message/17361#17361

/message/17361#17361

/people/sai.nikhil

/people/sai.nikhil

/message/17363#17363

/message/17362#17362

/message/17362#17362



0 -idle-idle

1 -arbff-arbff

v6.4.1b has 4 modes

0 -idle-idle

1 -arbff-arbff

2 -idel-arbff

3- aa-then-ia

Technically I believe that this problem can be fixed, if the host can be moved to someexisting 4G connections. Am I assuming it right?



And..to add to that regarding performance. The VM team is seeing an impact in the formof backups running slow. I'm not completely sure, if this would be the reason, but at themoment, I cannot find any other errors.

Thanks,

SK



In my case it doesn't had an impact on the performance.

/people/sai.nikhil

/people/sai.nikhil

/message/17364#17364

/message/17363#17363

/message/17363#17363



/message/17365#17365

/message/17364#17364

/message/17364#17364



The switch expect other fill words than the HBA is sending and this causes the error counterto increase.

Between each data frame the switch expect arbff frames but the HBA is sending idle frames.This is due to different implementations of 8GB FC standard.

Brocade found out that abrff are better than idle on 8gbit speed. If you slow down the portspeed to 4Gbit everything is fine because both devices are sending idles.

Idles frames or arbff frames should not have a performance issue.

I suggest to look somewhere else to find the performance issue.

Andreas

1 2 3 4 Previous Next

4036

Documents

rapid increasing

add additional

add isls links

transmission

latest code

ordered sets

transmission

word synchronisation