Hi all
Hoping for some pointers to see where to look as we have been experiencing so many issues.
Environment is:
ESX 5.5 (latest version as of this week)
HP Blade ProLiant BL460c Gen8
Brocade FC switches
VNX5400 storage
Everything is at current patch level (HP Feb SSP, latest drivers etc from the latest issue of VMWare/HP Recipe Book)
On a nightly basis we see messages of the following:
Lost access to volume 52e00703-7702b882-d845-0017a4779402 (DataStore) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
07/05/2014 07:50:16
Successfully restored access to volume 52e0072b-0adb69f0-f97b-0017a4779402 (DataStore) following connectivity issues.info
From the vmkernal.log I can see these messages:
2014-05-07T02:44:14.860Z cpu0:33743)lpfc: lpfc_scsi_cmd_iocb_cmpl:2157: 0:(0):3271: FCP cmd x2a failed <2/4> sid x0d0505, did x0d0100, oxid xd4 Abort Requested Host Abort Req
2014-05-07T02:44:14.860Z cpu0:32813)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x2a (0x412e80450900, 32805) to dev "naa.60060160fd503600ea66bd8acb82e311" on path "vmhba0:C0:T2:L4" Failed: H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL
2014-05-07T02:44:14.860Z cpu0:32813)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60060160fd503600ea66bd8acb82e311" state in doubt; requested fast path state update...
2014-05-07T02:44:14.860Z cpu0:32813)ScsiDeviceIO: 2337: Cmd(0x412e80450900) 0x2a, CmdSN 0x299 from world 32805 to dev "naa.60060160fd503600ea66bd8acb82e311" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2014-05-07T02:44:15.859Z cpu0:32881)lpfc: lpfc_scsi_cmd_iocb_cmpl:2157: 0:(0):3271: FCP cmd x2a failed <2/4> sid x0d0505, did x0d0100, oxid xe7 Abort Requested Host Abort Req
2014-05-07T02:44:15.860Z cpu0:32813)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60060160fd503600ea66bd8acb82e311" state in doubt; requested fast path state update...
2014-05-07T02:44:15.860Z cpu0:32813)ScsiDeviceIO: 2337: Cmd(0x413682f6e580) 0x2a, CmdSN 0x29a from world 32805 to dev "naa.60060160fd503600ea66bd8acb82e311" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2014-05-07T02:44:20.025Z cpu4:36428)World: 14296: VC opID 4CAEF736-0000009D-0-fb maps to vmkernel opID 4f50e1ac
2014-05-07T02:44:20.397Z cpu1:34426)Fil3: 15408: Max retries (10) exceeded for caller Fil3_FileIO (status 'IO was aborted by VMFS via a virt-reset on the device')
2014-05-07T02:44:20.397Z cpu1:34426)BC: 2288: Failed to write (uncached) object '.iormstats.sf': Maximum kernel-level retries exceeded
2014-05-07T02:44:23.013Z cpu8:34236)HBX: 2692: Waiting for timed out [HB state abcdef02 offset 4161536 gen 179 stampUS 1629072438 uuid 536997d8-04a1ea9d-6c9a-0017a4779402 jrnl <FB 2656233> drv 14.60] on vol 'uk1-san01:Production DataStore 3'
2014-05-07T02:44:29.645Z cpu4:33106)lpfc: lpfc_scsi_cmd_iocb_cmpl:2157: 0:(0):3271: FCP cmd xa3 failed <3/4> sid x0d0505, did x0d0000, oxid xeb Abort Requested Host Abort Req
2014-05-07T02:44:29.645Z cpu20:33044)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:651: Path "vmhba0:C0:T3:L4" (UP) command 0xa3 failed with status Timeout. H:0x5 D:0x0 P:0x0 Possible sense data: 0x5 0x20 0x0.
2014-05-07T02:44:29.755Z cpu7:32857)HBX: 255: Reclaimed heartbeat for volume 52e0072b-0adb69f0-f97b-0017a4779402 (uk1-san01:Production DataStore 3): [Timeout] Offset 4161536
2014-05-07T02:44:29.755Z cpu7:32857)[HB state abcdef02 offset 4161536 gen 179 stampUS 1632947371 uuid 536997d8-04a1ea9d-6c9a-0017a4779402 jrnl <FB 2656233> drv 14.60]
2014-05-07T02:44:31.808Z cpu0:32807)lpfc: lpfc_scsi_cmd_iocb_cmpl:2157: 0:(0):3271: FCP cmd x2a failed <2/4> sid x0d0505, did x0d0100, oxid x112 Abort Requested Host Abort Req
2014-05-07T02:44:31.808Z cpu0:32813)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60060160fd503600ea66bd8acb82e311" state in doubt; requested fast path state update...
2014-05-07T02:44:31.808Z cpu0:32813)ScsiDeviceIO: 2337: Cmd(0x412e8349b000) 0x2a, CmdSN 0x2ab from world 32805 to dev "naa.60060160fd503600ea66bd8acb82e311" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2014-05-07T02:44:32.407Z cpu1:34426)Fil3: 15408: Max retries (10) exceeded for caller Fil3_FileIO (status 'IO was aborted by VMFS via a virt-reset on the device')
2014-05-07T02:44:32.407Z cpu1:34426)BC: 2288: Failed to write (uncached) object '.iormstats.sf': Maximum kernel-level retries exceeded
2014-05-07T02:44:33.541Z cpu16:32855)HBX: 255: Reclaimed heartbeat for volume 52e0072b-0adb69f0-f97b-0017a4779402 (uk1-san01:Production DataStore 3): [Timeout] Offset 4161536
2014-05-07T02:44:33.541Z cpu16:32855)[HB state abcdef02 offset 4161536 gen 179 stampUS 1646119268 uuid 536997d8-04a1ea9d-6c9a-0017a4779402 jrnl <FB 2656233> drv 14.60]
Any help much appreciated.
Thanks