Hello - I am vSphere newbie and reading thru all documentation.
Ran into crash situation on ESXi 5.1 host yesterday:
1. NetApp storage on backend direct-connect to SAN switch. NetApp filer heads setup to partner with each other; primary IP is 172.28.4.160/24 (no gateway). Primary IP always available even if filer head goes down (sounds similar to cluster IP).
2. Allocated NIC2 and NIC 6 from each ESXi host for iSCSI.
3. Direct connect from ESXi host NICs to SAN switch. Defined iSCSI VLAN 240 for the physical switch (switchport mode trunk).
4. Created dvSwitch and set MTU to 9000. Created 2 dvUplinks for the dvSwitch and assigned ESXi host's NIC2 to dvUplinkStorage and NIC6 to dvUplinkStorageR. Created 2 dvPortgroups: dvPgStorage (VLAN 240) and dvPgStorageR (redundant, VLAN 240). Ensured that teaming policy for each portgroup has only the appropriate active dvUplink and that all other uplinks are set to "Unused". dvPgStorage has dvUplinkStorage and dvPgStorageR has dvUplinkStorageR.
5. Created 2 VMkernels on the ESXi host. vmk1 has IP of 172.28.4.90/24, MTU 9000. vmk2 has IP of 172.28.4.100/24, MTU 9000. Verified jumbo frames enabled and working via "vmkping -4 -d -I vmk1 -s 8200 172.28.4.160".
6. Created iSCSI software HBA on ESXi host (vmhba34). Added VMkernel vmk1 to vmhba34 network adapters. Added iSCSI target 172.28.4.160:3260.
7. In NetApp System Manager, created three LUNs (ID 1, 2, and 3) and an iGroup with the ESXi host IQN added (from vmhba34 properties).
8. Returned to VIC and verified that all LUNs (1, 2, and 3) showed up for the ESXi host under Storage Adapters.
9. Under ESXi host Storage tab, added datastore for the iSCSI LUN #2.
10. Provisioned VM to that datastore and started the VM. No problems - provisioning was fast (at least 150MB throughput from informal measurement of 1GbE NIC using dvSwitch portgroup monitoring) and the VM start was as fast as from local disk.
11. To enable failover: Added *second* VMkernel vmk2 to vmhba network adapters.
That last step is where the crash occurred. Immediately I could not get to the running VM.
I was able to connect back to the failed ESX host immediately. I went to the Storage Adapter for the ESXi host and saw that LUN 2 was marked as "dead" (grayed out). Interestingly enough, the Storage Adapter still showed both VMkernels and everything else (including all other LUNs) with healthy, green icons.
Thanks for any advice. I've attached the /var/log/vmkernel.log for reference (failures up to the point where I performed a reboot).
6. Defined the ESXi