Quantcast
Channel: VMware Communities : Discussion List - VMware ESXi 5
Viewing all articles
Browse latest Browse all 20614

Software iSCSI HBA, NetApp, and NIC failover (*crash*)

$
0
0

Hello - I am vSphere newbie and reading thru all documentation.

 

Ran into crash situation on ESXi 5.1 host yesterday:

 

1. NetApp storage on backend direct-connect to SAN switch. NetApp filer heads setup to partner with each other; primary IP is 172.28.4.160/24 (no gateway). Primary IP always available even if filer head goes down (sounds similar to cluster IP).

 

2. Allocated NIC2 and NIC 6 from each ESXi host for iSCSI.

 

3. Direct connect from ESXi host NICs to SAN switch. Defined iSCSI VLAN 240 for the physical switch (switchport mode trunk).

 

4. Created dvSwitch and set MTU to 9000. Created 2 dvUplinks for the dvSwitch and assigned ESXi host's NIC2 to dvUplinkStorage and NIC6 to dvUplinkStorageR. Created 2 dvPortgroups: dvPgStorage (VLAN 240) and dvPgStorageR (redundant, VLAN 240). Ensured that teaming policy for each portgroup has only the appropriate active dvUplink and that all other uplinks are set to "Unused". dvPgStorage has dvUplinkStorage and dvPgStorageR has dvUplinkStorageR.

 

5. Created 2 VMkernels on the ESXi host. vmk1 has IP of 172.28.4.90/24, MTU 9000. vmk2 has IP of 172.28.4.100/24, MTU 9000. Verified jumbo frames enabled and working via "vmkping -4 -d -I vmk1 -s 8200 172.28.4.160".

 

6. Created iSCSI software HBA on ESXi host (vmhba34). Added VMkernel vmk1 to vmhba34 network adapters. Added iSCSI target 172.28.4.160:3260.

 

7. In NetApp System Manager, created three LUNs (ID 1, 2, and 3) and an iGroup with the ESXi host IQN added (from vmhba34 properties).

 

8. Returned to VIC and verified that all LUNs (1, 2, and 3) showed up for the ESXi host under Storage Adapters.

 

9. Under ESXi host Storage tab, added datastore for the iSCSI LUN #2.

 

10. Provisioned VM to that datastore and started the VM. No problems - provisioning was fast (at least 150MB throughput from informal measurement of 1GbE NIC using dvSwitch portgroup monitoring) and the VM start was as fast as from local disk.

 

11. To enable failover: Added *second* VMkernel vmk2 to vmhba network adapters.

 

That last step is where the crash occurred. Immediately I could not get to the running VM.

 

I was able to connect back to the failed ESX host immediately. I went to the Storage Adapter for the ESXi host and saw that LUN 2 was marked as "dead" (grayed out). Interestingly enough, the Storage Adapter still showed both VMkernels and everything else (including all other LUNs) with healthy, green icons.

 

I first removed the second VMkernel (vmk2, the one for dvPgStorageR) and then rebooted the failed ESX host.
On reboot the iSCSI storage adapter was fine - all three LUNs reported healthy including the one that had showed up "dead" before. Unfortunately the iSCSI storage mount hadn't persisted but I simply remounted the LUN #2 and - tada - was able to restart the failed VM with no ill effects discernable.
So why did everything crash? Especially since I had followed all of the rules?
I think I found the answer: vSphere Storage docs on page 117 speak to NetApp storage systems:
<cut>
When you set up multipathing between two iSCSI HBAs and multiple ports on a NetApp
storage system, give each HBA a different iSCSI initiator name.
The NetApp storage system only permits one connection for each target and each initiator.
Attempts to make additional connections cause the first connection to drop. Therefore, a
single HBA should not attempt to connect to multiple IP addresses associated with the same
NetApp target.
</cut>
I suspect this was the problem: I can't simply add a second VMkernel to the software iSCSI HBA with a NetApp backend. Because - the ESXi host will try and establish sessions for *each VMkernel* for multi-pathing.
Since there is only one software iSCSI HBA, then there is only one Initiator name. To NetApp it looks like duplicate sessions from the same Initiator - the first session is closed which - BAM - causes the iSCSI session to be reset, which causes me to lose my LUN mount, and so on.
Am I correct? If so, how can I perform failover at iSCSI level with NetApp using software iSCSI on the ESXi host? Because - you can't create *2* iSCSI software HBAs only one.

 

Thanks for any advice. I've attached the /var/log/vmkernel.log for reference (failures up to the point where I performed a reboot).

 

 

 

6. Defined the ESXi


Viewing all articles
Browse latest Browse all 20614

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>