So, I have two ESXi5.5 hosts connected to some iSCSI storage. Both hosts are identical in every respect, inc the dual-port Intel X540-T2 10GbE NIC in the same slots. The storage array has two dual-port X540-T2 NICs. I'm connecting the hosts to the storage using CAT6A x-over cables. The hosts are using the standard ESXi software iSCSI adapter. There are no errors on either host. and the vCenter Appliance is happily talking to both hosts with HA/DRS enabled.
I needed to performance tune the iSCSI connection to the storage, and along with some basic modifications to the iSCSI S/W adapter on each host to match the storage array, I wanted to switch to using 9000-byte jumbo frames.
I made the jumbo frame setting changes to the storage, then made the changes to the ESXi hosts. The first host went perfect - a quick rescan of the HBA's, and the storage was working fine - much quicker as well. Then I made the IDENTICAL change to the second host, and it went into a complete fit. First, it completely lost connection to the storage, then I couldn't view any Network Adapter, Storage or Storage Adapter settings. The config pages wouldn't load, and I eventually lost connection from the VIClient. SSH still seemed to work, but no commands would be accepted. The console was also unresponsive, so I had to do a hard reboot. The host started to boot, but was VERY slow when scanning for iSCSI volumes. It did finish booting after about 10 minutes though.
At this stage, I could get back into the config settings, but the iSCSI volumes weren't showing, but the iSCSI LUN's were showing as available in the host iSCSI HBA! I tried to force mount the volumes, and the host went into meltdown again (same symptoms). Another hard boot. After about an hour of fighting with this, I changed the MTU back to 1500 on the vSwitch connected to the storage array. Almost instantly, the storage re-appeared and re-mounted. Change MTU back to 9000, same fit. I triple checked everything - all settings looked ok. I switched the host storage connection to an unused 10GbE port on the storage array - same problem. I changed the x-over cable - same problem. Nothing I could do would make the host use a 9000-byte MTU, but the other host is happily using it!
Beats me. Got to try and get this sorted before this system goes into production.
Anyone got any ideas?