Quantcast
Channel: VMware Communities : Discussion List - VMware ESXi 5
Viewing all articles
Browse latest Browse all 20614

Cluster Networking Issues

$
0
0

Hi All,

 

Just wanted to share a 'fun' situation which has been on-going for the last few months. This case has bounced back and fore between VMware and HP support and I finally think we've reached the end of our tether ! Below is a summary of the problem -

 

"We recently experienced a loss of network communications from a shared uplink set presented to our c7000
based  ESXi cluster. This uplink set hosts all of our virtual machine traffic  through multiple VLANs, therefore we lost all communications to our  virtual machines. See below for a list of hardware/software -

 

c7000 blade chasis
OA Firmware - 3.32
10x BL460c G7 hosts
2x VC 8Gb 24-Port FC Module
Firmware - 3.30
2x VC FlexFabric 10Gb/24-Port Module
Firmware - 1.04
Emulex NC553i 10Gb 2-port FlexFabric Converged Adapter
Firmware - 4.0.360.15
be2net driver - 4.0.355.1

 

In terms of connectivity we use the following -
Single Shared 10GbE uplink for all our virtual machine traffic (per module)
Single Dedicated 1GbE Uplink for ESXi management traffic (per module)
Both delivered via the VC FlexFabric 10Gb/24-Port Modules
Dual 8Gb FC uplinks for storage traffic delivered to blades via Mezzanine cards (per module)

 

In this incident all management ports to ESXi hosts are available but VMs are not contactable from machines outside of cluster.
VMs are accessible from vSphere console and available but they cannot talk to other networked services on different subnets.
VMs  can talk to other VMs in same cluster, on the same host, on the same  subnet – this is because communication does not go outside of the  Distributed Virtual Switch by design (DvS).
In the end, we could only restore normal operation by powering down the entire chassis."

 

HPs advice was the stock standard "Update your OA/VC/Blade enclosure firmware to the latest release" (we also updated the ESXi drivers from HP's most recent September release of ESXi 5.0 u1).

We did this as a pilot run at our secondary site which runs the same hardware/software and more importantly had not experienced the above issue. Then two days later we had exactly the same issue at our secondary site. HP faffed, batted it to VMware who batted this case back then we ran out of troubleshooting time and had to restore services...

 

In my mind this looks like a Virtual Connect issue, but at this point I am not ruling anything out.

IMHO our setup is a pretty common one, so I am curious as to whether anyone has seen or experienced anything similar or can give us some pointers as to where to go next.

At the moment I feel like we're sitting on a ticking timebomb. We have no confidence in either setup as we have not traced the cause of the fault and therefore we have no idea when the next outage will occur. It's worth noting that the primary site was in working operation for 6 months before this ugly issue reared its head. No major updates were applied to ESXi (apart from minor patches) and no driver or firmware updates had been applied to the enclosure.

 

At this point we're desperate for any help or advice !

 

Thanks!


Viewing all articles
Browse latest Browse all 20614

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>