Our system dates back to ESX 3.5 and was designed so that VMotion and Management traffic use the same vlan. There are 2 physical nic's in each of our ESXi hosts that are configured to carry this traffic.
On the virutal side, we have a vSwitch with VMKernel ports for Management and VMotion traffic. The physical nics are configured active/active for the vSwithc and active/standby for the VMKernel ports where one of the nic's is active for Management and the other is active for VMotion.
We are running 300 or so View desktops in this environment and the issue I would like to solve is this: If we place a loaded host in Maintenance Mode most, if not all, of the environment becomes noticeable slower, almost to the point of not being able to work. It returns to normal after a few minutes.
I don't have a test environment to replicate the issue and gather test data with and I can't use the production environment for that purpose So, as much as possible, I want to configure the system for redundancy and good performance in the event of failure of a host or one of the VMotion or Management nic's.
It has been suggested to implement network IO control if I can't separate the VMotion and Management traffic. I plan on configuring "Traffic Shaping" on the VMotion and Management vmkernel ports. Is this the best place to do this. Also, I realize the "right" answer depends on our environment but is there a suggested starting point for Average Bandwidth, Peak Bandwidth, and Burst Size on a 1Gig nic for VMotion and Management traffic?
My other question is the optimal allocation of the nic's in the Vkernel port groups. If I configure the nic's as active/active rather then active/standby, does that double the bandwidth of the VMkernel portgroup or is its bandwidth limited to that provided by a single port? In the event of a failure of one of the nic's, does the VMkernerl port go down or does it continue with the remaining link.
Thank you