Hello everyone,
I maintain a single VMWare host running vSphere 5.5 (ESXi) Update 2 OEM HP version at the moment for a mid-size charity.
The hardware in use:
HP Proliant DL380e Gen8 (bought brand new in August 2014), HP SmartArray B320i storage controller, HP H222 host bus adapter (only a HP Ultrium4 tape drive connected to that), HP Intel 4port NIC 366i, 32GB RAM, 2 Quadcore Intel Xeon E5-2407
The box was initially installed and configured in August using HP OEM vSphere 5.5 Update 1 installation CD. vSphere is installed on the RAID array configured on the B320i controller. A VMWare Essentials license is also in use/installed.
It's running 3 Windows 2008 R2 VMs (DC, Exchange 2010 and a backup server with Backup Exec 2010 R3 [I know this is not a recommended/supported configuration, but it worked with 5.5 U1 without issues]) besides 2 Debian Linux VMs.
2 weeks ago during weekend maintenance I first installed the latest HP SPP (Service Pack for Proliant) Sept. 2014 which provided several firmware updates for e.g. the B320i, the 366i NIC etc.
After that I performed an upgrade instalölation of vSphere HP OEM 5.5 Update 2 version, which was also released by HP beginning of Sept..
All those setup/update procedures went through without any issues, error messages or crashes.
The host was running fine for 3 days and suddenly crashed with a PSOD stating: PCPU 0: no heartbeat (2/2 IPIs received) [unfortunately I did not take a screenshot]
I reset/rebooted the host through iLo4 console and kept an eye on the server the next days.
The first PSOD took place during daily (nightly) backup on the connected tape drive.
On the following Friday/Saturday night (about 2 days later) it crashed again with the following PSOD - again with PCPU 0: no heartbeat (2/2 IPIs received):
![PSOD1.PNG]()
So I started investigating this, found some hints here in the VMWare communities leading to recommended BIOS settings of HP Proliant servers and checked the actual settings and changed the values to the recommended ones. The server was running fine without gliutches for about 16 hours then crashed again with this PSOD:
![PSOD2.PNG]()
I continued investigation, and especially took an eye on power management setting in BIOS, vSphere and in the Windows VMs.
Also checked installed firnware versions of the storage controllers and NIC and driver versions in use. All OK there (as recommended in HP VMWare recipe Sept. 2014).
Server was running fine for about a week after the reboot then another PSOD early this morning at about 3 a.m.:
![PSOD3.PNG]()
The server/VMs were mostly idle at this time, no heavy I/O activity.
The first two PSODs happened during backup but not at a certain time (one at about 10 p.m. the other early in the morning between 2 and 3 a.m.).
I read through tons of hints to faulty NIC drivers/firmware, BIOS confgurations etc. but nothing helps or even everything is configured exactly as in HP recommondations for vSphere 5.x.
For the BIOS settings I followed this list/table:Recommended BIOS Settings on HP ProLiant DL580 G7 for VMware vSphere | Boerlowie's Blog
vSphere is configured to "High Performance Mode" and the Windows VMs, too.
I'm somehow stuck now, so maybe someone here has a good hint for me?
If you need any further hardware/software/configuration/whatever details, just ask.
Cheers and thanks in advance for any help,
cykVM