June 3, 2024

Hosts out of sync after restoring vCenter

 Background

After working with VMware Support on a case we were asked to install a special patch on the vCenter server. It turned out this patch broke some unrelated functionality we needed (remounting rdm disks on a VM that already had 35 rdm disks). This script runs at night and the next day we decided to roll back vCenter to the backup from the previous day; to the backup that was taken just before the patch was installed.

Problem

Some of our ESXi hosts started showing symptoms of being out of sync, all the stats became blank and no alarms were triggered, just two blue info messages. Trying to reconfigure HA would however trigger alarms. 

Cannot synchronize host servername
Quick stats on servername is not up-to-date

Quick stats on xxxx is not up-to-date


The password for the vpxuser changes every 30 days and with many hosts in your vCenter it can potentially affect quite a few hosts depending on the time frame between the backup and the rollback. VMware has a list of things to consider when doing a restore, but the problem we experienced is not on the list.  

When you have 150 esxi hosts in your vCenter it can be time consuming to manually go through each host to find which hosts have been affected by the rollback.
Get-AdvancedSetting -Entity ($DefaultVIServer).Name -Name VirtualCenter.VimPasswordExpirationInDays
Get-AdvancedSetting -Entity ($DefaultVIServer).Name -Name VirtualCenter.VimPasswordExpirationInDays

Searching through the logs of one of the affected hosts revealed little about that it was having problems.

Solution

By looking at the logs through Splunk we could find a log entry from vCenter that blew up after the restore:

Exception occurred during host sync; Got method fault

Now we could use Splunk to give us a list of the affected servers.

Then we could right click each server from the vSphere client and chose Disconnect and then Connect again.

Disconnect + Connect

After having reconnected the hosts things were working fine again and the ongoing error messages we had in Splunk stopped coming.

No comments:

Post a Comment