Background
Versions involved:
VMware ESXi, 7.0.1, 17325551, DEL-ESXi-701_17325551-A01
vCenter 7.0U1 Build 17491160
vCenter and ESXi hosts were upgraded from 6.7U3 to 7.0U1c an the vSAN disk format was upgraded to version 13.
Problem
After upgrading many clusters from 6.7U3 to 7.0U1c and upgrading the vSAN format to 13 we experienced a health warning after the upgrade.
The error message in Skyline Health was "vSAN critical alert regarding a potential data inconsistency"
For almost all clusters this error would fix itself within 60 minutes after the upgrade (typically in a much shorter time).
Trying to put a host in maintenance mode would fail after 1 hour. Before failing it would stop at a high percentage between 80 and even at 100% with a message "Objects Evacuated xxx of yyy. Data Evacuated xxx MB of yyy MB".
It's worth mentioning that this cluster had an active Horizon environment running during the upgrade and we suspect that it's constant tasks of creating and removing VMs has contributed to this problem.
Solution
We found a kb article with a similar error message even though we haven't changed the storage policy of any VMs for a long time (but Horizon might have done something like that behind the scenes): https://kb.vmware.com/s/article/82383
This article states this is a rare issue, but we found a korean page referring this same issue. The VMware kb article has a python script that you will need to run on each host involved. After running the python script we were able to put hosts in maintenance mode and do 7.x single image patching.
We asked VMware support if it was a good idea that we had changed this setting and their response was "Yes, if you want the DeltaComponent functionality going forward then please change it back to 1. The delta component makes a temporary component when there are maintenance mode issues."
Because of this we decided to change the value back and wrote a powershell script instead of running a python script on each host:
param (
[string]$clustername = $( Read-Host "Enter cluster name:" )
)
get-cluster $clustername|Get-VMHost| Get-AdvancedSetting -Name "VSAN.DeltaComponent"| Set-AdvancedSetting -Value 1 -Confirm:$false
As we've only found a single article on this issue (in Korean) I guess this issue is indeed quite rare, but if it happens again we now know what to do.
No comments:
Post a Comment