Issue with Host Based Replication can cause hostd to panic

I’m currently working on a SRM 5.0 project. This is the end of week 5 and up until yesterday everything was going perfectly.

A little background. This environment is running vSphere 5.0 Update 1b, vSphere 5.1 isn’t an option because of lack of support from Symantec on NetBackup. The client wants to use vSphere Replication, AKA Host Based Replication.

Yesterday the client noticed some of his hosts disconnecting from vCenter. The hosts were still available via SSH and all VM’s were still running perfectly fine. Restarting the management agents had no effect, the hosts would not reconnect to vCenter. At random times the hosts would disconnect and later reconnect on their own. Looking through the hostd log we discovered the issue:

2012-10-18T16:25:54.241Z [296C5B90 panic ‘Default’]
–>
–> Panic: Assert Failed: “_quiescedType == quiescedType” @ bora/vim/hostd/hbrsvc/ReplicationGroup.cpp:3505
–> Backtrace:
–> [00] rip 1bfabb43
–> [01] rip 1be035be
–> [02] rip 1bfa1b00
–> [03] rip 1bfa1c12
–> [04] rip 1bd9c036
–> [05] rip 057ddcc5
–> [06] rip 057ddf49
–> [07] rip 057de526
–> [08] rip 057b1f20
–> [09] rip 057b26ba
–> [10] rip 057b9f96
–> [11] rip 1bd7e78a
–> [12] rip 1bd78b1c
–> [13] rip 1bd79556
–> [14] rip 1bd7ba28
–> [15] rip 052d8501
–> [16] rip 1bfce3e1
–> [17] rip 1bfc9533
–> [18] rip 1bfca0d8
–> [19] rip 052d8501
–> [20] rip 1bfbe679
–> [21] rip 1c676852

After opening a case with GSS we learned that if a VM that is being replicated with multiple vmdk’s while each vmdk is in a different state, for example: one disk is done replicating while the other is not, and a state change on the VM occurs such as a power cycle or a snapshot create or delete the replication manager on the host incorrectly assumes all vmdk’s are in the same state and when they’re not it causes hostd to panic, this condition will continue until replication of the VM is complete or you reboot the host. The state change here that is triggering this is snapshots created by NetBackup.  This issue is not present in ESXi 5.1 and will be fixed in ESXi 5.0 in a future update.

There isn’t a KB on this (yet) so I wanted to let anyone who maybe seeing this issue know that it’s a known issue and being worked on.

 

The KB on this issue is now live: http://kb.vmware.com/kb/2030515

Sam

We have seen this issue also, with SRM and using snapshots for backups. VMware support told us the same thing.

Appreciate this blog post – copy/paste of the panic message into Google brought it up. 🙂

Your email address will not be published. Required fields are marked *