I'm trying to drain cloudvirts for reboots and live migration is behaving badly.
Some VMs migrate just fine
Most of the time, the VM is partially migrated and started on the destination host (so that I can see it in virsh and ps) but then the VM on the old host is never stopped and nova reports 'migrating' status forever. This produces a whole lot of warnings about the VM running where it doesn't belong.
Possible contributing factors:
- I restarted all nodes in the control plane earlier
Attempted fixes:
- Did a complete reset and restart of rabbitmq
- Restarted all openstack services several times
- Noticed that the restart --all cookbook does not restart cinder-api, so restarted that by hand (and made a patch, https://gerrit.wikimedia.org/r/c/cloud/wmcs-cookbooks/+/1115509)