Cancel Remove Snapshot Task

December 18, 2014 2 By Allan Kjaer

* Disclamer: I'm not sure this is supported by VMware, and requires a deep knowledge of how VMware snapshots works, and this can vary in different versions.

I had a customer contacting me that they had a problem. They had shutdown a Virtual Machine because it had a huge snapshot ~1.7 TB, and afterwards they had initiated the remove snapshot task. But after 2 hours it was only at 10%. So quick calculation says that this will take +20 hours, and they could not wait that long too power on the machine again, so we had to do something. But you can't cancel the task from the vSphere Client.

The first thing was to find the host where the machine was located and that was doing the removal of the snapshot, so I logged in to the ESXi host using SSH, and running th following command:

~ # vim-cmd vimsvc/task_list
(ManagedObjectReference) [
   'vim.Task:haTask-311-vim.vm.Snapshot.remove-268537832',
   'vim.Task:haTask-ha-host-vim.HostSystem.acquireCimServicesTicket-268543629'

The vim.Task:haTask-311-vim.vm.Snapshot.remove-268537832 indicated that this host is removing a snapshot (if delete all is running the text indicats that) afterward I wanted to be sure that i was the correct machine, by running this:

~ # vim-cmd vimsvc/task_info haTask-311-vim.vm.Snapshot.remove-268537832
(vim.TaskInfo) {
   dynamicType = <unset>,
   key = "haTask-311-vim.vm.Snapshot.remove-268537832",
   task = 'vim.Task:haTask-311-vim.vm.Snapshot.remove-268537832',
   description = (vmodl.LocalizableMessage) null,
   name = "vim.vm.Snapshot.remove",
   descriptionId = "vm.Snapshot.remove",
   entity = 'vim.VirtualMachine:311',
   entityName = "SERVER01",
   state = "running",
   cancelled = false,
   cancelable = true,
   error = (vmodl.MethodFault) null,
   result = <unset>,
   progress = 10,
   reason = (vim.TaskReasonUser) {
      dynamicType = <unset>,
      userName = "vpxuser",
   },
   queueTime = "2014-12-17T18:10:17.285736Z",
   startTime = "2014-12-17T18:10:17.286068Z",
   completeTime = <unset>,
   eventChainId = 268537832,
   changeTag = <unset>,
   parentTaskKey = <unset>,
   rootTaskKey = <unset>,
}

Checking the information in the above output, and marked some of the imported information, like machine name, progress, start time, and that the task can be canceled.

Then I issued a Cancel Task with this command:

~ # vim-cmd vimsvc/task_cancel haTask-311-vim.vm.Snapshot.remove-268537832

After a few secounds the task was canceled.

Afterward we tried to power on the machine, but i returned the following error in the vSphere Client:

An error was received from the ESX host while powering on VM SERVER01.
Failed to start the virtual machine.
Module DiskEarly power on failed. 
Cannot open the disk 'SERVER01-000001.vmdk' or one of the snapshot disks it depends on. 
The system cannot find the file specified

To investigate this problem I had to look at the VMDK files for the VM, by running the following command:

~ # ls -lh /vmfs/volumes/Datastore1/SERVER01/*.vmdk
-rw——-    1 root     root       40.0G Dec 17 18:12 /vmfs/volumes//Datastore1/SERVER01/SERVER01-flat.vmdk
-rw——-    1 root     root         501 Dec 17  2013 /vmfs/volumes//Datastore1/SERVER01/SERVER01.vmdk
-rw——-    1 root     root     1000.0G Dec 17 20:28 /vmfs/volumes//Datastore1/SERVER01/SERVER01_1-flat.vmdk
-rw——-    1 root     root         494 Dec 17  2013 /vmfs/volumes//Datastore1/SERVER01/SERVER01_1.vmdk
-rw——-    1 root     root     1000.0G Dec 17 20:28 /vmfs/volumes//Datastore1/SERVER01/SERVER01_1-000001-delta.vmdk
-rw——-    1 root     root         494 Nov 22 11:54 /vmfs/volumes//Datastore1/SERVER01/SERVER01_1-000001.vmdk
-rw——-    1 root     root      522.1G Dec 17 18:01 /vmfs/volumes//Datastore1/SERVER01/SERVER01_2-flat.vmdk
-rw——-    1 root     root         494 Dec 17  2013 /vmfs/volumes//Datastore1/SERVER01/SERVER01_2.vmdk
-rw——-    1 root     root      617.8G Dec 17 18:01 /vmfs/volumes//Datastore1/SERVER01/SERVER01_1-000001-delta.vmdk
-rw——-    1 root     root         494 Nov 22 11:54 /vmfs/volumes//Datastore1/SERVER01/SERVER01_1-000001.vmdk
-rw——-    1 root     root     1000.0G Dec 17 18:01 /vmfs/volumes//Datastore1/SERVER01/SERVER01_3-flat.vmdk
-rw——-    1 root     root         494 Dec 17  2013 /vmfs/volumes//Datastore1/SERVER01/SERVER01_3.vmdk
-rw——-    1 root     root      567.3G Dec 17 18:01 /vmfs/volumes//Datastore1/SERVER01/SERVER01_1-000001-delta.vmdk
-rw——-    1 root     root         494 Nov 22 11:54 /vmfs/volumes//Datastore1/SERVER01/SERVER01_1-000001.vmdk

The file that the error say is missing, is not there, but I could see that the disk SERVER01_1-flat.vmdk was updated latest at Dec 17 18:12 and that was after the snapshot removal was started, and that SERVER01_1-flat.vmdk was update latest at Dec 17 20:28, this means the SERVER01_1-flat.vmdk was fully consolidated, but the VMX file did not reflect this, so i just edited one line i the VMX file:

Before (part of the VMX file):

scsi0:0.fileName = "SERVER01-000001.vmdk"
scsi0:0.mode = "persistent"

After (part of the VMX file):

scsi0:0.fileName = "SERVER01.vmdk"
scsi0:0.mode = "persistent"

Afterward I could power on the machine without any problems, and then initiate a Consolidate Snapshots, to delete the remaining snapshot, will the machine is running.

This procedure can de diffent if there is multiple snapshots on the Virtual Machine.

* I would strongly advise against using snapshot for more then 1-3 days and that grows to this size.

Please share this page if you find it usefull: