Check for null host before proceeding with VM volume operations in managed storage while restoring VM#12879
Conversation
…naged storage while restoring VM
|
@blueorangutan package |
|
@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 4.22 #12879 +/- ##
============================================
- Coverage 17.61% 17.61% -0.01%
Complexity 15665 15665
============================================
Files 5917 5917
Lines 531461 531464 +3
Branches 64977 64978 +1
============================================
- Hits 93608 93607 -1
- Misses 427295 427298 +3
- Partials 10558 10559 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Fixes a restore-VM failure mode in managed storage where vm.getLastHostId() points to a deleted host, causing host-dependent volume operations to NPE and leaving an extra ROOT volume behind.
Changes:
- Add a null-host guard in managed-storage handling during VM restore to safely skip host-dependent operations when the host record no longer exists.
- Minor formatting-only updates (whitespace/brace style) across
UserVmManagerImpl.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } catch (Exception ex) { | ||
| throw new CloudRuntimeException("Router start failed due to" + ex); |
There was a problem hiding this comment.
The exception message concatenation is missing a space ("due to" + ex) and the thrown CloudRuntimeException drops the original cause/stack trace. Consider including a separating space and passing the caught exception as the cause so callers/logs preserve details.
| // default findById() won't search entries with removed field not null | ||
| Host host = _hostDao.findById(hostId); | ||
| if (host == null) { | ||
| logger.warn("Host {} not found", hostId); |
There was a problem hiding this comment.
This new log line only prints the hostId; when debugging restore/re-image issues it would be helpful to include the VM and root volume identifiers (e.g., vm UUID/id and root volume UUID/id) in the warning so operators can correlate the skip with the affected resources.
| logger.warn("Host {} not found", hostId); | |
| logger.warn("Host {} not found for vm id: {}, uuid: {}, root volume id: {}, uuid: {}", hostId, vm.getId(), vm.getUuid(), root.getId(), root.getUuid()); |
| if (hostId != null) { | ||
| VolumeInfo volumeInfo = volFactory.getVolume(root.getId()); | ||
| // default findById() won't search entries with removed field not null | ||
| Host host = _hostDao.findById(hostId); | ||
| if (host == null) { | ||
| logger.warn("Host {} not found", hostId); | ||
| return; | ||
| } |
There was a problem hiding this comment.
There are unit tests for restoreVirtualMachine() in server/src/test/java/com/cloud/vm/UserVmManagerImplTest.java, but none appear to cover the managed-storage restore path where vm.getLastHostId() is set and _hostDao.findById(hostId) returns null (deleted host). Adding a test for this scenario would help prevent regressions (e.g., ensure restore proceeds without leaving an extra ROOT volume in Allocated state).
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17228 |
|
@blueorangutan test |
|
@kiranchavala a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
kiranchavala
left a comment
There was a problem hiding this comment.
LGTM
Steps to reproduce the issue
- Deploy vm with root volume on powerflex
- Stop vm
- Remove the host where last deployed
- Restore vm - should see NPE before fix
Before fix
2026-03-26 06:13:16,254 ERROR [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-24:ctx-03609cb1 job-89/job-90 ctx-82378625) (logid:2a5a5f3a) Invocation exception, caused by: java.lang.NullPointerException
2026-03-26 06:13:16,255 INFO [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-24:ctx-03609cb1 job-89/job-90 ctx-82378625) (logid:2a5a5f3a) Rethrow exception java.lang.NullPointerException
2026-03-26 06:13:16,254 DEBUG [c.c.a.ApiServlet] (qtp1052967153-21:ctx-11e2f3f3 ctx-5b70f743) (logid:6d155dc4) ===END=== 10.0.3.251 -- GET jobId=2a5a5f3a-343a-4d33-8c2b-5acb34860d77&command=queryAsyncJobResult&response=json&sessionkey=pcTUuEaUc9SM5K-7M3zJ9bLiyRk
2026-03-26 06:13:16,257 DEBUG [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-24:ctx-03609cb1 job-89/job-90) (logid:2a5a5f3a) Done with run of VM work job: com.cloud.vm.VmWorkRestore for VM 7, job origin: 89
2026-03-26 06:13:16,258 ERROR [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-24:ctx-03609cb1 job-89/job-90) (logid:2a5a5f3a) Unable to complete AsyncJob {"accountId":2,"cmd":"com.cloud.vm.VmWorkRestore","cmdInfo":"rO0ABXNyABpjb20uY2xvdWQudm0uVm1Xb3JrUmVzdG9yZQK3-6IUa1sTAgAEWgAHZXhwdW5nZUwAB2RldGFpbHN0AA9MamF2YS91dGlsL01hcDtMABJyb290RGlza09mZmVyaW5nSWR0ABBMamF2YS9sYW5nL0xvbmc7TAAKdGVtcGxhdGVJZHEAfgACeHIAE2NvbS5jbG91ZC52bS5WbVdvcmufmbZW8CVnawIABEoACWFjY291bnRJZEoABnVzZXJJZEoABHZtSWRMAAtoYW5kbGVyTmFtZXQAEkxqYXZhL2xhbmcvU3RyaW5nO3hwAAAAAAAAAAIAAAAAAAAAAgAAAAAAAAAHdAAZVmlydHVhbE1hY2hpbmVNYW5hZ2VySW1wbABzcgARamF2YS51dGlsLkhhc2hNYXAFB9rBwxZg0QMAAkYACmxvYWRGYWN0b3JJAAl0aHJlc2hvbGR4cD9AAAAAAAAAdwgAAAAQAAAAAHhwc3IADmphdmEubGFuZy5Mb25nO4vkkMyPI98CAAFKAAV2YWx1ZXhyABBqYXZhLmxhbmcuTnVtYmVyhqyVHQuU4IsCAAB4cAAAAAAAAAAE","cmdVersion":0,"completeMsid":null,"created":"Thu Mar 26 06:13:16 UTC 2026","id":90,"initMsid":32987261436360,"instanceId":null,"instanceType":null,"lastPolled":null,"lastUpdated":null,"processStatus":0,"removed":null,"result":null,"resultCode":0,"status":"IN_PROGRESS","userId":2,"uuid":"8802ec86-f177-4d58-afa9-3ce49aba5357"}, job origin:89
java.lang.NullPointerException
at com.cloud.vm.UserVmManagerImpl.handleManagedStorage(UserVmManagerImpl.java:8152)
at com.cloud.vm.UserVmManagerImpl.restoreVirtualMachine(UserVmManagerImpl.java:7929)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proc
After fix
2026-03-26 05:49:33,242 INFO [o.a.c.f.j.i.AsyncJobMonitor] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42) (logid:c619ae28) Add job-42 into job monitoring
2026-03-26 05:49:33,247 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42) (logid:c12e730d) Executing AsyncJob {"accountId":2,"cmd":"com.cloud.vm.VmWorkRestore","cmdInfo":"rO0ABXNyABpjb20uY2xvdWQudm0uVm1Xb3JrUmVzdG9yZQK3-6IUa1sTAgAEWgAHZXhwdW5nZUwAB2RldGFpbHN0AA9MamF2YS91dGlsL01hcDtMABJyb290RGlza09mZmVyaW5nSWR0ABBMamF2YS9sYW5nL0xvbmc7TAAKdGVtcGxhdGVJZHEAfgACeHIAE2NvbS5jbG91ZC52bS5WbVdvcmufmbZW8CVnawIABEoACWFjY291bnRJZEoABnVzZXJJZEoABHZtSWRMAAtoYW5kbGVyTmFtZXQAEkxqYXZhL2xhbmcvU3RyaW5nO3hwAAAAAAAAAAIAAAAAAAAAAgAAAAAAAAADdAAZVmlydHVhbE1hY2hpbmVNYW5hZ2VySW1wbABzcgARamF2YS51dGlsLkhhc2hNYXAFB9rBwxZg0QMAAkYACmxvYWRGYWN0b3JJAAl0aHJlc2hvbGR4cD9AAAAAAAAAdwgAAAAQAAAAAHhwc3IADmphdmEubGFuZy5Mb25nO4vkkMyPI98CAAFKAAV2YWx1ZXhyABBqYXZhLmxhbmcuTnVtYmVyhqyVHQuU4IsCAAB4cAAAAAAAAAAE","cmdVersion":0,"completeMsid":null,"created":"Thu Mar 26 05:49:32 UTC 2026","id":42,"initMsid":32988419064358,"instanceId":null,"instanceType":null,"lastPolled":null,"lastUpdated":null,"processStatus":0,"removed":null,"result":null,"resultCode":0,"status":"IN_PROGRESS","userId":2,"uuid":"22c19480-5b1a-4406-ab99-11b30cc962b2"}
2026-03-26 05:49:33,247 DEBUG [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42) (logid:c12e730d) Run VM work job: com.cloud.vm.VmWorkRestore for VM 3, job origin: 41
2026-03-26 05:49:33,251 DEBUG [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184) (logid:c12e730d) Execute VM work job: com.cloud.vm.VmWorkRestore{"templateId":4,"details":{},"expunge":false,"userId":2,"accountId":2,"vmId":3,"handlerName":"VirtualMachineManagerImpl"}
2026-03-26 05:49:33,255 DEBUG [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184) (logid:c12e730d) Restoring vm 3 with templateId : 4 diskOfferingId : null details : {}
2026-03-26 05:49:33,269 INFO [c.c.v.UserVmManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184) (logid:c12e730d) VM cannot be configured to be dynamically scalable if any of the service offering's dynamic scaling property, template's dynamic scaling property or global setting is false
2026-03-26 05:49:33,275 DEBUG [c.c.r.ResourceLimitManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184) (logid:c12e730d) Updating resource Type = volume count for Account = 2 Operation = increasing Amount = 1
2026-03-26 05:49:33,278 DEBUG [c.c.r.ResourceLimitManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184) (logid:c12e730d) Updating resource Type = primary_storage count for Account = 2 Operation = increasing Amount = (8.00 GB) 8589934592
2026-03-26 05:49:33,286 WARN [c.c.v.UserVmManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184) (logid:c12e730d) Host 1 not found
2026-03-26 05:49:33,322 DEBUG [c.c.r.ResourceLimitManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184 ctx-dec3ba0a) (logid:c12e730d) Updating resource Type = volume count for Account = 2 Operation = decreasing Amount = 1
2026-03-26 05:49:33,325 DEBUG [c.c.r.ResourceLimitManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184 ctx-dec3ba0a) (logid:c12e730d) Updating resource Type = primary_storage count for Account = 2 Operation = decreasing Amount = (8.00 GB) 8589934592
2026-03-26 05:49:33,336 DEBUG [c.c.v.UserVmManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184) (logid:c12e730d) Restore of VM VM instance {"id":3,"instanceName":"i-2-3-VM","state":"Stopped","type":"User","uuid":"1fa38e66-f36a-4ee0-bee0-22b771f771bb"} done successfully
2026-03-26 05:49:33,336 DEBUG [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184) (logid:c12e730d) Done executing VM work job: com.cloud.vm.VmWorkRestore{"templateId":4,"details":{},"expunge":false,"userId":2,"accountId":2,"vmId":3,"handlerName":"VirtualMachineManagerImpl"}
2026-03-26 05:49:33,337 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184) (logid:c12e730d) Complete async job-42, jobStatus: SUCCEEDED, resultCode: 0, result: rO0ABXNyABFqYXZhLnV0aWwuSGFzaE1hcAUH2sHDFmDRAwACRgAKbG9hZEZhY3RvckkACXRocmVzaG9sZHhwP0AAAAAAAAx3CAAAABAAAAABc3IADmphdmEubGFuZy5Mb25nO4vkkMyPI98CAAFKAAV2YWx1ZXhyABBqYXZhLmxhbmcuTnVtYmVyhqyVHQuU4IsCAAB4cAAAAAAAAAADcHg
|
[SF] Trillian test result (tid-15744)
|
Description
This PR checks for null host before proceeding with VM volume operations in managed storage while restoring VM.
During restore VM, when VM last host id returns null when the Host was deleted, the VM ends up with additional ROOT Volume in
Allocatedstate and the later re-image operation will be failing with validation error:Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
How did you try to break this feature and the system with this change?