Skip to content

Check for null host before proceeding with VM volume operations in managed storage while restoring VM#12879

Merged
nvazquez merged 1 commit intoapache:4.22from
shapeblue:fix-npe-handle-managed-storage
Mar 26, 2026
Merged

Check for null host before proceeding with VM volume operations in managed storage while restoring VM#12879
nvazquez merged 1 commit intoapache:4.22from
shapeblue:fix-npe-handle-managed-storage

Conversation

@sureshanaparti
Copy link
Contributor

@sureshanaparti sureshanaparti commented Mar 24, 2026

Description

This PR checks for null host before proceeding with VM volume operations in managed storage while restoring VM.

During restore VM, when VM last host id returns null when the Host was deleted, the VM ends up with additional ROOT Volume in Allocated state and the later re-image operation will be failing with validation error:

InvalidParameterValueException ex = new InvalidParameterValueException("There are " + rootVols.size() + " root volumes for VM " + vm.getUuid());

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@codecov
Copy link

codecov bot commented Mar 24, 2026

Codecov Report

❌ Patch coverage is 5.40541% with 35 lines in your changes missing coverage. Please review.
✅ Project coverage is 17.61%. Comparing base (bce5594) to head (2d51a71).
⚠️ Report is 4 commits behind head on 4.22.

Files with missing lines Patch % Lines
.../src/main/java/com/cloud/vm/UserVmManagerImpl.java 5.40% 32 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               4.22   #12879      +/-   ##
============================================
- Coverage     17.61%   17.61%   -0.01%     
  Complexity    15665    15665              
============================================
  Files          5917     5917              
  Lines        531461   531464       +3     
  Branches      64977    64978       +1     
============================================
- Hits          93608    93607       -1     
- Misses       427295   427298       +3     
- Partials      10558    10559       +1     
Flag Coverage Δ
uitests 3.70% <ø> (ø)
unittests 18.68% <5.40%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a restore-VM failure mode in managed storage where vm.getLastHostId() points to a deleted host, causing host-dependent volume operations to NPE and leaving an extra ROOT volume behind.

Changes:

  • Add a null-host guard in managed-storage handling during VM restore to safely skip host-dependent operations when the host record no longer exists.
  • Minor formatting-only updates (whitespace/brace style) across UserVmManagerImpl.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1211 to 1212
} catch (Exception ex) {
throw new CloudRuntimeException("Router start failed due to" + ex);
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception message concatenation is missing a space ("due to" + ex) and the thrown CloudRuntimeException drops the original cause/stack trace. Consider including a separating space and passing the caught exception as the cause so callers/logs preserve details.

Copilot uses AI. Check for mistakes.
// default findById() won't search entries with removed field not null
Host host = _hostDao.findById(hostId);
if (host == null) {
logger.warn("Host {} not found", hostId);
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new log line only prints the hostId; when debugging restore/re-image issues it would be helpful to include the VM and root volume identifiers (e.g., vm UUID/id and root volume UUID/id) in the warning so operators can correlate the skip with the affected resources.

Suggested change
logger.warn("Host {} not found", hostId);
logger.warn("Host {} not found for vm id: {}, uuid: {}, root volume id: {}, uuid: {}", hostId, vm.getId(), vm.getUuid(), root.getId(), root.getUuid());

Copilot uses AI. Check for mistakes.
Comment on lines 9221 to +9227
if (hostId != null) {
VolumeInfo volumeInfo = volFactory.getVolume(root.getId());
// default findById() won't search entries with removed field not null
Host host = _hostDao.findById(hostId);
if (host == null) {
logger.warn("Host {} not found", hostId);
return;
}
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are unit tests for restoreVirtualMachine() in server/src/test/java/com/cloud/vm/UserVmManagerImplTest.java, but none appear to cover the managed-storage restore path where vm.getLastHostId() is set and _hostDao.findById(hostId) returns null (deleted host). Adding a test for this scenario would help prevent regressions (e.g., ensure restore proceeds without leaving an extra ROOT volume in Allocated state).

Copilot uses AI. Check for mistakes.
@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17228

@kiranchavala
Copy link
Member

@blueorangutan test

@blueorangutan
Copy link

@kiranchavala a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

Copy link
Member

@kiranchavala kiranchavala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Steps to reproduce the issue

  1. Deploy vm with root volume on powerflex
  2. Stop vm
  3. Remove the host where last deployed
  4. Restore vm - should see NPE before fix

Before fix

Image
2026-03-26 06:13:16,254 ERROR [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-24:ctx-03609cb1 job-89/job-90 ctx-82378625) (logid:2a5a5f3a) Invocation exception, caused by: java.lang.NullPointerException
2026-03-26 06:13:16,255 INFO  [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-24:ctx-03609cb1 job-89/job-90 ctx-82378625) (logid:2a5a5f3a) Rethrow exception java.lang.NullPointerException
2026-03-26 06:13:16,254 DEBUG [c.c.a.ApiServlet] (qtp1052967153-21:ctx-11e2f3f3 ctx-5b70f743) (logid:6d155dc4) ===END===  10.0.3.251 -- GET  jobId=2a5a5f3a-343a-4d33-8c2b-5acb34860d77&command=queryAsyncJobResult&response=json&sessionkey=pcTUuEaUc9SM5K-7M3zJ9bLiyRk
2026-03-26 06:13:16,257 DEBUG [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-24:ctx-03609cb1 job-89/job-90) (logid:2a5a5f3a) Done with run of VM work job: com.cloud.vm.VmWorkRestore for VM 7, job origin: 89
2026-03-26 06:13:16,258 ERROR [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-24:ctx-03609cb1 job-89/job-90) (logid:2a5a5f3a) Unable to complete AsyncJob {"accountId":2,"cmd":"com.cloud.vm.VmWorkRestore","cmdInfo":"rO0ABXNyABpjb20uY2xvdWQudm0uVm1Xb3JrUmVzdG9yZQK3-6IUa1sTAgAEWgAHZXhwdW5nZUwAB2RldGFpbHN0AA9MamF2YS91dGlsL01hcDtMABJyb290RGlza09mZmVyaW5nSWR0ABBMamF2YS9sYW5nL0xvbmc7TAAKdGVtcGxhdGVJZHEAfgACeHIAE2NvbS5jbG91ZC52bS5WbVdvcmufmbZW8CVnawIABEoACWFjY291bnRJZEoABnVzZXJJZEoABHZtSWRMAAtoYW5kbGVyTmFtZXQAEkxqYXZhL2xhbmcvU3RyaW5nO3hwAAAAAAAAAAIAAAAAAAAAAgAAAAAAAAAHdAAZVmlydHVhbE1hY2hpbmVNYW5hZ2VySW1wbABzcgARamF2YS51dGlsLkhhc2hNYXAFB9rBwxZg0QMAAkYACmxvYWRGYWN0b3JJAAl0aHJlc2hvbGR4cD9AAAAAAAAAdwgAAAAQAAAAAHhwc3IADmphdmEubGFuZy5Mb25nO4vkkMyPI98CAAFKAAV2YWx1ZXhyABBqYXZhLmxhbmcuTnVtYmVyhqyVHQuU4IsCAAB4cAAAAAAAAAAE","cmdVersion":0,"completeMsid":null,"created":"Thu Mar 26 06:13:16 UTC 2026","id":90,"initMsid":32987261436360,"instanceId":null,"instanceType":null,"lastPolled":null,"lastUpdated":null,"processStatus":0,"removed":null,"result":null,"resultCode":0,"status":"IN_PROGRESS","userId":2,"uuid":"8802ec86-f177-4d58-afa9-3ce49aba5357"}, job origin:89
java.lang.NullPointerException
	at com.cloud.vm.UserVmManagerImpl.handleManagedStorage(UserVmManagerImpl.java:8152)
	at com.cloud.vm.UserVmManagerImpl.restoreVirtualMachine(UserVmManagerImpl.java:7929)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proc

After fix

2026-03-26 05:49:33,242 INFO  [o.a.c.f.j.i.AsyncJobMonitor] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42) (logid:c619ae28) Add job-42 into job monitoring
2026-03-26 05:49:33,247 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42) (logid:c12e730d) Executing AsyncJob {"accountId":2,"cmd":"com.cloud.vm.VmWorkRestore","cmdInfo":"rO0ABXNyABpjb20uY2xvdWQudm0uVm1Xb3JrUmVzdG9yZQK3-6IUa1sTAgAEWgAHZXhwdW5nZUwAB2RldGFpbHN0AA9MamF2YS91dGlsL01hcDtMABJyb290RGlza09mZmVyaW5nSWR0ABBMamF2YS9sYW5nL0xvbmc7TAAKdGVtcGxhdGVJZHEAfgACeHIAE2NvbS5jbG91ZC52bS5WbVdvcmufmbZW8CVnawIABEoACWFjY291bnRJZEoABnVzZXJJZEoABHZtSWRMAAtoYW5kbGVyTmFtZXQAEkxqYXZhL2xhbmcvU3RyaW5nO3hwAAAAAAAAAAIAAAAAAAAAAgAAAAAAAAADdAAZVmlydHVhbE1hY2hpbmVNYW5hZ2VySW1wbABzcgARamF2YS51dGlsLkhhc2hNYXAFB9rBwxZg0QMAAkYACmxvYWRGYWN0b3JJAAl0aHJlc2hvbGR4cD9AAAAAAAAAdwgAAAAQAAAAAHhwc3IADmphdmEubGFuZy5Mb25nO4vkkMyPI98CAAFKAAV2YWx1ZXhyABBqYXZhLmxhbmcuTnVtYmVyhqyVHQuU4IsCAAB4cAAAAAAAAAAE","cmdVersion":0,"completeMsid":null,"created":"Thu Mar 26 05:49:32 UTC 2026","id":42,"initMsid":32988419064358,"instanceId":null,"instanceType":null,"lastPolled":null,"lastUpdated":null,"processStatus":0,"removed":null,"result":null,"resultCode":0,"status":"IN_PROGRESS","userId":2,"uuid":"22c19480-5b1a-4406-ab99-11b30cc962b2"}
2026-03-26 05:49:33,247 DEBUG [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42) (logid:c12e730d) Run VM work job: com.cloud.vm.VmWorkRestore for VM 3, job origin: 41
2026-03-26 05:49:33,251 DEBUG [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184) (logid:c12e730d) Execute VM work job: com.cloud.vm.VmWorkRestore{"templateId":4,"details":{},"expunge":false,"userId":2,"accountId":2,"vmId":3,"handlerName":"VirtualMachineManagerImpl"}
2026-03-26 05:49:33,255 DEBUG [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184) (logid:c12e730d) Restoring vm 3 with templateId : 4 diskOfferingId : null details : {}
2026-03-26 05:49:33,269 INFO  [c.c.v.UserVmManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184) (logid:c12e730d) VM cannot be configured to be dynamically scalable if any of the service offering's dynamic scaling property, template's dynamic scaling property or global setting is false
2026-03-26 05:49:33,275 DEBUG [c.c.r.ResourceLimitManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184) (logid:c12e730d) Updating resource Type = volume count for Account = 2 Operation = increasing Amount = 1
2026-03-26 05:49:33,278 DEBUG [c.c.r.ResourceLimitManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184) (logid:c12e730d) Updating resource Type = primary_storage count for Account = 2 Operation = increasing Amount = (8.00 GB) 8589934592
2026-03-26 05:49:33,286 WARN  [c.c.v.UserVmManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184) (logid:c12e730d) Host 1 not found
2026-03-26 05:49:33,322 DEBUG [c.c.r.ResourceLimitManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184 ctx-dec3ba0a) (logid:c12e730d) Updating resource Type = volume count for Account = 2 Operation = decreasing Amount = 1
2026-03-26 05:49:33,325 DEBUG [c.c.r.ResourceLimitManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184 ctx-dec3ba0a) (logid:c12e730d) Updating resource Type = primary_storage count for Account = 2 Operation = decreasing Amount = (8.00 GB) 8589934592
2026-03-26 05:49:33,336 DEBUG [c.c.v.UserVmManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184) (logid:c12e730d) Restore of VM VM instance {"id":3,"instanceName":"i-2-3-VM","state":"Stopped","type":"User","uuid":"1fa38e66-f36a-4ee0-bee0-22b771f771bb"} done successfully
2026-03-26 05:49:33,336 DEBUG [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184) (logid:c12e730d) Done executing VM work job: com.cloud.vm.VmWorkRestore{"templateId":4,"details":{},"expunge":false,"userId":2,"accountId":2,"vmId":3,"handlerName":"VirtualMachineManagerImpl"}
2026-03-26 05:49:33,337 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184) (logid:c12e730d) Complete async job-42, jobStatus: SUCCEEDED, resultCode: 0, result: rO0ABXNyABFqYXZhLnV0aWwuSGFzaE1hcAUH2sHDFmDRAwACRgAKbG9hZEZhY3RvckkACXRocmVzaG9sZHhwP0AAAAAAAAx3CAAAABAAAAABc3IADmphdmEubGFuZy5Mb25nO4vkkMyPI98CAAFKAAV2YWx1ZXhyABBqYXZhLmxhbmcuTnVtYmVyhqyVHQuU4IsCAAB4cAAAAAAAAAADcHg 

Copy link
Contributor

@nvazquez nvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nvazquez nvazquez merged commit 84676af into apache:4.22 Mar 26, 2026
29 of 30 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in Apache CloudStack 4.22.1 Mar 26, 2026
@nvazquez nvazquez deleted the fix-npe-handle-managed-storage branch March 26, 2026 10:58
@blueorangutan
Copy link

[SF] Trillian test result (tid-15744)
Environment: kvm-ol8 (x2), zone: Advanced Networking with Mgmt server ol8
Total time taken: 49202 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr12879-t15744-kvm-ol8.zip
Smoke tests completed. 148 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestClusterDRS>:setup Error 0.00 test_cluster_drs.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants