Check for null host before proceeding with VM volume operations in managed storage while restoring VM by sureshanaparti · Pull Request #12879 · apache/cloudstack

sureshanaparti · 2026-03-24T09:09:57Z

Description

This PR checks for null host before proceeding with VM volume operations in managed storage while restoring VM.

During restore VM, when VM last host id returns null when the Host was deleted, the VM ends up with additional ROOT Volume in Allocated state and the later re-image operation will be failing with validation error:

InvalidParameterValueException ex = new InvalidParameterValueException("There are " + rootVols.size() + " root volumes for VM " + vm.getUuid());

Types of changes

Breaking change (fix or feature that would cause existing functionality to change)
New feature (non-breaking change which adds functionality)
Bug fix (non-breaking change which fixes an issue)
Enhancement (improves an existing feature and functionality)
Cleanup (Code refactoring and cleanup, that may add test cases)
Build/CI
Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

Major
Minor

Bug Severity

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

…naged storage while restoring VM

sureshanaparti · 2026-03-24T09:15:47Z

@blueorangutan package

blueorangutan · 2026-03-24T09:18:05Z

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

codecov · 2026-03-24T09:18:28Z

Codecov Report

❌ Patch coverage is 5.40541% with 35 lines in your changes missing coverage. Please review.
✅ Project coverage is 17.61%. Comparing base (bce5594) to head (2d51a71).
⚠️ Report is 4 commits behind head on 4.22.

Files with missing lines	Patch %	Lines
.../src/main/java/com/cloud/vm/UserVmManagerImpl.java	5.40%	32 Missing and 3 partials ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               4.22   #12879      +/-   ##
============================================
- Coverage     17.61%   17.61%   -0.01%     
  Complexity    15665    15665              
============================================
  Files          5917     5917              
  Lines        531461   531464       +3     
  Branches      64977    64978       +1     
============================================
- Hits          93608    93607       -1     
- Misses       427295   427298       +3     
- Partials      10558    10559       +1

Flag	Coverage Δ
uitests	`3.70% <ø> (ø)`
unittests	`18.68% <5.40%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

Fixes a restore-VM failure mode in managed storage where vm.getLastHostId() points to a deleted host, causing host-dependent volume operations to NPE and leaving an extra ROOT volume behind.

Changes:

Add a null-host guard in managed-storage handling during VM restore to safely skip host-dependent operations when the host record no longer exists.
Minor formatting-only updates (whitespace/brace style) across UserVmManagerImpl.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-24T09:20:43Z

server/src/main/java/com/cloud/vm/UserVmManagerImpl.java

+            } catch (Exception ex) {
                throw new CloudRuntimeException("Router start failed due to" + ex);


The exception message concatenation is missing a space ("due to" + ex) and the thrown CloudRuntimeException drops the original cause/stack trace. Consider including a separating space and passing the caught exception as the cause so callers/logs preserve details.

Copilot · 2026-03-24T09:20:43Z

server/src/main/java/com/cloud/vm/UserVmManagerImpl.java

+                // default findById() won't search entries with removed field not null
                Host host = _hostDao.findById(hostId);
+                if (host == null) {
+                    logger.warn("Host {} not found", hostId);


This new log line only prints the hostId; when debugging restore/re-image issues it would be helpful to include the VM and root volume identifiers (e.g., vm UUID/id and root volume UUID/id) in the warning so operators can correlate the skip with the affected resources.

Suggested change

logger.warn("Host {} not found", hostId);

logger.warn("Host {} not found for vm id: {}, uuid: {}, root volume id: {}, uuid: {}", hostId, vm.getId(), vm.getUuid(), root.getId(), root.getUuid());

Copilot · 2026-03-24T09:20:43Z

server/src/main/java/com/cloud/vm/UserVmManagerImpl.java

            if (hostId != null) {
-                VolumeInfo volumeInfo = volFactory.getVolume(root.getId());
+                // default findById() won't search entries with removed field not null
                Host host = _hostDao.findById(hostId);
+                if (host == null) {
+                    logger.warn("Host {} not found", hostId);
+                    return;
+                }


There are unit tests for restoreVirtualMachine() in server/src/test/java/com/cloud/vm/UserVmManagerImplTest.java, but none appear to cover the managed-storage restore path where vm.getLastHostId() is set and _hostDao.findById(hostId) returns null (deleted host). Adding a test for this scenario would help prevent regressions (e.g., ensure restore proceeds without leaving an extra ROOT volume in Allocated state).

blueorangutan · 2026-03-24T10:15:01Z

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17228

kiranchavala · 2026-03-26T06:17:48Z

@blueorangutan test

blueorangutan · 2026-03-26T06:20:03Z

@kiranchavala a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

kiranchavala

LGTM

Steps to reproduce the issue

Deploy vm with root volume on powerflex
Stop vm
Remove the host where last deployed
Restore vm - should see NPE before fix

Before fix

2026-03-26 06:13:16,254 ERROR [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-24:ctx-03609cb1 job-89/job-90 ctx-82378625) (logid:2a5a5f3a) Invocation exception, caused by: java.lang.NullPointerException
2026-03-26 06:13:16,255 INFO  [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-24:ctx-03609cb1 job-89/job-90 ctx-82378625) (logid:2a5a5f3a) Rethrow exception java.lang.NullPointerException
2026-03-26 06:13:16,254 DEBUG [c.c.a.ApiServlet] (qtp1052967153-21:ctx-11e2f3f3 ctx-5b70f743) (logid:6d155dc4) ===END===  10.0.3.251 -- GET  jobId=2a5a5f3a-343a-4d33-8c2b-5acb34860d77&command=queryAsyncJobResult&response=json&sessionkey=pcTUuEaUc9SM5K-7M3zJ9bLiyRk
2026-03-26 06:13:16,257 DEBUG [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-24:ctx-03609cb1 job-89/job-90) (logid:2a5a5f3a) Done with run of VM work job: com.cloud.vm.VmWorkRestore for VM 7, job origin: 89
2026-03-26 06:13:16,258 ERROR [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-24:ctx-03609cb1 job-89/job-90) (logid:2a5a5f3a) Unable to complete AsyncJob {"accountId":2,"cmd":"com.cloud.vm.VmWorkRestore","cmdInfo":"rO0ABXNyABpjb20uY2xvdWQudm0uVm1Xb3JrUmVzdG9yZQK3-6IUa1sTAgAEWgAHZXhwdW5nZUwAB2RldGFpbHN0AA9MamF2YS91dGlsL01hcDtMABJyb290RGlza09mZmVyaW5nSWR0ABBMamF2YS9sYW5nL0xvbmc7TAAKdGVtcGxhdGVJZHEAfgACeHIAE2NvbS5jbG91ZC52bS5WbVdvcmufmbZW8CVnawIABEoACWFjY291bnRJZEoABnVzZXJJZEoABHZtSWRMAAtoYW5kbGVyTmFtZXQAEkxqYXZhL2xhbmcvU3RyaW5nO3hwAAAAAAAAAAIAAAAAAAAAAgAAAAAAAAAHdAAZVmlydHVhbE1hY2hpbmVNYW5hZ2VySW1wbABzcgARamF2YS51dGlsLkhhc2hNYXAFB9rBwxZg0QMAAkYACmxvYWRGYWN0b3JJAAl0aHJlc2hvbGR4cD9AAAAAAAAAdwgAAAAQAAAAAHhwc3IADmphdmEubGFuZy5Mb25nO4vkkMyPI98CAAFKAAV2YWx1ZXhyABBqYXZhLmxhbmcuTnVtYmVyhqyVHQuU4IsCAAB4cAAAAAAAAAAE","cmdVersion":0,"completeMsid":null,"created":"Thu Mar 26 06:13:16 UTC 2026","id":90,"initMsid":32987261436360,"instanceId":null,"instanceType":null,"lastPolled":null,"lastUpdated":null,"processStatus":0,"removed":null,"result":null,"resultCode":0,"status":"IN_PROGRESS","userId":2,"uuid":"8802ec86-f177-4d58-afa9-3ce49aba5357"}, job origin:89
java.lang.NullPointerException
	at com.cloud.vm.UserVmManagerImpl.handleManagedStorage(UserVmManagerImpl.java:8152)
	at com.cloud.vm.UserVmManagerImpl.restoreVirtualMachine(UserVmManagerImpl.java:7929)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proc

After fix

2026-03-26 05:49:33,242 INFO  [o.a.c.f.j.i.AsyncJobMonitor] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42) (logid:c619ae28) Add job-42 into job monitoring
2026-03-26 05:49:33,247 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42) (logid:c12e730d) Executing AsyncJob {"accountId":2,"cmd":"com.cloud.vm.VmWorkRestore","cmdInfo":"rO0ABXNyABpjb20uY2xvdWQudm0uVm1Xb3JrUmVzdG9yZQK3-6IUa1sTAgAEWgAHZXhwdW5nZUwAB2RldGFpbHN0AA9MamF2YS91dGlsL01hcDtMABJyb290RGlza09mZmVyaW5nSWR0ABBMamF2YS9sYW5nL0xvbmc7TAAKdGVtcGxhdGVJZHEAfgACeHIAE2NvbS5jbG91ZC52bS5WbVdvcmufmbZW8CVnawIABEoACWFjY291bnRJZEoABnVzZXJJZEoABHZtSWRMAAtoYW5kbGVyTmFtZXQAEkxqYXZhL2xhbmcvU3RyaW5nO3hwAAAAAAAAAAIAAAAAAAAAAgAAAAAAAAADdAAZVmlydHVhbE1hY2hpbmVNYW5hZ2VySW1wbABzcgARamF2YS51dGlsLkhhc2hNYXAFB9rBwxZg0QMAAkYACmxvYWRGYWN0b3JJAAl0aHJlc2hvbGR4cD9AAAAAAAAAdwgAAAAQAAAAAHhwc3IADmphdmEubGFuZy5Mb25nO4vkkMyPI98CAAFKAAV2YWx1ZXhyABBqYXZhLmxhbmcuTnVtYmVyhqyVHQuU4IsCAAB4cAAAAAAAAAAE","cmdVersion":0,"completeMsid":null,"created":"Thu Mar 26 05:49:32 UTC 2026","id":42,"initMsid":32988419064358,"instanceId":null,"instanceType":null,"lastPolled":null,"lastUpdated":null,"processStatus":0,"removed":null,"result":null,"resultCode":0,"status":"IN_PROGRESS","userId":2,"uuid":"22c19480-5b1a-4406-ab99-11b30cc962b2"}
2026-03-26 05:49:33,247 DEBUG [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42) (logid:c12e730d) Run VM work job: com.cloud.vm.VmWorkRestore for VM 3, job origin: 41
2026-03-26 05:49:33,251 DEBUG [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184) (logid:c12e730d) Execute VM work job: com.cloud.vm.VmWorkRestore{"templateId":4,"details":{},"expunge":false,"userId":2,"accountId":2,"vmId":3,"handlerName":"VirtualMachineManagerImpl"}
2026-03-26 05:49:33,255 DEBUG [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184) (logid:c12e730d) Restoring vm 3 with templateId : 4 diskOfferingId : null details : {}
2026-03-26 05:49:33,269 INFO  [c.c.v.UserVmManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184) (logid:c12e730d) VM cannot be configured to be dynamically scalable if any of the service offering's dynamic scaling property, template's dynamic scaling property or global setting is false
2026-03-26 05:49:33,275 DEBUG [c.c.r.ResourceLimitManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184) (logid:c12e730d) Updating resource Type = volume count for Account = 2 Operation = increasing Amount = 1
2026-03-26 05:49:33,278 DEBUG [c.c.r.ResourceLimitManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184) (logid:c12e730d) Updating resource Type = primary_storage count for Account = 2 Operation = increasing Amount = (8.00 GB) 8589934592
2026-03-26 05:49:33,286 WARN  [c.c.v.UserVmManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184) (logid:c12e730d) Host 1 not found
2026-03-26 05:49:33,322 DEBUG [c.c.r.ResourceLimitManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184 ctx-dec3ba0a) (logid:c12e730d) Updating resource Type = volume count for Account = 2 Operation = decreasing Amount = 1
2026-03-26 05:49:33,325 DEBUG [c.c.r.ResourceLimitManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184 ctx-dec3ba0a) (logid:c12e730d) Updating resource Type = primary_storage count for Account = 2 Operation = decreasing Amount = (8.00 GB) 8589934592
2026-03-26 05:49:33,336 DEBUG [c.c.v.UserVmManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184) (logid:c12e730d) Restore of VM VM instance {"id":3,"instanceName":"i-2-3-VM","state":"Stopped","type":"User","uuid":"1fa38e66-f36a-4ee0-bee0-22b771f771bb"} done successfully
2026-03-26 05:49:33,336 DEBUG [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184) (logid:c12e730d) Done executing VM work job: com.cloud.vm.VmWorkRestore{"templateId":4,"details":{},"expunge":false,"userId":2,"accountId":2,"vmId":3,"handlerName":"VirtualMachineManagerImpl"}
2026-03-26 05:49:33,337 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-6:ctx-1d65c227 job-41/job-42 ctx-7a37d184) (logid:c12e730d) Complete async job-42, jobStatus: SUCCEEDED, resultCode: 0, result: rO0ABXNyABFqYXZhLnV0aWwuSGFzaE1hcAUH2sHDFmDRAwACRgAKbG9hZEZhY3RvckkACXRocmVzaG9sZHhwP0AAAAAAAAx3CAAAABAAAAABc3IADmphdmEubGFuZy5Mb25nO4vkkMyPI98CAAFKAAV2YWx1ZXhyABBqYXZhLmxhbmcuTnVtYmVyhqyVHQuU4IsCAAB4cAAAAAAAAAADcHg

nvazquez

LGTM

blueorangutan · 2026-03-26T20:52:03Z

[SF] Trillian test result (tid-15744)
Environment: kvm-ol8 (x2), zone: Advanced Networking with Mgmt server ol8
Total time taken: 49202 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr12879-t15744-kvm-ol8.zip
Smoke tests completed. 148 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test	Result	Time (s)	Test File
ContextSuite context=TestClusterDRS>:setup	`Error`	0.00	test_cluster_drs.py

Check for null host before proceeding with VM volume operations in ma…

2d51a71

…naged storage while restoring VM

sureshanaparti added this to the 4.22.1 milestone Mar 24, 2026

sureshanaparti added this to Apache CloudStack 4.22.1 Mar 24, 2026

github-project-automation bot moved this to Todo in Apache CloudStack 4.22.1 Mar 24, 2026

sureshanaparti moved this from Todo to In Progress in Apache CloudStack 4.22.1 Mar 24, 2026

sureshanaparti requested review from Copilot and nvazquez March 24, 2026 09:15

Copilot started reviewing on behalf of sureshanaparti March 24, 2026 09:17 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

kiranchavala approved these changes Mar 26, 2026

View reviewed changes

kiranchavala added the status:ready-for-merge label Mar 26, 2026

nvazquez approved these changes Mar 26, 2026

View reviewed changes

nvazquez merged commit 84676af into apache:4.22 Mar 26, 2026
29 of 30 checks passed

github-project-automation bot moved this from In Progress to Done in Apache CloudStack 4.22.1 Mar 26, 2026

nvazquez deleted the fix-npe-handle-managed-storage branch March 26, 2026 10:58

		} catch (Exception ex) {
		throw new CloudRuntimeException("Router start failed due to" + ex);

	logger.warn("Host {} not found", hostId);
	logger.warn("Host {} not found for vm id: {}, uuid: {}, root volume id: {}, uuid: {}", hostId, vm.getId(), vm.getUuid(), root.getId(), root.getUuid());

Conversation

sureshanaparti commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Types of changes

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

Bug Severity

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

Uh oh!

sureshanaparti commented Mar 24, 2026

Uh oh!

blueorangutan commented Mar 24, 2026

Uh oh!

codecov bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

blueorangutan commented Mar 24, 2026

Uh oh!

kiranchavala commented Mar 26, 2026

Uh oh!

blueorangutan commented Mar 26, 2026

Uh oh!

kiranchavala left a comment

Choose a reason for hiding this comment

Uh oh!

nvazquez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

blueorangutan commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sureshanaparti commented Mar 24, 2026 •

edited

Loading

codecov bot commented Mar 24, 2026 •

edited

Loading