Skip to content

Handle non-OvsFetchInterfaceAnswer in OVS tunnel manager#12860

Open
dheeraj12347 wants to merge 1 commit intoapache:4.22from
dheeraj12347:4.22
Open

Handle non-OvsFetchInterfaceAnswer in OVS tunnel manager#12860
dheeraj12347 wants to merge 1 commit intoapache:4.22from
dheeraj12347:4.22

Conversation

@dheeraj12347
Copy link

@dheeraj12347 dheeraj12347 commented Mar 19, 2026

When starting a GRE isolated network on XCP-ng, the management server can fail with a ClassCastException like UnsupportedAnswer cannot be cast to OvsFetchInterfaceAnswer in OvsTunnelManagerImpl.handleFetchInterfaceAnswer. This happens when the agent returns an UnsupportedAnswer (or another Answer type) for OvsFetchInterfaceCommand, but the code unconditionally casts the first Answer to OvsFetchInterfaceAnswer.

Root cause

handleFetchInterfaceAnswer assumed answers[0] is always an OvsFetchInterfaceAnswer and directly cast it without checking for null, array length, or actual runtime type. When the hypervisor agent responds with a different Answer implementation (e.g. UnsupportedAnswer), this results in a ClassCastException and the GRE tunnel setup fails.

Solution

Add null and length checks for the Answer[] to handle missing or empty responses.

Check answers[0] with instanceof OvsFetchInterfaceAnswer before casting.

If the answer is not an OvsFetchInterfaceAnswer, log a clear warning with the actual type and details, and return null instead of throwing a ClassCastException.

Preserve the existing success path when a successful OvsFetchInterfaceAnswer with a non-empty IP address is returned.

Testing

Local build on 4.22:

bash
mvn -pl api,server -am -DskipTests clean install

Addresses #12815

@weizhouapache
Copy link
Member

@dheeraj12347
I believe this PR handles the error.
However, we'd better know what caused the issue and fix the root cause. Otherwise, the feature is still non-functional

@weizhouapache
Copy link
Member

@dheeraj12347
are you able to reproduce the issue and verify your fix ?

@codecov
Copy link

codecov bot commented Mar 19, 2026

Codecov Report

❌ Patch coverage is 0% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 17.61%. Comparing base (27bce46) to head (8486a89).
⚠️ Report is 4 commits behind head on 4.22.

Files with missing lines Patch % Lines
...va/com/cloud/network/ovs/OvsTunnelManagerImpl.java 0.00% 13 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               4.22   #12860      +/-   ##
============================================
- Coverage     17.61%   17.61%   -0.01%     
+ Complexity    15662    15661       -1     
============================================
  Files          5917     5917              
  Lines        531415   531438      +23     
  Branches      64973    64974       +1     
============================================
+ Hits          93588    93589       +1     
- Misses       427271   427293      +22     
  Partials      10556    10556              
Flag Coverage Δ
uitests 3.70% <ø> (ø)
unittests 18.68% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@DaanHoogland DaanHoogland added this to the 4.22.1 milestone Mar 19, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the OVS tunnel manager’s handling of agent responses for OvsFetchInterfaceCommand to prevent management server failures (e.g., ClassCastException) when the agent returns a non-OvsFetchInterfaceAnswer (such as UnsupportedAnswer) during GRE isolated network setup on XCP-ng.

Changes:

  • Add null/empty checks for the Answer[] returned by the agent.
  • Guard the cast to OvsFetchInterfaceAnswer with an instanceof check and log a clear warning when the returned answer type is unexpected.
  • Minor cleanup of the IP string empty-check (isEmpty()).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

OvsTunnelInterfaceVO ti = createInterfaceRecord(ans.getIp(),
ans.getNetmask(), ans.getMac(), hostId, ans.getLabel());
ans.getNetmask(), ans.getMac(), hostId, ans.getLabel());
return ti.getIp();
@dheeraj12347
Copy link
Author

@dheeraj12347 are you able to reproduce the issue and verify your fix ?

I don’t currently have a full CloudStack 4.22 + XCP‑ng 8.3 + OVS/GRE environment to reproduce this end‑to‑end. If you think it’s important that I verify it myself, I can try to set up such a lab, but it may take some time given my hardware/resources.

@weizhouapache
Copy link
Member

@dheeraj12347

Normally, PR authors test their changes before requesting peer review. However, I understand that in some cases it may not be possible to reproduce the issue due to differences in hardware or environment, or because the issue itself is difficult to reproduce.

That said, I don’t think it’s necessary to acquire new hardware to reproduce and verify the fix in this case. It would be helpful if @UAnton could test the changes.

However, as mentioned in my previous comment, I don’t think this PR will fix the issue #12815. We need to understand why the expected OvsFetchInterfaceAnswer is not being returned, and why an UnsupportedAnswer is returned instead.

@DaanHoogland DaanHoogland linked an issue Mar 20, 2026 that may be closed by this pull request
@UAnton
Copy link

UAnton commented Mar 20, 2026

@dheeraj12347 @DaanHoogland @weizhouapache
Hi,
If you show me what I should do, then I will test it, of course...
P.S. I figured it out! I'm building deb packages.

@dheeraj12347
Copy link
Author

@dheeraj12347

Normally, PR authors test their changes before requesting peer review. However, I understand that in some cases it may not be possible to reproduce the issue due to differences in hardware or environment, or because the issue itself is difficult to reproduce.

That said, I don’t think it’s necessary to acquire new hardware to reproduce and verify the fix in this case. It would be helpful if @UAnton could test the changes.

However, as mentioned in my previous comment, I don’t think this PR will fix the issue #12815. We need to understand why the expected OvsFetchInterfaceAnswer is not being returned, and why an UnsupportedAnswer is returned instead.

Thanks for the review and explanation. This PR mainly makes the management server handle the UnsupportedAnswer safely and log a clear warning instead of throwing a ClassCastException. Right now I don’t have a CloudStack 4.22 + XCP‑ng 8.3 + OVS/GRE lab, so I can’t fully reproduce and verify the GRE isolation behavior myself. If you feel it’s important, I can try to build such an environment, and I’m also happy to support @UAnton who can test it in their setup.

@dheeraj12347
Copy link
Author

@dheeraj12347 @DaanHoogland @weizhouapache Hi, If you show me what I should do, then I will test it, of course... P.S. I figured it out! I'm building deb packages.

Hi @UAnton,
thanks a lot for offering to test. To verify this change with your XCP‑ng 8.3 setup, could you please:

Install/upgrade your CloudStack 4.22 management server with the deb package built from PR #12860.

Use the same GRE + XCP‑ng 8.3 environment where you saw the error before.

Create a GRE network offering, then create a GRE isolated network from it and deploy a VM in that network so the virtual router starts.

Check the management server logs:

The ClassCastException: UnsupportedAnswer cannot be cast to OvsFetchInterfaceAnswer should no longer appear.

Instead, there should be a warning saying that a non‑OvsFetchInterfaceAnswer (e.g. UnsupportedAnswer) was returned.
Please let me know what you see in the logs and whether the router/network behaves any differently.

@UAnton
Copy link

UAnton commented Mar 23, 2026

@dheeraj12347 Hi,
After update, SystemVMs don't start.

2026-03-23 20:42:50,122 DEBUG [c.c.h.x.r.XcpServer83Resource] (DirectAgent-21:[ctx-3f8a2f62]) (logid:4e6085c7) Trying to connect to 169.254.133.184 attempt 11 of 100
2026-03-23 20:42:50,309 INFO  [c.c.h.x.r.w.x.CitrixStartCommandWrapper] (DirectAgent-21:[ctx-3f8a2f62]) (logid:4e6085c7) Connected to SystemVM: v-2-VM
2026-03-23 20:42:50,557 DEBUG [c.c.h.x.r.XcpServer83Resource] (DirectAgent-20:[ctx-8cb63fc3]) (logid:950316e3) Executing command in VR: /opt/cloud/bin/router_proxy.sh patched.sh 169.254.249.62 /opt/cloud/bin/keystore*
2026-03-23 20:42:50,651 ERROR [c.c.h.x.r.XcpServer83Resource] (DirectAgent-21:[ctx-3f8a2f62]) (logid:4e6085c7) Failed to scp file agent.zip required for patching the systemVM
2026-03-23 20:42:50,651 DEBUG [c.c.h.x.r.XcpServer83Resource] (DirectAgent-21:[ctx-3f8a2f62]) (logid:4e6085c7) VR Config files at /opt/xensource/packages/resources/ got created in VR, IP: 169.254.133.184
2026-03-23 20:42:50,651 DEBUG [c.c.h.x.r.XcpServer83Resource] (DirectAgent-21:[ctx-3f8a2f62]) (logid:4e6085c7) Executing command in VR: /opt/cloud/bin/router_proxy.sh patched.sh 169.254.133.184 /opt/cloud/bin/keystore*
2026-03-23 20:42:50,716 INFO  [c.c.c.ClusterManagerImpl] (Cluster-Heartbeat-1:[ctx-3144a88e]) (logid:2d6eb3d9) No inactive management server node found
2026-03-23 20:42:50,716 DEBUG [c.c.c.ClusterManagerImpl] (Cluster-Heartbeat-1:[ctx-3144a88e]) (logid:2d6eb3d9) Peer scan is finished. profiler: Done. Duration: 1ms , profilerQueryActiveList: Done. Duration: 0ms, , profilerSyncClusterInfo: Done. Duration: 0ms, profilerInvalidatedNodeList: Done. Duration: 0ms, profilerRemovedList: Done. Duration: 0ms,, profilerNewList: Done. Duration: 0ms, profilerInactiveList: Done. Duration: 0ms
2026-03-23 20:42:50,985 ERROR [c.c.u.s.SshHelper] (DirectAgent-20:[ctx-8cb63fc3]) (logid:950316e3) SSH execution of command /opt/cloud/bin/router_proxy.sh patched.sh 169.254.249.62 /opt/cloud/bin/keystore* has an error status code in return. Result output: 
2026-03-23 20:42:51,062 ERROR [c.c.u.s.SshHelper] (DirectAgent-21:[ctx-3f8a2f62]) (logid:4e6085c7) SSH execution of command /opt/cloud/bin/router_proxy.sh patched.sh 169.254.133.184 /opt/cloud/bin/keystore* has an error status code in return. Result output: 
2026-03-23 20:42:51,799 DEBUG [o.a.c.h.H.HAManagerBgPollTask] (BackgroundTaskPollManager-6:[ctx-2271c7ed]) (logid:707fbbc6) HA health check task is running...
2026-03-23 20:42:51,986 DEBUG [c.c.h.x.r.XcpServer83Resource] (DirectAgent-20:[ctx-8cb63fc3]) (logid:950316e3) Executing command in VR: /opt/cloud/bin/router_proxy.sh patched.sh 169.254.249.62 /opt/cloud/bin/keystore*
2026-03-23 20:42:52,062 DEBUG [c.c.h.x.r.XcpServer83Resource] (DirectAgent-21:[ctx-3f8a2f62]) (logid:4e6085c7) Executing command in VR: /opt/cloud/bin/router_proxy.sh patched.sh 169.254.133.184 /opt/cloud/bin/keystore*

@dheeraj12347
Copy link
Author

@dheeraj12347 Hi, After update, SystemVMs don't start.

2026-03-23 20:42:50,122 DEBUG [c.c.h.x.r.XcpServer83Resource] (DirectAgent-21:[ctx-3f8a2f62]) (logid:4e6085c7) Trying to connect to 169.254.133.184 attempt 11 of 100
2026-03-23 20:42:50,309 INFO  [c.c.h.x.r.w.x.CitrixStartCommandWrapper] (DirectAgent-21:[ctx-3f8a2f62]) (logid:4e6085c7) Connected to SystemVM: v-2-VM
2026-03-23 20:42:50,557 DEBUG [c.c.h.x.r.XcpServer83Resource] (DirectAgent-20:[ctx-8cb63fc3]) (logid:950316e3) Executing command in VR: /opt/cloud/bin/router_proxy.sh patched.sh 169.254.249.62 /opt/cloud/bin/keystore*
2026-03-23 20:42:50,651 ERROR [c.c.h.x.r.XcpServer83Resource] (DirectAgent-21:[ctx-3f8a2f62]) (logid:4e6085c7) Failed to scp file agent.zip required for patching the systemVM
2026-03-23 20:42:50,651 DEBUG [c.c.h.x.r.XcpServer83Resource] (DirectAgent-21:[ctx-3f8a2f62]) (logid:4e6085c7) VR Config files at /opt/xensource/packages/resources/ got created in VR, IP: 169.254.133.184
2026-03-23 20:42:50,651 DEBUG [c.c.h.x.r.XcpServer83Resource] (DirectAgent-21:[ctx-3f8a2f62]) (logid:4e6085c7) Executing command in VR: /opt/cloud/bin/router_proxy.sh patched.sh 169.254.133.184 /opt/cloud/bin/keystore*
2026-03-23 20:42:50,716 INFO  [c.c.c.ClusterManagerImpl] (Cluster-Heartbeat-1:[ctx-3144a88e]) (logid:2d6eb3d9) No inactive management server node found
2026-03-23 20:42:50,716 DEBUG [c.c.c.ClusterManagerImpl] (Cluster-Heartbeat-1:[ctx-3144a88e]) (logid:2d6eb3d9) Peer scan is finished. profiler: Done. Duration: 1ms , profilerQueryActiveList: Done. Duration: 0ms, , profilerSyncClusterInfo: Done. Duration: 0ms, profilerInvalidatedNodeList: Done. Duration: 0ms, profilerRemovedList: Done. Duration: 0ms,, profilerNewList: Done. Duration: 0ms, profilerInactiveList: Done. Duration: 0ms
2026-03-23 20:42:50,985 ERROR [c.c.u.s.SshHelper] (DirectAgent-20:[ctx-8cb63fc3]) (logid:950316e3) SSH execution of command /opt/cloud/bin/router_proxy.sh patched.sh 169.254.249.62 /opt/cloud/bin/keystore* has an error status code in return. Result output: 
2026-03-23 20:42:51,062 ERROR [c.c.u.s.SshHelper] (DirectAgent-21:[ctx-3f8a2f62]) (logid:4e6085c7) SSH execution of command /opt/cloud/bin/router_proxy.sh patched.sh 169.254.133.184 /opt/cloud/bin/keystore* has an error status code in return. Result output: 
2026-03-23 20:42:51,799 DEBUG [o.a.c.h.H.HAManagerBgPollTask] (BackgroundTaskPollManager-6:[ctx-2271c7ed]) (logid:707fbbc6) HA health check task is running...
2026-03-23 20:42:51,986 DEBUG [c.c.h.x.r.XcpServer83Resource] (DirectAgent-20:[ctx-8cb63fc3]) (logid:950316e3) Executing command in VR: /opt/cloud/bin/router_proxy.sh patched.sh 169.254.249.62 /opt/cloud/bin/keystore*
2026-03-23 20:42:52,062 DEBUG [c.c.h.x.r.XcpServer83Resource] (DirectAgent-21:[ctx-3f8a2f62]) (logid:4e6085c7) Executing command in VR: /opt/cloud/bin/router_proxy.sh patched.sh 169.254.133.184 /opt/cloud/bin/keystore*

Hi @UAnton,
thanks a lot for testing and sharing the logs. The agent.zip / router_proxy.sh patched.sh ... keystore* errors look like a separate SystemVM patching/environment issue, not directly related to the GRE UnsupportedAnswer cannot be cast to OvsFetchInterfaceAnswer fix. If it turns out any of this is caused by my change I’ll definitely investigate and update the PR. Once your SystemVMs are starting normally again, it would be great if you could retry the GRE case and let me know if the original ClassCastException is gone and the new warning about UnsupportedAnswer appears.

@UAnton
Copy link

UAnton commented Mar 26, 2026

Hi @dheeraj12347
I deploy a new lab on 4.22, upgrade to 4.22 SNAPSHOT. Now the router is starting up, and GRE is working, but only if the router and VMs are on one node. Ig VM on the node1 and router on the node2 - network not working

P.S. Network providers are broken
Знімок екрана 2026-03-26 о 09 26 06

@dheeraj12347
Copy link
Author

Hi @dheeraj12347 I deploy a new lab on 4.22, upgrade to 4.22 SNAPSHOT. Now the router is starting up, and GRE is working, but only if the router and VMs are on one node. Ig VM on the node1 and router on the node2 - network not working

P.S. Network providers are broken

Hi @UAnton,
thanks a lot for rebuilding the lab and testing again, it’s really helpful. It’s great that the router now starts and GRE works when the router and VMs are on the same node. The issues you’re seeing with cross‑node traffic and the broken network providers in the screenshot suggest there is more going on in the multi‑host GRE/OVS and provider configuration.
Could you please share a bit more detail on your setup (number of hosts and GRE/OVS layout) and a log snippet from the management server around the time you test VMs on different nodes (especially any OvsFetchInterfaceCommand / UnsupportedAnswer lines)? That will help me look deeper into what might be going wrong.

@UAnton
Copy link

UAnton commented Mar 26, 2026

@dheeraj12347,
My lab - 2 hosts with 4 network interfaces (MGMT, Public, Guest, Storage). Default installation of xcp-ng.
root@cloudstack:/srv# grep -n "OvsFetchInterface" /var/log/cloudstack/management/management-server.log 145333:2026-03-26 08:03:37,161 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] (Work-Job-Executor-19:[ctx-ff3a26f7, job-116/job-117, ctx-78d7fd53]) (logid:7dad1999) Wait time setting on com.cloud.agent.api.OvsFetchInterfaceCommand is 1800 seconds 145335:2026-03-26 08:03:37,162 DEBUG [c.c.a.t.Request] (Work-Job-Executor-19:[ctx-ff3a26f7, job-116/job-117, ctx-78d7fd53]) (logid:7dad1999) Seq 1-7402510412513544295: Sending { Cmd , MgmtId: 345051273784, via: 1(xcp-ng-02), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.OvsFetchInterfaceCommand":{"label":"cloudbr3","wait":"0","bypassHostMaintenance":"false"}}] } 145336:2026-03-26 08:03:37,162 DEBUG [c.c.a.t.Request] (Work-Job-Executor-19:[ctx-ff3a26f7, job-116/job-117, ctx-78d7fd53]) (logid:7dad1999) Seq 1-7402510412513544295: Executing: { Cmd , MgmtId: 345051273784, via: 1(xcp-ng-02), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.OvsFetchInterfaceCommand":{"label":"cloudbr3","wait":"0","bypassHostMaintenance":"false"}}] } 145340:2026-03-26 08:03:37,241 DEBUG [c.c.a.t.Request] (DirectAgent-161:[ctx-d16b0a75]) (logid:7dad1999) Seq 1-7402510412513544295: Processing: { Ans: , MgmtId: 345051273784, via: 1(xcp-ng-02), Ver: v1, Flags: 110, [{"com.cloud.agent.api.UnsupportedAnswer":{"result":"false","details":"Unsupported command issued: com.cloud.agent.api.OvsFetchInterfaceCommand. Are you sure you got the right type of server?","wait":"0","bypassHostMaintenance":"false"}}] } 145343:2026-03-26 08:03:37,241 WARN [c.c.n.o.OvsTunnelManagerImpl] (Work-Job-Executor-19:[ctx-ff3a26f7, job-116/job-117, ctx-78d7fd53]) (logid:7dad1999) Expected OvsFetchInterfaceAnswer from host 1 but got UnsupportedAnswer with details: Unsupported command issued: com.cloud.agent.api.OvsFetchInterfaceCommand. Are you sure you got the right type of server? 146192:2026-03-26 08:04:54,392 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] (Work-Job-Executor-19:[ctx-ff3a26f7, job-116/job-117, ctx-78d7fd53]) (logid:7dad1999) Wait time setting on com.cloud.agent.api.OvsFetchInterfaceCommand is 1800 seconds 146194:2026-03-26 08:04:54,393 DEBUG [c.c.a.t.Request] (Work-Job-Executor-19:[ctx-ff3a26f7, job-116/job-117, ctx-78d7fd53]) (logid:7dad1999) Seq 2-2107684625609393625: Sending { Cmd , MgmtId: 345051273784, via: 2(xcp-ng-01), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.OvsFetchInterfaceCommand":{"label":"cloudbr3","wait":"0","bypassHostMaintenance":"false"}}] } 146195:2026-03-26 08:04:54,393 DEBUG [c.c.a.t.Request] (Work-Job-Executor-19:[ctx-ff3a26f7, job-116/job-117, ctx-78d7fd53]) (logid:7dad1999) Seq 2-2107684625609393625: Executing: { Cmd , MgmtId: 345051273784, via: 2(xcp-ng-01), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.OvsFetchInterfaceCommand":{"label":"cloudbr3","wait":"0","bypassHostMaintenance":"false"}}] } 146199:2026-03-26 08:04:54,447 DEBUG [c.c.a.t.Request] (DirectAgent-328:[ctx-a017872f]) (logid:7dad1999) Seq 2-2107684625609393625: Processing: { Ans: , MgmtId: 345051273784, via: 2(xcp-ng-01), Ver: v1, Flags: 110, [{"com.cloud.agent.api.UnsupportedAnswer":{"result":"false","details":"Unsupported command issued: com.cloud.agent.api.OvsFetchInterfaceCommand. Are you sure you got the right type of server?","wait":"0","bypassHostMaintenance":"false"}}] } 146202:2026-03-26 08:04:54,447 WARN [c.c.n.o.OvsTunnelManagerImpl] (Work-Job-Executor-19:[ctx-ff3a26f7, job-116/job-117, ctx-78d7fd53]) (logid:7dad1999) Expected OvsFetchInterfaceAnswer from host 2 but got UnsupportedAnswer with details: Unsupported command issued: com.cloud.agent.api.OvsFetchInterfaceCommand. Are you sure you got the right type of server? 148346:2026-03-26 08:07:22,971 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] (Work-Job-Executor-20:[ctx-38a32219, job-119/job-120, ctx-2d13d180]) (logid:cbd2ee98) Wait time setting on com.cloud.agent.api.OvsFetchInterfaceCommand is 1800 seconds 148348:2026-03-26 08:07:22,972 DEBUG [c.c.a.t.Request] (Work-Job-Executor-20:[ctx-38a32219, job-119/job-120, ctx-2d13d180]) (logid:cbd2ee98) Seq 2-2107684625609393638: Sending { Cmd , MgmtId: 345051273784, via: 2(xcp-ng-01), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.OvsFetchInterfaceCommand":{"label":"cloudbr3","wait":"0","bypassHostMaintenance":"false"}}] } 148349:2026-03-26 08:07:22,973 DEBUG [c.c.a.t.Request] (Work-Job-Executor-20:[ctx-38a32219, job-119/job-120, ctx-2d13d180]) (logid:cbd2ee98) Seq 2-2107684625609393638: Executing: { Cmd , MgmtId: 345051273784, via: 2(xcp-ng-01), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.OvsFetchInterfaceCommand":{"label":"cloudbr3","wait":"0","bypassHostMaintenance":"false"}}] } 148353:2026-03-26 08:07:23,026 DEBUG [c.c.a.t.Request] (DirectAgent-171:[ctx-18ee3eb2]) (logid:cbd2ee98) Seq 2-2107684625609393638: Processing: { Ans: , MgmtId: 345051273784, via: 2(xcp-ng-01), Ver: v1, Flags: 110, [{"com.cloud.agent.api.UnsupportedAnswer":{"result":"false","details":"Unsupported command issued: com.cloud.agent.api.OvsFetchInterfaceCommand. Are you sure you got the right type of server?","wait":"0","bypassHostMaintenance":"false"}}] } 148356:2026-03-26 08:07:23,026 WARN [c.c.n.o.OvsTunnelManagerImpl] (Work-Job-Executor-20:[ctx-38a32219, job-119/job-120, ctx-2d13d180]) (logid:cbd2ee98) Expected OvsFetchInterfaceAnswer from host 2 but got UnsupportedAnswer with details: Unsupported command issued: com.cloud.agent.api.OvsFetchInterfaceCommand. Are you sure you got the right type of server? 148824:2026-03-26 08:07:32,993 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] (Work-Job-Executor-21:[ctx-87357eba, job-119/job-121, ctx-bd188ee1]) (logid:cbd2ee98) Wait time setting on com.cloud.agent.api.OvsFetchInterfaceCommand is 1800 seconds 148826:2026-03-26 08:07:32,994 DEBUG [c.c.a.t.Request] (Work-Job-Executor-21:[ctx-87357eba, job-119/job-121, ctx-bd188ee1]) (logid:cbd2ee98) Seq 2-2107684625609393642: Sending { Cmd , MgmtId: 345051273784, via: 2(xcp-ng-01), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.OvsFetchInterfaceCommand":{"label":"cloudbr3","wait":"0","bypassHostMaintenance":"false"}}] } 148827:2026-03-26 08:07:32,994 DEBUG [c.c.a.t.Request] (Work-Job-Executor-21:[ctx-87357eba, job-119/job-121, ctx-bd188ee1]) (logid:cbd2ee98) Seq 2-2107684625609393642: Executing: { Cmd , MgmtId: 345051273784, via: 2(xcp-ng-01), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.OvsFetchInterfaceCommand":{"label":"cloudbr3","wait":"0","bypassHostMaintenance":"false"}}] } 148831:2026-03-26 08:07:33,055 DEBUG [c.c.a.t.Request] (DirectAgent-23:[ctx-fd0cee9d]) (logid:cbd2ee98) Seq 2-2107684625609393642: Processing: { Ans: , MgmtId: 345051273784, via: 2(xcp-ng-01), Ver: v1, Flags: 110, [{"com.cloud.agent.api.UnsupportedAnswer":{"result":"false","details":"Unsupported command issued: com.cloud.agent.api.OvsFetchInterfaceCommand. Are you sure you got the right type of server?","wait":"0","bypassHostMaintenance":"false"}}] } 148834:2026-03-26 08:07:33,055 WARN [c.c.n.o.OvsTunnelManagerImpl] (Work-Job-Executor-21:[ctx-87357eba, job-119/job-121, ctx-bd188ee1]) (logid:cbd2ee98) Expected OvsFetchInterfaceAnswer from host 2 but got UnsupportedAnswer with details: Unsupported command issued: com.cloud.agent.api.OvsFetchInterfaceCommand. Are you sure you got the right type of server? 149205:2026-03-26 08:07:39,981 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] (Work-Job-Executor-21:[ctx-87357eba, job-119/job-121, ctx-bd188ee1]) (logid:cbd2ee98) Wait time setting on com.cloud.agent.api.OvsFetchInterfaceCommand is 1800 seconds 149207:2026-03-26 08:07:39,982 DEBUG [c.c.a.t.Request] (Work-Job-Executor-21:[ctx-87357eba, job-119/job-121, ctx-bd188ee1]) (logid:cbd2ee98) Seq 1-7402510412513544325: Sending { Cmd , MgmtId: 345051273784, via: 1(xcp-ng-02), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.OvsFetchInterfaceCommand":{"label":"cloudbr3","wait":"0","bypassHostMaintenance":"false"}}] } 149208:2026-03-26 08:07:39,982 DEBUG [c.c.a.t.Request] (Work-Job-Executor-21:[ctx-87357eba, job-119/job-121, ctx-bd188ee1]) (logid:cbd2ee98) Seq 1-7402510412513544325: Executing: { Cmd , MgmtId: 345051273784, via: 1(xcp-ng-02), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.OvsFetchInterfaceCommand":{"label":"cloudbr3","wait":"0","bypassHostMaintenance":"false"}}] } 149212:2026-03-26 08:07:40,035 DEBUG [c.c.a.t.Request] (DirectAgent-174:[ctx-01d61c0b]) (logid:cbd2ee98) Seq 1-7402510412513544325: Processing: { Ans: , MgmtId: 345051273784, via: 1(xcp-ng-02), Ver: v1, Flags: 110, [{"com.cloud.agent.api.UnsupportedAnswer":{"result":"false","details":"Unsupported command issued: com.cloud.agent.api.OvsFetchInterfaceCommand. Are you sure you got the right type of server?","wait":"0","bypassHostMaintenance":"false"}}] } 149215:2026-03-26 08:07:40,036 WARN [c.c.n.o.OvsTunnelManagerImpl] (Work-Job-Executor-21:[ctx-87357eba, job-119/job-121, ctx-bd188ee1]) (logid:cbd2ee98) Expected OvsFetchInterfaceAnswer from host 1 but got UnsupportedAnswer with details: Unsupported command issued: com.cloud.agent.api.OvsFetchInterfaceCommand. Are you sure you got the right type of server? root@cloudstack:/srv# grep -n "UnsupportedAnswer" /var/log/cloudstack/management/management-server.log 145340:2026-03-26 08:03:37,241 DEBUG [c.c.a.t.Request] (DirectAgent-161:[ctx-d16b0a75]) (logid:7dad1999) Seq 1-7402510412513544295: Processing: { Ans: , MgmtId: 345051273784, via: 1(xcp-ng-02), Ver: v1, Flags: 110, [{"com.cloud.agent.api.UnsupportedAnswer":{"result":"false","details":"Unsupported command issued: com.cloud.agent.api.OvsFetchInterfaceCommand. Are you sure you got the right type of server?","wait":"0","bypassHostMaintenance":"false"}}] } 145342:2026-03-26 08:03:37,241 DEBUG [c.c.a.t.Request] (Work-Job-Executor-19:[ctx-ff3a26f7, job-116/job-117, ctx-78d7fd53]) (logid:7dad1999) Seq 1-7402510412513544295: Received: { Ans: , MgmtId: 345051273784, via: 1(xcp-ng-02), Ver: v1, Flags: 110, { UnsupportedAnswer } } 145343:2026-03-26 08:03:37,241 WARN [c.c.n.o.OvsTunnelManagerImpl] (Work-Job-Executor-19:[ctx-ff3a26f7, job-116/job-117, ctx-78d7fd53]) (logid:7dad1999) Expected OvsFetchInterfaceAnswer from host 1 but got UnsupportedAnswer with details: Unsupported command issued: com.cloud.agent.api.OvsFetchInterfaceCommand. Are you sure you got the right type of server? 146199:2026-03-26 08:04:54,447 DEBUG [c.c.a.t.Request] (DirectAgent-328:[ctx-a017872f]) (logid:7dad1999) Seq 2-2107684625609393625: Processing: { Ans: , MgmtId: 345051273784, via: 2(xcp-ng-01), Ver: v1, Flags: 110, [{"com.cloud.agent.api.UnsupportedAnswer":{"result":"false","details":"Unsupported command issued: com.cloud.agent.api.OvsFetchInterfaceCommand. Are you sure you got the right type of server?","wait":"0","bypassHostMaintenance":"false"}}] } 146201:2026-03-26 08:04:54,447 DEBUG [c.c.a.t.Request] (Work-Job-Executor-19:[ctx-ff3a26f7, job-116/job-117, ctx-78d7fd53]) (logid:7dad1999) Seq 2-2107684625609393625: Received: { Ans: , MgmtId: 345051273784, via: 2(xcp-ng-01), Ver: v1, Flags: 110, { UnsupportedAnswer } } 146202:2026-03-26 08:04:54,447 WARN [c.c.n.o.OvsTunnelManagerImpl] (Work-Job-Executor-19:[ctx-ff3a26f7, job-116/job-117, ctx-78d7fd53]) (logid:7dad1999) Expected OvsFetchInterfaceAnswer from host 2 but got UnsupportedAnswer with details: Unsupported command issued: com.cloud.agent.api.OvsFetchInterfaceCommand. Are you sure you got the right type of server? 148353:2026-03-26 08:07:23,026 DEBUG [c.c.a.t.Request] (DirectAgent-171:[ctx-18ee3eb2]) (logid:cbd2ee98) Seq 2-2107684625609393638: Processing: { Ans: , MgmtId: 345051273784, via: 2(xcp-ng-01), Ver: v1, Flags: 110, [{"com.cloud.agent.api.UnsupportedAnswer":{"result":"false","details":"Unsupported command issued: com.cloud.agent.api.OvsFetchInterfaceCommand. Are you sure you got the right type of server?","wait":"0","bypassHostMaintenance":"false"}}] } 148355:2026-03-26 08:07:23,026 DEBUG [c.c.a.t.Request] (Work-Job-Executor-20:[ctx-38a32219, job-119/job-120, ctx-2d13d180]) (logid:cbd2ee98) Seq 2-2107684625609393638: Received: { Ans: , MgmtId: 345051273784, via: 2(xcp-ng-01), Ver: v1, Flags: 110, { UnsupportedAnswer } } 148356:2026-03-26 08:07:23,026 WARN [c.c.n.o.OvsTunnelManagerImpl] (Work-Job-Executor-20:[ctx-38a32219, job-119/job-120, ctx-2d13d180]) (logid:cbd2ee98) Expected OvsFetchInterfaceAnswer from host 2 but got UnsupportedAnswer with details: Unsupported command issued: com.cloud.agent.api.OvsFetchInterfaceCommand. Are you sure you got the right type of server? 148831:2026-03-26 08:07:33,055 DEBUG [c.c.a.t.Request] (DirectAgent-23:[ctx-fd0cee9d]) (logid:cbd2ee98) Seq 2-2107684625609393642: Processing: { Ans: , MgmtId: 345051273784, via: 2(xcp-ng-01), Ver: v1, Flags: 110, [{"com.cloud.agent.api.UnsupportedAnswer":{"result":"false","details":"Unsupported command issued: com.cloud.agent.api.OvsFetchInterfaceCommand. Are you sure you got the right type of server?","wait":"0","bypassHostMaintenance":"false"}}] } 148833:2026-03-26 08:07:33,055 DEBUG [c.c.a.t.Request] (Work-Job-Executor-21:[ctx-87357eba, job-119/job-121, ctx-bd188ee1]) (logid:cbd2ee98) Seq 2-2107684625609393642: Received: { Ans: , MgmtId: 345051273784, via: 2(xcp-ng-01), Ver: v1, Flags: 110, { UnsupportedAnswer } } 148834:2026-03-26 08:07:33,055 WARN [c.c.n.o.OvsTunnelManagerImpl] (Work-Job-Executor-21:[ctx-87357eba, job-119/job-121, ctx-bd188ee1]) (logid:cbd2ee98) Expected OvsFetchInterfaceAnswer from host 2 but got UnsupportedAnswer with details: Unsupported command issued: com.cloud.agent.api.OvsFetchInterfaceCommand. Are you sure you got the right type of server? 149212:2026-03-26 08:07:40,035 DEBUG [c.c.a.t.Request] (DirectAgent-174:[ctx-01d61c0b]) (logid:cbd2ee98) Seq 1-7402510412513544325: Processing: { Ans: , MgmtId: 345051273784, via: 1(xcp-ng-02), Ver: v1, Flags: 110, [{"com.cloud.agent.api.UnsupportedAnswer":{"result":"false","details":"Unsupported command issued: com.cloud.agent.api.OvsFetchInterfaceCommand. Are you sure you got the right type of server?","wait":"0","bypassHostMaintenance":"false"}}] } 149214:2026-03-26 08:07:40,035 DEBUG [c.c.a.t.Request] (Work-Job-Executor-21:[ctx-87357eba, job-119/job-121, ctx-bd188ee1]) (logid:cbd2ee98) Seq 1-7402510412513544325: Received: { Ans: , MgmtId: 345051273784, via: 1(xcp-ng-02), Ver: v1, Flags: 110, { UnsupportedAnswer } } 149215:2026-03-26 08:07:40,036 WARN [c.c.n.o.OvsTunnelManagerImpl] (Work-Job-Executor-21:[ctx-87357eba, job-119/job-121, ctx-bd188ee1]) (logid:cbd2ee98) Expected OvsFetchInterfaceAnswer from host 1 but got UnsupportedAnswer with details: Unsupported command issued: com.cloud.agent.api.OvsFetchInterfaceCommand. Are you sure you got the right type of server?

@weizhouapache
Copy link
Member

weizhouapache commented Mar 26, 2026

@UAnton
can you check if cloudbr3 exists and whether it has a static IP ?

@UAnton
Copy link

UAnton commented Mar 26, 2026

@dheeraj12347
Знімок екрана 2026-03-26 о 12 16 07
Знімок екрана 2026-03-26 о 12 16 29

@dheeraj12347
Copy link
Author

Thanks for the detailed logs and the screenshots, @UAnton – very helpful.
I see the agent is still returning UnsupportedAnswer for OvsFetchInterfaceCommand on both hosts with cloudbr3. I’m interested to see what you find when checking cloudbr3 on each XCP‑ng host (existence and static IP), as Weizhou suggested; once that’s clear, I can adjust the code if we spot anything on the CloudStack side.

@weizhouapache
Copy link
Member

@UAnton
can you please check the logs on host 2 ?

  • /var/log/cloud/cloud.log
  • /var/log/cloud/ovstunnel.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

GRE isolation + XCP-NG

6 participants