CloudStack 的 Orchestration Engine(资源编排引擎)是整个 IaaS 系统的主要模块。
任何 VM 生命周期操作,例如 deployVirtualMachinestartVMrebootVMmigrateVM,最终都会走入这一套 Workflow + StateMachine + Manager 调用链

本篇依据 CloudStack 4.2.2 中真实的目录结构、类名、调用栈来解释整个 VM 部署流程。

1. Orchestration Engine 的主要模块

engine/
  └ orchestration/
       ├── src/com/cloud/vm/VirtualMachineManagerImpl.java
       ├── src/com/cloud/vm/VirtualMachineGuru.java
       ├── src/com/cloud/deploy/
       │      ├── DeploymentPlanningManagerImpl.java
       │      ├── FirstFitPlanner.java
       │      └── DeploymentPlan.java
       ├── src/com/cloud/network/NetworkOrchestrator.java
       ├── src/com/cloud/storage/VolumeManagerImpl.java
       └── workflow/
             ├── VMOperationListener.java
             └── orchestrateDeployVM

模块分层:

  • VM Orchestrator:控制 VM 整体生命周期
  • Deploy Planner:负责选择物理主机
  • Network Orchestrator:准备 VM 网络、NIC、VR
  • Volume/Storage Orchestrator:准备 Root/Data 卷、模板、存储池
  • Guru(虚拟机专家):每种 Hypervisor 的实现差异由 Guru 决定
  • Workflow Engine:顺序执行步骤(Steps)

2. 从 API 到 Orchestration:完整调用链

用户调用:

deployVirtualMachine

对应类:

org.apache.cloudstack.api.command.user.vm.DeployVMCmd

execute() 方法调用:

UserVm result = _userVmService.deployVirtualMachine(this);

我们跟进 _userVmService

UserVmManagerImpl.deployVirtualMachine()

再进入 Orchestration Engine:

VirtualMachineManagerImpl.orchestrateDeployVM()

完整调用栈如下:

DeployVMCmd.execute()
 → UserVmManagerImpl.deployVirtualMachine()
   → VirtualMachineManagerImpl.orchestrateDeployVM()
     → DeploymentPlanningManagerImpl.plan()
     → NetworkOrchestrator.prepare()
     → VolumeManagerImpl.prepare()
     → Guru.finalizeDeployment()
     → send(StartCommand) to Agent

3. DeploymentPlanningManager:主机选择核心

位于:

engine/orchestration/src/com/cloud/deploy/DeploymentPlanningManagerImpl.java

核心方法:

@Override
public DeployDestination plan(VirtualMachineProfile vmProfile, DeploymentPlan plan, ExcludeList avoid) {
    DataCenter dc = _dcDao.findById(plan.getDataCenterId());
    List<Cluster> clusters = _clusterDao.listByDcId(dc.getId());
    for (Cluster cluster : clusters) {
        if (suitable(cluster, vmProfile)) {
            Host host = findHost(cluster, vmProfile);
            StoragePool pool = findStorage(cluster, vmProfile);
            return new DeployDestination(dc, pod, cluster, host, pool);
        }
    }
    throw new InsufficientServerCapacityException();
}

关键逻辑步骤:

  1. 获取 zone/pod/cluster 列表
  2. 遍历 cluster,检查是否满足 VM 规格
  3. 在 cluster 内部选择 Host
  4. 在 cluster 内选择合适存储池
  5. 组成 DeployDestination

4. FirstFitPlanner:CloudStack 的默认调度器

位于:

engine/orchestration/src/com/cloud/deploy/FirstFitPlanner.java

主要策略:

  • CPU/Memory 要求足够
  • 同时考虑 StoragePool 容量
  • 优先选择低负载 Cluster
  • 过滤掉 avoid 列表中的 host/pool

示例源码:

List<Host> hosts = _hostDao.listAllUpAndEnabledByCluster(clusterId);
for (Host host : hosts) {
    if (host.getCpu() > vmCpu && host.getRam() > vmMem) {
        return host;
    }
}

5. NetworkOrchestrator:准备网络环境

位于:

engine/orchestration/src/com/cloud/network/NetworkOrchestrator.java

部署 VM 的网络准备流程:

prepare(vm, destination)
 → allocateNIC()
 → implementNetwork()
 → configureVirtualRouter()

5.1 NIC 分配

NicProfile nic = new NicProfile();
nic.setIPv4Address(ip);
nic.setNetwork(network);

5.2 implementNetwork()

调用 NetworkGuru:

BridgeGuru
PodBasedNetworkGuru
OvsGuestNetworkGuru

Guru 决定:

  • 网络类型(Isolated/Shared)
  • VLAN 分配
  • Broadcast Domain 结构

6. Storage/Volume Orchestrator:卷准备流程

CloudStack 启动 VM 前会确保:

  • Root Volume 已创建
  • Template 已从 Secondary Storage 拷贝到 Primary Storage
  • Data Volume 已准备好

位于:

engine/storage/VolumeManagerImpl.java

关键流程:

createVolumeFromTemplate()
chooseStoragePool()
copyTemplateToPool()

示例源码:

StoragePool pool = _storagePoolAllocator.allocateToPool(template, vm);
VolumeVO vol = new VolumeVO(...);
vol.setPoolId(pool.getId());
_volumeDao.persist(vol);

7. Guru 层:Hypervisor 特定逻辑

位于:

com.cloud.vm.VirtualMachineGuru

不同 Hypervisor 的实现:

  • KVMGuru
  • XenServerGuru
  • VMwareGuru

Guru 决定:

  • generate VM Name
  • attach Volume
  • attach NIC
  • finalizeDeployment()

示例:

@Override
public void finalizeVirtualMachineProfile(VirtualMachineProfile profile, DeployDestination dest) {
    profile.setBootLoader(BootloaderType.CD);
}

8. StartCommand:VM 最终启动

真正启动 VM 在 Hypervisor Host 上通过 Agent 实现:

调用链:

VirtualMachineManagerImpl.startVirtualMachine()
 → Commands cmds = new Commands(new StartCommand(...))
 → agentMgr.send(hostId, cmds)

StartCommand 示例:

public class StartCommand extends Command {
    private String vmName;
    private List<DiskTO> disks;
    private NicTO[] nics;
}

Agent 收到 StartCommand 后会:

  • 创建 libvirt XML(KVM)
  • 或调用 XenAPI / VMware API
  • 启动 VM

9. 状态机:VM 生命周期的核心

VM 有以下状态:

Created → Starting → Running → Stopping → Stopped → Destroyed → Expunging

状态机位于:

engine/schema/src/com/cloud/vm/VirtualMachineState.java

状态转移通过:

_stateMachine.transitTo(vm, Event.StartRequested, State.Starting)

并持久化到数据库:

UPDATE vm_instance SET state='Starting' WHERE id=? AND state='Created'

10. VM 启动完整时序图

User Request
  |
  v
DeployVMCmd.execute()
  |
  v
UserVmManagerImpl.deployVirtualMachine()
  |
  v
VirtualMachineManagerImpl.orchestrateDeployVM()
  |
  +--> DeploymentPlanningManager.plan()
  |        |
  |        +--> FirstFitPlanner.chooseHost()
  |
  +--> NetworkOrchestrator.prepare()
  |
  +--> VolumeManager.prepare()
  |
  +--> Guru.finalizeDeployment()
  |
  +--> agentMgr.send(StartCommand)
  |
  v
VM Running

11. 常见部署失败点

11.1 失败点:Host 不可用

DeploymentPlanningManager 抛:

InsufficientServerCapacityException

11.2 网络失败(Guru)

NetworkOrchestrator:

Unable to implements network

通常与 VLAN 或 VR 启动失败相关。

11.3 StartCommand 失败

Agent 返回:

Answer == null or !Answer.getResult()

日志位置:

/var/log/cloudstack/agent/agent.log

12. 小结

CloudStack Orchestration Engine 通过:

  • Planner(调度)
  • Orchestrators(编排)
  • Guru(Hypervisor 逻辑)
  • StateMachine(生命周期)
  • Agent(底层执行)

组成了高度模块化、可替换、可扩展的架构。