CloudStack 代码阅读(十)—— 调度器与资源分配(Host / Storage / Network Allocator)
调度(Scheduler / Allocator)是 CloudStack 的核心组件,负责决定:
- VM 应该运行在哪台 Host 上
- Volume 应该存储在哪个 StoragePool
- 网络资源是否可用
- 如何回避不健康节点、资源不足节点
- 如何配合 Orchestration Engine 执行复杂部署
1. 调度体系架构总览(源码定位)
CloudStack 调度器被拆成三层:
1. DeploymentPlanningManager(负责 orchestrateDeployVM 的整体调度入口)
2. Host Allocator(挑选 Host)
3. StoragePool Allocator(挑选存储池)
源码路径:
engine/orchestration/src/com/cloud/deploy/
├── DeploymentPlanningManagerImpl.java
├── FirstFitPlanner.java
├── ClusterBasedPlanner.java
server/src/com/cloud/agent/manager/allocator/
├── HostAllocator.java
├── FirstFitAllocator.java
engine/storage/src/com/cloud/storage/allocator/
├── StoragePoolAllocator.java
├── FirstFitStoragePoolAllocator.java
2. DeploymentPlanningManager:调度入口
2.1 调用链
VM 调度链路:
orchestrateDeployVM()
→ DeploymentPlanningManager.plan()
→ Planner.design()
→ HostAllocator.getHostsToUse()
→ StoragePoolAllocator.select()
关键方法:
@Override
public DeployDestination plan(VirtualMachineProfile vmProfile, DeploymentPlan plan, ExcludeList avoid) {
DataCenter dc = _dcDao.findById(plan.getDataCenterId());
List<Planner> planners = getPlanners(offering, vmProfile);
for (Planner planner : planners) {
DeployDestination dest = planner.plan(vmProfile, plan, avoid);
if (dest != null) return dest;
}
throw new InsufficientServerCapacityException(...);
}
3. Planner(第一层调度)
Planner 决定使用何种算法挑选资源。
CloudStack 默认 Planner:
- FirstFitPlanner(最常用)
- ClusterBasedPlanner
- ImplicitPlanner
3.1 FirstFitPlanner 关键逻辑
位置:
engine/orchestration/com/cloud/deploy/FirstFitPlanner.java
源码:
public DeployDestination plan(...) {
List<Cluster> clusters = listAllClusters();
for (Cluster c : clusters) {
Host host = findHost(c, vmProfile);
StoragePool pool = findPool(c, vmProfile);
if (host != null && pool != null) {
return new DeployDestination(dc, pod, c, host, pool);
}
}
return null;
}
FirstFit = 先找到第一个满足 CPU/MEM/Storage 条件的 Cluster。
Planner 决定整体方向,而真正提供候选资源的,是下面的 Allocator。
4. HostAllocator(主机选择)
位置:
server/src/com/cloud/agent/manager/allocator/HostAllocator.java
主要实现:
- FirstFitAllocator
- UserConcentratedPodAllocator(按用户集中度选 host)
- RandomAllocator(随机)
核心接口:
List<Host> allocateTo(VirtualMachineProfile vm, DeployDestination dest, ExcludeList avoid);
4.1 FirstFitAllocator 实现
关键方法:
hosts = _hostDao.listAllUpAndEnabledByCluster(clusterId);
for (Host host : hosts) {
if (checkHostCapacity(host, vmProfile) &&
!avoid.shouldAvoid(host)) {
return Arrays.asList(host);
}
}
return null;
检查 CPU 和内存:
host.getTotalMemory() - host.getUsedMemory() > vmMemRequired
host.getCpus() * host.getCpuSpeed() > vmCpuRequired
另外还会检查:
- host 是否维护模式
- host 是否兼容 hypervisor
- 网络是否能从该 host 路由
5. StoragePoolAllocator(存储池选择)
存储调度同样插件化:
StoragePoolAllocator
├── FirstFitStoragePoolAllocator
├── LocalStoragePoolAllocator
└── ClusterScopeStorageAllocator
关键接口:
List<StoragePool> select(VirtualMachineProfile vm, Long dataCenterId, Long podId, Long clusterId);
5.1 FirstFitStoragePoolAllocator
源码逻辑:
for (StoragePoolVO pool : availablePools) {
if (pool.getStatus() == Status.Up &&
pool.getAvailableSpace() > volumeSize &&
checkTags(pool, vm)) {
return Arrays.asList(pool);
}
}
调度因素:
- 池容量
- 池可达性(网络拓扑)
- 存储标签(Storage Tags)
- 模板是否存在于 pool 中
6. Network Allocator(网络可达性验证)
调度还需验证网络是否能在目标 Host 运行。
调用链:
DeploymentPlanningManager.plan()
→ _networkModel.isVmNetworksPresentOnHost(vm, host)
网络匹配逻辑:
List<NicProfile> nics = vm.getNics();
for (NicProfile nic : nics) {
if (!networkAvailableOnHost(nic.getNetwork(), host)) return false;
}
return true;
网络可达性包括:
- BroadcastDomain 类型匹配
- VLAN 在目标 host 是否 trunk
- VR 是否存在且能为该 host 服务
7. 限制与回避(ExcludeList)
ExcludeList 是调度中非常关键但常被忽视的结构。
它用于记录不应使用的资源:
hostsToAvoid
poolsToAvoid
clustersToAvoid
podsToAvoid
调度失败时:
avoid.addHost(hostId);
avoid.addPool(poolId);
调度重试会根据 ExcludeList 避开问题资源。
8. 调度完整时序图
orchestrateDeployVM
|
v
DeploymentPlanningManager.plan()
|
+--> FirstFitPlanner.plan()
| |
| +--> HostAllocator.allocateTo()
| | |
| | +--> check CPU/MEM/capacity
| | +--> avoid list
| |
| +--> StoragePoolAllocator.select()
| |
| +--> check space/tags/pool status
|
+--> Build DeployDestination(dc,pod,cluster,host,pool)
9. 调度失败点
9.1 Host 无可用资源
InsufficientServerCapacityException
原因:
- CPU/MEM 不足
- host 处于维护模式
- AvoidList 避开了所有 host
9.2 StoragePool 满
No suitable storage pool
检查:
storage_pool.used_bytes
storage_pool.capacity_bytes
9.3 网络不可达
Network is not available on host
检查 VLAN trunk、VR 状态。
10. Host / Storage Capacity 表结构
10.1 op_host_capacity
存储 host 的容量信息:
host_id
capacity_type
total_capacity
used_capacity
VM 启动后更新容量:
UPDATE op_host_capacity SET used += vm_mem
10.2 storage_pool 表
id
available_bytes
capacity_bytes
status
scope
11. 调度关键类小结
| 层级 | 类 | 作用 |
|---|---|---|
| 调度入口 | DeploymentPlanningManager | 调用 Planner |
| Planner | FirstFitPlanner | 整体调度策略 |
| HostAllocator | FirstFitAllocator | 主机选择 |
| StorageAllocator | FirstFitStoragePoolAllocator | 存储池选择 |
| 网络验证 | NetworkModelImpl | VLAN/trunk/VR 检查 |
| 回避列表 | ExcludeList | 动态避免 bad host/pool |
12. 总结
CloudStack 的调度体系不是单层算法,而是多层协同结构:
- Planner → Allocators → NetworkModel → ExcludeList
- CPU/MEM/Storage/Network 联合决策
- 插件化的 HostAllocator / StoragePoolAllocator
- ExcludeList 提供“动态学习”能力