We need to reliably spin up dozens of VMs, some with vGPU, all with consistent networking and storage, so manual provisioning is no longer sustainable.
The management server hosts a lightweight orchestration system that automates the creation, customization, and launch of KVM VMs. This system uses a set of template files and three shell scripts to generate fully configured VMs (with or without vGPU support) using the base Ubuntu 22.04 image created earlier.
Each VM is assigned static IP addresses across multiple 10 GbE/40 GbE networks, receives deterministic network configuration, and is provisioned using virt-customize before being launched via libvirt.
Note : Pre-requisites
- DNS entry for the vmname/hostname of guest should be pre-configured and nslookup should succeed.
- /dev/$vmname – block device should exist
Note : This script is very basic. Gateway IP, DNS server IP, domain name – all hard coded. We need this until the DCM (Data center manager application is ready for use).
This post documents all template files and the core orchestration scripts.
1. Template Directory Structure
The templates for Ubuntu 22.04 are stored under:
/root/kvm-local/ubuntu22/
Example contents:
hosts
resolved.conf
base.qcow2
50-cloud-init.yaml
Each of these files contains placeholders (e.g., {{ip1}}, {{hostname}}, {{dns}}) which the orchestration script replaces before injecting them into the VM disk image.
2. Template File: hosts
Path: ubuntu22/hosts
127.0.0.1 localhost
{{ip1}} {{hostname}}
Variables replaced:
{{hostname}}– VM hostname{{ip1}}– management network IP (10.0.1.x)
3. Template File: systemd-resolved
Path: ubuntu22/resolved.conf
[Resolve]
DNS={{dns}}
FallbackDNS=8.8.8.8
Domains={{domain}}
DNSStubListener=no
Variables replaced:
{{dns}}– DNS server (10.0.0.1){{domain}}– Domain suffix (yourdomain.com)
4. Template File: Netplan Configuration
Path: ubuntu22/50-cloud-init.yaml
This file assigns six interfaces, one for each bridged network:
network:
version: 2
ethernets:
eth0:
addresses: [{{ip1}}/16]
routes:
- to: default
via: {{gateway}}
mtu: 9000
eth1:
addresses: [{{ip2}}/24]
mtu: 9000
eth2:
addresses: [{{ip3}}/24]
mtu: 9000
eth3:
addresses: [{{ip4}}/24]
mtu: 9000
eth4:
addresses: [{{ip5}}/24]
mtu: 9000
eth5:
addresses: [{{ip6}}/24]
mtu: 9000
Variables replaced:
{{ip1}}to{{ip6}}{{gateway}}
The Orchestration Scripts
Three main scripts automate VM lifecycle operations:
- deletevm.sh — Remove an existing VM completely
- preparevm.sh — Create the customized QCOW image from base
- launch.sh — Define and start the VM using libvirt
Below is polished documentation for each.
5. Script: deletevm.sh
Purpose: Fully delete an existing VM, including domain definition, autostart entries, runtime XML, and stale QEMU processes.
#!/bin/bash
vmname=""
diskname=""
while getopts ":v:" opt; do
case $opt in
v) vmname="$OPTARG" ;;
\?) echo "Invalid option: -$OPTARG" >&2; exit 1 ;;
:) echo "Option -$OPTARG requires an argument." >&2; exit 1 ;;
esac
done
if [ -z "$vmname" ]; then
echo "VM name not provided."
exit 1
fi
echo "Deleting server5:${vmname} (if existing)"
# Check if domain exists
if virsh dominfo "${vmname}" &>/dev/null; then
echo "Stopping ${vmname}..."
virsh destroy "${vmname}" &>/dev/null || virsh shutdown "${vmname}" &>/dev/null
sleep 2
fi
# Make sure no QEMU process is left
PID=$(pgrep -f "qemu.*${vmname}") || true
if [ -n "$PID" ]; then
echo "Force killing leftover QEMU process (PID: $PID)..."
kill -9 $PID
fi
# Now undefine completely (including nvram, snapshots, etc.)
if virsh dominfo "${vmname}" &>/dev/null; then
echo "Undefining domain ${vmname}..."
virsh undefine "${vmname}" --nvram --remove-all-storage &>/dev/null || true
else
# In case it's transient and not listed
virsh undefine "${vmname}" --nvram &>/dev/null || true
fi
# Remove leftover autostart files
AUTOSTART_XML="/etc/libvirt/qemu/autostart/${vmname}.xml"
if [ -f "$AUTOSTART_XML" ]; then
echo "Removing autostart link..."
rm -f "$AUTOSTART_XML"
fi
# Remove stale runtime XML (transient domain)
RUNTIME_XML="/var/run/libvirt/qemu/${vmname}.xml"
if [ -f "$RUNTIME_XML" ]; then
echo "Removing stale runtime XML..."
rm -f "$RUNTIME_XML"
fi
# Optional: clean up device symlinks (if any)
if [ -b "/dev/${VMNAME}" ]; then
echo "Releasing block device /dev/${VMNAME} ..."
dmsetup remove "/dev/${VMNAME}" 2>/dev/null || true
losetup -d "/dev/${VMNAME}" 2>/dev/null || true
fi
exit 0
Key behaviors:
- Checks if VM exists via
virsh dominfo - Attempts graceful shutdown, else force destroy
- Kills leftover QEMU processes associated with the VM
- Undefines domain (including NVRAM and storage)
- Removes autostart and runtime XML
- Cleans up stale device symlinks if necessary
This ensures that VM recreation always starts cleanly.
6. Script: preparevm.sh
Note: Pre-requisite A block storage (or partition) mapped to /dev/$host (or vmname) should exist.
Purpose:
- Clone base QCOW2
- Replace placeholders in templates
- Inject updated templates and hostname
- Embed timezone and firstboot behavior
- Convert QCOW → RAW and write directly to the VM’s block device (
/dev/{{diskname}})
#!/bin/bash
#$host $vcpu $ram $ip $diskname
host=$1
vcpu=$2
ram=$3
diskname=$4
ip1="10.0.1.$5"
ip2="10.0.2.$5"
ip3="10.0.3.$5"
ip4="10.0.4.$5"
ip5="10.0.5.$5"
ip6="10.0.6.$5"
flavor="ubuntu22"
withgpu=false
if [ -n "$6" ]; then
withgpu=true
fi
domain="yourdomain.com"
dns=10.0.0.1
gateway=10.0.0.1
base=/root/kvm-local
templates=$base/$flavor
target=$base/$host
# Clone base OVA to target host
mkdir -p $target
rm -f $target/*
echo "Copying base image to instance folder..."
echo " "
rm -f $target/$host.qcow2
rsync -ah --progress $base/$flavor/base.qcow2 $target/$host.qcow2
echo " "
# Clone and update templates
echo "Copying $templates/resolved.conf to $target/resolved.conf"
cp $templates/resolved.conf $target/resolved.conf
# For now only ubuntu22
intfile=$target/50-cloud-init.yaml
echo "Copying $templates/50-cloud-init.yaml $intfile"
cp $templates/50-cloud-init.yaml $intfile
echo "copying $templates/hosts $target/hosts"
cp $templates/hosts $target/hosts
echo "Customizing vm...."
resolvefile=$target/resolved.conf
hostfile=$target/hosts
#Update resolve.conf
sed -i "s/{{dns}}/$dns/g" $resolvefile
sed -i "s/{{domain}}/$domain/g" $resolvefile
#Update netplan file
echo "Updating netplan file $intfile"
sed -i "s/{{ip1}}/$ip1/g" $intfile
sed -i "s/{{ip2}}/$ip2/g" $intfile
sed -i "s/{{ip3}}/$ip3/g" $intfile
sed -i "s/{{ip4}}/$ip4/g" $intfile
sed -i "s/{{ip5}}/$ip5/g" $intfile
sed -i "s/{{ip6}}/$ip6/g" $intfile
sed -i "s/{{gateway}}/$gateway/g" $intfile
sed -i "s/{{hostname}}/$host/g" $hostfile
sed -i "s/{{domain}}/$domain/g" $hostfile
sed -i "s/{{ip1}}/$ip1/g" $hostfile
echo "================================"
datestr=`date`
echo "$datestr : Baking customization to base file."
virt-customize -a $target/$host.qcow2 --copy-in $intfile:/etc/netplan/ --copy-in $resolvefile:/etc/systemd/ --copy-in $hostfile:/etc/ --hostname $host --timezone Asia/Kolkata --firstboot-command '/usr/local/bin/resizedisk'
datestr=`date`
echo "$datestr : Exporting base file to /dev/$diskname"
qemu-img convert -O raw $target/$host.qcow2 /dev/$diskname
sync
Key steps inside preparevm.sh
Generate all IPs
Based on the last octet:
10.0.1.x
10.0.2.x
10.0.3.x
10.0.4.x
10.0.5.x
10.0.6.x
Clone base.qcow2
rsync -ah --progress base.qcow2 $target/$host.qcow2
Copy & customize templates
All of these are updated with sed:
/etc/netplan/50-cloud-init.yaml/etc/systemd/resolved.conf/etc/hosts
Inject customized files into the VM image
Using virt-customize:
virt-customize -a $target/$host.qcow2 --copy-in ...
Also sets hostname, timezone, and firstboot resizing logic.
Export to block device
Raw image is written directly to the VM disk:
qemu-img convert -O raw ... /dev/$diskname
This allows VMs to use fast, RAID-backed raw partitions instead of QCOW.
7. Script: launch.sh
Purpose:
Coordinate deletion, customization, guest XML generation, and VM start.
#!/bin/bash
vcpu="4"
ram="8"
host="server5"
ip=""
vmname=""
vgpu="none"
flavor="ubuntu22"
diskname=""
while getopts ":c:v:r:g:" opt; do
case $opt in
v) vmname="$OPTARG" ;;
c) vcpu="$OPTARG" ;;
r) ram="$OPTARG" ;;
g) vgpu="$OPTARG" ;;
\?) echo "Invalid option: -$OPTARG" >&2; exit 1 ;;
:) echo "Option -$OPTARG requires an argument." >&2; exit 1 ;;
esac
done
if [ -z "$vmname" ]; then
echo "VM name not provided."
exit 1
fi
datestr=`date`
echo " "
echo "Start : $datestr"
echo " "
ip=`nslookup "$vmname" | grep "Address:" | grep -v "#53" |cut -d " " -f2 | cut -d "." -f4`
if [ -z "$ip" ]; then
echo "IP address lookup failed for $vmname."
exit 1
else
echo " "
echo "IP look up succeeded : 10.0.1.$ip"
echo " "
fi
diskname=$vmname
if [ -e /dev/$vmname ]; then
echo "/dev/$vmname exists. Proceeding..."
else
echo "ERROR: /dev/$vmname does not exist!"
exit 1
fi
remote_uri="qemu+ssh://root@$host/system"
echo "==========================================================================================="
echo "Server : $host"
echo "VM Name : $vmname"
echo "vCPUs : $vcpu"
echo "Memory : $ram GB"
echo "Disk : /dev/$diskname"
echo "Management IP : 10.0.1.$ip"
echo "VGPU ID : $vgpu"
echo "==========================================================================================="
if [ "$vgpu" == "none" ]; then
withvgpu="false"
else
withgpu="true"
fi
exit 0
datestr=`date`
echo "$datestr : Deleting $host:$vmname (if existing)"
./deletevm.sh -v $vmname
base=/root/kvm-local
domtemplates=/root/kvm-local/
datestr=`date`
echo " "
echo "$datestr : Customizing"
echo " "
./preparevm.sh $vmname $vcpu $ram $diskname $ip $withvgpu
datestr=`date`
echo " "
domfile="$base/$vmname/$vmname.xml"
if [ "$withvgpu" == "false" ]; then
echo "Creating template (without VGPU) : Copying $domtemplates/node-novgpu.xml to $domfile"
cp $domtemplates/node-novgpu.xml $domfile
else
echo "Creating template (with VGPU) : Copying $domtemplates/node-gpu.xml to $domfile"
cp $domtemplates/node-gpu.xml $domfile
fi
echo " "
sed -i "s/{{vmname}}/$vmname/g" $domfile
sed -i "s/{{ram}}/$ram/g" $domfile
sed -i "s/{{vcpus}}/$vcpu/g" $domfile
sed -i "s/{{diskname}}/$diskname/g" $domfile
sed -i "s/{{vgpuid}}/$vgpu/g" $domfile
datestr=`date`
echo " "
echo "$datestr : Launching"
echo " "
virsh -c "$remote_uri" define $domfile
virsh -c "$remote_uri" start $vmname
virsh -c "$remote_uri" autostart $vmname
datestr=`date`
echo " "
echo "$datestr : Deleting temporary custome image : $datestr"
echo " "
echo "Deleting generated custom image..."
rm -f $base/$vmname/$vmname.qcow2
datestr=`date`
echo " "
echo "$datestr : Completed"
echo " "
Key responsibilities
- Accepts CPU, RAM, IP, disk, and VGPU options
- Calls
deletevm.sh - Calls
preparevm.sh - Selects appropriate libvirt template:
node-gpu.xmlif vGPU ID providednode-novgpu.xmlotherwise
- Updates placeholders in XML:
{{vmname}}{{vcpus}}{{ram}}{{diskname}}{{vgpuid}}
Launch sequence
virsh define node.xml
virsh start vmname
virsh autostart vmname
Finally, removes the temporary QCOW2 image used during customization.
8. Libvirt Templates
Two templates are used: one for GPU-enabled VMs and one without.
node-gpu.xml
- Adds
<hostdev>block with the vGPU UUID - Six bridged interfaces (br1–br6)
- Raw block device attached as
/dev/{{diskname}}
The XML supports mdev/vfio-based vGPU passthrough.
node-novgpu.xml
Same as above, but without the <hostdev> section.
<domain type='kvm'>
<name>{{vmname}}</name>
<vcpu placement='static'>{{vcpus}}</vcpu>
<memory unit='GiB'>{{ram}}</memory>
<os>
<type arch='x86_64' machine='pc-q35-6.2'>hvm</type>
<boot dev='hd'/>
</os>
<features>
<acpi/>
<apic/>
</features>
<cpu mode='host-passthrough' check='none'/>
<clock offset='utc'/>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>restart</on_crash>
<devices>
<emulator>/usr/bin/qemu-system-x86_64</emulator>
<controller type='usb' model='qemu-xhci'/>
<controller type='pci' model='pcie-root'/>
<controller type='sata' index='0'/>
<controller type='virtio-serial' index='0'/>
<disk type='block' device='disk'>
<driver name='qemu' type='raw' cache='none' io='native'/>
<source dev='/dev/{{diskname}}'/>
<target dev='vda' bus='virtio'/>
</disk>
<interface type='bridge'>
<source bridge='br1'/>
<model type='virtio'/>
<driver name='vhost' queues='1'/>
</interface>
<interface type='bridge'>
<source bridge='br2'/>
<model type='virtio'/>
<driver name='vhost' queues='2'/>
</interface>
<interface type='bridge'>
<source bridge='br3'/>
<model type='virtio'/>
<driver name='vhost' queues='{{vcpus}}'/>
</interface>
<interface type='bridge'>
<source bridge='br4'/>
<model type='virtio'/>
<driver name='vhost' queues='{{vcpus}}'/>
</interface>
<interface type='bridge'>
<source bridge='br5'/>
<model type='virtio'/>
<driver name='vhost' queues='{{vcpus}}'/>
</interface>
<interface type='bridge'>
<source bridge='br6'/>
<model type='virtio'/>
<driver name='vhost' queues='{{vcpus}}'/>
</interface>
<serial type='pty'>
<target port='0'/>
</serial>
<console type='pty'>
<target type='serial' port='0'/>
</console>
<channel type='unix'>
<target type='virtio' name='org.qemu.guest_agent.0'/>
<address type='virtio-serial' controller='0' bus='0' port='1'/>
</channel>
<input type='tablet' bus='usb'/>
<input type='keyboard' bus='ps2'/>
<graphics type='spice' autoport='yes' listen='0.0.0.0'>
<listen type='address' address='0.0.0.0'/>
</graphics>
<video>
<model type='vga' vram='16384' heads='1' primary='yes'/>
</video>
<memballoon model='virtio'/>
</devices>
</domain>
Both templates use:
machine='pc-q35-6.2'cpu mode='host-passthrough'virtiodrivers- SPICE graphics access for console
- Memory ballooning
9. End-to-End Flow
- launch.sh invoked
- deletevm.sh ensures a clean state
- preparevm.sh clones base image & customizes it
- libvirt XML template generated
- VM is defined, started, and set to autostart
- Temporary QCOW2 deleted
- VM ready with deterministic networking and full configuration
Usage Example: Creating a New VM (“registry”)
This example demonstrates how the orchestration scripts work together to provision a new VM called registry, allocate it a block device via udev (/dev/registry), configure networking, and launch it with libvirt.
1. Confirm Existing Running VMs
root@server5:~/kvm-local# virsh list
Id Name State
----------------------
2 blog running
3 mdb running
2. DNS Lookup for the New Host
root@server5:~/kvm-local# nslookup registry
Server: 10.0.0.1
Address: 10.0.0.1#53
Name: registry.yourdomain.com
Address: 10.0.1.164
3. Confirm the Backing Block Device
root@server5:~/kvm-local# ls -ltr /dev/registry
lrwxrwxrwx 1 root root 4 Nov 23 20:19 /dev/registry -> sdb6
The /dev/registry symlink is created from udev rules and always points to the correct RAID-backed partition.
4. Launch the VM via orchestrator
root@server5:~/kvm-local# ./launch.sh -v registry -c 4 -r 8 -i 164 -d registry
- VM Name: registry
- vCPUs: 4
- RAM: 8 GB
- Management IP: 10.0.1.164
- Disk:
/dev/registry - VGPU: none
5. Sample Output
l# ./launch.sh -v registry -c 4 -r 8
Start : Mon Nov 24 08:09:29 AM IST 2025
IP look up succeeded : 10.0.1.164
/dev/registry exists. Proceeding...
===========================================================================================
Server : server5
VM Name : registry
vCPUs : 4
Memory : 8 GB
Disk : /dev/registry
Management IP : 10.0.1.164
VGPU ID : none
===========================================================================================
root@server5:~/kvm-local# nano launch.sh
root@server5:~/kvm-local# ./launch.sh -v registry -c 4 -r 8
Start : Mon Nov 24 08:09:41 AM IST 2025
IP look up succeeded : 10.0.1.164
/dev/registry exists. Proceeding...
===========================================================================================
Server : server5
VM Name : registry
vCPUs : 4
Memory : 8 GB
Disk : /dev/registry
Management IP : 10.0.1.164
VGPU ID : none
===========================================================================================
Mon Nov 24 08:09:41 AM IST 2025 : Deleting server5:registry (if existing)
Deleting server5:registry (if existing)
Mon Nov 24 08:09:42 AM IST 2025 : Customizing
Copying base image to instance folder...
sending incremental file list
base.qcow2
1.35G 100% 328.75MB/s 0:00:03 (xfr#1, to-chk=0/1)
Copying /root/kvm-local/ubuntu22/resolved.conf to /root/kvm-local/registry/resolved.conf
Copying /root/kvm-local/ubuntu22/50-cloud-init.yaml /root/kvm-local/registry/50-cloud-init.yaml
copying /root/kvm-local/ubuntu22/hosts /root/kvm-local/registry/hosts
Resizing /root/kvm-local/registry/registry.qcow2 to ...
Customizing vm....
Updating netplan file /root/kvm-local/registry/50-cloud-init.yaml
================================
Mon Nov 24 08:09:46 AM IST 2025 : Baking customization to base file.
[ 0.0] Examining the guest ...
[ 4.7] Setting a random seed
[ 4.8] Setting the machine ID in /etc/machine-id
[ 4.8] Copying: /root/kvm-local/registry/50-cloud-init.yaml to /etc/netplan/
[ 4.8] Copying: /root/kvm-local/registry/resolved.conf to /etc/systemd/
[ 4.8] Copying: /root/kvm-local/registry/hosts to /etc/
[ 4.8] Setting the hostname: registry
[ 6.5] Setting the timezone: Asia/Kolkata
[ 6.5] Installing firstboot command: /usr/local/bin/resizedisk
[ 6.7] Finishing off
Mon Nov 24 08:09:53 AM IST 2025 : Exporting base file to /dev/registry
Creating template (without VGPU) : Copying /root/kvm-local//node-novgpu.xml to /root/kvm-local/registry/registry.xml
Mon Nov 24 08:10:14 AM IST 2025 : Launching
Domain 'registry' defined from /root/kvm-local/registry/registry.xml
Domain 'registry' started
Domain 'registry' marked as autostarted
Mon Nov 24 08:10:19 AM IST 2025 : Deleting temporary custome image : Mon Nov 24 08:10:19 AM IST 2025
Deleting generated custom image...
Mon Nov 24 08:10:19 AM IST 2025 : Completed
6. Verify the VM is Running
root@server5:~/kvm-local# virsh list
Id Name State
--------------------------
2 blog running
3 mdb running
5 registry running
7. SSH into the Newly Created VM
root@server5:~/kvm-local# ssh registry
Verify swap, CPU, memory, storage:
Memory
root@registry:~# free
total used free shared buff/cache available
Mem: 8132220 165212 7759452 1156 207556 7723692
Swap: 0 0 0
CPU
root@registry:~# lscpu
Architecture: x86_64
CPU(s): 4
Virtualization: KVM
Disk
root@registry:~# df -m
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/vda2 705419 3219 673410 1% /
Everything is configured exactly as intended:
- vCPU, RAM, hostname, IPs
- Netplan configuration
- systemd-resolved
- udev-backed raw block device
- zero swap
- correct cloud template applied
Conclusion
This example demonstrates the full lifecycle of provisioning a VM using the orchestration framework:
- Resolve DNS and udev-backed disk
- Delete old VM
- Prepare new image via
virt-customize - Generate libvirt XML
- Define and start the VM
- Autostart enablement
- Clean-up of temporary data
- Confirm operation via SSH
The process is deterministic, repeatable, and production-grade, suitable for deploying VMs.