Orchestrating VMs using scripts and templates

We need to reliably spin up dozens of VMs, some with vGPU, all with consistent networking and storage, so manual provisioning is no longer sustainable.

The management server hosts a lightweight orchestration system that automates the creation, customization, and launch of KVM VMs. This system uses a set of template files and three shell scripts to generate fully configured VMs (with or without vGPU support) using the base Ubuntu 22.04 image created earlier.

Each VM is assigned static IP addresses across multiple 10 GbE/40 GbE networks, receives deterministic network configuration, and is provisioned using virt-customize before being launched via libvirt.

Note : Pre-requisites

  • DNS entry for the vmname/hostname of guest should be pre-configured and nslookup should succeed.
  • /dev/$vmname – block device should exist

Note : This script is very basic. Gateway IP, DNS server IP, domain name – all hard coded. We need this until the DCM (Data center manager application is ready for use).

This post documents all template files and the core orchestration scripts.


1. Template Directory Structure

The templates for Ubuntu 22.04 are stored under:

/root/kvm-local/ubuntu22/

Example contents:

hosts
resolved.conf
base.qcow2
50-cloud-init.yaml

Each of these files contains placeholders (e.g., {{ip1}}, {{hostname}}, {{dns}}) which the orchestration script replaces before injecting them into the VM disk image.


2. Template File: hosts

Path: ubuntu22/hosts

127.0.0.1       localhost
{{ip1}}         {{hostname}}

Variables replaced:

  • {{hostname}} – VM hostname
  • {{ip1}} – management network IP (10.0.1.x)

3. Template File: systemd-resolved

Path: ubuntu22/resolved.conf

[Resolve]
DNS={{dns}}
FallbackDNS=8.8.8.8
Domains={{domain}}
DNSStubListener=no

Variables replaced:

  • {{dns}} – DNS server (10.0.0.1)
  • {{domain}} – Domain suffix (yourdomain.com)

4. Template File: Netplan Configuration

Path: ubuntu22/50-cloud-init.yaml

This file assigns six interfaces, one for each bridged network:

network:
  version: 2
  ethernets:
    eth0:
      addresses: [{{ip1}}/16]
      routes:
        - to: default
          via: {{gateway}}
      mtu: 9000
    eth1:
      addresses: [{{ip2}}/24]
      mtu: 9000
    eth2:
      addresses: [{{ip3}}/24]
      mtu: 9000
    eth3:
      addresses: [{{ip4}}/24]
      mtu: 9000
    eth4:
      addresses: [{{ip5}}/24]
      mtu: 9000
    eth5:
      addresses: [{{ip6}}/24]
      mtu: 9000

Variables replaced:

  • {{ip1}} to {{ip6}}
  • {{gateway}}

The Orchestration Scripts

Three main scripts automate VM lifecycle operations:

  1. deletevm.sh — Remove an existing VM completely
  2. preparevm.sh — Create the customized QCOW image from base
  3. launch.sh — Define and start the VM using libvirt

Below is polished documentation for each.


5. Script: deletevm.sh

Purpose: Fully delete an existing VM, including domain definition, autostart entries, runtime XML, and stale QEMU processes.

#!/bin/bash
vmname=""
diskname=""
while getopts ":v:" opt; do
  case $opt in
    v) vmname="$OPTARG" ;;
    \?) echo "Invalid option: -$OPTARG" >&2; exit 1 ;;
    :) echo "Option -$OPTARG requires an argument." >&2; exit 1 ;;
  esac
done
if [ -z "$vmname" ]; then
    echo "VM name not provided."
    exit 1
fi

echo "Deleting server5:${vmname} (if existing)"

# Check if domain exists
if virsh dominfo "${vmname}" &>/dev/null; then
    echo "Stopping ${vmname}..."
    virsh destroy "${vmname}" &>/dev/null || virsh shutdown "${vmname}" &>/dev/null
    sleep 2
fi

# Make sure no QEMU process is left
PID=$(pgrep -f "qemu.*${vmname}") || true
if [ -n "$PID" ]; then
    echo "Force killing leftover QEMU process (PID: $PID)..."
    kill -9 $PID
fi

# Now undefine completely (including nvram, snapshots, etc.)
if virsh dominfo "${vmname}" &>/dev/null; then
    echo "Undefining domain ${vmname}..."
    virsh undefine "${vmname}" --nvram --remove-all-storage &>/dev/null || true
else
    # In case it's transient and not listed
    virsh undefine "${vmname}" --nvram &>/dev/null || true
fi

# Remove leftover autostart files
AUTOSTART_XML="/etc/libvirt/qemu/autostart/${vmname}.xml"
if [ -f "$AUTOSTART_XML" ]; then
    echo "Removing autostart link..."
    rm -f "$AUTOSTART_XML"
fi

# Remove stale runtime XML (transient domain)
RUNTIME_XML="/var/run/libvirt/qemu/${vmname}.xml"
if [ -f "$RUNTIME_XML" ]; then
    echo "Removing stale runtime XML..."
    rm -f "$RUNTIME_XML"
fi

# Optional: clean up device symlinks (if any)
if [ -b "/dev/${VMNAME}" ]; then
    echo "Releasing block device /dev/${VMNAME} ..."
    dmsetup remove "/dev/${VMNAME}" 2>/dev/null || true
    losetup -d "/dev/${VMNAME}" 2>/dev/null || true
fi
exit 0

Key behaviors:

  • Checks if VM exists via virsh dominfo
  • Attempts graceful shutdown, else force destroy
  • Kills leftover QEMU processes associated with the VM
  • Undefines domain (including NVRAM and storage)
  • Removes autostart and runtime XML
  • Cleans up stale device symlinks if necessary

This ensures that VM recreation always starts cleanly.


6. Script: preparevm.sh

Note: Pre-requisite A block storage (or partition) mapped to /dev/$host (or vmname) should exist.

Purpose:

  • Clone base QCOW2
  • Replace placeholders in templates
  • Inject updated templates and hostname
  • Embed timezone and firstboot behavior
  • Convert QCOW → RAW and write directly to the VM’s block device (/dev/{{diskname}})
#!/bin/bash
#$host $vcpu $ram $ip $diskname
host=$1
vcpu=$2
ram=$3
diskname=$4
ip1="10.0.1.$5"
ip2="10.0.2.$5"
ip3="10.0.3.$5"
ip4="10.0.4.$5"
ip5="10.0.5.$5"
ip6="10.0.6.$5"
flavor="ubuntu22"
withgpu=false

if [ -n "$6" ]; then
   withgpu=true
fi

domain="yourdomain.com"
dns=10.0.0.1
gateway=10.0.0.1
base=/root/kvm-local
templates=$base/$flavor
target=$base/$host

# Clone base OVA to target host
mkdir -p $target
rm -f $target/*
echo "Copying base image to instance folder..."
echo " "
rm -f $target/$host.qcow2
rsync -ah --progress $base/$flavor/base.qcow2 $target/$host.qcow2
echo " "

# Clone and update templates
echo "Copying $templates/resolved.conf to $target/resolved.conf"
cp $templates/resolved.conf $target/resolved.conf

# For now only ubuntu22
intfile=$target/50-cloud-init.yaml
echo "Copying $templates/50-cloud-init.yaml $intfile"
cp $templates/50-cloud-init.yaml $intfile

echo "copying $templates/hosts $target/hosts"
cp $templates/hosts $target/hosts

echo "Customizing vm...."
resolvefile=$target/resolved.conf
hostfile=$target/hosts

#Update resolve.conf
sed -i "s/{{dns}}/$dns/g" $resolvefile
sed -i "s/{{domain}}/$domain/g" $resolvefile

#Update netplan file

echo "Updating netplan file $intfile"
sed -i "s/{{ip1}}/$ip1/g" $intfile
sed -i "s/{{ip2}}/$ip2/g" $intfile
sed -i "s/{{ip3}}/$ip3/g" $intfile
sed -i "s/{{ip4}}/$ip4/g" $intfile
sed -i "s/{{ip5}}/$ip5/g" $intfile
sed -i "s/{{ip6}}/$ip6/g" $intfile

sed -i "s/{{gateway}}/$gateway/g" $intfile
sed -i "s/{{hostname}}/$host/g" $hostfile
sed -i "s/{{domain}}/$domain/g" $hostfile
sed -i "s/{{ip1}}/$ip1/g" $hostfile
echo "================================"
datestr=`date`

echo "$datestr : Baking customization to base file."

virt-customize -a $target/$host.qcow2 --copy-in $intfile:/etc/netplan/ --copy-in $resolvefile:/etc/systemd/ --copy-in $hostfile:/etc/ --hostname $host --timezone Asia/Kolkata --firstboot-command '/usr/local/bin/resizedisk'

datestr=`date`

echo "$datestr : Exporting base file to /dev/$diskname"
qemu-img convert -O raw $target/$host.qcow2 /dev/$diskname
sync

Key steps inside preparevm.sh

Generate all IPs

Based on the last octet:

10.0.1.x
10.0.2.x
10.0.3.x
10.0.4.x
10.0.5.x
10.0.6.x

Clone base.qcow2

rsync -ah --progress base.qcow2 $target/$host.qcow2

Copy & customize templates

All of these are updated with sed:

  • /etc/netplan/50-cloud-init.yaml
  • /etc/systemd/resolved.conf
  • /etc/hosts

Inject customized files into the VM image

Using virt-customize:

virt-customize -a $target/$host.qcow2 --copy-in ...

Also sets hostname, timezone, and firstboot resizing logic.

Export to block device

Raw image is written directly to the VM disk:

qemu-img convert -O raw ... /dev/$diskname

This allows VMs to use fast, RAID-backed raw partitions instead of QCOW.


7. Script: launch.sh

Purpose:
Coordinate deletion, customization, guest XML generation, and VM start.

#!/bin/bash
vcpu="4"
ram="8"
host="server5"
ip=""
vmname=""
vgpu="none"
flavor="ubuntu22"
diskname=""
while getopts ":c:v:r:g:" opt; do
    case $opt in
        v) vmname="$OPTARG" ;;
        c) vcpu="$OPTARG" ;;
        r) ram="$OPTARG" ;;
        g) vgpu="$OPTARG" ;;
        \?) echo "Invalid option: -$OPTARG" >&2; exit 1 ;;
        :) echo "Option -$OPTARG requires an argument." >&2; exit 1 ;;
    esac
done
if [ -z "$vmname" ]; then
    echo "VM name not provided."
    exit 1
fi

datestr=`date`
echo " "
echo "Start : $datestr"
echo " "
ip=`nslookup "$vmname" | grep "Address:" | grep -v "#53" |cut -d " " -f2 | cut -d "." -f4`
if [ -z "$ip" ]; then
    echo "IP address lookup failed for $vmname."
    exit 1
else
    echo " "
    echo "IP look up succeeded : 10.0.1.$ip"
    echo " "
fi
diskname=$vmname
if [ -e /dev/$vmname ]; then
    echo "/dev/$vmname exists. Proceeding..."
else
    echo "ERROR: /dev/$vmname does not exist!"
    exit 1
fi
remote_uri="qemu+ssh://root@$host/system"

echo "==========================================================================================="
echo "Server        : $host"
echo "VM Name       : $vmname"
echo "vCPUs         : $vcpu"
echo "Memory        : $ram GB"
echo "Disk          : /dev/$diskname"
echo "Management IP : 10.0.1.$ip"
echo "VGPU ID       : $vgpu"
echo "==========================================================================================="
if [ "$vgpu" == "none" ]; then
    withvgpu="false"
else
    withgpu="true"
fi
exit 0
datestr=`date`
echo "$datestr : Deleting $host:$vmname (if existing)"
./deletevm.sh -v $vmname
base=/root/kvm-local
domtemplates=/root/kvm-local/
datestr=`date`
echo " "
echo "$datestr : Customizing"
echo " "
./preparevm.sh $vmname $vcpu $ram $diskname $ip $withvgpu
datestr=`date`
echo " "
domfile="$base/$vmname/$vmname.xml"
if [ "$withvgpu" == "false" ]; then
    echo "Creating template (without VGPU) : Copying $domtemplates/node-novgpu.xml to $domfile"
    cp $domtemplates/node-novgpu.xml $domfile
else
    echo "Creating template (with VGPU) : Copying $domtemplates/node-gpu.xml to $domfile"
    cp $domtemplates/node-gpu.xml $domfile
fi
echo " "
sed -i "s/{{vmname}}/$vmname/g" $domfile
sed -i "s/{{ram}}/$ram/g" $domfile
sed -i "s/{{vcpus}}/$vcpu/g" $domfile
sed -i "s/{{diskname}}/$diskname/g" $domfile
sed -i "s/{{vgpuid}}/$vgpu/g" $domfile
datestr=`date`
echo " "
echo "$datestr : Launching"
echo " "
virsh -c "$remote_uri" define $domfile
virsh -c "$remote_uri" start $vmname
virsh -c "$remote_uri" autostart $vmname
datestr=`date`
echo " "
echo "$datestr : Deleting temporary custome image : $datestr"
echo " "
echo "Deleting generated custom image..."
rm -f $base/$vmname/$vmname.qcow2
datestr=`date`
echo " "
echo "$datestr : Completed"
echo " "

Key responsibilities

  • Accepts CPU, RAM, IP, disk, and VGPU options
  • Calls deletevm.sh
  • Calls preparevm.sh
  • Selects appropriate libvirt template:
    • node-gpu.xml if vGPU ID provided
    • node-novgpu.xml otherwise
  • Updates placeholders in XML:
    • {{vmname}}
    • {{vcpus}}
    • {{ram}}
    • {{diskname}}
    • {{vgpuid}}

Launch sequence

virsh define node.xml
virsh start vmname
virsh autostart vmname

Finally, removes the temporary QCOW2 image used during customization.


8. Libvirt Templates

Two templates are used: one for GPU-enabled VMs and one without.

node-gpu.xml

  • Adds <hostdev> block with the vGPU UUID
  • Six bridged interfaces (br1–br6)
  • Raw block device attached as /dev/{{diskname}}

The XML supports mdev/vfio-based vGPU passthrough.

node-novgpu.xml

Same as above, but without the <hostdev> section.

<domain type='kvm'>
  <name>{{vmname}}</name>
  <vcpu placement='static'>{{vcpus}}</vcpu>
  <memory unit='GiB'>{{ram}}</memory>
  <os>
    <type arch='x86_64' machine='pc-q35-6.2'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode='host-passthrough' check='none'/>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>

    <emulator>/usr/bin/qemu-system-x86_64</emulator>

    <controller type='usb' model='qemu-xhci'/>
    <controller type='pci' model='pcie-root'/>
    <controller type='sata' index='0'/>
    <controller type='virtio-serial' index='0'/>


    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source dev='/dev/{{diskname}}'/>
      <target dev='vda' bus='virtio'/>
    </disk>

    <interface type='bridge'>
      <source bridge='br1'/>
      <model type='virtio'/>
      <driver name='vhost' queues='1'/>
    </interface>

    <interface type='bridge'>
      <source bridge='br2'/>
      <model type='virtio'/>
      <driver name='vhost' queues='2'/>
    </interface>

    <interface type='bridge'>
      <source bridge='br3'/>
      <model type='virtio'/>
      <driver name='vhost' queues='{{vcpus}}'/>
    </interface>

    <interface type='bridge'>
      <source bridge='br4'/>
      <model type='virtio'/>
      <driver name='vhost' queues='{{vcpus}}'/>
    </interface>

    <interface type='bridge'>
      <source bridge='br5'/>
      <model type='virtio'/>
      <driver name='vhost' queues='{{vcpus}}'/>
    </interface>

    <interface type='bridge'>
      <source bridge='br6'/>
      <model type='virtio'/>
      <driver name='vhost' queues='{{vcpus}}'/>
    </interface>

    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='tablet' bus='usb'/>
    <input type='keyboard' bus='ps2'/>

    <graphics type='spice' autoport='yes' listen='0.0.0.0'>
      <listen type='address' address='0.0.0.0'/>
    </graphics>
    <video>
      <model type='vga' vram='16384' heads='1' primary='yes'/>
    </video>

    <memballoon model='virtio'/>
  </devices>
</domain>

Both templates use:

  • machine='pc-q35-6.2'
  • cpu mode='host-passthrough'
  • virtio drivers
  • SPICE graphics access for console
  • Memory ballooning

9. End-to-End Flow

  1. launch.sh invoked
  2. deletevm.sh ensures a clean state
  3. preparevm.sh clones base image & customizes it
  4. libvirt XML template generated
  5. VM is defined, started, and set to autostart
  6. Temporary QCOW2 deleted
  7. VM ready with deterministic networking and full configuration

Usage Example: Creating a New VM (“registry”)

This example demonstrates how the orchestration scripts work together to provision a new VM called registry, allocate it a block device via udev (/dev/registry), configure networking, and launch it with libvirt.


1. Confirm Existing Running VMs

root@server5:~/kvm-local# virsh list
 Id   Name   State
----------------------
 2    blog   running
 3    mdb    running

2. DNS Lookup for the New Host

root@server5:~/kvm-local# nslookup registry
Server:         10.0.0.1
Address:        10.0.0.1#53

Name:   registry.yourdomain.com
Address: 10.0.1.164

3. Confirm the Backing Block Device

root@server5:~/kvm-local# ls -ltr /dev/registry
lrwxrwxrwx 1 root root 4 Nov 23 20:19 /dev/registry -> sdb6

The /dev/registry symlink is created from udev rules and always points to the correct RAID-backed partition.


4. Launch the VM via orchestrator

root@server5:~/kvm-local# ./launch.sh -v registry -c 4 -r 8 -i 164 -d registry
  • VM Name: registry
  • vCPUs: 4
  • RAM: 8 GB
  • Management IP: 10.0.1.164
  • Disk: /dev/registry
  • VGPU: none

5. Sample Output

l# ./launch.sh -v registry -c 4 -r 8

Start : Mon Nov 24 08:09:29 AM IST 2025


IP look up succeeded : 10.0.1.164

/dev/registry exists. Proceeding...
===========================================================================================
Server        : server5
VM Name       : registry
vCPUs         : 4
Memory        : 8 GB
Disk          : /dev/registry
Management IP : 10.0.1.164
VGPU ID       : none
===========================================================================================
root@server5:~/kvm-local# nano launch.sh
root@server5:~/kvm-local# ./launch.sh -v registry -c 4 -r 8

Start : Mon Nov 24 08:09:41 AM IST 2025


IP look up succeeded : 10.0.1.164

/dev/registry exists. Proceeding...
===========================================================================================
Server        : server5
VM Name       : registry
vCPUs         : 4
Memory        : 8 GB
Disk          : /dev/registry
Management IP : 10.0.1.164
VGPU ID       : none
===========================================================================================
Mon Nov 24 08:09:41 AM IST 2025 : Deleting server5:registry (if existing)
Deleting server5:registry (if existing)

Mon Nov 24 08:09:42 AM IST 2025 : Customizing

Copying base image to instance folder...

sending incremental file list
base.qcow2
          1.35G 100%  328.75MB/s    0:00:03 (xfr#1, to-chk=0/1)

Copying /root/kvm-local/ubuntu22/resolved.conf to /root/kvm-local/registry/resolved.conf
Copying /root/kvm-local/ubuntu22/50-cloud-init.yaml /root/kvm-local/registry/50-cloud-init.yaml
copying /root/kvm-local/ubuntu22/hosts /root/kvm-local/registry/hosts
Resizing /root/kvm-local/registry/registry.qcow2 to ...
Customizing vm....
Updating netplan file /root/kvm-local/registry/50-cloud-init.yaml
================================
Mon Nov 24 08:09:46 AM IST 2025 : Baking customization to base file.
[   0.0] Examining the guest ...
[   4.7] Setting a random seed
[   4.8] Setting the machine ID in /etc/machine-id
[   4.8] Copying: /root/kvm-local/registry/50-cloud-init.yaml to /etc/netplan/
[   4.8] Copying: /root/kvm-local/registry/resolved.conf to /etc/systemd/
[   4.8] Copying: /root/kvm-local/registry/hosts to /etc/
[   4.8] Setting the hostname: registry
[   6.5] Setting the timezone: Asia/Kolkata
[   6.5] Installing firstboot command: /usr/local/bin/resizedisk
[   6.7] Finishing off
Mon Nov 24 08:09:53 AM IST 2025 : Exporting base file to /dev/registry

Creating template (without VGPU) : Copying /root/kvm-local//node-novgpu.xml to /root/kvm-local/registry/registry.xml


Mon Nov 24 08:10:14 AM IST 2025 : Launching

Domain 'registry' defined from /root/kvm-local/registry/registry.xml

Domain 'registry' started

Domain 'registry' marked as autostarted


Mon Nov 24 08:10:19 AM IST 2025 : Deleting temporary custome image : Mon Nov 24 08:10:19 AM IST 2025

Deleting generated custom image...

Mon Nov 24 08:10:19 AM IST 2025 : Completed

6. Verify the VM is Running

root@server5:~/kvm-local# virsh list
 Id   Name       State
--------------------------
 2    blog       running
 3    mdb        running
 5    registry   running

7. SSH into the Newly Created VM

root@server5:~/kvm-local# ssh registry

Verify swap, CPU, memory, storage:

Memory

root@registry:~# free
               total        used        free      shared  buff/cache   available
Mem:         8132220      165212     7759452        1156      207556     7723692
Swap:              0           0           0

CPU

root@registry:~# lscpu
Architecture:             x86_64
CPU(s):                   4
Virtualization:           KVM

Disk

root@registry:~# df -m
Filesystem     1M-blocks  Used Available Use% Mounted on
/dev/vda2         705419  3219    673410   1% /

Everything is configured exactly as intended:

  • vCPU, RAM, hostname, IPs
  • Netplan configuration
  • systemd-resolved
  • udev-backed raw block device
  • zero swap
  • correct cloud template applied

Conclusion

This example demonstrates the full lifecycle of provisioning a VM using the orchestration framework:

  1. Resolve DNS and udev-backed disk
  2. Delete old VM
  3. Prepare new image via virt-customize
  4. Generate libvirt XML
  5. Define and start the VM
  6. Autostart enablement
  7. Clean-up of temporary data
  8. Confirm operation via SSH

The process is deterministic, repeatable, and production-grade, suitable for deploying VMs.