Custom Cloud Image for AI Workloads

This document describes the process for creating a reusable, deterministic Ubuntu 24.04 QCOW2 base image optimized for GPU-accelerated AI workloads. The resulting image is intended to be cloned and customized by automation/orchestration tooling.


1. Base OS Installation

  • Operating System: Ubuntu Server 24.04 LTS
  • Kernel: GA kernel (6.8.0) — avoid HWE
  • Disk Size: 50 GB (future-proof; base image only)

Why avoid the HWE kernel in the base image

  • GA kernel (6.8.0) provides maximum stability and predictable behavior.
  • Reduces DKMS rebuild failures during NVIDIA driver installation.
  • Avoids kernel churn across clones.fstrin

2. System Update

apt update -y
apt upgrade -y

3. SSH Configuration (Remote Root Access)

Update SSH daemon configuration

Edit /etc/ssh/sshd_config and ensure the following are enabled:

PermitRootLogin yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys .ssh/authorized_keys2

Update SSH client configuration

Edit /etc/ssh/ssh_config:

StrictHostKeyChecking no

Set root password

passwd

Generate SSH key (optional, for automation)

ssh-keygen

4. Clean Login Noise (MOTD)

Disable MOTD messages

Edit /etc/pam.d/ssh and comment out:

#session optional pam_motd.so motd=/run/motd.dynamic
#session optional pam_motd.so noupdate
#session optional pam_mail.so standard noenv

Disable MOTD news

Edit /etc/default/motd-news:

ENABLED=0

5. Remove Snap and snapd (and plymouth)

snap list
snap remove lxd
snap remove core20
snap remove snapd
apt purge --remove snapd
rm -rf /root/snap/
apt remove --purge plymouth

6. Disable Swap (AI-friendly, deterministic memory behavior)

systemctl list-units | grep swap
systemctl stop swap.img.swap swap.target
systemctl disable swap.img.swap swap.target
systemctl mask swap.img.swap swap.target
swapoff -a
rm -f /swap.img

Remove swap entries from /etc/fstab.


8. Cloud-Init Networking Cleanup

rm -f /etc/cloud/cloud.cfg.d/90-installer-network.cfg

Disable cloud-init networking entirely:

echo "network: {config: disabled}" > /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg

9. Limit Systemd Journal Size

sed -i 's/#SystemMaxFileSize=.*/SystemMaxFileSize=512M/' /etc/systemd/journald.conf

10. systemd-resolved Configuration

Edit /etc/systemd/resolved.conf:

DNS=<Your DNS Server IP>
FallbackDNS=8.8.8.8
Domains=<Your domain>
DNSStubListener=no

Apply:

ln -fs /run/systemd/resolve/resolv.conf /etc/resolv.conf
systemctl restart systemd-resolved

11. Install Base Utilities

apt install -y \
  net-tools rsyslog bc fio iperf3 gnupg2 \
  software-properties-common lvm2 nfs-common jq

12. Disable Unwanted Timers

systemctl list-units | grep timer

Disable:

systemctl stop apt-daily-upgrade.timer apt-daily.timer fwupd-refresh.timer \
               motd-news.timer update-notifier-download.timer update-notifier-motd.timer

systemctl disable apt-daily-upgrade.timer apt-daily.timer fwupd-refresh.timer \
                  motd-news.timer update-notifier-download.timer update-notifier-motd.timer

systemctl mask apt-daily-upgrade.timer apt-daily.timer fwupd-refresh.timer \
             motd-news.timer update-notifier-download.timer update-notifier-motd.timer

13. Disable Unattended Services

systemctl stop unattended-upgrades.service apparmor ufw ubuntu-advantage-tools
systemctl disable unattended-upgrades.service apparmor ufw ubuntu-advantage-tools
systemctl mask unattended-upgrades.service apparmor ufw ubuntu-advantage-tools

14. GRUB updates

Edit /etc/default/grub and update the following lines

GRUB_CMDLINE_LINUX_DEFAULT=”net.ifnames=0 biosdevname=0 cpufreq.default_governor=performance”
GRUB_TERMINAL=”console serial”
GRUB_SERIAL_COMMAND=”serial –speed=115200 –unit=0 –word=8 –parity=no –stop=1″

net.ifnames=0 & biosdevname=0: Disables predictable naming to force the classic eth0 convention. This ensures that manually injected Netplan configurations “hit” the correct interface without needing to probe for hardware-specific names (like enp0s3) during orchestration.

cpufreq.default_governor=performance: Eliminates CPU frequency scaling latency. The VM operates at maximum clock speed immediately upon boot, which is critical for consistent AI workload performance.

GRUB_CMDLINE_LINUX="": Kept empty to ensure the kernel parameters remain modular and easily overridable via the default string.

GRUB_TERMINAL="console serial": Dual-routes the bootloader output to both virtual VGA and the serial port. This allows orchestration logs to be captured via virsh console even if the VM is headless.

GRUB_SERIAL_COMMAND: Standardizes the serial interface at 115200 baud, ensuring that host-side monitoring scripts can reliably parse boot and kernel messages from the start.

Disable Nouveau (Blacklist)

Even if the driver isn’t fully active, Nouveau can sometimes “touch” the hardware during boot, which interferes with the NVIDIA driver installation or VFIO binding later.

Create a blacklist file: sudo nano /etc/modprobe.d/blacklist-nouveau.conf

Add these lines:

blacklist nouveau
options nouveau modeset=0

Load VFIO Modules at Boot

To ensure the guest can properly handle the passed-through hardware, load the VFIO modules into the kernel early.

Open the modules file: sudo nano /etc/modules

Append these lines:

vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

Final Image Sync

Since you’ve modified modules and blacklists, you must rebuild the initramfs and update GRUB to ensure these settings are baked into the early boot process:

sudo update-initramfs -u
sudo update-grub

reboot


15. Kernel Headers and Toolchain

apt install -y linux-headers-$(uname -r) build-essential dkms gcc make libboost-program-options-dev cmake ninja-build

16. GPU Drivers

Driver selection

  • Use the Ubuntu-recommended NVIDIA driver (via nvidia-detector)
  • Driver version 580 chosen for:
    • NVIDIA L4 compatibility
    • Stability on kernel 6.8
    • Forward CUDA compatibility

Install driver and utilities

apt install -y nvidia-driver-580-server nvidia-utils-580-server

CUDA runtime is intentionally not installed system-wide.

17. Python Runtime, IO utility

Based on trial-and-error with dependencies (specifically flash-attn/vllm), CUDA Toolkit 12.8 was identified as an ideal candidate.

# 1. Download the Keyring
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb

# 2. Install it
dpkg -i cuda-keyring_1.1-1_all.deb

# 3. Update apt cache
apt update -y
apt upgrade -y

apt install -y cuda-toolkit-12-8

Add the following to /etc/bash.bashrc (after adding logout and login once)

export CUDA_HOME=/usr/local/cuda-12.8

# Update PATH: Put CUDA first to override system defaults
export PATH=${CUDA_HOME}/bin:${PATH}

apt install -y python3 python3-pip poppler-utils libaio-dev

18. Add environmental variables, Create Virtual Env

Create directories used by Hugging Face

mkdir -p /var/lib/huggingface/{hub,transformers}

chmod -R 755 /var/lib/huggingface

Add the following to /etc/environment

HF_HOME=/var/lib/huggingface
TRANSFORMERS_CACHE=/var/lib/huggingface/transformers
HF_HUB_CACHE=/var/lib/huggingface/hubHF_HOME=/var/lib/huggingface
TOKENIZERS_PARALLELISM=false

Create venv

# 1. Install the venv tool
apt install -y python3-venv

# 2. Create the environment
python3 -m venv /opt/ai-env

# Add activation to the end of the root bash profile
echo 'source /opt/ai-env/bin/activate' >> ~/.bashrc

# Reload the profile for the current session
source ~/.bashrc

19. Install llama server

cd /opt
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build-cuda -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES="89" -DLLAMA_BUILD_EXAMPLES=OFF -DLLAMA_BUILD_TESTS=OFF -DLLAMA_BUILD_SERVER=ON
cmake --build build-cuda --config Release -j $(nproc)
cmake --install build-cuda
sudo ldconfig
cd ~
rm -rf /opt/llama.cpp

20. Disk Resize on First Boot

Create /usr/local/bin/resizedisk:

#!/bin/bash
growpart /dev/vda 2
partx --update /dev/vda2
resize2fs /dev/vda2
systemctl stop guestfs-firstboot.service
systemctl disable guestfs-firstboot.service
rm -f /root/*.log

Set permissions:

chmod +x /usr/local/bin/resizedisk

21. GPU Temperature monitoring script.

Script updates the current GPU temperature on the host in/opt/nvidia/gputempt.txt. The value is periodically read by a script in the host, based on temperature. FAN Speeds are adjusted to keep the temperature within limits. Note that passwordless SSH must be enabled before the crontab entry that runs this script is added.

Create /opt/nvidia/updatetemp.sh

 #!/bin/bash

# Define variables
REMOTE_HOST="serverxxxx"
REMOTE_FILE="/opt/nvidia/gputemp.txt"

# Get the temperature
# nounits removes the 'C' so you just get the number (easier for parsing later)
TEMP=$(nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader,nounits | head -n1)

# Send to remote server
# The quotes around 'cat >> ...' ensure the redirection happens on the REMOTE server, not local.
echo "$TEMP" | ssh "$REMOTE_HOST" "cat > $REMOTE_FILE"

22. Filesystem Optimization and Compaction

e4defrag /
fstrim -av
dd if=/dev/zero of=/zero.fill bs=1M status=progress
rm -f /zero.fill
fstrim -av

23. Final Cleanup and Shutdown

truncate -s 0 /etc/machine-id (Essential for DHCP—many servers use this ID rather than the MAC address to assign IPs).

history -c
shutdown -h now

24. Export Base QCOW2 Image

Compress and finalize the base image:

virt-sparsify --compress \
  /var/lib/libvirt/images/ubuntu24g.qcow2 \
  /root/kvm-local/ubuntu24/baseg.qcow2