Custom Cloud Image for AI Workloads

This document describes the process for creating a reusable, deterministic Ubuntu 24.04 QCOW2 base image optimized for GPU-accelerated AI workloads. The resulting image is intended to be cloned and customized by automation/orchestration tooling.


1. Base OS Installation

  • Operating System: Ubuntu Server 24.04 LTS
  • Kernel: GA kernel (6.8.0) — avoid HWE
  • Disk Size: 50 GB (future-proof; base image only)

Why avoid the HWE kernel in the base image

  • GA kernel (6.8.0) provides maximum stability and predictable behavior.
  • Reduces DKMS rebuild failures during NVIDIA driver installation.
  • Avoids kernel churn across clones.fstrin

2. System Update

apt update -y
apt upgrade -y

3. SSH Configuration (Remote Root Access)

Update SSH daemon configuration

Edit /etc/ssh/sshd_config and ensure the following are enabled:

PermitRootLogin yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys .ssh/authorized_keys2

Update SSH client configuration

Edit /etc/ssh/ssh_config:

StrictHostKeyChecking no

Set root password

passwd

Generate SSH key (optional, for automation)

ssh-keygen

4. Clean Login Noise (MOTD)

Disable MOTD messages

Edit /etc/pam.d/ssh and comment out:

#session optional pam_motd.so motd=/run/motd.dynamic
#session optional pam_motd.so noupdate
#session optional pam_mail.so standard noenv

Disable MOTD news

Edit /etc/default/motd-news:

ENABLED=0

5. Remove Snap and snapd

snap list
snap remove lxd
snap remove core20
snap remove snapd
apt purge --remove snapd
rm -rf /root/snap/

6. Disable Swap (AI-friendly, deterministic memory behavior)

systemctl list-units | grep swap
systemctl stop swap.img.swap swap.target
systemctl disable swap.img.swap swap.target
systemctl mask swap.img.swap swap.target
swapoff -a
rm -f /swap.img

Remove swap entries from /etc/fstab.


8. Cloud-Init Networking Cleanup

rm -f /etc/cloud/cloud.cfg.d/90-installer-network.cfg

Disable cloud-init networking entirely:

echo "network: {config: disabled}" > /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg

9. Limit Systemd Journal Size

sed -i 's/#SystemMaxFileSize=.*/SystemMaxFileSize=512M/' /etc/systemd/journald.conf

10. systemd-resolved Configuration

Edit /etc/systemd/resolved.conf:

DNS=<Your DNS Server IP>
FallbackDNS=8.8.8.8
Domains=<Your domain>
DNSStubListener=no

Apply:

ln -fs /run/systemd/resolve/resolv.conf /etc/resolv.conf
systemctl restart systemd-resolved

11. Install Base Utilities

apt install -y \
  net-tools rsyslog bc fio iperf3 gnupg2 \
  software-properties-common lvm2 nfs-common jq

12. Disable Unwanted Timers

systemctl list-units | grep timer

Disable:

systemctl stop apt-daily-upgrade.timer apt-daily.timer fwupd-refresh.timer \
               motd-news.timer update-notifier-download.timer update-notifier-motd.timer

systemctl disable apt-daily-upgrade.timer apt-daily.timer fwupd-refresh.timer \
                  motd-news.timer update-notifier-download.timer update-notifier-motd.timer

systemctl mask apt-daily-upgrade.timer apt-daily.timer fwupd-refresh.timer \
             motd-news.timer update-notifier-download.timer update-notifier-motd.timer

13. Disable Unattended Services

systemctl stop unattended-upgrades.service apparmor ufw ubuntu-advantage-tools
systemctl disable unattended-upgrades.service apparmor ufw ubuntu-advantage-tools
systemctl mask unattended-upgrades.service apparmor ufw ubuntu-advantage-tools

14. GRUB updates

Edit /etc/default/grub and update following lines

GRUB_CMDLINE_LINUX_DEFAULT=”net.ifnames=0 biosdevname=0 cpufreq.default_governor=performance”
GRUB_TERMINAL=”console serial”
GRUB_SERIAL_COMMAND=”serial –speed=115200 –unit=0 –word=8 –parity=no –stop=1″

net.ifnames=0 & biosdevname=0: Disables predictable naming to force the classic eth0 convention. This ensures that manually injected Netplan configurations “hit” the correct interface without needing to probe for hardware-specific names (like enp0s3) during orchestration.

cpufreq.default_governor=performance: Eliminates CPU frequency scaling latency. The VM operates at maximum clock speed immediately upon boot, which is critical for consistent AI workload performance.

GRUB_CMDLINE_LINUX="": Kept empty to ensure the kernel parameters remain modular and easily overridable via the default string.

GRUB_TERMINAL="console serial": Dual-routes the bootloader output to both virtual VGA and the serial port. This allows orchestration logs to be captured via virsh console even if the VM is headless.

GRUB_SERIAL_COMMAND: Standardizes the serial interface at 115200 baud, ensuring that host-side monitoring scripts can reliably parse boot and kernel messages from the start.

Disable Nouveau (Blacklist)

Even if the driver isn’t fully active, Nouveau can sometimes “touch” the hardware during boot, which interferes with the NVIDIA driver installation or VFIO binding later.

Create a blacklist file: sudo nano /etc/modprobe.d/blacklist-nouveau.conf

Add these lines:

blacklist nouveau
options nouveau modeset=0

Load VFIO Modules at Boot

For the guest to properly handle the passed-through hardware, ensure the VFIO modules are loaded into the kernel early.

Open the modules file: sudo nano /etc/modules

Append these lines:

vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

Final Image Sync

Since you’ve modified modules and blacklists, you must rebuild the initramfs and update GRUB to ensure these settings are baked into the early boot process:

sudo update-initramfs -u
sudo update-grub

reboot


15. Kernel Headers and Toolchain

apt install -y linux-headers-$(uname -r) build-essential dkms gcc make

16. GPU Drivers

Driver selection

  • Use the Ubuntu-recommended NVIDIA driver (via nvidia-detector)
  • Driver version 580 chosen for:
    • NVIDIA L4 compatibility
    • Stability on kernel 6.8
    • Forward CUDA compatibility

Install driver and utilities

apt install -y nvidia-driver-580-server nvidia-utils-580-server

CUDA runtime is intentionally not installed system-wide.

17. Python Runtime, IO utility

Based on trial and errors on dependencies (specifically flash-attn / vllm) CUDA tool kit 12.8 was identified as ideal candidate.

# 1. Download the Keyring
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb

# 2. Install it
dpkg -i cuda-keyring_1.1-1_all.deb

# 3. Update apt cache
apt update -y
apt upgrade -y

apt install -y cuda-toolkit-12-8

Add the following to /etc/bash.bashrc (after adding logout and login once)

export CUDA_HOME=/usr/local/cuda-12.8

# Update PATH: Put CUDA first to override system defaults
export PATH=${CUDA_HOME}/bin:${PATH}

apt install -y python3 python3-pip poppler-utils libaio-dev

18. Add environmental variables, Create Virtual Env

Create directories used by Hugging Face

mkdir -p /var/lib/huggingface/{hub,transformers}

chmod -R 755 /var/lib/huggingface

Add the following to /etc/environment

HF_HOME=/var/lib/huggingface
TRANSFORMERS_CACHE=/var/lib/huggingface/transformers
HF_HUB_CACHE=/var/lib/huggingface/hubHF_HOME=/var/lib/huggingface
TOKENIZERS_PARALLELISM=false

Create venv

# 1. Install the venv tool
apt install -y python3-venv

# 2. Create the environment
python3 -m venv /opt/ai-env

# Add activation to the end of the root bash profile
echo 'source /opt/ai-env/bin/activate' >> ~/.bashrc

# Reload the profile for the current session
source ~/.bashrc

19. Core Deep Learning Framework

PyTorch (CUDA 12.8 runtime via wheels)

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
  • PyTorch wheels ship the CUDA runtime internally.
  • Avoids system-level CUDA toolkit dependency.
  • Decouples framework upgrades from the OS image.

20. Scientific Python Stack

pip install numpy scipy pandas

21. Hugging Face Ecosystem

pip install transformers tokenizers safetensors peft vllm datasets

22. GPU-Aware Utilities

pip install packaging ninja wheel
pip install nvidia-cufile-cu12 accelerate bitsandbytes torch-c-dlpack-ext deepspeed

# Add to the end of /etc/bash.bashrc (for interactive shells)

# 1. Define the library path for the AI Virtual Environment
export CUFILE_LIB="/opt/ai-env/lib/python3.12/site-packages/nvidia/cufile/lib"

# 2. Add it to LD_LIBRARY_PATH so vLLM finds the GDS drivers
export LD_LIBRARY_PATH="${CUFILE_LIB}${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"

Compile Flash Attention : Theis operation takes a lot of time. To reduce time, I had allocated 24 full cores and 128G RAM so that I can set high concurrency of 10

export MAX_JOBS=1
pip install flash-attn

Configure Accelerate Run the configuration wizard.

accelerate config 

Below are the key responses for a single-node, multi-GPU setup (or single GPU) using DeepSpeed.

(ai-env) root@gpunode:~# accelerate config
/opt/ai-env/lib/python3.12/site-packages/torch/cuda/__init__.py:1064: UserWarning: Can't initialize NVML
  raw_cnt = _raw_device_count_nvml()
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------In which compute environment are you running?
This machine
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Which type of machine are you using?
No distributed training
Do you want to run your training on CPU only (even if a GPU / Apple Silicon / Ascend NPU device is available)? [yes/NO]:NO
Do you wish to optimize your script with torch dynamo?[yes/NO]:NO
Do you want to use DeepSpeed? [yes/NO]: NO
What GPU(s) (by id) should be used for training on this machine as a comma-separated list? [all]:all
Would you like to enable numa efficiency? (Currently only supported on NVIDIA hardware). [yes/NO]: NO
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Do you wish to use mixed precision?
bf16
accelerate configuration saved at /var/lib/huggingface/accelerate/default_config.yaml
(ai-env) root@gpunode:~#

23. Inference Server Clients

Install the standard client library to communicate with local inference engines.

pip install openai

24. I/O Utility / Dataprocessing

pip install pdf2image pillow

25. Web & API Infrastructure

pip install fastapi uvicorn[standard] python-multipart pydantic 

26. Disk Resize on First Boot

Create /usr/local/bin/resizedisk:

#!/bin/bash
growpart /dev/vda 2
partx --update /dev/vda2
resize2fs /dev/vda2
systemctl stop guestfs-firstboot.service
systemctl disable guestfs-firstboot.service
rm -f /root/*.log

Set permissions:

chmod +x /usr/local/bin/resizedisk

27. Filesystem Optimization and Compaction

e4defrag /
fstrim -av
dd if=/dev/zero of=/zero.fill bs=1M status=progress
rm -f /zero.fill
fstrim -av

28. Final Cleanup and Shutdown

truncate -s 0 /etc/machine-id (Essential for DHCP—many servers use this ID rather than the MAC address to assign IPs).

SSH Host Keys: rm /etc/ssh/ssh_host_* (Ensures each VM generates its own unique fingerprint on first boot).

cat <<EOF > /etc/rc.local
#!/bin/bash
if [ ! -f /etc/ssh/ssh_host_rsa_key ]; then
    ssh-keygen -A
    systemctl restart ssh
fi
EOF
chmod +x /etc/rc.local

history -c
shutdown -h now

29. Export Base QCOW2 Image

Compress and finalize the base image:

virt-sparsify --compress \
  /var/lib/libvirt/images/ubuntu24g.qcow2 \
  /root/kvm-local/ubuntu24/baseg.qcow2