This document describes the process for creating a reusable, deterministic Ubuntu 24.04 QCOW2 base image optimized for GPU-accelerated AI workloads. The resulting image is intended to be cloned and customized by automation/orchestration tooling.
1. Base OS Installation
- Operating System: Ubuntu Server 24.04 LTS
- Kernel: GA kernel (6.8.0) — avoid HWE
- Disk Size: 50 GB (future-proof; base image only)
Why avoid the HWE kernel in the base image
- GA kernel (6.8.0) provides maximum stability and predictable behavior.
- Reduces DKMS rebuild failures during NVIDIA driver installation.
- Avoids kernel churn across clones.fstrin
2. System Update
apt update -y
apt upgrade -y
3. SSH Configuration (Remote Root Access)
Update SSH daemon configuration
Edit /etc/ssh/sshd_config and ensure the following are enabled:
PermitRootLogin yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys .ssh/authorized_keys2
Update SSH client configuration
Edit /etc/ssh/ssh_config:
StrictHostKeyChecking no
Set root password
passwd
Generate SSH key (optional, for automation)
ssh-keygen
4. Clean Login Noise (MOTD)
Disable MOTD messages
Edit /etc/pam.d/ssh and comment out:
#session optional pam_motd.so motd=/run/motd.dynamic
#session optional pam_motd.so noupdate
#session optional pam_mail.so standard noenv
Disable MOTD news
Edit /etc/default/motd-news:
ENABLED=0
5. Remove Snap and snapd
snap list
snap remove lxd
snap remove core20
snap remove snapd
apt purge --remove snapd
rm -rf /root/snap/
6. Disable Swap (AI-friendly, deterministic memory behavior)
systemctl list-units | grep swap
systemctl stop swap.img.swap swap.target
systemctl disable swap.img.swap swap.target
systemctl mask swap.img.swap swap.target
swapoff -a
rm -f /swap.img
Remove swap entries from /etc/fstab.
8. Cloud-Init Networking Cleanup
rm -f /etc/cloud/cloud.cfg.d/90-installer-network.cfg
Disable cloud-init networking entirely:
echo "network: {config: disabled}" > /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg
9. Limit Systemd Journal Size
sed -i 's/#SystemMaxFileSize=.*/SystemMaxFileSize=512M/' /etc/systemd/journald.conf
10. systemd-resolved Configuration
Edit /etc/systemd/resolved.conf:
DNS=<Your DNS Server IP>
FallbackDNS=8.8.8.8
Domains=<Your domain>
DNSStubListener=no
Apply:
ln -fs /run/systemd/resolve/resolv.conf /etc/resolv.conf
systemctl restart systemd-resolved
11. Install Base Utilities
apt install -y \
net-tools rsyslog bc fio iperf3 gnupg2 \
software-properties-common lvm2 nfs-common jq
12. Disable Unwanted Timers
systemctl list-units | grep timer
Disable:
systemctl stop apt-daily-upgrade.timer apt-daily.timer fwupd-refresh.timer \
motd-news.timer update-notifier-download.timer update-notifier-motd.timer
systemctl disable apt-daily-upgrade.timer apt-daily.timer fwupd-refresh.timer \
motd-news.timer update-notifier-download.timer update-notifier-motd.timer
systemctl mask apt-daily-upgrade.timer apt-daily.timer fwupd-refresh.timer \
motd-news.timer update-notifier-download.timer update-notifier-motd.timer
13. Disable Unattended Services
systemctl stop unattended-upgrades.service apparmor ufw ubuntu-advantage-tools
systemctl disable unattended-upgrades.service apparmor ufw ubuntu-advantage-tools
systemctl mask unattended-upgrades.service apparmor ufw ubuntu-advantage-tools
14. GRUB updates
Edit /etc/default/grub and update following lines
GRUB_CMDLINE_LINUX_DEFAULT=”net.ifnames=0 biosdevname=0 cpufreq.default_governor=performance”
GRUB_TERMINAL=”console serial”
GRUB_SERIAL_COMMAND=”serial –speed=115200 –unit=0 –word=8 –parity=no –stop=1″net.ifnames=0 & biosdevname=0: Disables predictable naming to force the classic eth0 convention. This ensures that manually injected Netplan configurations “hit” the correct interface without needing to probe for hardware-specific names (like enp0s3) during orchestration.
cpufreq.default_governor=performance: Eliminates CPU frequency scaling latency. The VM operates at maximum clock speed immediately upon boot, which is critical for consistent AI workload performance.
GRUB_CMDLINE_LINUX="": Kept empty to ensure the kernel parameters remain modular and easily overridable via the default string.
GRUB_TERMINAL="console serial": Dual-routes the bootloader output to both virtual VGA and the serial port. This allows orchestration logs to be captured via virsh console even if the VM is headless.
GRUB_SERIAL_COMMAND: Standardizes the serial interface at 115200 baud, ensuring that host-side monitoring scripts can reliably parse boot and kernel messages from the start.
Disable Nouveau (Blacklist)
Even if the driver isn’t fully active, Nouveau can sometimes “touch” the hardware during boot, which interferes with the NVIDIA driver installation or VFIO binding later.
Create a blacklist file: sudo nano /etc/modprobe.d/blacklist-nouveau.conf
Add these lines:
blacklist nouveau
options nouveau modeset=0
Load VFIO Modules at Boot
For the guest to properly handle the passed-through hardware, ensure the VFIO modules are loaded into the kernel early.
Open the modules file: sudo nano /etc/modules
Append these lines:
vfiovfio_iommu_type1vfio_pcivfio_virqfd
Final Image Sync
Since you’ve modified modules and blacklists, you must rebuild the initramfs and update GRUB to ensure these settings are baked into the early boot process:
sudo update-initramfs -u
sudo update-grub
reboot
15. Kernel Headers and Toolchain
apt install -y linux-headers-$(uname -r) build-essential dkms gcc make
16. GPU Drivers
Driver selection
- Use the Ubuntu-recommended NVIDIA driver (via
nvidia-detector) - Driver version 580 chosen for:
- NVIDIA L4 compatibility
- Stability on kernel 6.8
- Forward CUDA compatibility
Install driver and utilities
apt install -y nvidia-driver-580-server nvidia-utils-580-server
CUDA runtime is intentionally not installed system-wide.
17. Python Runtime, IO utility
Based on trial and errors on dependencies (specifically flash-attn / vllm) CUDA tool kit 12.8 was identified as ideal candidate.
# 1. Download the Keyring
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
# 2. Install it
dpkg -i cuda-keyring_1.1-1_all.deb
# 3. Update apt cache
apt update -y
apt upgrade -y
apt install -y cuda-toolkit-12-8
Add the following to /etc/bash.bashrc (after adding logout and login once)
export CUDA_HOME=/usr/local/cuda-12.8
# Update PATH: Put CUDA first to override system defaults
export PATH=${CUDA_HOME}/bin:${PATH}
apt install -y python3 python3-pip poppler-utils libaio-dev
18. Add environmental variables, Create Virtual Env
Create directories used by Hugging Face
mkdir -p /var/lib/huggingface/{hub,transformers}
chmod -R 755 /var/lib/huggingface
Add the following to /etc/environment
HF_HOME=/var/lib/huggingface
TRANSFORMERS_CACHE=/var/lib/huggingface/transformers
HF_HUB_CACHE=/var/lib/huggingface/hubHF_HOME=/var/lib/huggingface
TOKENIZERS_PARALLELISM=false
Create venv
# 1. Install the venv tool
apt install -y python3-venv
# 2. Create the environment
python3 -m venv /opt/ai-env
# Add activation to the end of the root bash profile
echo 'source /opt/ai-env/bin/activate' >> ~/.bashrc
# Reload the profile for the current session
source ~/.bashrc
19. Core Deep Learning Framework
PyTorch (CUDA 12.8 runtime via wheels)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
- PyTorch wheels ship the CUDA runtime internally.
- Avoids system-level CUDA toolkit dependency.
- Decouples framework upgrades from the OS image.
20. Scientific Python Stack
pip install numpy scipy pandas
21. Hugging Face Ecosystem
pip install transformers tokenizers safetensors peft vllm datasets
22. GPU-Aware Utilities
pip install packaging ninja wheel
pip install nvidia-cufile-cu12 accelerate bitsandbytes torch-c-dlpack-ext deepspeed
# Add to the end of /etc/bash.bashrc (for interactive shells)
# 1. Define the library path for the AI Virtual Environment
export CUFILE_LIB="/opt/ai-env/lib/python3.12/site-packages/nvidia/cufile/lib"
# 2. Add it to LD_LIBRARY_PATH so vLLM finds the GDS drivers
export LD_LIBRARY_PATH="${CUFILE_LIB}${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"
Compile Flash Attention : Theis operation takes a lot of time. To reduce time, I had allocated 24 full cores and 128G RAM so that I can set high concurrency of 10
export MAX_JOBS=1
pip install flash-attn
Configure Accelerate Run the configuration wizard.
accelerate config
Below are the key responses for a single-node, multi-GPU setup (or single GPU) using DeepSpeed.
(ai-env) root@gpunode:~# accelerate config
/opt/ai-env/lib/python3.12/site-packages/torch/cuda/__init__.py:1064: UserWarning: Can't initialize NVML
raw_cnt = _raw_device_count_nvml()
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------In which compute environment are you running?
This machine
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Which type of machine are you using?
No distributed training
Do you want to run your training on CPU only (even if a GPU / Apple Silicon / Ascend NPU device is available)? [yes/NO]:NO
Do you wish to optimize your script with torch dynamo?[yes/NO]:NO
Do you want to use DeepSpeed? [yes/NO]: NO
What GPU(s) (by id) should be used for training on this machine as a comma-separated list? [all]:all
Would you like to enable numa efficiency? (Currently only supported on NVIDIA hardware). [yes/NO]: NO
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Do you wish to use mixed precision?
bf16
accelerate configuration saved at /var/lib/huggingface/accelerate/default_config.yaml
(ai-env) root@gpunode:~#
23. Inference Server Clients
Install the standard client library to communicate with local inference engines.
pip install openai
24. I/O Utility / Dataprocessing
pip install pdf2image pillow
25. Web & API Infrastructure
pip install fastapi uvicorn[standard] python-multipart pydantic
26. Disk Resize on First Boot
Create /usr/local/bin/resizedisk:
#!/bin/bash
growpart /dev/vda 2
partx --update /dev/vda2
resize2fs /dev/vda2
systemctl stop guestfs-firstboot.service
systemctl disable guestfs-firstboot.service
rm -f /root/*.log
Set permissions:
chmod +x /usr/local/bin/resizedisk
27. Filesystem Optimization and Compaction
e4defrag /
fstrim -av
dd if=/dev/zero of=/zero.fill bs=1M status=progress
rm -f /zero.fill
fstrim -av
28. Final Cleanup and Shutdown
truncate -s 0 /etc/machine-id (Essential for DHCP—many servers use this ID rather than the MAC address to assign IPs).
SSH Host Keys: rm /etc/ssh/ssh_host_* (Ensures each VM generates its own unique fingerprint on first boot).
cat <<EOF > /etc/rc.local
#!/bin/bash
if [ ! -f /etc/ssh/ssh_host_rsa_key ]; then
ssh-keygen -A
systemctl restart ssh
fi
EOF
chmod +x /etc/rc.local
history -c
shutdown -h now
29. Export Base QCOW2 Image
Compress and finalize the base image:
virt-sparsify --compress \
/var/lib/libvirt/images/ubuntu24g.qcow2 \
/root/kvm-local/ubuntu24/baseg.qcow2