This document describes the process for creating a reusable, deterministic Ubuntu 24.04 QCOW2 base image optimized for GPU-accelerated AI workloads. The resulting image is intended to be cloned and customized by automation/orchestration tooling.
1. Base OS Installation
- Operating System: Ubuntu Server 24.04 LTS
- Kernel: GA kernel (6.8.0) — avoid HWE
- Disk Size: 50 GB (future-proof; base image only)
Why avoid the HWE kernel in the base image
- GA kernel (6.8.0) provides maximum stability and predictable behavior.
- Reduces DKMS rebuild failures during NVIDIA driver installation.
- Avoids kernel churn across clones.fstrin
2. System Update
apt update -y
apt upgrade -y
3. SSH Configuration (Remote Root Access)
Update SSH daemon configuration
Edit /etc/ssh/sshd_config and ensure the following are enabled:
PermitRootLogin yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys .ssh/authorized_keys2
Update SSH client configuration
Edit /etc/ssh/ssh_config:
StrictHostKeyChecking no
Set root password
passwd
Generate SSH key (optional, for automation)
ssh-keygen
4. Clean Login Noise (MOTD)
Disable MOTD messages
Edit /etc/pam.d/ssh and comment out:
#session optional pam_motd.so motd=/run/motd.dynamic
#session optional pam_motd.so noupdate
#session optional pam_mail.so standard noenv
Disable MOTD news
Edit /etc/default/motd-news:
ENABLED=0
5. Remove Snap and snapd
snap list
snap remove lxd
snap remove core20
snap remove snapd
apt purge --remove snapd
rm -rf /root/snap/
6. Disable Swap (AI-friendly, deterministic memory behavior)
systemctl list-units | grep swap
systemctl stop swap.img.swap swap.target
systemctl disable swap.img.swap swap.target
systemctl mask swap.img.swap swap.target
swapoff -a
rm -f /swap.img
Remove swap entries from /etc/fstab.
7. Deterministic Network Interface Naming/CPU Governor Configuration
Edit /etc/default/grub:
GRUB_CMDLINE_LINUX="net.ifnames=0 biosdevname=0 cpufreq.default_governor=performance"
Apply and reboot:
update-grub
reboot
8. Cloud-Init Networking Cleanup
rm -f /etc/cloud/cloud.cfg.d/90-installer-network.cfg
Disable cloud-init networking entirely:
echo "network: {config: disabled}" > /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg
9. Limit Systemd Journal Size
sed -i 's/#SystemMaxFileSize=.*/SystemMaxFileSize=512M/' /etc/systemd/journald.conf
10. systemd-resolved Configuration
Edit /etc/systemd/resolved.conf:
DNS=<Your DNS Server IP>
FallbackDNS=8.8.8.8
Domains=<Your domain>
DNSStubListener=no
Apply:
ln -fs /run/systemd/resolve/resolv.conf /etc/resolv.conf
systemctl restart systemd-resolved
11. Install Base Utilities
apt install -y \
net-tools rsyslog bc fio iperf3 gnupg2 \
software-properties-common lvm2 nfs-common jq
12. Disable Unwanted Timers
systemctl list-units | grep timer
Disable:
systemctl stop apt-daily-upgrade.timer apt-daily.timer fwupd-refresh.timer \
motd-news.timer update-notifier-download.timer update-notifier-motd.timer
systemctl disable apt-daily-upgrade.timer apt-daily.timer fwupd-refresh.timer \
motd-news.timer update-notifier-download.timer update-notifier-motd.timer
systemctl mask apt-daily-upgrade.timer apt-daily.timer fwupd-refresh.timer \
motd-news.timer update-notifier-download.timer update-notifier-motd.timer
13. Disable Unattended Services
systemctl stop unattended-upgrades.service apparmor ufw ubuntu-advantage-tools
systemctl disable unattended-upgrades.service apparmor ufw ubuntu-advantage-tools
systemctl mask unattended-upgrades.service apparmor ufw ubuntu-advantage-tools
14. Kernel Headers and Toolchain
apt install -y linux-headers-$(uname -r) build-essential dkms gcc make
15. GPU Drivers
Driver selection
- Use the Ubuntu-recommended NVIDIA driver (via
nvidia-detector) - Driver version 580 chosen for:
- NVIDIA L4 compatibility
- Stability on kernel 6.8
- Forward CUDA compatibility
Install driver and utilities
apt install -y nvidia-driver-580-server nvidia-utils-580-server
CUDA runtime is intentionally not installed system-wide.
16. Python Runtime, IO utility
Based on trial and errors on dependencies (specifically flash-attn / vllm) CUDA tool kit 12.6 was identified as ideal candidate.
# 1. Download the Keyring
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
# 2. Install it
dpkg -i cuda-keyring_1.1-1_all.deb
# 3. Update apt cache
apt update -y
apt upgrade -y
apt install -y cuda-toolkit-12-6
Add the following to /etc/bash.bashrc (after adding logout and login once)
export CUDA_HOME=/usr/local/cuda-12.6
# Update PATH: Put CUDA first to override system defaults
export PATH=${CUDA_HOME}/bin:${PATH}
apt install -y python3 python3-pip poppler-utils libaio-dev
17. Add environmental variables, Create Virtual Env
Create directories used by Hugging Face
mkdir -p /var/lib/huggingface/{hub,transformers}
chmod -R 755 /var/lib/huggingface
Add the following to /etc/environment
HF_HOME=/var/lib/huggingface
TRANSFORMERS_CACHE=/var/lib/huggingface/transformers
HF_HUB_CACHE=/var/lib/huggingface/hubHF_HOME=/var/lib/huggingface
TOKENIZERS_PARALLELISM=false
Create venv
# 1. Install the venv tool
apt install -y python3-venv
# 2. Create the environment
python3 -m venv /opt/ai-env
# Add activation to the end of the root bash profile
echo 'source /opt/ai-env/bin/activate' >> ~/.bashrc
# Reload the profile for the current session
source ~/.bashrc
18. Core Deep Learning Framework
PyTorch (CUDA 12.4 runtime via wheels)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126
- PyTorch wheels ship the CUDA runtime internally.
- Avoids system-level CUDA toolkit dependency.
- Decouples framework upgrades from the OS image.
19. Scientific Python Stack
pip install numpy scipy pandas
20. Hugging Face Ecosystem
pip install transformers tokenizers safetensors peft vllm datasets
21. GPU-Aware Utilities
pip install packaging ninja wheel
pip install nvidia-cufile-cu12 accelerate bitsandbytes torch-c-dlpack-ext deepspeed
# Add to the end of /etc/bash.bashrc (for interactive shells)
# 1. Define the library path for the AI Virtual Environment
export CUFILE_LIB="/opt/ai-env/lib/python3.12/site-packages/nvidia/cufile/lib"
# 2. Add it to LD_LIBRARY_PATH so vLLM finds the GDS drivers
export LD_LIBRARY_PATH="${CUFILE_LIB}${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"
Compile Flash Attention : Theis operation takes a lot of time. To reduce time, I had allocated 24 full cores and 128G RAM so that I can set high concurrency of 10
export MAX_JOBS=1
pip install flash-attn
Configure Accelerate Run the configuration wizard.
accelerate config
Below are the key responses for a single-node, multi-GPU setup (or single GPU) using DeepSpeed.
(ai-env) root@gpunode:~# accelerate config
/opt/ai-env/lib/python3.12/site-packages/torch/cuda/__init__.py:1064: UserWarning: Can't initialize NVML
raw_cnt = _raw_device_count_nvml()
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------In which compute environment are you running?
This machine
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Which type of machine are you using?
No distributed training
Do you want to run your training on CPU only (even if a GPU / Apple Silicon / Ascend NPU device is available)? [yes/NO]:NO
Do you wish to optimize your script with torch dynamo?[yes/NO]:NO
Do you want to use DeepSpeed? [yes/NO]: NO
What GPU(s) (by id) should be used for training on this machine as a comma-separated list? [all]:all
Would you like to enable numa efficiency? (Currently only supported on NVIDIA hardware). [yes/NO]: NO
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Do you wish to use mixed precision?
bf16
accelerate configuration saved at /var/lib/huggingface/accelerate/default_config.yaml
(ai-env) root@gpunode:~#
22. Inference Server Clients
Install the standard client library to communicate with local inference engines.
pip install openai
23. I/O Utility / Dataprocessing
pip install pdf2image pillow
24. Web & API Infrastructure
pip install fastapi uvicorn[standard] python-multipart pydantic
25. Disk Resize on First Boot
Create /usr/local/bin/resizedisk:
#!/bin/bash
growpart /dev/vda 2
partx --update /dev/vda2
resize2fs /dev/vda2
systemctl stop guestfs-firstboot.service
systemctl disable guestfs-firstboot.service
rm -f /root/*.log
Set permissions:
chmod +x /usr/local/bin/resizedisk
26. Filesystem Optimization and Compaction
e4defrag /
fstrim -av
dd if=/dev/zero of=/zero.fill bs=1M status=progress
rm -f /zero.fill
fstrim -av
27. Final Cleanup and Shutdown
history -c
shutdown -h now
28. Export Base QCOW2 Image
Compress and finalize the base image:
virt-sparsify --compress \
/var/lib/libvirt/images/ubuntu22.qcow2 \
/root/kvm-local/ubuntu22/base.qcow2