Custom Cloud Image for AI Workloads – Home Lab

This document describes the process for creating a reusable, deterministic Ubuntu 24.04 QCOW2 base image optimized for GPU-accelerated AI workloads. The resulting image is intended to be cloned and customized by automation/orchestration tooling.

1. Base OS Installation

Operating System: Ubuntu Server 24.04 LTS
Kernel: GA kernel (6.8.0) — avoid HWE
Disk Size: 50 GB (future-proof; base image only)

Why avoid the HWE kernel in the base image

GA kernel (6.8.0) provides maximum stability and predictable behavior.
Reduces DKMS rebuild failures during NVIDIA driver installation.
Avoids kernel churn across clones.fstrin

2. System Update

apt update -y
apt upgrade -y

3. SSH Configuration (Remote Root Access)

Update SSH daemon configuration

Edit /etc/ssh/sshd_config and ensure the following are enabled:

PermitRootLogin yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys .ssh/authorized_keys2

Update SSH client configuration

Edit /etc/ssh/ssh_config:

StrictHostKeyChecking no

Set root password

passwd

Generate SSH key (optional, for automation)

ssh-keygen

4. Clean Login Noise (MOTD)

Disable MOTD messages

Edit /etc/pam.d/ssh and comment out:

#session optional pam_motd.so motd=/run/motd.dynamic
#session optional pam_motd.so noupdate
#session optional pam_mail.so standard noenv

Disable MOTD news

Edit /etc/default/motd-news:

ENABLED=0

5. Remove Snap and snapd

snap list
snap remove lxd
snap remove core20
snap remove snapd
apt purge --remove snapd
rm -rf /root/snap/

6. Disable Swap (AI-friendly, deterministic memory behavior)

systemctl list-units | grep swap
systemctl stop swap.img.swap swap.target
systemctl disable swap.img.swap swap.target
systemctl mask swap.img.swap swap.target
swapoff -a
rm -f /swap.img

Remove swap entries from /etc/fstab.

7. Deterministic Network Interface Naming/CPU Governor Configuration

Edit /etc/default/grub:

GRUB_CMDLINE_LINUX="net.ifnames=0 biosdevname=0 cpufreq.default_governor=performance"

Apply and reboot:

update-grub
reboot

8. Cloud-Init Networking Cleanup

rm -f /etc/cloud/cloud.cfg.d/90-installer-network.cfg

Disable cloud-init networking entirely:

echo "network: {config: disabled}" > /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg

9. Limit Systemd Journal Size

sed -i 's/#SystemMaxFileSize=.*/SystemMaxFileSize=512M/' /etc/systemd/journald.conf

10. systemd-resolved Configuration

Edit /etc/systemd/resolved.conf:

DNS=<Your DNS Server IP>
FallbackDNS=8.8.8.8
Domains=<Your domain>
DNSStubListener=no

Apply:

ln -fs /run/systemd/resolve/resolv.conf /etc/resolv.conf
systemctl restart systemd-resolved

11. Install Base Utilities

apt install -y \
  net-tools rsyslog bc fio iperf3 gnupg2 \
  software-properties-common lvm2 nfs-common jq

12. Disable Unwanted Timers

systemctl list-units | grep timer

Disable:

systemctl stop apt-daily-upgrade.timer apt-daily.timer fwupd-refresh.timer \
               motd-news.timer update-notifier-download.timer update-notifier-motd.timer

systemctl disable apt-daily-upgrade.timer apt-daily.timer fwupd-refresh.timer \
                  motd-news.timer update-notifier-download.timer update-notifier-motd.timer

systemctl mask apt-daily-upgrade.timer apt-daily.timer fwupd-refresh.timer \
             motd-news.timer update-notifier-download.timer update-notifier-motd.timer

13. Disable Unattended Services

systemctl stop unattended-upgrades.service apparmor ufw ubuntu-advantage-tools
systemctl disable unattended-upgrades.service apparmor ufw ubuntu-advantage-tools
systemctl mask unattended-upgrades.service apparmor ufw ubuntu-advantage-tools

14. Kernel Headers and Toolchain

apt install -y linux-headers-$(uname -r) build-essential dkms gcc make

15. GPU Drivers

Driver selection

Use the Ubuntu-recommended NVIDIA driver (via nvidia-detector)
Driver version 580 chosen for:
- NVIDIA L4 compatibility
- Stability on kernel 6.8
- Forward CUDA compatibility

Install driver and utilities

apt install -y nvidia-driver-580-server nvidia-utils-580-server

CUDA runtime is intentionally not installed system-wide.

16. Python Runtime, IO utility

Based on trial and errors on dependencies (specifically flash-attn / vllm) CUDA tool kit 12.6 was identified as ideal candidate.

# 1. Download the Keyring
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb

# 2. Install it
dpkg -i cuda-keyring_1.1-1_all.deb

# 3. Update apt cache
apt update -y
apt upgrade -y

apt install -y cuda-toolkit-12-6

Add the following to /etc/bash.bashrc (after adding logout and login once)

export CUDA_HOME=/usr/local/cuda-12.6

# Update PATH: Put CUDA first to override system defaults
export PATH=${CUDA_HOME}/bin:${PATH}

apt install -y python3 python3-pip poppler-utils libaio-dev

17. Add environmental variables, Create Virtual Env

Create directories used by Hugging Face

mkdir -p /var/lib/huggingface/{hub,transformers}

chmod -R 755 /var/lib/huggingface

Add the following to /etc/environment

HF_HOME=/var/lib/huggingface
TRANSFORMERS_CACHE=/var/lib/huggingface/transformers
HF_HUB_CACHE=/var/lib/huggingface/hubHF_HOME=/var/lib/huggingface
TOKENIZERS_PARALLELISM=false

Create venv

# 1. Install the venv tool
apt install -y python3-venv 

# 2. Create the environment
python3 -m venv /opt/ai-env

# Add activation to the end of the root bash profile
echo 'source /opt/ai-env/bin/activate' >> ~/.bashrc

# Reload the profile for the current session
source ~/.bashrc

18. Core Deep Learning Framework

PyTorch (CUDA 12.4 runtime via wheels)

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126

PyTorch wheels ship the CUDA runtime internally.
Avoids system-level CUDA toolkit dependency.
Decouples framework upgrades from the OS image.

19. Scientific Python Stack

pip install numpy scipy pandas

20. Hugging Face Ecosystem

pip install transformers tokenizers safetensors peft vllm datasets

21. GPU-Aware Utilities

pip install packaging ninja wheel
pip install nvidia-cufile-cu12 accelerate bitsandbytes torch-c-dlpack-ext deepspeed 

# Add to the end of /etc/bash.bashrc (for interactive shells)

# 1. Define the library path for the AI Virtual Environment
export CUFILE_LIB="/opt/ai-env/lib/python3.12/site-packages/nvidia/cufile/lib"

# 2. Add it to LD_LIBRARY_PATH so vLLM finds the GDS drivers
export LD_LIBRARY_PATH="${CUFILE_LIB}${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"

Compile Flash Attention : Theis operation takes a lot of time.  To reduce time, I had allocated 24 full cores and 128G RAM so that I can set high concurrency of 10

export MAX_JOBS=1
pip install flash-attn

Configure Accelerate Run the configuration wizard.

accelerate config

Below are the key responses for a single-node, multi-GPU setup (or single GPU) using DeepSpeed.

(ai-env) root@gpunode:~# accelerate config
/opt/ai-env/lib/python3.12/site-packages/torch/cuda/__init__.py:1064: UserWarning: Can't initialize NVML
  raw_cnt = _raw_device_count_nvml()
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------In which compute environment are you running?
This machine
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Which type of machine are you using?
No distributed training
Do you want to run your training on CPU only (even if a GPU / Apple Silicon / Ascend NPU device is available)? [yes/NO]:NO
Do you wish to optimize your script with torch dynamo?[yes/NO]:NO
Do you want to use DeepSpeed? [yes/NO]: NO
What GPU(s) (by id) should be used for training on this machine as a comma-separated list? [all]:all
Would you like to enable numa efficiency? (Currently only supported on NVIDIA hardware). [yes/NO]: NO
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Do you wish to use mixed precision?
bf16
accelerate configuration saved at /var/lib/huggingface/accelerate/default_config.yaml
(ai-env) root@gpunode:~#

22. Inference Server Clients

Install the standard client library to communicate with local inference engines.

pip install openai

23. I/O Utility / Dataprocessing

pip install pdf2image pillow

24. Web & API Infrastructure

pip install fastapi uvicorn[standard] python-multipart pydantic

25. Disk Resize on First Boot

Create /usr/local/bin/resizedisk:

#!/bin/bash
growpart /dev/vda 2
partx --update /dev/vda2
resize2fs /dev/vda2
systemctl stop guestfs-firstboot.service
systemctl disable guestfs-firstboot.service
rm -f /root/*.log

Set permissions:

chmod +x /usr/local/bin/resizedisk

26. Filesystem Optimization and Compaction

e4defrag /
fstrim -av
dd if=/dev/zero of=/zero.fill bs=1M status=progress
rm -f /zero.fill
fstrim -av

27. Final Cleanup and Shutdown

history -c
shutdown -h now

28. Export Base QCOW2 Image

Compress and finalize the base image:

virt-sparsify --compress \
  /var/lib/libvirt/images/ubuntu22.qcow2 \
  /root/kvm-local/ubuntu22/base.qcow2