1. Why PCI Pass-through?
The primary goal is to use the Tesla P4 at 100% capacity within a Virtual Machine for high-speed document parsing/inference without the complexity of vGPU.
- Zero Licensing: PCI Pass-through is native and free, bypassing the need for NVIDIA License System (NLS) servers.
- Lean Host: The Host remains a “carrier,” avoiding proprietary drivers that could break during kernel updates.
2. Host Configuration
We isolate the GPU hardware, so the Guest VM can claim it exclusively.
Identify the Hardware
# Locate the card's address and Vendor ID
lspci -nnk | grep -i nvidia
- Address:
82:00.0(The physical PCI slot on the motherboard). - Vendor ID:
[10de:1bb3](The “DNA” of the Tesla P4).
Kernel & GRUB Setup
We use the 6.8 HWE kernel for modern VFIO stability.
sudo apt install --install-recommends linux-generic-hwe-22.04 -y
Edit /etc/default/grub and add the isolation parameters: GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on iommu=pt vfio-pci.ids=10de:1bb3"
intel_iommu=on: Enables hardware memory isolation.iommu=pt: Direct “pass-through” for better performance.vfio-pci.ids: Locks the card to the VFIO driver at boot.
Apply changes:
sudo update-grub
sudo reboot
3. The “Intelligent” Guest Image (.qcow2)
This image automatically activates the GPU only if it is attached to the VM.
Install Guest Drivers
sudo apt install -y nvidia-driver-535-server nvidia-utils-535-server nvidia-cuda-toolkit
The Auto-Activation Tripwire
We use two Systemd files to manage the driver.
File 1: /etc/systemd/system/gpu-activate.service This script unmasks and initializes the persistence daemon.
[Unit]
Description=Initialize Tesla P4 for Inference
[Service]
Type=oneshot
ExecStartPre=/usr/bin/systemctl unmask nvidia-persistenced.service
ExecStart=/usr/bin/systemctl start nvidia-persistenced.service
ExecStartPost=/usr/bin/nvidia-smi -pm 1
RemainAfterExit=yes
File 2: /etc/systemd/system/gpu-activate.path This “watcher” triggers the service only when the GPU device file appears.
[Unit]
Description=Watch for NVIDIA GPU Device Creation
[Path]
PathExists=/dev/nvidia0
[Install]
WantedBy=multi-user.target
Final Setup in Guest:
# Ensure drivers don't run on non-GPU VMs
sudo systemctl mask nvidia-persistenced
# Enable only the watcher
sudo systemctl enable gpu-activate.path
# Keep the service itself disabled (the path unit will trigger it)
sudo systemctl disable gpu-activate.service
4. Cleaning the Base Image
Before converting to a template, we “seal” the VM to prevent identity leaks.
# Clear machine-ids to avoid DHCP/Network conflicts
sudo truncate -s 0 /etc/machine-id
sudo rm /var/lib/dbus/machine-id
# Clear logs and shell history
history -c
sudo rm -rf /var/log/*.log
5. Notes on Compression
If virt-sparsify fails with a libguestfs error, it is likely due to restricted kernel permissions on Ubuntu.
# Grant read permissions to the host kernel
sudo chmod 0644 /boot/vmlinuz*
sudo chmod 0644 /boot/initrd.img-6.8.0-90-generic
# Compress the final image using the direct backend
export LIBGUESTFS_BACKEND=direct
virt-sparsify --compress /source/ubuntu22g.qcow2 /destination/baseg.qcow2
6. Libvirt XML Configuration
Attach the device to your VM via virsh edit:
<hostdev mode='subsystem' type='pci' managed='yes'>
<driver name='vfio'/>
<source>
<address domain='0x0000' bus='0x82' slot='0x00' function='0x0'/>
</source>
</hostdev>