In a home lab that also serves as an R&D environment, frequent rebuilds, upgrades, and full reinstall cycles are inevitable. To avoid losing critical documentation, Docker repositories, source code, and other persistent assets, I decided to dedicate one physical server exclusively for management. This system now hosts all essential services and a lightweight K3s cluster used to test container images before pushing them to the production RKE2 environment.
The management server itself is a 10-bay Dell R630 configured for high reliability. It uses three 2.5″ HDDs in a RAID 5 array for the operating system and seven 1 TB SSDs in a RAID 10 virtual disk dedicated to running VMs. This layout provides both performance and redundancy, ensuring that critical data and essential services remain protected even in the event of disk failures.
From past experience with KVM hosts, I learned that VM virtual disks backed by hardware RAID do not always retain consistent device naming after a server reboot. This inconsistency can break automation and orchestration workflows. To address this, I created udev rules that match each disk’s ID_PART_ENTRY_UUID and map it to a stable, human-friendly device name that corresponds to the VM name. This ensures reliable device paths that can be safely used by the VM orchestration scripts.
This post captures all the steps involved—starting from the base OS installation—required to configure and prepare this server as the dedicated management backbone of the lab.
Remove snapd Completely
Snap introduces background services and timers not required for production VMs.
List installed snaps:
snap list
Remove them:
snap remove lxd
snap remove core20
snap remove snapd
Uninstall snapd:
apt purge --remove snapd
rm -rf /root/snap/
Disable Swap
Swap is not required for our workload profile.
Check swap units:
systemctl list-units | grep swap
Stop, disable, and mask all swap units:
systemctl stop swap.target
systemctl disable swap.target
systemctl mask swap.target
swapoff -a
rm -f /swap.img
Edit /etc/fstab and comment out any swap entries.
Configure NIC and Bridge Interfaces
To support high-performance networking and KVM virtualization, all NICs are configured with an MTU of 9000. The system includes four 10 GbE interfaces (Intel X710: eno1–eno4) and two 40 GbE interfaces (Mellanox ConnectX-3 Pro: enp3s0 and enp3s0d1). Since this server hosts multiple VMs, each physical interface is bridged to provide direct, high-throughput connectivity to the guests.
The first interface (eno1) serves as the management network and uses a /16 address. This is intentional because the default route (via 10.0.0.1, the UDM Pro security gateway) resides on the same network. All remaining interfaces are assigned /24 subnets so that east-west traffic stays localized and is switched internally by the Arista 7050QX.
Below is the complete netplan configuration:
# /etc/netplan/50-cloud-init.yaml
# This file is generated from cloud-init data. To disable cloud-init
# network configuration, create:
# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg
# with: network: {config: disabled}
network:
version: 2
ethernets:
eno1: { mtu: 9000 }
eno2: { mtU: 9000 }
eno3: { mtu: 9000 }
eno4: { mtu: 9000 }
enp3s0: { mtu: 9000 }
enp3s0d1: { mtu: 9000 }
bridges:
br1:
interfaces: [eno1]
addresses: [10.0.1.5/16]
routes:
- to: default
via: 10.0.0.1
parameters:
stp: false
forward-delay: 0
mtu: 9000
br2:
interfaces: [eno2]
addresses: [10.0.2.5/24]
parameters:
stp: false
forward-delay: 0
mtu: 9000
br3:
interfaces: [eno3]
addresses: [10.0.3.5/24]
parameters:
stp: false
forward-delay: 0
mtu: 9000
br4:
interfaces: [eno4]
addresses: [10.0.4.5/24]
parameters:
stp: false
forward-delay: 0
mtu: 9000
br5:
interfaces: [enp3s0]
addresses: [10.0.5.5/24]
parameters:
stp: false
forward-delay: 0
mtu: 9000
br6:
interfaces: [enp3s0d1]
addresses: [10.0.6.5/24]
parameters:
stp: false
forward-delay: 0
mtu: 9000
The resulting routing table reflects the designated default route and the isolated /24 subnets:
default via 10.0.0.1 dev br1 proto static
10.0.0.0/16 dev br1 proto kernel scope link src 10.0.1.5
10.0.2.0/24 dev br2 proto kernel scope link src 10.0.2.5
10.0.3.0/24 dev br3 proto kernel scope link src 10.0.3.5
10.0.4.0/24 dev br4 proto kernel scope link src 10.0.4.5
10.0.5.0/24 dev br5 proto kernel scope link src 10.0.5.5
10.0.6.0/24 dev br6 proto kernel scope link src 10.0.6.5
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown
Disable Cloud-Init Networking
Since the network configuration is fully managed through custom Netplan files, cloud-init’s networking component must be disabled to prevent it from overwriting settings on reboot. The following steps disable cloud-init networking and clean any previous state:
# Disable cloud-init from managing network configuration
sudo tee /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg > /dev/null <<EOF
network: {config: disabled}
EOF
# Clean cloud-init state and logs
sudo cloud-init clean --logs
Configure NTP
To ensure accurate time synchronization—critical for logs, certificates, KVM operations, and distributed systems—the server is configured to use Google’s public NTP service with Ubuntu’s NTP servers as fallback. The timezone is also set to match the local region.
# Set system timezone
sudo timedatectl set-timezone "Asia/Kolkata"
# Configure primary and fallback NTP servers
sudo sed -i "s/#NTP=/NTP=time.google.com/g" /etc/systemd/timesyncd.conf
sudo sed -i "s/#FallbackNTP=ntp.ubuntu.com/FallbackNTP=ntp.ubuntu.com/g" /etc/systemd/timesyncd.conf
# Reload and restart the time synchronization service
sudo systemctl daemon-reload
sudo systemctl stop systemd-timesyncd.service
sudo systemctl start systemd-timesyncd.service
You can verify synchronization using:
timedatectl status
Configure System Limits and Core Settings
Set Maximum Journal File Size
To prevent uncontrolled growth of systemd journal logs, configure a maximum file size of 512 MB:
sudo sed -i "s/#SystemMaxFileSize.*/SystemMaxFileSize=512M/g" /etc/systemd/journald.conf
Increase File Descriptor and Process Limits
For a server that runs KVM, containers, orchestration scripts, and various management services, increasing the maximum number of open files and processes is essential:
echo "* hard nofile 65536" | sudo tee -a /etc/security/limits.conf
echo "* soft nofile 65536" | sudo tee -a /etc/security/limits.conf
echo "* hard nproc 65536" | sudo tee -a /etc/security/limits.conf
echo "* soft nproc 65536" | sudo tee -a /etc/security/limits.conf
Disable Nouveau and Prepare for NVIDIA Drivers
This server uses NVIDIA GPUs, is part of a vGPU-enabled environment, blacklist the nouveau driver to avoid conflicts:
echo -e "blacklist nouveau\noptions nouveau modeset=0" | sudo tee /etc/modprobe.d/disable-nouveau.conf
sudo update-initramfs -u
sudo reboot
Disable Unwanted and Automated Services
To prevent background package updates, firmware refresh jobs, and automated MOTD tasks from interfering with lab automation or controlled upgrade cycles, disable and mask them:
sudo systemctl stop unattended-upgrades.service \
apt-daily-upgrade.timer apt-daily.timer \
fwupd-refresh.timer motd-news.timer \
update-notifier-download.timer update-notifier-motd.timer
sudo systemctl disable unattended-upgrades.service \
apt-daily-upgrade.timer apt-daily.timer \
fwupd-refresh.timer motd-news.timer \
update-notifier-download.timer update-notifier-motd.timer
sudo systemctl mask unattended-upgrades.service \
apt-daily-upgrade.timer apt-daily.timer \
fwupd-refresh.timer motd-news.timer \
update-notifier-download.timer update-notifier-motd.timer
Configuring udev Rules
### Step 1: Extract Partition UUIDs
For each partition in the VM storage RAID, extract the ID_PART_ENTRY_UUID value using udevadm:
udevadm info --query=all --name=/dev/sdb1 | grep "ID_PART_ENTRY_UUID"
udevadm info --query=all --name=/dev/sdb2 | grep "ID_PART_ENTRY_UUID"
udevadm info --query=all --name=/dev/sdb3 | grep "ID_PART_ENTRY_UUID"
udevadm info --query=all --name=/dev/sdb4 | grep "ID_PART_ENTRY_UUID"
udevadm info --query=all --name=/dev/sdb5 | grep "ID_PART_ENTRY_UUID"
udevadm info --query=all --name=/dev/sdb6 | grep "ID_PART_ENTRY_UUID"
Sample output (one per partition):
E: ID_PART_ENTRY_UUID=7e50d064-c0d1-2f48-bc2b-c22bf8a98933
E: ID_PART_ENTRY_UUID=d7276c4a-c944-8249-95fc-5f5845ad8d9e
E: ID_PART_ENTRY_UUID=f8f0bac9-34d0-5a46-b6c0-80a7bed863a1
E: ID_PART_ENTRY_UUID=9aa79a3f-e05a-1c4b-bb81-acd8712e2f8b
E: ID_PART_ENTRY_UUID=b1016c48-2878-3d4d-9933-c3f103450c06
E: ID_PART_ENTRY_UUID=a483e696-afd5-3d43-a8aa-690f125ac70e
Each UUID uniquely identifies a partition and remains stable across reboots.
### Step 2: Create Custom udev Rules
Create the rules file:
/etc/udev/rules.d/99-kvm-storage.rules
Add the following mappings to create stable, human-readable device names:
SUBSYSTEM=="block", ENV{ID_PART_ENTRY_UUID}=="7e50d064-c0d1-2f48-bc2b-c22bf8a98933", SYMLINK+="mirror"
SUBSYSTEM=="block", ENV{ID_PART_ENTRY_UUID}=="d7276c4a-c944-8249-95fc-5f5845ad8d9e", SYMLINK+="mdb"
SUBSYSTEM=="block", ENV{ID_PART_ENTRY_UUID}=="f8f0bac9-34d0-5a46-b6c0-80a7bed863a1", SYMLINK+="git"
SUBSYSTEM=="block", ENV{ID_PART_ENTRY_UUID}=="9aa79a3f-e05a-1c4b-bb81-acd8712e2f8b", SYMLINK+="dcm"
SUBSYSTEM=="block", ENV{ID_PART_ENTRY_UUID}=="b1016c48-2878-3d4d-9933-c3f103450c06", SYMLINK+="web"
SUBSYSTEM=="block", ENV{ID_PART_ENTRY_UUID}=="a483e696-afd5-3d43-a8aa-690f125ac70e", SYMLINK+="registry"
This creates persistent symlinks that can be safely used by VM orchestration scripts:
/dev/mirror
/dev/mdb
/dev/git
/dev/dcm
/dev/web
/dev/registry
### Step 3: Apply the udev Rules
Apply the new rules and trigger them:
sudo udevadm control --reload-rules
sudo udevadm trigger
You can verify the symlinks with:
ls -l /dev | grep -E "mirror|mdb|git|dcm|web|registry"
Remove Unwanted Packages
Since this server is dedicated to lab management and does not require Canonical’s Ubuntu Pro or Ubuntu Advantage subscription tooling, these packages can be safely removed to reduce noise and background processes:
sudo apt purge ubuntu-advantage-tools ubuntu-pro-client*
sudo apt autoremove --purge -y
This keeps the system lean and prevents unwanted prompts or background checks related to subscription features.
Disable Transparent Huge Pages (THP)
Transparent Huge Pages (THP) can introduce unpredictable latency in database workloads, especially on systems running PostgreSQL, MongoDB, Redis, or other latency-sensitive services. THP tries to automatically merge standard 4 KB pages into 2 MB huge pages, but this background merging/compaction can cause stalls that negatively affect consistent performance. PostgreSQL does not benefit from THP and instead prefers madvise/regular huge pages only when explicitly configured.
For a standalone PostgreSQL server doing Git + AI/ML metadata workloads, disabling THP helps maintain consistent response times, reduce jitter, and avoid CPU stalls caused by THP compaction, especially under heavy write or mixed workloads.
Create the service
sudo nano /etc/systemd/system/disable-thp.service
Paste:
[Unit]
Description=Disable Transparent Huge Pages
After=sysinit.target local-fs.target
[Service]
Type=oneshot
ExecStart=/bin/sh -c "echo never > /sys/kernel/mm/transparent_hugepage/enabled"
ExecStart=/bin/sh -c "echo never > /sys/kernel/mm/transparent_hugepage/defrag"
[Install]
WantedBy=multi-user.target
Enable the service
sudo systemctl daemon-reload
sudo systemctl enable --now disable-thp.service
Installing the NVIDIA vGPU Host Driver
Note:
This development server includes a basic NVIDIA vGPU-capable card (Tesla P4) installed specifically for proof-of-concept testing. The steps below document the vGPU host driver installation process required to enable mediated device creation and vGPU profiles on this GPU.
### Step 1: Install Required Packages
Install the dependencies needed for building kernel modules and enabling mediated devices:
sudo apt install -y lvm2 linux-headers-$(uname -r) build-essential dkms mdevctl unzip
### Step 2: Download the vGPU Driver Package
Download the NVIDIA vGPU bundle from:
https://ui.licensing.nvidia.com/software
Use the filters:
- Product Type: Ubuntu KVM
- Product Version: 16.11 (vGPU release branch 535)
Example downloaded file:
NVIDIA-GRID-Ubuntu-KVM-535.261.04-535.261.03-539.41.zip
### Step 3: Extract the vGPU Package
unzip NVIDIA-GRID-Ubuntu-KVM-535.261.04-535.261.03-539.41.zip
You will see:
Guest_Drivers/
Host_Drivers/
Signing_Keys/
*.pdf documents
### Step 4: Install the vGPU Host Driver
Navigate to the host driver directory:
cd Host_Drivers/
ls -ltr
You should see:
nvidia-vgpu-ubuntu-535_535.261.04_amd64.deb
Install it:
sudo dpkg -i nvidia-vgpu-ubuntu-535_535.261.04_amd64.deb
Enable vGPU support:
echo "options nvidia NVreg_EnableVGPU=1" | sudo tee /etc/modprobe.d/nvidia-vgpu.conf
### Step 5: Load Required Kernel Modules
Load mdev and vfio kernel modules:
sudo modprobe mdev
sudo modprobe vfio
sudo modprobe vfio_pci
Ensure they load on boot:
echo "mdev" | sudo tee -a /etc/modules-load.d/modules.conf
echo "vfio" | sudo tee -a /etc/modules-load.d/modules.conf
echo "vfio_pci" | sudo tee -a /etc/modules-load.d/modules.conf
### Step 6: Reboot and Verify Driver Installation
Reboot:
sudo reboot
Verify:
nvidia-smi
You should see the Tesla P4 recognized and the vGPU driver loaded successfully.
Creating vGPU Instances
To create an instance using the nvidia-66 profile (8 GB vGPU profile for Tesla P4):
echo "850989c4-26d1-4029-bdbb-245070cd137c" \
| sudo tee /sys/class/mdev_bus/0000:82:00.0/mdev_supported_types/nvidia-66/create
The UUID returned corresponds to the created vGPU device.
### Step 7: Configure Mellanox mlx4_core Module (if applicable)
This server uses Mellanox ConnectX-3 Pro adapters, configure the driver to operate both ports in Ethernet mode. Create the following module configuration file:
sudo tee /etc/modprobe.d/mlx4_core.conf > /dev/null <<EOF
options mlx4_core port_type_array=2,2
EOF
Update initramfs so the setting is applied on the next reboot:
Meaning:
port_type_array=1,1→ Infinibandport_type_array=2,2→ Ethernet (what is required in the setup)
sudo update-initramfs -u
This ensures both Mellanox ports consistently initialize in Ethernet mode, preventing mode-mismatch issues during early boot and before systemd loads network configuration.