Management Server for Home Lab

In a home lab that also serves as an R&D environment, frequent rebuilds, upgrades, and full reinstall cycles are inevitable. To avoid losing critical documentation, Docker repositories, source code, and other persistent assets, I decided to dedicate one physical server exclusively for management. This system now hosts all essential services and a lightweight K3s cluster used to test container images before pushing them to the production RKE2 environment.

The management server itself is a 10-bay Dell R630 configured for high reliability. It uses three 2.5″ HDDs in a RAID 5 array for the operating system and seven 1 TB SSDs in a RAID 10 virtual disk dedicated to running VMs. This layout provides both performance and redundancy, ensuring that critical data and essential services remain protected even in the event of disk failures.

From past experience with KVM hosts, I learned that VM virtual disks backed by hardware RAID do not always retain consistent device naming after a server reboot. This inconsistency can break automation and orchestration workflows. To address this, I created udev rules that match each disk’s ID_PART_ENTRY_UUID and map it to a stable, human-friendly device name that corresponds to the VM name. This ensures reliable device paths that can be safely used by the VM orchestration scripts.

This post captures all the steps involved—starting from the base OS installation—required to configure and prepare this server as the dedicated management backbone of the lab.

Remove snapd Completely

Snap introduces background services and timers not required for production VMs.

List installed snaps:

snap list

Remove them:

snap remove lxd
snap remove core20
snap remove snapd

Uninstall snapd:

apt purge --remove snapd
rm -rf /root/snap/

Disable Swap

Swap is not required for our workload profile.

Check swap units:

systemctl list-units | grep swap

Stop, disable, and mask all swap units:

systemctl stop swap.target
systemctl disable swap.target
systemctl mask swap.target
swapoff -a
rm -f /swap.img

Edit /etc/fstab and comment out any swap entries.

Configure NIC and Bridge Interfaces

To support high-performance networking and KVM virtualization, all NICs are configured with an MTU of 9000. The system includes four 10 GbE interfaces (Intel X710: eno1–eno4) and two 40 GbE interfaces (Mellanox ConnectX-3 Pro: enp3s0 and enp3s0d1). Since this server hosts multiple VMs, each physical interface is bridged to provide direct, high-throughput connectivity to the guests.

The first interface (eno1) serves as the management network and uses a /16 address. This is intentional because the default route (via 10.0.0.1, the UDM Pro security gateway) resides on the same network. All remaining interfaces are assigned /24 subnets so that east-west traffic stays localized and is switched internally by the Arista 7050QX.

Below is the complete netplan configuration:

# /etc/netplan/50-cloud-init.yaml
# This file is generated from cloud-init data. To disable cloud-init
# network configuration, create:
# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg
# with: network: {config: disabled}

network:
  version: 2
  ethernets:
    eno1: { mtu: 9000 }
    eno2: { mtU: 9000 }
    eno3: { mtu: 9000 }
    eno4: { mtu: 9000 }
    enp3s0: { mtu: 9000 }
    enp3s0d1: { mtu: 9000 }

  bridges:
    br1:
      interfaces: [eno1]
      addresses: [10.0.1.5/16]
      routes:
        - to: default
          via: 10.0.0.1
      parameters:
        stp: false
        forward-delay: 0
      mtu: 9000

    br2:
      interfaces: [eno2]
      addresses: [10.0.2.5/24]
      parameters:
        stp: false
        forward-delay: 0
      mtu: 9000

    br3:
      interfaces: [eno3]
      addresses: [10.0.3.5/24]
      parameters:
        stp: false
        forward-delay: 0
      mtu: 9000

    br4:
      interfaces: [eno4]
      addresses: [10.0.4.5/24]
      parameters:
        stp: false
        forward-delay: 0
      mtu: 9000

    br5:
      interfaces: [enp3s0]
      addresses: [10.0.5.5/24]
      parameters:
        stp: false
        forward-delay: 0
      mtu: 9000

    br6:
      interfaces: [enp3s0d1]
      addresses: [10.0.6.5/24]
      parameters:
        stp: false
        forward-delay: 0
      mtu: 9000

The resulting routing table reflects the designated default route and the isolated /24 subnets:

default via 10.0.0.1 dev br1 proto static
10.0.0.0/16 dev br1 proto kernel scope link src 10.0.1.5
10.0.2.0/24 dev br2 proto kernel scope link src 10.0.2.5
10.0.3.0/24 dev br3 proto kernel scope link src 10.0.3.5
10.0.4.0/24 dev br4 proto kernel scope link src 10.0.4.5
10.0.5.0/24 dev br5 proto kernel scope link src 10.0.5.5
10.0.6.0/24 dev br6 proto kernel scope link src 10.0.6.5
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown

Disable Cloud-Init Networking

Since the network configuration is fully managed through custom Netplan files, cloud-init’s networking component must be disabled to prevent it from overwriting settings on reboot. The following steps disable cloud-init networking and clean any previous state:

# Disable cloud-init from managing network configuration
sudo tee /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg > /dev/null <<EOF
network: {config: disabled}
EOF

# Clean cloud-init state and logs
sudo cloud-init clean --logs

Configure NTP

To ensure accurate time synchronization—critical for logs, certificates, KVM operations, and distributed systems—the server is configured to use Google’s public NTP service with Ubuntu’s NTP servers as fallback. The timezone is also set to match the local region.

# Set system timezone
sudo timedatectl set-timezone "Asia/Kolkata"

# Configure primary and fallback NTP servers
sudo sed -i "s/#NTP=/NTP=time.google.com/g" /etc/systemd/timesyncd.conf
sudo sed -i "s/#FallbackNTP=ntp.ubuntu.com/FallbackNTP=ntp.ubuntu.com/g" /etc/systemd/timesyncd.conf

# Reload and restart the time synchronization service
sudo systemctl daemon-reload
sudo systemctl stop systemd-timesyncd.service
sudo systemctl start systemd-timesyncd.service

You can verify synchronization using:

timedatectl status

Configure System Limits and Core Settings

Set Maximum Journal File Size

To prevent uncontrolled growth of systemd journal logs, configure a maximum file size of 512 MB:

sudo sed -i "s/#SystemMaxFileSize.*/SystemMaxFileSize=512M/g" /etc/systemd/journald.conf

Increase File Descriptor and Process Limits

For a server that runs KVM, containers, orchestration scripts, and various management services, increasing the maximum number of open files and processes is essential:

echo "* hard nofile 65536" | sudo tee -a /etc/security/limits.conf
echo "* soft nofile 65536" | sudo tee -a /etc/security/limits.conf
echo "* hard nproc 65536"  | sudo tee -a /etc/security/limits.conf
echo "* soft nproc 65536"  | sudo tee -a /etc/security/limits.conf

Disable Nouveau and Prepare for NVIDIA Drivers

This server uses NVIDIA GPUs, is part of a vGPU-enabled environment, blacklist the nouveau driver to avoid conflicts:

echo -e "blacklist nouveau\noptions nouveau modeset=0" | sudo tee /etc/modprobe.d/disable-nouveau.conf
sudo update-initramfs -u
sudo reboot

Disable Unwanted and Automated Services

To prevent background package updates, firmware refresh jobs, and automated MOTD tasks from interfering with lab automation or controlled upgrade cycles, disable and mask them:

sudo systemctl stop unattended-upgrades.service \
  apt-daily-upgrade.timer apt-daily.timer \
  fwupd-refresh.timer motd-news.timer \
  update-notifier-download.timer update-notifier-motd.timer

sudo systemctl disable unattended-upgrades.service \
  apt-daily-upgrade.timer apt-daily.timer \
  fwupd-refresh.timer motd-news.timer \
  update-notifier-download.timer update-notifier-motd.timer

sudo systemctl mask unattended-upgrades.service \
  apt-daily-upgrade.timer apt-daily.timer \
  fwupd-refresh.timer motd-news.timer \
  update-notifier-download.timer update-notifier-motd.timer

Configuring udev Rules

### Step 1: Extract Partition UUIDs

For each partition in the VM storage RAID, extract the ID_PART_ENTRY_UUID value using udevadm:

udevadm info --query=all --name=/dev/sdb1 | grep "ID_PART_ENTRY_UUID"
udevadm info --query=all --name=/dev/sdb2 | grep "ID_PART_ENTRY_UUID"
udevadm info --query=all --name=/dev/sdb3 | grep "ID_PART_ENTRY_UUID"
udevadm info --query=all --name=/dev/sdb4 | grep "ID_PART_ENTRY_UUID"
udevadm info --query=all --name=/dev/sdb5 | grep "ID_PART_ENTRY_UUID"
udevadm info --query=all --name=/dev/sdb6 | grep "ID_PART_ENTRY_UUID"

Sample output (one per partition):

E: ID_PART_ENTRY_UUID=7e50d064-c0d1-2f48-bc2b-c22bf8a98933
E: ID_PART_ENTRY_UUID=d7276c4a-c944-8249-95fc-5f5845ad8d9e
E: ID_PART_ENTRY_UUID=f8f0bac9-34d0-5a46-b6c0-80a7bed863a1
E: ID_PART_ENTRY_UUID=9aa79a3f-e05a-1c4b-bb81-acd8712e2f8b
E: ID_PART_ENTRY_UUID=b1016c48-2878-3d4d-9933-c3f103450c06
E: ID_PART_ENTRY_UUID=a483e696-afd5-3d43-a8aa-690f125ac70e

Each UUID uniquely identifies a partition and remains stable across reboots.


### Step 2: Create Custom udev Rules

Create the rules file:

/etc/udev/rules.d/99-kvm-storage.rules

Add the following mappings to create stable, human-readable device names:

SUBSYSTEM=="block", ENV{ID_PART_ENTRY_UUID}=="7e50d064-c0d1-2f48-bc2b-c22bf8a98933", SYMLINK+="mirror"
SUBSYSTEM=="block", ENV{ID_PART_ENTRY_UUID}=="d7276c4a-c944-8249-95fc-5f5845ad8d9e", SYMLINK+="mdb"
SUBSYSTEM=="block", ENV{ID_PART_ENTRY_UUID}=="f8f0bac9-34d0-5a46-b6c0-80a7bed863a1", SYMLINK+="git"
SUBSYSTEM=="block", ENV{ID_PART_ENTRY_UUID}=="9aa79a3f-e05a-1c4b-bb81-acd8712e2f8b", SYMLINK+="dcm"
SUBSYSTEM=="block", ENV{ID_PART_ENTRY_UUID}=="b1016c48-2878-3d4d-9933-c3f103450c06", SYMLINK+="web"
SUBSYSTEM=="block", ENV{ID_PART_ENTRY_UUID}=="a483e696-afd5-3d43-a8aa-690f125ac70e", SYMLINK+="registry"

This creates persistent symlinks that can be safely used by VM orchestration scripts:

/dev/mirror  
/dev/mdb  
/dev/git  
/dev/dcm  
/dev/web  
/dev/registry

### Step 3: Apply the udev Rules

Apply the new rules and trigger them:

sudo udevadm control --reload-rules
sudo udevadm trigger

You can verify the symlinks with:

ls -l /dev | grep -E "mirror|mdb|git|dcm|web|registry"

Remove Unwanted Packages

Since this server is dedicated to lab management and does not require Canonical’s Ubuntu Pro or Ubuntu Advantage subscription tooling, these packages can be safely removed to reduce noise and background processes:

sudo apt purge ubuntu-advantage-tools ubuntu-pro-client*
sudo apt autoremove --purge -y

This keeps the system lean and prevents unwanted prompts or background checks related to subscription features.

Disable Transparent Huge Pages (THP)

Transparent Huge Pages (THP) can introduce unpredictable latency in database workloads, especially on systems running PostgreSQL, MongoDB, Redis, or other latency-sensitive services. THP tries to automatically merge standard 4 KB pages into 2 MB huge pages, but this background merging/compaction can cause stalls that negatively affect consistent performance. PostgreSQL does not benefit from THP and instead prefers madvise/regular huge pages only when explicitly configured.

For a standalone PostgreSQL server doing Git + AI/ML metadata workloads, disabling THP helps maintain consistent response times, reduce jitter, and avoid CPU stalls caused by THP compaction, especially under heavy write or mixed workloads.

Create the service

sudo nano /etc/systemd/system/disable-thp.service

Paste:

[Unit]
Description=Disable Transparent Huge Pages
After=sysinit.target local-fs.target

[Service]
Type=oneshot
ExecStart=/bin/sh -c "echo never > /sys/kernel/mm/transparent_hugepage/enabled"
ExecStart=/bin/sh -c "echo never > /sys/kernel/mm/transparent_hugepage/defrag"

[Install]
WantedBy=multi-user.target

Enable the service

sudo systemctl daemon-reload
sudo systemctl enable --now disable-thp.service

Installing the NVIDIA vGPU Host Driver

Note:
This development server includes a basic NVIDIA vGPU-capable card (Tesla P4) installed specifically for proof-of-concept testing. The steps below document the vGPU host driver installation process required to enable mediated device creation and vGPU profiles on this GPU.


### Step 1: Install Required Packages

Install the dependencies needed for building kernel modules and enabling mediated devices:

sudo apt install -y lvm2 linux-headers-$(uname -r) build-essential dkms mdevctl unzip

### Step 2: Download the vGPU Driver Package

Download the NVIDIA vGPU bundle from:

https://ui.licensing.nvidia.com/software

Use the filters:

  • Product Type: Ubuntu KVM
  • Product Version: 16.11 (vGPU release branch 535)

Example downloaded file:

NVIDIA-GRID-Ubuntu-KVM-535.261.04-535.261.03-539.41.zip

### Step 3: Extract the vGPU Package

unzip NVIDIA-GRID-Ubuntu-KVM-535.261.04-535.261.03-539.41.zip

You will see:

Guest_Drivers/
Host_Drivers/
Signing_Keys/
*.pdf documents

### Step 4: Install the vGPU Host Driver

Navigate to the host driver directory:

cd Host_Drivers/
ls -ltr

You should see:

nvidia-vgpu-ubuntu-535_535.261.04_amd64.deb

Install it:

sudo dpkg -i nvidia-vgpu-ubuntu-535_535.261.04_amd64.deb

Enable vGPU support:

echo "options nvidia NVreg_EnableVGPU=1" | sudo tee /etc/modprobe.d/nvidia-vgpu.conf

### Step 5: Load Required Kernel Modules

Load mdev and vfio kernel modules:

sudo modprobe mdev
sudo modprobe vfio
sudo modprobe vfio_pci

Ensure they load on boot:

echo "mdev"      | sudo tee -a /etc/modules-load.d/modules.conf
echo "vfio"      | sudo tee -a /etc/modules-load.d/modules.conf
echo "vfio_pci"  | sudo tee -a /etc/modules-load.d/modules.conf

### Step 6: Reboot and Verify Driver Installation

Reboot:

sudo reboot

Verify:

nvidia-smi

You should see the Tesla P4 recognized and the vGPU driver loaded successfully.


Creating vGPU Instances

To create an instance using the nvidia-66 profile (8 GB vGPU profile for Tesla P4):

echo "850989c4-26d1-4029-bdbb-245070cd137c" \
  | sudo tee /sys/class/mdev_bus/0000:82:00.0/mdev_supported_types/nvidia-66/create

The UUID returned corresponds to the created vGPU device.

### Step 7: Configure Mellanox mlx4_core Module (if applicable)

This server uses Mellanox ConnectX-3 Pro adapters, configure the driver to operate both ports in Ethernet mode. Create the following module configuration file:

sudo tee /etc/modprobe.d/mlx4_core.conf > /dev/null <<EOF
options mlx4_core port_type_array=2,2
EOF

Update initramfs so the setting is applied on the next reboot:

Meaning:

  • port_type_array=1,1 → Infiniband
  • port_type_array=2,2 → Ethernet (what is required in the setup)
sudo update-initramfs -u

This ensures both Mellanox ports consistently initialize in Ethernet mode, preventing mode-mismatch issues during early boot and before systemd loads network configuration.