This blog provides a detailed overview of the physical infrastructure supporting the lab and production-grade deployment environment. The goal of this platform is to create a high-performance, high-availability foundation for virtualized workloads, container-based services, storage clusters, and management applications. The hardware has been selected and organized to deliver predictable performance, strong isolation, and reliable scaling.
1. Dedicated Management Server
A single Dell PowerEdge R630 is allocated exclusively for core data-center management functions. This server hosts all essential internal management services, ensuring they remain isolated from compute and storage workloads.
Key responsibilities of the management server:
- Internal Ubuntu package mirror
- Management database (for orchestration, tracking, and operational metadata)
- Web and FTP services
- Private Docker/OCI registry
- Backup and archival workflows
- Various supporting services required for cluster bootstrap and ongoing operations
This server is equipped with a hardware RAID-10 array built using 8 × 1 TB SSDs, providing both high throughput and redundancy for critical management data.
2. Compute, Virtualization & Storage Cluster
The core of the environment consists of four high-performance servers, designed to host:
- A Ceph storage cluster
- KVM hypervisors
- An RKE2-based Kubernetes cluster
Each server plays a dual role—delivering compute capacity for virtual machines and Kubernetes workloads while simultaneously participating in a distributed Ceph storage pool.
Two dedicated storage tiers are defined:
- NVMe pool for high-IOPs workloads
- SSD pool for capacity and balanced throughput
All VM disk requirements are fulfilled using Ceph RBD (RADOS Block Device), enabling high availability, live migration flexibility, and unified storage management across the cluster.
3. Network Architecture
Every server in the cluster is equipped with the following network interfaces:
- 4 × 10 GbE NICs
- 2 × 40 GbE NICs (Mellanox)
The network has been segmented with clear traffic isolation to avoid contention and ensure low-latency performance across storage, compute, and management layers.
Network Interface Allocation
| Interface | Purpose |
|---|---|
| 40G NIC #1 | Dedicated Ceph cluster backend network |
| 40G NIC #2 | Application database cluster synchronization traffic |
| 10G NIC #1 | Ceph north-bound access (client access to storage) |
| 10G NIC #2 | Application DB north-bound access |
| 10G NIC #3 | VM management and hypervisor control plane |
| 10G NIC #4 | Worker-node and Kubernetes overlay traffic |
This separation ensures deterministic performance even under load, with each subsystem receiving its own physical bandwidth and path.
4. Switching & Connectivity
A single Arista 7050QX-30S switch forms the backbone of the environment. The switch provides 40G and 10G capabilities to meet both high-speed storage traffic and standard data-plane requirements.
Key connectivity features:
- 40G to 4×10G breakout AOC cables for attaching 10G server ports directly to the Arista switch
- 40G QSFP Active Optical Cables for direct connections between Mellanox 40G NICs and the switch
- All interfaces configured with MTU 9000 (jumbo frames) to optimize Ceph, VM migration, and Kubernetes overlay performance
5. Firewall & Gateway
A Ubiquiti UDM Pro is deployed as the edge firewall and primary gateway for the environment. It provides:
- WAN routing
- Firewall segmentation
- VPN access
- Traffic monitoring
- Internal VLAN gateway services
The UDM Pro uplinks to the Arista switch via a 10G SFP+ link, allowing high-throughput east-west and north-south traffic.
6. Virtualization Platform
KVM is the hypervisor of choice for all virtual machine workloads.
A custom-built Hypervisor Management Solution orchestrates:
- VM creation and lifecycle
- vCPU and memory allocation
- Image handling
- vGPU support (where applicable)
- RBD-backed storage assignments
- Automated configuration for networks and bridge mappings
This approach provides full control of the virtualization environment without relying on heavy external platforms.
7. Storage Layer: Ceph Cluster
The Ceph cluster spans the four compute nodes, forming a distributed storage backend with:
- NVMe OSDs for high-performance, low-latency operations
- SSD OSDs for general storage pools
All storage for virtual machines, container workloads, and application services is exposed as Ceph RBD block devices, enabling:
- Redundant and self-healing storage
- Transparent failover
- Uniform storage consumption across compute nodes
- Simplified scaling by adding OSDs or nodes
This architecture ensures resilience and performance for both VM and Kubernetes workloads.
Conclusion
This hardware platform provides a robust foundation for infrastructure automation, virtual machine orchestration, container platform deployment, and distributed storage operations.
By combining high-speed networking, redundant storage, compute density, and strong isolation between management and workload traffic, the environment is designed to scale predictably and operate efficiently under demanding usage.
A clean separation of roles—management, compute, storage, networking, and control—ensures the long-term maintainability and reliability of the entire system.