Next : Installing Ceph
I have been working on developing microservices as part of an application deployed on K8S. More time and effort goes into building resiliency and handling failure than building application features. One of the key areas where challenges were observed was related to storage, which is the backbone of any application.
Searching the net, I identified that a robust distributed storage system that is scalable and highly available, along with support for integrating with K8S-based applications easily, will abstract out the storage-related challenges we had encountered. A further limited search on the net identified Ceph Storage as a prime candidate. Deployed on decent hardware, it would provide a production-grade platform for on-premise deployments of SMEs.
Key objectives:
- The storage cluster should satisfy the applications’ IOPS requirements.
- Time-tested CSI driver support for the storage is a must.
- The CSI driver’s over-the-wire encryption support will be an added advantage.
- The hardware requirements should not be high or prohibitive for SMEs planning to opt for on-premise deployments.
- It should be open-source with a decent number of production deployments.
- Active community support and decent documentation are required.
- Initial deployment complexities are not an issue as they would be one-time and can be documented.
- Version upgrades and security patch applications should be possible.
- It would be an added advantage if they were simple and time-tested.
- Should support dynamic volume expansion
- Though a K8S feature, migrating pods/stateful sets should be automatic and consistent in case of server or worker node failures.
The decision is to use Ceph RBD and not Ceph FS. I am new to using Ceph and am starting with this decision. All worker node VMs will be backed by block storage in Ceph RBD. The Ceph CSI Driver (RBD) will enable block objects for PV requirements.
Available hardware: 4 x Dell R630 with the following configuration
- Intel X710 daughter board – 4 x 10G SFP+
- Intel X520 – DA2 – Adapter – 2 x 10G SFP+ (Slot 2)
- Dual M.2 NVME PCIe 3 Adapter (x4 Bifurcation) – Slots 1 & 3
- Samsung 970 EVO Plus NVMe – 4 numbers
- 4 x 10K RPM 1.2 TB SAS Drive in RAID 10 (PERC Mini 730p) for Boot / OS
MicroTik CRS-326-24S+2Q+RM
- Bridge Mode
- 9000 MTU
- Ensures Maximum utilisation of available network throughput
- Consistent 9.86 Gbps in iperf3 test results
- 20 SFP+ ports connected to servers (5 ports each)
- Direct Attach Cable
- 1 SFP+ port connected to TP-Link Router (uplink)
- Direct Attach Cable
TP-Link ER8411
- 10G SFP+ LAN Port connected to Cloud Router Switch
- 10G SFP+ WAN Port connected to Gateway UTM device
- UTM device gateway is 1G RJ45, used Microtik S+RJ10 Copper Module
Deployment
2 x 10 G NICs dedicated to Ceph Storage, one for public network and one for cluster network
- 10.0.4.0 /24 – public network
- 10.0.5.0/24 – cluster network
2 x 10 G NICs dedicated to Percona XDB cluster, one for application access and one for cluster synch
- 10.0.2.0/24 – Service, application access
- 10.0.3.0/24 – Cluster sync
1 x 10G NIC for server management and accessing VMs
- 10.0.1.0/24 – Servers and VM Management / Access
Ubuntu 24.04: Support for Ceph libraries in repositories, Periodic security updates
Install Ubuntu 24.04 server. Some of the post-installation steps performed
Login with the user account created during installation and set the password for the root account
sudo passwd
Allow remote root login and disable strict host checking (Reverted after installation)
sed -i "s/#PermitRootLogin prohibit-password/PermitRootLogin yes/g" /etc/ssh/sshd_config
sed -i "s/#PubkeyAuthentication/PubkeyAuthentication/g" /etc/ssh/sshd_config
sed -i "s/#AuthorizedKeysFile/AuthorizedKeysFile/g" /etc/ssh/sshd_config
sed -i "s/# StrictHostKeyChecking ask/ StrictHostKeyChecking no/g" /etc/ssh/ssh_config
sed -i "s/session optional pam_motd.so/#session optional pam_motd.so/g" /etc/pam.d/sshd
sed -i "s/session optional pam_motd.so/#session optional pam_motd.so/g" /etc/pam.d/sshd
service ssh restart
Log out and log in as root user.
Set the required timezone and configure NTP.
timedatectl set-timezone "Asia/Kolkata"
sed -i "s/#NTP=/NTP=time\.google\.com/g" /etc/systemd/timesyncd.conf
sed -i "s/#FallbackNTP=ntp.ubuntu.com/FallbackNTP=ntp\.ubuntu\.com/g" /etc/systemd/timesyncd.conf
systemctl enable --now systemd-time-wait-sync.service
Configure systemd-resolved to build /etc/resolv.conf during startup
ln -fs /run/systemd/resolve/resolv.conf /etc/resolv.conf
sed -i "s/^\#DNS.*/DNS=10.0.0.1/g" /etc/systemd/resolved.conf
sed -i "s/^\#Fallback.*/FallbackDNS=8.8.8.8/g" /etc/systemd/resolved.conf
systemctl restart systemd-resolved
Edit /etc/netplan/50-cloud-init.yaml and update MTU for all interfaces
network:
ethernets:
eno1np0:
addresses:
- 10.0.1.1/16
mtu: 9000
gateway4: 10.0.0.1
eno2np1:
addresses:
- 10.0.2.1/24
mtu: 9000
eno3np2:
addresses:
- 10.0.3.1/24
mtu: 9000
eno4np3:
addresses:
- 10.0.4.1/24
mtu: 9000
enp129s0f0:
addresses:
- 10.0.5.1/24
mtu: 9000
version: 2
Disable Swap
systemctl stop swap.img.swap swap.target
rm -f /swap.img
sed -i '/swap\.img/d' /etc/fstab
Limit journal file size
sed -i "s/#SystemMaxFileSize.*/SystemMaxFileSize=512M/g" /etc/systemd/journald.conf
Set max files and process counts (This may not be required – need to review).
echo "* hard nofile 65536" >> /etc/security/limits.conf
echo "* soft nofile 65536" >> /etc/security/limits.conf
echo "* hard nproc 65536" >> /etc/security/limits.conf
echo "* soft nproc 65536" >> /etc/security/limits.conf
Stop and disable all timers related to package updates, apparmor, cloud-init
touch /etc/cloud/cloud-init-disabled
systemctl stop apparmor cloud-init cloud-init-local
systemctl stop apt-daily-upgrade.timer apt-daily.timer fwupd-refresh.timer
systemctl stop motd-news.timer update-notifier-download.timer update-notifier-motd.timer
systemctl disable apparmor cloud-init cloud-init-local
systemctl disable apt-daily-upgrade.timer apt-daily.timer fwupd-refresh.timer
systemctl disable motd-news.timer update-notifier-download.timer update-notifier-motd.timer
Update the Ubuntu repository and upgrade packages
apt update -y
apt upgrade -y
Install packages that become handy
apt install -y net-tools iperf3
Disable IPv6 and enable huge pages: Edit /etc/default/grub and change the contents of one line
GRUB_CMDLINE_LINUX_DEFAULT="ipv6.disable=1 hugepagesz=1G hugepages=X transparent_hugepage=never"
update-grub
Note: Verified that the system supports 1G huge pages
root@server1:~# lscpu | grep -E 'pdpe1gb'
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 pti ssbd ibrs ibpb stibp tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts vnmi md_clear flush_l1d
root@server1:~#
Reboot the server
Basic iperf3 check between two servers (1 and 3)
root@server1:~# iperf3 -s -B 10.0.5.1
-----------------------------------------------------------
Server listening on 5201 (test #1)
-----------------------------------------------------------
Accepted connection from 10.0.5.3, port 51892
[ 5] local 10.0.5.1 port 5201 connected to 10.0.5.3 port 51898
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 1.14 GBytes 9.79 Gbits/sec
[ 5] 1.00-2.00 sec 1.15 GBytes 9.87 Gbits/sec
[ 5] 2.00-3.00 sec 1.15 GBytes 9.87 Gbits/sec
[ 5] 3.00-4.00 sec 1.15 GBytes 9.87 Gbits/sec
[ 5] 4.00-5.00 sec 1.15 GBytes 9.87 Gbits/sec
[ 5] 5.00-6.00 sec 1.15 GBytes 9.87 Gbits/sec
[ 5] 6.00-7.00 sec 1.15 GBytes 9.87 Gbits/sec
[ 5] 7.00-8.00 sec 1.15 GBytes 9.87 Gbits/sec
[ 5] 8.00-9.00 sec 1.15 GBytes 9.87 Gbits/sec
[ 5] 9.00-10.00 sec 1.15 GBytes 9.87 Gbits/sec
[ 5] 10.00-10.00 sec 640 KBytes 9.66 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.00 sec 11.5 GBytes 9.86 Gbits/sec receiver
-----------------------------------------------------------
Server listening on 5201 (test #2)
-----------------------------------------------------------
root@server3:~# iperf3 -c 10.0.5.1
Connecting to host 10.0.5.1, port 5201
[ 5] local 10.0.5.3 port 51898 connected to 10.0.5.1 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 1.14 GBytes 9.82 Gbits/sec 11 1.56 MBytes
[ 5] 1.00-2.00 sec 1.15 GBytes 9.87 Gbits/sec 0 1.60 MBytes
[ 5] 2.00-3.00 sec 1.15 GBytes 9.86 Gbits/sec 0 1.63 MBytes
[ 5] 3.00-4.00 sec 1.15 GBytes 9.86 Gbits/sec 0 1.65 MBytes
[ 5] 4.00-5.00 sec 1.15 GBytes 9.87 Gbits/sec 2 1.72 MBytes
[ 5] 5.00-6.00 sec 1.15 GBytes 9.87 Gbits/sec 1 1.72 MBytes
[ 5] 6.00-7.00 sec 1.15 GBytes 9.87 Gbits/sec 0 1.74 MBytes
[ 5] 7.00-8.00 sec 1.15 GBytes 9.86 Gbits/sec 0 1.76 MBytes
[ 5] 8.00-9.00 sec 1.15 GBytes 9.87 Gbits/sec 2 1.79 MBytes
[ 5] 9.00-10.00 sec 1.15 GBytes 9.86 Gbits/sec 0 1.81 MBytes
– – – – – – – – – – – – – – – – – – – – – – – – –
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 11.5 GBytes 9.87 Gbits/sec 16 sender
[ 5] 0.00-10.00 sec 11.5 GBytes 9.86 Gbits/sec receiver
iperf Done.
root@server3:~#
root@server1:# cat /proc/meminfo | grep “HugePages”
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 3072
HugePages_Free: 3072
HugePages_Rsvd: 0
HugePages_Surp: 0
root@server1:~#
Some notes on selecting NVME
- The topmost would be to go with enterprise-grade – A bit expensive – but I decided not to go for it for my home lab.
- Check for PLP (Power loss protection)
- Prefer TLC over QLC
- TBW – the higher, the better