Next : Installing Ceph
I have been working on developing microservices as part of an application deployed on K8S. More time and effort goes into building resiliency and handling failure than building application features. One of the key areas where challenges were observed was related to storage, which is the backbone of any application.
Searching the net, I identified that a robust distributed storage system that is scalable and highly available, along with support for integrating with K8S-based applications easily, will abstract out the storage-related challenges we had encountered. A further limited search on the net identified Ceph Storage as a prime candidate. Deployed on decent hardware, it would provide a production-grade platform for on-premise deployments of SMEs.
Key objectives:
- The storage cluster should satisfy the applications’ IOPS requirements.
- Time-tested CSI driver support for the storage is a must.
- The CSI driver’s over-the-wire encryption support will be an added advantage.
- The hardware requirements should not be high or prohibitive for SMEs planning to opt for on-premise deployments.
- It should be open-source with a decent number of production deployments.
- Active community support and decent documentation are required.
- Initial deployment complexities are not an issue as they would be one-time and can be documented.
- Version upgrades and security patch applications should be possible.
- It would be an added advantage if they were simple and time-tested.
- Should support dynamic volume expansion
- Though a K8S feature, migrating pods/stateful sets should be automatic and consistent in case of server or worker node failures.
The decision is to use Ceph RBD and not Ceph FS. I am new to using Ceph and am starting with this decision. All worker node VMs will be backed by block storage in Ceph RBD. The Ceph CSI Driver (RBD) will enable block objects for PV requirements.
Available hardware: 4 x Dell R630 with the following configuration
- Intel X710 daughter board – 4 x 10G SFP+
- Intel X520 – DA2 – Adapter – 2 x 10G SFP+ (Slot 2)
- Dual M.2 NVME PCIe 3 Adapter (x4 Bifurcation) – Slots 1 & 3
- Samsung 970 EVO Plus NVMe – 4 numbers
- 4 x 10K RPM 1.2 TB SAS Drive in RAID 10 (PERC Mini 730p) for Boot / OS
MicroTik CRS-326-24S+2Q+RM
- Bridge Mode
- 9000 MTU
- Ensures Maximum utilisation of available network throughput
- Consistent 9.86 Gbps in iperf3 test results
- 20 SFP+ ports connected to servers (5 ports each)
- Direct Attach Cable
- 1 SFP+ port connected to TP-Link Router (uplink)
- Direct Attach Cable
TP-Link ER8411
- 10G SFP+ LAN Port connected to Cloud Router Switch
- 10G SFP+ WAN Port connected to Gateway UTM device
- UTM device gateway is 1G RJ45, used Microtik S+RJ10 Copper Module
Deployment
2 x 10 G NICs dedicated to Ceph Storage, one for public network and one for cluster network
- 10.0.4.0 /24 – public network
- 10.0.5.0/24 – cluster network
2 x 10 G NICs dedicated to Percona XDB cluster, one for application access and one for cluster synch
- 10.0.2.0/24 – Service, application access
- 10.0.3.0/24 – Cluster sync
1 x 10G NIC for server management and accessing VMs
- 10.0.1.0/24 – Servers and VM Management / Access
Note: I initially attempted to use Ubuntu 24.04, as the default repositories included Ceph-Squid release packages, however after understanding that Ceph-Reef was the official stable release and my affinity towards Debian, I decided to go with Debian 12 and Ceph-Reef
Install Debian 12.7 on the server(s)
Login/ssh into the server with the user account configured during installation
Switch to the root user account
su -
Enable remote login for root user account
sed -i "s/#PermitRootLogin prohibit-password/PermitRootLogin yes/g" /etc/ssh/sshd_config
sed -i "s/#PubkeyAuthentication/PubkeyAuthentication/g" /etc/ssh/sshd_config
sed -i "s/#AuthorizedKeysFile/AuthorizedKeysFile/g" /etc/ssh/sshd_config
sed -i "s/# StrictHostKeyChecking ask/ StrictHostKeyChecking no/g" /etc/ssh/ssh_config
sed -i "s/session optional pam_motd.so/#session optional pam_motd.so/g" /etc/pam.d/sshd
sed -i "s/session optional pam_motd.so/#session optional pam_motd.so/g" /etc/pam.d/sshd
service ssh restart
Remove CDROM from the apt sources list.
sed -i '/deb cdrom/d' /etc/apt/sources.list
Logout and log in as root user
Install required packages
apt -y install net-tools systemd-resolved fio iperf3 gnupg2 software-properties-common lvm2 nfs-common
Configure DNS server IP – Let the system resolve and manage DNS server configuration.
ln -fs /run/systemd/resolve/resolv.conf /etc/resolv.conf
sed -i "s/^\#DNS.*/DNS=8.8.8.8/g" /etc/systemd/resolved.conf
systemctl restart systemd-resolved
Disable daily update timers.
systemctl stop apt-daily-upgrade.timer apt-daily.timer apparmor
systemctl disable apt-daily-upgrade.timer apt-daily.timer apparmor
Configure NTP server and restart NTP services
timedatectl set-timezone "Asia/Kolkata"
sed -i "s/#NTP=/NTP=time\.google\.com/g" /etc/systemd/timesyncd.conf
sed -i "s/#FallbackNTP=ntp.ubuntu.com/FallbackNTP=ntp\.ubuntu\.com/g" /etc/systemd/timesyncd.conf
systemctl stop systemd-timesyncd.service
systemctl start systemd-timesyncd.service
Configure max file size of the journal
sed -i "s/#SystemMaxFileSize.*/SystemMaxFileSize=512M/g" /etc/systemd/journald.conf
Configure the maximum number of files open and the maximum number of processes
echo "* hard nofile 65536" >> /etc/security/limits.conf
echo "* soft nofile 65536" >> /etc/security/limits.conf
echo "* hard nproc 65536" >> /etc/security/limits.conf
echo "* soft nproc 65536" >> /etc/security/limits.conf
Disable IPV6 and enable huge pages
sed -i 's/GRUB_CMDLINE_LINUX=""/GRUB_CMDLINE_LINUX="ipv6.disable=1 hugepagesz=2MB hugepages=32000 transparent_hugepage=never"/g' /etc/default/grub
update-grub
Configure network interface configuration – update /etc/network/interfaces (Sample provided here is from server1 – Change it as required)
source /etc/network/interfaces.d/*
auto lo
iface lo inet loopback
allow-hotplug eno1
iface eno1 inet static
address 10.0.1.1/16
gateway 10.0.0.1
mtu 9000
allow-hotplug eno2
iface eno2 inet static
address 10.0.2.1/24
mtu 9000
allow-hotplug eno3
iface eno3 inet static
address 10.0.3.1/24
mtu 9000
allow-hotplug eno4
iface eno4 inet static
address 10.0.4.1/24
mtu 9000
allow-hotplug enp129s0
iface enp129s0 inet static
address 10.0.5.1/24
mtu 9000
Reboot the server
Enable key-based, passwordless SSH between servers
echo "Host ceph1" > ~/.ssh/config
echo " Hostname ceph1" >> ~/.ssh/config
echo " User root" >> ~/.ssh/config
echo "Host ceph2" >> ~/.ssh/config
echo " Hostname ceph2" >> ~/.ssh/config
echo " User root" >> ~/.ssh/config
echo "Host ceph3" >> ~/.ssh/config
echo " Hostname ceph3" >> ~/.ssh/config
echo " User root" >> ~/.ssh/config
echo "Host ceph4" >> ~/.ssh/config
echo " Hostname ceph4" >> ~/.ssh/config
echo " User root" >> ~/.ssh/config
ssh-keygen -q -N ""
ssh-keygen -f '/root/.ssh/known_hosts' -R 'ceph1'
ssh-keygen -f '/root/.ssh/known_hosts' -R 'ceph2'
ssh-keygen -f '/root/.ssh/known_hosts' -R 'ceph3'
ssh-keygen -f '/root/.ssh/known_hosts' -R 'ceph4'
ssh-copy-id ceph1
ssh-copy-id ceph2
ssh-copy-id ceph3
ssh-copy-id ceph4
Basic iperf3 check between two servers (1 and 3)
root@server1:~# iperf3 -s -B 10.0.5.1
-----------------------------------------------------------
Server listening on 5201 (test #1)
-----------------------------------------------------------
Accepted connection from 10.0.5.4, port 41562
[ 5] local 10.0.5.1 port 5201 connected to 10.0.5.4 port 41578
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 1.15 GBytes 9.88 Gbits/sec
[ 5] 1.00-2.00 sec 1.15 GBytes 9.90 Gbits/sec
[ 5] 2.00-3.00 sec 1.15 GBytes 9.90 Gbits/sec
[ 5] 3.00-4.00 sec 1.15 GBytes 9.90 Gbits/sec
[ 5] 4.00-5.00 sec 1.15 GBytes 9.90 Gbits/sec
[ 5] 5.00-6.00 sec 1.15 GBytes 9.90 Gbits/sec
[ 5] 6.00-7.00 sec 1.15 GBytes 9.90 Gbits/sec
[ 5] 7.00-8.00 sec 1.15 GBytes 9.90 Gbits/sec
[ 5] 8.00-9.00 sec 1.15 GBytes 9.90 Gbits/sec
[ 5] 9.00-10.00 sec 1.15 GBytes 9.90 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.00 sec 11.5 GBytes 9.90 Gbits/sec receiver
-----------------------------------------------------------
Server listening on 5201 (test #2)
-----------------------------------------------------------
root@server4:~# iperf3 -c 10.0.5.1
Connecting to host 10.0.5.1, port 5201
[ 5] local 10.0.5.4 port 41578 connected to 10.0.5.1 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 1.15 GBytes 9.91 Gbits/sec 0 1.44 MBytes
[ 5] 1.00-2.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.51 MBytes
[ 5] 2.00-3.00 sec 1.15 GBytes 9.89 Gbits/sec 0 1.51 MBytes
[ 5] 3.00-4.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.59 MBytes
[ 5] 4.00-5.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.59 MBytes
[ 5] 5.00-6.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.59 MBytes
[ 5] 6.00-7.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.59 MBytes
[ 5] 7.00-8.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.59 MBytes
[ 5] 8.00-9.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.59 MBytes
[ 5] 9.00-10.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.59 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 11.5 GBytes 9.90 Gbits/sec 0 sender
[ 5] 0.00-10.00 sec 11.5 GBytes 9.90 Gbits/sec receiver
iperf Done.
root@server4:~#
root@server1:# cat /proc/meminfo | grep "HugePages"
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 32000
HugePages_Free: 32000
HugePages_Rsvd: 0
HugePages_Surp: 0
root@server1:~#
Some notes on selecting NVME
- Check for PLP (Power loss protection)
- Prefer TLC over QLC
- TBW – the higher, the better