Installing Ceph Reef (v18) Installation and Initial Configuration

Platform: Ubuntu 22.04 (Jammy)
Deployment model: cephadm, multi-host, mixed SSD classes


1. Install prerequisites on all hosts

Install base dependencies required by cephadm and Ceph daemons:

apt install -y \
  ca-certificates curl gnupg lsb-release \
  python3 python3-pip \
  openssh-client openssh-server sudo jq

Container runtime

Cephadm requires a supported OCI container runtime.
Podman is the recommended and well-tested runtime for Ceph Reef on Ubuntu 22.04.

apt install -y podman

2. Kernel and networking prerequisites

Enable IPv4 forwarding (required for Ceph networking):

echo "net.ipv4.ip_forward = 1" | sudo tee /etc/sysctl.d/99-ceph.conf
sudo sysctl --system

3. Add Ceph Reef repository and upgrade packages (all nodes)

Add official Ceph GPG key

curl -fsSL https://download.ceph.com/keys/release.asc \
  | sudo gpg --dearmor -o /usr/share/keyrings/ceph.gpg

Verify the key:

gpg --show-keys /usr/share/keyrings/ceph.gpg

Add Ceph Reef repository (Ubuntu 22.04 – Jammy)

echo "deb [signed-by=/usr/share/keyrings/ceph.gpg] \
https://download.ceph.com/debian-reef/ jammy main" \
| sudo tee /etc/apt/sources.list.d/ceph.list

Update and upgrade:

apt update -y
apt upgrade -y

4. Bootstrap node preparation (server1)

Install Ceph tools only on the bootstrap node:

apt install -y cephadm ceph-common ceph-volume

Verify versions:

cephadm version
ceph -v

Verify host readiness:

cephadm check-host
cephadm gather-facts

5. Bootstrap parameters

ParameterValue
Bootstrap node IP10.0.4.1
Public network10.0.4.0/24
Cluster network10.0.5.0/24
SSH userroot

DNS prerequisites

Ensure forward DNS resolution exists for all nodes:

10.0.4.1 → server1.yourdomain.net
10.0.4.2 → server2.yourdomain.net
10.0.4.3 → server3.yourdomain.net
10.0.4.4 → server4.yourdomain.net

6. Bootstrap the cluster

Run only on server1:

cephadm bootstrap \
  --mon-ip 10.0.4.1 \
  --cluster-network 10.0.5.0/24 \
  --ssh-user root \
  --allow-fqdn-hostname

Verification

ceph orch status
ceph config get mon public_network
ceph config get osd cluster_network
ceph -s

At this stage, HEALTH_WARN due to no OSDs is expected.


7. SSH key distribution (recommended method)

From server1, install cephadm public key on all other nodes:

ssh-copy-id -f -i /etc/ceph/ceph.pub root@10.0.4.2
ssh-copy-id -f -i /etc/ceph/ceph.pub root@10.0.4.3
ssh-copy-id -f -i /etc/ceph/ceph.pub root@10.0.4.4

8. Add hosts to the orchestrator

Run only on server1:

ceph orch host add server2.yourdomain.net 10.0.4.2
ceph orch host add server3.yourdomain.net 10.0.4.3
ceph orch host add server4.yourdomain.net 10.0.4.4

Verify:

ceph orch host ls

9. Device → OSD → Pool strategy

Enterprise SSDs (/dev/db*)

  • Devices: /dev/db1, /dev/db2, /dev/db3
  • Purpose:
    • High endurance
    • Latency-sensitive workloads
  • Pool target: db-pool
  • CRUSH class: db

Consumer SSD / NVMe (/dev/app*)

  • Devices: /dev/app1, /dev/app2, /dev/app3, /dev/app4
  • Purpose:
    • General block/object storage
    • Higher capacity, lower endurance
  • Pool target: app-pool
  • CRUSH class: app

Important clarifications

  • One OSD per physical device
  • No WAL/DB device separation
  • Pool separation is done via CRUSH rules, not BlueStore DB devices
  • /dev/db* and /dev/app* are full OSDs, not metadata devices

10. Disk cleanup (all servers, destructive)

This wipes all Ceph/LVM metadata. Use only on fresh or reclaimed disks.

lvchange -an $(lvs --noheadings -o lv_path | awk '{print $1}')
vgchange -an $(vgs --noheadings -o vg_name | awk '{print $1}')
vgremove -y $(vgs --noheadings -o vg_name | awk '{print $1}')
pvremove -y $(pvs --noheadings -o pv_name | awk '{print $1}')

Zap devices:

for d in /dev/app1 /dev/app2 /dev/app3 /dev/app4 \
         /dev/db1 /dev/db2 /dev/db3; do
  wipefs -a "$d"
  sgdisk --zap-all "$d"
  blkdiscard "$d" 2>/dev/null || true
  ceph-volume lvm zap "$d"
done

Fallback (if needed):

dd if=/dev/zero of=/dev/<device> bs=1M count=100

Reboot hosts.

11. Verify devices are available

ceph orch device ls

All intended devices should show AVAILABLE: Yes.


12. Create OSD specs

Enterprise SSD OSD spec (osd-db.yaml)

service_type: osd
service_id: db-osds
placement:
  host_pattern: "server*"
spec:
  data_devices:
    paths:
      - /dev/db1
      - /dev/db2
      - /dev/db3
  objectstore: bluestore
  crush_device_class: db

Consumer SSD OSD spec (osd-app.yaml)

service_type: osd
service_id: app-osds
placement:
  host_pattern: "server*"
spec:
  data_devices:
    paths:
      - /dev/app1
      - /dev/app2
      - /dev/app3
      - /dev/app4
  objectstore: bluestore
  crush_device_class: app

Apply specs:

ceph orch apply -i osd-db.yaml
ceph orch apply -i osd-app.yaml

Wait until all expected OSDs appear:

ceph osd tree

13. Expected orchestrator behavior (important)

After OSD creation, Ceph reports:

Failed to apply service(s): osd.db-osds, osd.app-osds

Why this happens

  • OSD specs are provisioning mechanisms
  • Once disks are consumed, re-applying specs fails (devices already have filesystems)
  • This is expected behavior

Correct resolution

Remove the specs without touching OSDs:

ceph orch rm osd.db-osds --force
ceph orch rm osd.app-osds --force

Result:

  • OSDs remain intact
  • Cluster becomes HEALTH_OK
  • OSDs appear as <unmanaged> in ceph orch ls

14. Why unmanaged OSDs are safe and expected

Unmanaged does NOT mean:

  • OSDs are unhealthy
  • OSDs are not restarted
  • OSDs are not upgraded

Unmanaged means:

  • Cephadm will not re-run disk provisioning logic
  • Disks will not be touched again automatically

This is the recommended steady-state after provisioning when:

  • Explicit device paths are used
  • Hardware is heterogeneous
  • Disk automation is not desired

Ceph daemons (OSDs) are still fully managed.


15. CRUSH rules and pools

Create CRUSH rules:

ceph osd crush rule create-replicated app-rule default host app
ceph osd crush rule create-replicated db-rule  default host db

Create pools:

ceph osd pool create app-pool 128
ceph osd pool set app-pool size 3
ceph osd pool set app-pool min_size 2
ceph osd pool set app-pool crush_rule app-rule

ceph osd pool create db-pool 128
ceph osd pool set db-pool size 3
ceph osd pool set db-pool min_size 2
ceph osd pool set db-pool crush_rule db-rule

Enable application tags:

ceph osd pool application enable app-pool rbd
ceph osd pool application enable db-pool rbd

16. Recovery and backfill tuning (recommended)

ceph config set osd osd_max_backfills 2
ceph config set osd osd_recovery_max_active 3
ceph config set osd osd_recovery_sleep 0.1

17. Persist cluster state (bootstrap node)

ceph config dump > /root/ceph-config.dump
ceph osd dump > /root/ceph-osd.dump
ceph mon dump > /root/ceph-mon.dump

18. Secure admin key (bootstrap node only)

chmod 600 /etc/ceph/ceph.client.admin.keyring
chown root:root /etc/ceph/ceph.client.admin.keyring