Rabbit MQ Cluster + HAProxy + Keepalived

I observed some challenges in my tests when using RMQ version 4, downloading the required packages from specific repos.

I decided to go with Rabbitmq 3.10. I launched 3 Debian 12 VMs, as packages are maintained as part of the distro.

All the VMs have 3 NICs bridged to physical host NICs dedicated to specific purposes. NIC 1 is for VM management (10.0.1.x), NIC 2 is for Rabbitmq access (10.0.2.x), and NIC is for RabbitMQ cluster traffic (10.0.3.x). The DNS server is updated with host resolution for all IPs; for VM management, it would be dbx.domain.net; for RabbitMQ access, it would be rmqx.domain.net; and for cluster traffic, it would be rsynchx.domain.net.

Deciding to use only quorum queues.

Install rabbitmq-server in all nodes with a simple “apt install rabbitmq-server” command.

Create /etc/rabbitmq/rabbitmq.conf in all VMs (Replace x with the relevant IP octet)

# Client traffic (AMQP)
listeners.tcp.default = 10.0.2.x:5672

# Management UI (optional)
management.listener.ip = 10.0.2.x
management.listener.port = 15672

# Clustering communication
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_classic_config
# The node names match what is configured in DNS server for RabbitMQ access
cluster_formation.classic_config.nodes.1 = rabbit@rmq1
cluster_formation.classic_config.nodes.2 = rabbit@rmq2
cluster_formation.classic_config.nodes.3 = rabbit@rmq3

cluster_partition_handling = pause_minority

Update /etc/rabbitmq/rabbitmq-env.config in all VMs (Replace x with the relevant IP octet)

# Defaults to rabbit. This can be useful if you want to run more than one node
# per machine - RABBITMQ_NODENAME should be unique per erlang-node-and-machine
# combination. See the clustering on a single machine guide for details:
# http://www.rabbitmq.com/clustering.html#single-machine
# Replace node name with DNS name configured for cluster traffic rabbit@rsynch1 / rabbit@rsynch2 / rabbit@rsynch3
NODENAME=rabbit@rsynch1

# By default RabbitMQ will bind to all interfaces, on IPv4 and IPv6 if
# available. Set this if you only want to bind to one network interface or#
# address family.
NODE_IP_ADDRESS=10.0.3.x

# Defaults to 5672.
RABBITMQ_DIST_PORT=25672

Create /etc/systemd/system/rabbitmq-server.service.d/limits.conf With the following contents

Configure file open limits for the service “systemctl edit rabbitmq-server.service” and configure the following overriding configuration.

[Service]
LimitNOFILE=64000

Enable the rabbitmq_management plugin, configure an Erlang cookie, ports and restart rabbitmq_server (all nodes)

rabbitmq-plugins enable rabbitmq_management
echo "SOMEALPHANUMERICCOOKIE" | sudo tee /var/lib/rabbitmq/.erlang.cookie
systemctl restart rabbitmq-server

One node 1, create an admin user (dcuser) and delete the default ‘guest’ user

rabbitmqctl add_user dcuser somepassword
rabbitmqctl set_permissions -p / dcuser ".*" ".*" ".*"
rabbitmqctl set_user_tags dcuser administrator
rabbitmqctl delete_user guest

On the other two nodes, stop the application, join the cluster and start the application.

rabbitmqctl stop_app
rabbitmqctl join_cluster rabbit@rsynch1
rabbitmqctl start_app

Verify cluster status

root@rmq1:~# rabbitmqctl cluster_status

Cluster status of node rabbit@rsynch1 ...
Basics

Cluster name: rabbit@rmq1.domain.net
Total CPU cores available cluster-wide: 6

Disk Nodes

rabbit@rsynch1
rabbit@rsynch2
rabbit@rsynch3

Running Nodes

rabbit@rsynch1
rabbit@rsynch2
rabbit@rsynch3

Versions

rabbit@rsynch1: RabbitMQ 3.12.1 on Erlang 25.3.2.8
rabbit@rsynch2: RabbitMQ 3.12.1 on Erlang 25.3.2.8
rabbit@rsynch3: RabbitMQ 3.12.1 on Erlang 25.3.2.8

CPU Cores

Node: rabbit@rsynch1, available CPU cores: 2
Node: rabbit@rsynch2, available CPU cores: 2
Node: rabbit@rsynch3, available CPU cores: 2

Maintenance status

Node: rabbit@rsynch1, status: not under maintenance
Node: rabbit@rsynch2, status: not under maintenance
Node: rabbit@rsynch3, status: not under maintenance

Alarms

(none)

Network Partitions

(none)

Listeners

Node: rabbit@rsynch1, interface: 10.0.2.19, port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@rsynch1, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@rsynch1, interface: 10.0.2.19, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@rsynch2, interface: 10.0.2.20, port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@rsynch2, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@rsynch2, interface: 10.0.2.20, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@rsynch3, interface: 10.0.2.21, port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@rsynch3, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@rsynch3, interface: 10.0.2.21, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0

Feature flags

Flag: classic_mirrored_queue_version, state: enabled
Flag: classic_queue_type_delivery_support, state: enabled
Flag: direct_exchange_routing_v2, state: enabled
Flag: drop_unroutable_metric, state: enabled
Flag: empty_basic_get_metric, state: enabled
Flag: feature_flags_v2, state: enabled
Flag: implicit_default_bindings, state: enabled
Flag: listener_records_in_ets, state: enabled
Flag: maintenance_mode_status, state: enabled
Flag: quorum_queue, state: enabled
Flag: restart_streams, state: enabled
Flag: stream_queue, state: enabled
Flag: stream_sac_coordinator_unblock_group, state: enabled
Flag: stream_single_active_consumer, state: enabled
Flag: tracking_records_in_ets, state: enabled
Flag: user_limits, state: enabled
Flag: virtual_host_metadata, state: enabled

Installing and configuring HAProxy

Note: The IP allocated as a VIP for RMQ is 10.0.2.22, and the node IP(s) are static with values 10.0.2.19, 10.0.2.20 and 10.0.2.21. We can use DNS names for backend configurations, but we are not doing so now.

Install haproxy in all nodes (Replace x with the node IP address [octet] )

apt install haproxy

Add the following at the end of /etc/haproxy/haproxy.cfg

# AMQP load balancer
frontend rabbitmq_amqp
    bind 10.0.2.x:35672
    default_backend rabbitmq_amqp_nodes

backend rabbitmq_amqp_nodes
    balance round-robin
    option tcp-check
    server rmq1 10.0.2.19:5672 check
    server rmq2 10.0.2.20:5672 check
    server rmq3 10.0.2.21:5672 check

# RabbitMQ Management UI load balancer
frontend rabbitmq_ui
    mode HTTP
    bind 10.0.2.21:55672
    # [ base64 of username:password configured to be used in place of xxxxxxxx ]
    http-request set-header Authorization Basic\ xxxxxxxxxxxx   
    default_backend rabbitmq_ui_nodes

backend rabbitmq_ui_nodes
    mode HTTP
    balance round-robin
    option httpchk
    # [ base64 of username:password configured to be used in place of xxxxxxxx ]
    http-check send meth GET uri /api/overview ver HTTP/1.1 hdr Host 10.0.2.21:55672 hdr Authorization Basic\ xxxxxxxxxx
    http-check expect status 200
    server rmq1 10.0.2.19:15672 check
    server rmq2 10.0.2.20:15672 check
    server rmq3 10.0.2.21:15672 check

Restart haproxy services. Note 10.0.2.22 VIP will not be bound to any node until keepalived gets configured in the next step (below).

systemctl restart haproxy

Installing and configuring keepalive.

Install keepalived on all nodes.

apt install keepalived

IP allocated for use as a VIP for RMQ is 10.0.2.22.

unicast_peer: Reliable, avoids multicast pitfalls

nopreempt on backup: Avoids race during failback

preempt_delay: Delay takeover to ensure clean failover

garp_master_delay: Quick VIP takeover by network

chk_haproxy : HAProxy process pid check 

chk_peer_vip : If a node sees HAProxy is running, but it also sees the VIP already active it will refuse to claim VIP

Create /usr/local/bin/check_peer_vip.sh in all nodes with the following contents.

#!/bin/bash

# Ping the VIP via the correct interface
ping -c1 -W1 -I eth1 10.0.2.22 | grep '1 received' >/dev/null

Add execute permissions

chmod +x /usr/local/bin/check_peer_vip.sh

Create /etc/keepalived/keepalived.conf file in each of the VMs. A sample from node1 is provided here. The unicast_peer IPs will change, and also the priority will be different for each of the VM,

vrrp_instance VI_RMQ {
  state MASTER
  interface eth1
  virtual_router_id 51
  priority 101
  advert_int 1
  preempt_delay 5
  garp_master_delay 1
  authentication {
    auth_type PASS
    auth_pass rabbitHA
  }
  virtual_ipaddress {
    10.0.2.22
  }
  unicast_peer {
    10.0.2.20
    10.0.2.21
  }
  track_script {
    chk_haproxy
    chk_peer_vip
  }
}

vrrp_script chk_haproxy {
  script "pidof haproxy"
  interval 2
  weight -20
}

vrrp_script chk_peer_vip {
  script "/usr/local/bin/check_peer_vip.sh"
  interval 2
  weight -10
}

Restart keepalived and haproxy on all nodes.

systemctl restart keepalived
systemctl restart haproxy

Configuring logrotate

Create /etc/logrotate.d/rabbitmq with the following contents

/var/log/rabbitmq/*.log {
    daily
    rotate 14
    compress
    delaycompress
    missingok
    notifempty
    copytruncate
}

Edit /etc/logrotate.d/haproxy and update the retention period from 7 to 14

/var/log/haproxy.log {
    daily
    rotate 14
    missingok
    notifempty
    compress
    delaycompress
    postrotate
        [ ! -x /usr/lib/rsyslog/rsyslog-rotate ] || /usr/lib/rsyslog/rsyslog-rotate
    endscript
}