lundi 7 décembre 2015

Installing an High-Availability SSL Web Load Balancer with HAProxy and Keepalived

Have you ever wanted to setup a Load Balancing infrastructure for your Web Servers which allows both High-Availability (HA) and Redundancy? this question can be easily answered with HAProxy and Keepalived!

First of all, HAProxy is a solution which offers reliable functions such as Load Balancing, High-Availability and Reverse-Proxy abilities along with SSL support.

Although HAProxy is used to offer High Availability, if it goes down our setup simply goes down the drain! 
This is where the Keepalived routing solution come handy with the VRRP protocol and the Floating IP (FIP) support. Indeed, by deploying two (or more) HAProxy servers we are able to use a unique Floating IP among those servers thanks to Keepalived. It means that if one of our servers goes down, one of the other members will claim the IP following the given priority, the one with the highest priority will be the Master and will claim the IP.


In this setup, we will be installing a flexible High-Availability infrastructure in a semi-physical DMZ with:
  • 2 HAProxy Servers
  • 2 Microsoft Web Servers (IIS)
  • 1 DMZ Firewall (iptables)
  • 1 Microsoft Web Server (IIS, Resources)
  • 1 Mercurial Repository


Here are the different elements present in the above schema and their purpose:

  • Internet Firewall: Client requests from the Internet will come through this device, the Public IP of my website will be translated to a Private IP (the Floating IP) via a NAT rule.
  • Reverse Proxy 01 and 02: Those are the HAProxy servers which will play a role of Load Balancing and High-Availability. The server labeled as Reverse Proxy 01 will be the default Master while the Reverse Proxy 02 will behave as a Slave. Note that the Floating IP is shared between both the servers, but is only active on the Master.
  • Web 01 and 02: Client requests are sent to those servers depending on the defined Load Balancing rule on the Reverse Proxy servers.
  • DMZ Firewall: This is the firewall server, which allows incoming and outgoing traffic between the frontend DMZ and the backend DMZ.
  • Web Resources: Hidden in the backend DMZ, this server receives and answers some Web Service requests from our frontend DMZ Web server when needed.
  • Mercurial Repository: In this setup, the Mercurial Repository server is used to store the configuration files for our Reverse Proxy

Leaving the Windows Web Servers aside, we will be using Debian GNU/Linux 8 (jessie) for our Reverse Proxy, DMZ Firewall and Mercurial Repository.

I will be using the following stable package:

  • haproxy: 1.5.8-3+deb8u2
  • keepalived: 1:1.2.13-1
  • mercurial: 3.1.2-2+deb8u1

Let's begin by setting up some redundancy with Keepalived on our Reverse Proxy servers.



Setting up Keepalived on the Reverse Proxy servers:


Reverse Proxy 01 (Master Node)


First things first, we will define our Reverse Proxy 01 server as our default Master Server. Obviously we need to install keepalived first!

apt-get install keepalived

Now that the package is installed, we have to allow the binding of a non-local IP (the IP doesn't belong to a device on the system) on our system by adding the net.ipv4.ip_nonlocal_bind value to our /etc/sysctl.conf file:

# Allows the binding of a non-local IP on this sytem
net.ipv4.ip_nonlocal_bind = 1

After editing the sysctl.conf file, we need to refresh the system with our brand new configuration file:

sysctl -p /etc/sysctl.conf

If everything's fine, the command should display "net.ipv4.ip_nonlocal_bind = 1" in the output.

Next, let's edit /etc/keepalived/keepalived.conf in order to define our configuration policy.

# Global parameters
global_defs {
    # Notification Emails Recipients
    notification_email {
        monitoring@katalykt.lan
        brouleau@katalykt.lan
    }
    
    # Notification Emails Sender
    notification_email_from ka_haproxy_01@noreply.com
    smtp_server 10.10.10.100       # The remote SMTP Server
    smtp_connect_timeout 30        # The remote SMTP Server connection timeout
    lvs_id ka_haproxy_01           # Keepalived process identifier, unique
}

# VRRP "Check script" check_haproxy: Used to monitor HAproxy state
vrrp_script check_haproxy {
    script "killall -0 haproxy"    # Returns 0 if at least one haproxy process is running
    interval 2                     # Check every 2 seconds
    weight 2                       # Add a weight of 2 to the priority if it runs!
}

# VRRP Instance VI_01: Both our keepalived hosts need to have the same name!
vrrp_instance VI_01 {
    state MASTER                   # Initialize with the desired state
    interface eth0                 # Applies to this interface
    virtual_router_id 19           # Same VRRP instances need to have the same ID and need to be unique (0..255)
    priority 101                   # The priority determines the failover priority, the highest is the master (0..255)
    advert_int 1                   # VRRP Advertisement frequency in seconds
    smtp_alert                     # An SMTP alert is sent upon a failover
    # We use some basic Authentication, same VRRP Instances need to have the same value
    authentication {
        auth_type PASS
        auth_pass SecuredPasswd*
    }
    # unicast_srv_ip 192.168.19.99 # Unicast source ip
    # Unicast remote peer(s), this is our second host in this case
    # unicast_peer {
    #    192.168.19.98
    # }
    # The Floating IP(s) which are shared between the hosts, we will label it eth0:100
    virtual_ipaddress {
        192.168.19.100 dev eth0 label eth0:100
    }
    # This VRRP Instance will use the defined check script "check_haproxy"
    track_script {
        check_haproxy
    }
}

Let's break down our configuration file:

  • Failover notifications will be sent to our SMTP relay server (which is the Web Resources server in the backend DMZ)
  • The keepalived process on the Master node will be called ka_haproxy_01, it needs to be unique
  • The check script will execute the "killall -0 haproxy" command every 2 seconds on this system, a value of 0 will be returned if at least one haproxy daemon is running and a weight of 2 will be added to the VRRP Instance if haproxy is running
  • This VRRP Instance has the same name on the Master and on the Slave(s). The Reverse Proxy 01 node will be defined as the Master in this case
  • Same VRRP Instances need to have the same Virtual Router ID which is used to identify the Virtual Router itself
  • The priority, in our case, is used to determine who in this VRRP Instance will be the master. The highest priority is the master so we give a priority of 101 to this node while the Slave node will have a value of 100
  • In order to secure a bit our setup, we will use a simple authentication system which, once more, need to be identical on each member of the VRRP Instance
  • Next, the Virtual IP (Floating IP) is defined there, we will label it eth0:100. Note that more than one Virtual IP can be added in there
  • Finally, we tell our VRRP Instance to use our check_haproxy track script in order to keep an eye out on the haproxy daemon
Note that Firewall rules for the SMTP relay can be found at the bottom of this article.

We can now restart (or start) keepalived on this system

service keepalived restart

Looking in /log/var/messages should confirm that keepalived is currently in a Master state:

Nov 24 14:58:14 haproxy01 Keepalived_vrrp[21780]: Registering Kernel netlink reflector
Nov 24 14:58:14 haproxy01 Keepalived_vrrp[21780]: Registering Kernel netlink command channel
Nov 24 14:58:14 haproxy01 Keepalived_vrrp[21780]: Registering gratuitous ARP shared channel
Nov 24 14:58:14 haproxy01 Keepalived_vrrp[21780]: Opening file '/etc/keepalived/keepalived.conf'.
Nov 24 14:58:14 haproxy01 Keepalived_vrrp[21780]: Truncating auth_pass to 8 characters
Nov 24 14:58:14 haproxy01 Keepalived_vrrp[21780]: Configuration is using : 39051 Bytes
Nov 24 14:58:14 haproxy01 Keepalived_vrrp[21780]: Using LinkWatch kernel netlink reflector...
Nov 24 14:58:14 haproxy01 Keepalived_healthcheckers[21779]: Registering Kernel netlink reflector
Nov 24 14:58:14 haproxy01 Keepalived_healthcheckers[21779]: Registering Kernel netlink command channel
Nov 24 14:58:14 haproxy01 Keepalived_healthcheckers[21779]: Opening file '/etc/keepalived/keepalived.conf'.
Nov 24 14:58:14 haproxy01 Keepalived_healthcheckers[21779]: Configuration is using : 7003 Bytes
Nov 24 14:58:14 haproxy01 Keepalived_healthcheckers[21779]: Using LinkWatch kernel netlink reflector...
Nov 24 14:58:14 haproxy01 Keepalived_vrrp[21780]: VRRP_Script(check_haproxy) succeeded
Nov 24 14:58:15 haproxy01 Keepalived_vrrp[21780]: VRRP_Instance(VI_01) Transition to MASTER STATE
Nov 24 14:58:16 haproxy01 Keepalived_vrrp[21780]: VRRP_Instance(VI_01) Entering MASTER STATE
Nov 24 14:58:16 haproxy01 Keepalived_vrrp[21780]: Remote SMTP server [10.10.10.100]:25 connected.
Nov 24 14:58:16 haproxy01 Keepalived_vrrp[21780]: SMTP alert successfully sent.

Executing an ip addr command should also confirm that the Floating IP is present on our system

ip addr
2: eth0:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:50:56:3f:00:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.19.99/24 brd 192.168.19.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.19.100/32 scope global eth0:100
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:fe8e:1a6c/64 scope link
       valid_lft forever preferred_lft forever

Alright! our Floating IP 192.168.19.100 is present on our Master node and labeled as eth0:100. Running an ifconfig command should display our Floating IP labeled as eth0:100 too:

ifconfig
eth0:100  Link encap:Ethernet  HWaddr 00:50:56:3f:00:01
          inet addr:192.168.19.100  Bcast:0.0.0.0  Mask:255.255.255.255
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

This is it for this node! Next is the Slave node.


Reverse Proxy 02 (Slave Node)


Just like the Master node, we install keepalived here:

apt-get install keepalived

We also need to edit the /etc/sysctl.conf file in order to allow the binding of the Floating IP on this system:

# Allows the binding of a non-local IP on this sytem
net.ipv4.ip_nonlocal_bind = 1

After editing the sysctl.conf file, we need to refresh the system with our modified configuration file:

sysctl -p /etc/sysctl.conf

If everything's fine, the command should display "net.ipv4.ip_nonlocal_bind = 1" in the output.

Once more, let's edit /etc/keepalived/keepalived.conf in order to define our configuration policy.

# Global parameters
global_defs {
    # Notification Emails Recipients
    notification_email {
        monitoring@katalykt.lan
        brouleau@katalykt.lan
    }
    
    # Notification Emails Sender
    notification_email_from ka_haproxy_02@noreply.com
    smtp_server 10.10.10.100       # The remote SMTP Server
    smtp_connect_timeout 30        # The remote SMTP Server connection timeout
    lvs_id ka_haproxy_02           # Keepalived process identifier, unique
}

# VRRP "Check script" check_haproxy: Used to monitor HAproxy state
vrrp_script check_haproxy {
    script "killall -0 haproxy"    # Returns 0 if at least one haproxy process is running
    interval 2                     # Check every 2 seconds
    weight 2                       # Add a weight of 2 to the priority if it runs!
}

# VRRP Instance VI_01: Both our keepalived hosts need to have the same name!
vrrp_instance VI_01 {
    state SLAVE                    # Initialize with the desired state
    interface eth0                 # Applies to this interface
    virtual_router_id 19           # Same VRRP instances need to have the same ID and need to be unique (0..255)
    priority 100                   # The priority determines the failover priority, the highest is the master (0..255)
    advert_int 1                   # VRRP Advertisement frequency in seconds
    smtp_alert                     # An SMTP alert is sent upon a failover
    # We use some basic Authentication, same VRRP Instances need to have the same value
    authentication {
        auth_type PASS
        auth_pass SecuredPasswd*
    }
    # unicast_srv_ip 192.168.19.98 # Unicast source ip
    # Unicast remote peer(s), this is our second host in this case
    # unicast_peer {
    #    192.168.19.99
    # }
    # The Floating IP(s) which are shared between the hosts, we will label it eth0:100
    virtual_ipaddress {
        192.168.19.100 dev eth0 label eth0:100
    }
    # This VRRP Instance will use the defined check script "check_haproxy"
    track_script {
        check_haproxy
    }
}

As you may have noticed, the above configuration file differs a bit from the previous one:

  • The keepalived process identifier is ka_haproxy_02 on the slave
  • This system is initialized in a Slave State
  • As opposed to the Master, the Slave priority is 1 point lower and starts at 100 thus forcing it to be in a Slave State by default

We can now restart (or start) keepalived on this system

service keepalived restart

Let's have a look in our /var/log/messages to verify that our system is currently in a Slave state:

Nov 24 14:59:13 haproxy02 Keepalived_vrrp[12441]: Registering Kernel netlink reflector
Nov 24 14:59:13 haproxy02 Keepalived_vrrp[12441]: Registering Kernel netlink command channel
Nov 24 14:59:13 haproxy02 Keepalived_vrrp[12441]: Registering gratuitous ARP shared channel
Nov 24 14:59:13 haproxy02 Keepalived_vrrp[12441]: Opening file '/etc/keepalived/keepalived.conf'.
Nov 24 14:59:13 haproxy02 Keepalived_vrrp[12441]: Truncating auth_pass to 8 characters
Nov 24 14:59:13 haproxy02 Keepalived_vrrp[12441]: Configuration is using : 38945 Bytes
Nov 24 14:59:13 haproxy02 Keepalived_vrrp[12441]: Using LinkWatch kernel netlink reflector...
Nov 24 14:59:13 haproxy02 Keepalived_vrrp[12441]: VRRP_Instance(VI_01) Entering BACKUP STATE
Nov 24 14:59:13 haproxy02 Keepalived_healthcheckers[12440]: Registering Kernel netlink reflector
Nov 24 14:59:13 haproxy02 Keepalived_healthcheckers[12440]: Registering Kernel netlink command channel
Nov 24 14:59:13 haproxy02 Keepalived_healthcheckers[12440]: Opening file '/etc/keepalived/keepalived.conf'.
Nov 24 14:59:13 haproxy02 Keepalived_healthcheckers[12440]: Configuration is using : 6897 Bytes
Nov 24 14:59:13 haproxy02 Keepalived_healthcheckers[12440]: Using LinkWatch kernel netlink reflector...
Nov 24 14:59:13 haproxy02 Keepalived_vrrp[12441]: Remote SMTP server [10.10.10.100]:25 connected.
Nov 24 14:59:13 haproxy02 Keepalived_vrrp[12441]: VRRP_Script(check_haproxy) succeeded
Nov 24 14:59:13 haproxy02 Keepalived_vrrp[12441]: SMTP alert successfully sent.

If our system is behaving as a slave, it should not have the Floating IP binded anywhere.

ip addr
2: eth0:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:50:56:3f:00:02 brd ff:ff:ff:ff:ff:ff
    inet 192.168.19.98/24 brd 192.168.19.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:fe8e:4bae/64 scope link
       valid_lft forever preferred_lft forever

This output confirms that our node is behaving as a Slave, the Floating IP is nowhere to be found here!

If the Slave node starts a transition toward the Master state at the startup, then you have to make sure that the keepalived nodes are able to communicate with each others on the IP protocol 112 and that Multicast is allowed in the area.

Note that Unicast may also be used instead of Mutlicast in some cases, to do so, simply comment out the Unicast properties from the configuration file. Only two parameters need to be defined to enable it:

  • unicast_srv_ip: The IP Address of the local node (i.e 192.168.19.99 on the master)
  • unicast_peer: The IP Address(es) of the remote node(s) (i.e the master has 192.168.19.98 in it's peer list)


You can also listen to the VRRP protocol heartbeat with tshark or tcpdump, VRRP advertisement packets should be visible during a capture. Here's a Network Capture on my Slave node using tcpdump with the "ip proto 112" filter:

srv-operator@haproxy02:~$ sudo tcpdump -f "ip proto 112"
[sudo] password for srv-operator:
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
15:43:54 IP 192.168.19.99 > 224.0.0.18: VRRPv2, Advertisement, vrid 19, prio 101, authtype simple, intvl 1s, length 20
15:43:55 IP 192.168.19.99 > 224.0.0.18: VRRPv2, Advertisement, vrid 19, prio 101, authtype simple, intvl 1s, length 20
15:43:56 IP 192.168.19.99 > 224.0.0.18: VRRPv2, Advertisement, vrid 19, prio 101, authtype simple, intvl 1s, length 20
15:43:57 IP 192.168.19.99 > 224.0.0.18: VRRPv2, Advertisement, vrid 19, prio 101, authtype simple, intvl 1s, length 20
15:43:58 IP 192.168.19.99 > 224.0.0.18: VRRPv2, Advertisement, vrid 19, prio 101, authtype simple, intvl 1s, length 20
15:43:59 IP 192.168.19.99 > 224.0.0.18: VRRPv2, Advertisement, vrid 19, prio 101, authtype simple, intvl 1s, length 20

In this quick capture, several visible elements are linked to the keepalived configuration file:

  • The vrid parameter refers to the Virtual Router ID, in our case it's 19.
  • The priority is currently 101 which is the Master's defined priority (note that HAProxy is not installed yet!)
  • A simple authentication is used just as defined in our file
  • The VRRP advertisement interval is configured on one second

You can also notice that the Master is the only one to send VRRP Advertisement packets via Multicast.



Testing Keepalived basic failover


Well, we may want to test our setup before going any further, so let's see what happens when we stop the keepalived daemon from the Master

service keepalived stop

First of all, let's have a look at the /var/log/messages on our Master node:

Nov 24 16:24:19 haproxy01 Keepalived_vrrp[29255]: VRRP_Instance(VI_01) sending 0 priority

Stopping keepalived should have removed the Floating IP from the Master Node:

ip addr
2: eth0:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:50:56:3f:00:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.19.99/24 brd 192.168.19.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:fe8e:1a6c/64 scope link
       valid_lft forever preferred_lft forever

The Floating IP is now assigned to our Slave node!

ip addr
2: eth0:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:50:56:3f:00:02 brd ff:ff:ff:ff:ff:ff
    inet 192.168.19.98/24 brd 192.168.19.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.19.100/32 scope global eth0:100
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:fe8e:4bae/64 scope link
       valid_lft forever preferred_lft forever

Looking at the /var/log/messages on the Slave node should reveal a transition toward the Master state

Nov 24 16:24:23 haproxy02 Keepalived_vrrp[12441]: VRRP_Instance(VI_01) Transition to MASTER STATE
Nov 24 16:24:24 haproxy02 Keepalived_vrrp[12441]: VRRP_Instance(VI_01) Entering MASTER STATE
Nov 24 16:24:24 haproxy02 Keepalived_vrrp[12441]: Remote SMTP server [10.10.10.100]:25 connected.
Nov 24 16:24:24 haproxy02 Keepalived_vrrp[12441]: SMTP alert successfully sent.

A live capture during the failover reveals that the Slave has taken over the Master role has seen below:

16:24:20 IP 192.168.19.99 > 224.0.0.18: VRRPv2, Advertisement, vrid 19, prio 101, authtype simple, intvl 1s, length 20
16:24:21 IP 192.168.19.99 > 224.0.0.18: VRRPv2, Advertisement, vrid 19, prio 101, authtype simple, intvl 1s, length 20
16:24:22 IP 192.168.19.99 > 224.0.0.18: VRRPv2, Advertisement, vrid 19, prio 101, authtype simple, intvl 1s, length 20
16:24:22 IP 192.168.19.99 > 224.0.0.18: VRRPv2, Advertisement, vrid 19, prio 0, authtype simple, intvl 1s, length 20
16:24:23 IP 192.168.19.98 > 224.0.0.18: VRRPv2, Advertisement, vrid 19, prio 100, authtype simple, intvl 1s, length 20
16:24:24 IP 192.168.19.98 > 224.0.0.18: VRRPv2, Advertisement, vrid 19, prio 100, authtype simple, intvl 1s, length 20
16:24:25 IP 192.168.19.98 > 224.0.0.18: VRRPv2, Advertisement, vrid 19, prio 100, authtype simple, intvl 1s, length 20

When we did kill the keepalived service on the Master node, a priority of 0 was last advertised which means that our Slave node now hold the highest priority (100).

Now that we know that our basic failover works, let's proceed to the installation of HAProxy.



Setting up HAProxy on the Reverse Proxy servers:


Since our HAProxy configuration will be identical on both of our nodes the following operations will be performed on our two Reverse Proxy nodes.

Alright! let's begin by installing haproxy:

apt-get install haproxy

Next, we're going to create a configuration file which will match the following needs:

  • Redirect HTTP requests to HTTPS as our site need to be HTTPS only
  • The Load Balancer should operate with a Round Robin setup
  • The Load Balancer should keep a client on a server since our Website relies on clients sessions

The SSL Certificate


I'll be using an SSL Certificate located in /etc/ssl/private/server.pem, this certificate file needs to be built as follows:

-----BEGIN MY CERTIFICATE-----
-----END MY CERTIFICATE-----
-----BEGIN INTERMEDIATE CERTIFICATE-----
-----END INTERMEDIATE CERTIFICATE-----
-----BEGIN INTERMEDIATE CERTIFICATE-----
-----END INTERMEDIATE CERTIFICATE-----
-----BEGIN ROOT CERTIFICATE-----
-----END ROOT CERTIFICATE-----
-----BEGIN RSA PRIVATE KEY-----
-----END RSA PRIVATE KEY-----

You may create the bundle with the cat command:

cat server.crt intermlow.crt intermhigh.crt root.crt server.key > /etc/ssl/private/server.pem

The Configuration File


Now that our SSL Certificate is present on the node we can proceed with the main configuration file, so let's edit /etc/haproxy/haproxy.cfg:

global
        log /dev/log    local0
        log /dev/log    local1 notice
        chroot /var/lib/haproxy
        stats socket /run/haproxy/admin.sock mode 660 level admin
        stats timeout 30s
        user haproxy
        group haproxy
        daemon

        # Default SSL material locations
        ca-base /etc/ssl/certs
        crt-base /etc/ssl/private

        # Default ciphers to use on SSL-enabled listening sockets.
        # For more information, see ciphers(1SSL). This list is from:
        #  https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
        ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS
        ssl-default-bind-options no-sslv3

        # Custom
        maxconn 2048
        tune.ssl.default-dh-param 2048

defaults
        log     global
        mode    http
        option  httplog
        option  dontlognull
        timeout connect 5000
        timeout client  50000
        timeout server  50000
        errorfile 400 /etc/haproxy/errors/400.http
        errorfile 403 /etc/haproxy/errors/403.http
        errorfile 408 /etc/haproxy/errors/408.http
        errorfile 500 /etc/haproxy/errors/500.http
        errorfile 502 /etc/haproxy/errors/502.http
        errorfile 503 /etc/haproxy/errors/503.http
        errorfile 504 /etc/haproxy/errors/504.http

        # Custom
        option forwardfor
        option http-server-close

        option redispatch

# HAProxy Stats
listen stats :1980
        mode http
        stats enable
        stats hide-version
        stats realm Haproxy\ Statistics
        stats uri /stats
        stats auth user:password

# HTTP Listener
frontend www-http
        bind 192.168.19.100:80
        http-request set-header X-Forwarded-Proto http if !{ ssl_fc }
        default_backend www-backend

# HTTPS Listener
frontend www-https
        bind 192.168.19.100:443 ssl crt /etc/ssl/private/server.pem
        http-request set-header X-Forwarded-Proto https if { ssl_fc }
        default_backend www-backend

# Web Backend (HTTPS Redirection)
backend www-backend
        balance roundrobin
        cookie SERVERID insert indirect nocache
        redirect scheme https if !{ ssl_fc }
        server www-1 192.168.19.101:80 check cookie s1
        server www-2 192.168.19.102:80 check cookie s2

Our configuration file contains a few settings worthy of being detailled:

  • The "option forwardfor" parameter is used to add the X-Forwarded-For to each request. This can be used to log the client real IP on your backend Web Servers
  • We also use the "option http-server-close" which will close the "server-facing" connection after a successful response, but it will keep the "client-facing" connection alive
  • Note that the "option redispatch" is used to prevent clients from sticking on a failed backend server. They will be redirected toward a working backend node.
  • The HAProxy stats will be available via HTTP on port 1980 (i.e. the Master node stats may be reached at http://192.168.19.99/stats). Don't forget to change the user:password value!
  • HTTP requests are defined in the www-http frontend and are sent to the www-backend where they are redirected over HTTPS
  • HTTPS requests are defined in the www-https frontend with a certificate attached to its port. The requests are sent to to the www-backend
  • Our www-backend section is used to define our web farm, we use a Round Robin load balancing for the requests and we insert a cookie named "SERVERID" which will be used to associate a client with a Web Server for the time being. This allows us to keep an active Client Session linked to the desired Web Server (i.e. s1 or s2).
  • The "redirect scheme https if !{ ssl_fc }" line will ensure that we're dealing with HTTPS only with our clients
  • Note that the backend connection between our HAProxy servers and our Web Farm is done over HTTP while the response to the client will be done over HTTPS.

Now that our configuration is done, let's add some logging options by editing /etc/rsyslog.conf

$ModLoad imudp
$UDPServerRun 514
$UDPServerAddress 127.0.0.1
We restart the rsyslog service along with the haproxy one

service rsyslog restart
service haproxy restart

And if everything's fine, we should have /var/log/haproxy.log created upon HAProxy startup containing logs such as:

Nov 25 16:47:25 haproxy01 haproxy[726]: 181.50.132.131:15839 [25/Nov/2015:16:47:25.798] www-https~ www-backend/www-1 56/0/2/90/148 200 339 - - --VN 1/1/0/1/0 0/0 "GET / HTTP/1.1"
Nov 25 16:48:02 haproxy01 haproxy[726]: 181.50.132.131:18144 [25/Nov/2015:16:48:02.446] www-https~ www-backend/www-1 64/0/1/3/68 200 339 - - --NI 2/2/0/1/0 0/0 "GET / HTTP/1.1"
Nov 25 16:48:40 haproxy01 haproxy[726]: 71.149.49.45:32903 [25/Nov/2015:16:48:40.699] www-https~ www-backend/www-2 126/0/1/3/130 200 171 - - --NI 2/2/0/1/0 0/0 "GET / HTTP/1.1"
Nov 25 17:01:37 haproxy01 haproxy[726]: 181.50.132.131:10517 [25/Nov/2015:17:01:37.660] www-http www-backend/<NOSRV> 0/-1/-1/-1/0 302 118 - - LRNN 0/0/0/0/3 0/0 "GET / HTTP/1.1"
Nov 25 17:01:38 haproxy01 haproxy[726]: 181.50.132.131:10556 [25/Nov/2015:17:01:38.576] www-https~ www-backend/www-1 63/0/0/3/66 200 339 - - --VN 1/1/0/1/0 0/0 "GET / HTTP/1.1"

After browsing from two Internet sources, we can clearly see that one client is using our www-1 backend Web Server  while the other is using the www-2 one. It's also possible to note that HTTP is redirected over HTTPS from the last two lines.

Our HAProxy nodes are now alive and kicking, but can they survive to a failure? Let's test it out!



Failover Scenarios


We've previously tested a basic failover but now that we're all set, let's try to run some advanced failure scenarios.


Reverse Proxy Failure


Our first test will simulate a failure in the haproxy daemon on our Master node which should force Keepalived to engage a failover toward our Slave node. As you can see below, we should end up in a situation where our active Master Reverse Proxy suffers from a failure.



After this scenario, the Slave node should become the Master and still deliver the Clients Requests toward both of our Web Servers. An active client which had started a session on one of our backend Web Server should still be able to join the same server thanks to our cookies (yummy!)

First, we stop HAProxy on the Master:

service haproxy stop

So far so good, we receive some notifications by mail about our Slave node entering in a Master state and vice versa.

[haproxy01] VRRP Instance VI_01 - Entering BACKUP state
=> VRRP Instance is nolonger owning VRRP VIPs <=
[haproxy02] VRRP Instance VI_01 - Entering MASTER state
=> VRRP Instance is now owning VRRP VIPs <=

Looking at /var/log/messages on our Master node reveals that a transition from the Master to the Backup state occurred:

Nov 26 14:46:31 haproxy01 Keepalived_vrrp[29326]: VRRP_Script(check_haproxy) failed
Nov 26 14:46:32 haproxy01 Keepalived_vrrp[29326]: VRRP_Instance(VI_01) Received higher prio advert
Nov 26 14:46:32 haproxy01 Keepalived_vrrp[29326]: VRRP_Instance(VI_01) Entering BACKUP STATE
Nov 26 14:46:32 haproxy01 Keepalived_vrrp[29326]: Remote SMTP server [10.10.10.100]:25 connected.
Nov 26 14:46:32 haproxy01 Keepalived_vrrp[29326]: SMTP alert successfully sent.

Looking at the same file on our Slave node shows that it took over the Master state (and so the Floating IP is now assigned to the Slave node):

Nov 26 14:46:39 haproxy02 Keepalived_vrrp[12441]: VRRP_Instance(VI_01) forcing a new MASTER election
Nov 26 14:46:39 haproxy02 Keepalived_vrrp[12441]: VRRP_Instance(VI_01) forcing a new MASTER election
Nov 26 14:46:40 haproxy02 Keepalived_vrrp[12441]: VRRP_Instance(VI_01) Transition to MASTER STATE
Nov 26 14:46:41 haproxy02 Keepalived_vrrp[12441]: VRRP_Instance(VI_01) Entering MASTER STATE
Nov 26 14:46:41 haproxy02 Keepalived_vrrp[12441]: Remote SMTP server [10.10.10.100]:25 connected.
Nov 26 14:46:41 haproxy02 Keepalived_vrrp[12441]: SMTP alert successfully sent.

To spice things up, I did initiate two client session before killing off the service. Let's have a look in /var/log/haproxy.log now

Nov 26 14:45:20 haproxy01 haproxy[726]: 181.50.132.131:16546 [26/Nov/2015:14:45:20.821] www-https~ www-backend/www-1 66/0/2/97/165 200 339 - - --VN 1/1/0/1/0 0/0 "GET / HTTP/1.1"
Nov 26 14:45:33 haproxy01 haproxy[726]: 181.50.132.131:16546 [26/Nov/2015:14:45:20.987] www-https~ www-backend/www-1 12394/0/1/2/12397 200 339 - - --VN 1/1/0/1/0 0/0 "GET / HTTP/1.1"
Nov 26 14:45:42 haproxy01 haproxy[726]: 71.149.49.45:19748 [26/Nov/2015:14:45:42.802] www-https~ www-backend/www-2 115/0/1/77/193 200 171 - - --VN 2/2/0/1/0 0/0 "GET / HTTP/1.1"
Nov 26 14:45:47 haproxy01 haproxy[726]: 71.149.49.45:11003 [26/Nov/2015:14:45:46.968] www-https~ www-backend/www-2 118/0/1/3/122 200 171 - - --VN 1/1/0/1/0 0/0 "GET / HTTP/1.1"
Nov 26 14:46:29 haproxy01 haproxy-systemd-wrapper[720]: haproxy-systemd-wrapper: SIGINT -> 726
Nov 26 14:46:29 haproxy01 haproxy-systemd-wrapper[720]: haproxy-systemd-wrapper: exit, haproxy RC=0

We can notice that:

  • Client 181.50.132.131 has been sent on the www-1 backend Web Server
  • Client 71.149.49.45 has been sent on the www-2 backend Web Server

Now that the Slave Node became the Master, I've refreshed my existing sessions by browsing the Website. If we look in /var/log/haproxy.log on our new Master (Reverse Proxy 02) we should notice that our previous clients are still guided toward their previous backend servers:

Nov 26 14:47:28 haproxy02 haproxy[7154]: 71.149.49.45:37188 [26/Nov/2015:14:47:27.888] www-https~ www-backend/www-2 112/0/2/3/117 200 171 - - --VN 1/1/0/1/0 0/0 "GET / HTTP/1.1"
Nov 26 14:47:30 haproxy02 haproxy[7154]: 71.149.49.45:33003 [26/Nov/2015:14:47:30.186] www-https~ www-backend/www-2 115/0/1/2/118 200 171 - - --VN 1/1/0/1/0 0/0 "GET / HTTP/1.1"
Nov 26 14:47:36 haproxy02 haproxy[7154]: 181.50.132.131:19716 [26/Nov/2015:14:47:36.600] www-https~ www-backend/www-1 56/0/1/4/61 200 339 - - --VN 1/1/0/1/0 0/0 "GET / HTTP/1.1"
Nov 26 14:47:41 haproxy02 haproxy[7154]: 181.50.132.131:19716 [26/Nov/2015:14:47:36.661] www-https~ www-backend/www-1 4715/0/1/3/4719 200 339 - - --VN 1/1/0/1/0 0/0 "GET / HTTP/1.1"

Moreover, if we look at a live VRRP capture during the failover, we can see that the priority of our initial Master node dropped from 103 to 101 since HAProxy has been stopped (weight -2). As the Slave node priority is currently at 102 (100 by default + 2 with haproxy running), this node will take over the Master role.

14:46:35 IP 192.168.19.99 > 224.0.0.18: VRRPv2, Advertisement, vrid 19, prio 103, authtype simple, intvl 1s, length 20
14:46:36 IP 192.168.19.99 > 224.0.0.18: VRRPv2, Advertisement, vrid 19, prio 103, authtype simple, intvl 1s, length 20
14:46:37 IP 192.168.19.99 > 224.0.0.18: VRRPv2, Advertisement, vrid 19, prio 103, authtype simple, intvl 1s, length 20
14:46:38 IP 192.168.19.99 > 224.0.0.18: VRRPv2, Advertisement, vrid 19, prio 103, authtype simple, intvl 1s, length 20
14:46:39 IP 192.168.19.99 > 224.0.0.18: VRRPv2, Advertisement, vrid 19, prio 101, authtype simple, intvl 1s, length 20
14:46:39 IP 192.168.19.98 > 224.0.0.18: VRRPv2, Advertisement, vrid 19, prio 102, authtype simple, intvl 1s, length 20
14:46:39 IP 192.168.19.99 > 224.0.0.18: VRRPv2, Advertisement, vrid 19, prio 101, authtype simple, intvl 1s, length 20
14:46:39 IP 192.168.19.98 > 224.0.0.18: VRRPv2, Advertisement, vrid 19, prio 102, authtype simple, intvl 1s, length 20
14:46:40 IP 192.168.19.98 > 224.0.0.18: VRRPv2, Advertisement, vrid 19, prio 102, authtype simple, intvl 1s, length 20
14:46:41 IP 192.168.19.98 > 224.0.0.18: VRRPv2, Advertisement, vrid 19, prio 102, authtype simple, intvl 1s, length 20


Alright! let's try to restart HAProxy on our original Master node now:

service haproxy start

We receive two notifications about a Master-Slave transition:

[haproxy02] VRRP Instance VI_01 - Entering BACKUP state
=> VRRP Instance is nolonger owning VRRP VIPs <=
[haproxy01] VRRP Instance VI_01 - Entering MASTER state
=> VRRP Instance is now owning VRRP VIPs <=

If we look in /var/log/messages on our orignal Master node we can see that the HAProxy check script is now considered valid again and that our node entered the Master state again:

Nov 27 09:49:57 haproxy01 Keepalived_vrrp[24573]: VRRP_Script(check_haproxy) succeeded
Nov 27 09:49:57 haproxy01 Keepalived_vrrp[24573]: VRRP_Instance(VI_01) forcing a new MASTER election
Nov 27 09:49:57 haproxy01 Keepalived_vrrp[24573]: VRRP_Instance(VI_01) forcing a new MASTER election
Nov 27 09:49:58 haproxy01 Keepalived_vrrp[24573]: VRRP_Instance(VI_01) Transition to MASTER STATE
Nov 27 09:49:59 haproxy01 Keepalived_vrrp[24573]: VRRP_Instance(VI_01) Entering MASTER STATE
Nov 27 09:49:59 haproxy01 Keepalived_vrrp[24573]: Remote SMTP server [10.10.10.100]:25 connected.
Nov 27 09:49:59 haproxy01 Keepalived_vrrp[24573]: SMTP alert successfully sent.

A quick peek in the same file on our original Slave node will reveal that it gave up the Master state since a higher priority was received:

Nov 27 09:50:04 haproxy02 Keepalived_vrrp[16756]: VRRP_Instance(VI_01) Received higher prio advert
Nov 27 09:50:04 haproxy02 Keepalived_vrrp[16756]: VRRP_Instance(VI_01) Entering BACKUP STATE
Nov 27 09:50:04 haproxy02 Keepalived_vrrp[16756]: Remote SMTP server [10.10.10.100]:25 connected.
Nov 27 09:50:05 haproxy02 Keepalived_vrrp[16756]: SMTP alert successfully sent.

As you can see, we're back in our original state.


Backend node Failure


In our next scenario, let's imagine that someone spilled some coffee on our second Web Server node host (doh!).

With our setup, the clients with an active session on our second Web Server node should be redirected toward our first Web Server node while our active HAProxy node should mark it as "down"


Here goes nothing! I did a rough shutdown of my second Web Server node while two clients had an active session up.

Here's HAProxy statistics before the outage:


And here's the statistics after the outage:


No surprise here since our second Web Server node is down, so let's have a look at our active HAProxy node /var/log/haproxy.log

Nov 27 10:59:27 haproxy01 haproxy[573]: 181.50.132.131:1726 [27/Nov/2015:10:59:27.418] www-https~ www-backend/www-1 64/0/1/22/87 200 339 - - --VN 1/1/0/1/0 0/0 "GET / HTTP/1.1"
Nov 27 10:59:29 haproxy01 haproxy[573]: 181.50.132.131:1726 [27/Nov/2015:10:59:27.506] www-https~ www-backend/www-1 1524/0/1/2/1527 200 339 - - --VN 1/1/0/1/0 0/0 "GET / HTTP/1.1"
Nov 27 10:59:31 haproxy01 haproxy[573]: 204.27.59.194:48332 [27/Nov/2015:10:59:30.916] www-https~ www-backend/www-2 351/0/1/25/377 200 172 - - --VN 2/2/0/1/0 0/0 "GET / HTTP/1.1"
Nov 27 11:00:09 haproxy01 haproxy[573]: Server www-backend/www-2 is DOWN, reason: Layer4 timeout, check duration: 2001ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Nov 27 11:00:09 haproxy01 haproxy[573]: Server www-backend/www-2 is DOWN, reason: Layer4 timeout, check duration: 2001ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Nov 27 11:00:11 haproxy01 haproxy[573]: 204.27.59.194:48849 [27/Nov/2015:11:00:10.930] www-https~ www-backend/www-1 712/0/1/2/715 200 171 - - --DI 2/2/0/1/0 0/0 "GET / HTTP/1.1"
Nov 27 11:00:14 haproxy01 haproxy[573]: 181.50.132.131:1726 [27/Nov/2015:10:59:29.032] www-https~ www-backend/www-1 45229/0/1/3/45233 200 339 - - --VN 1/1/0/1/0 0/0 "GET / HTTP/1.1"
Nov 27 11:00:16 haproxy01 haproxy[573]: 181.50.132.131:1726 [27/Nov/2015:11:00:14.265] www-https~ www-backend/www-1 1965/0/1/2/1968 200 339 - - --VN 1/1/0/1/0 0/0 "GET / HTTP/1.1"

The log shows that our setup is working as our previous client 204.27.59.194 is now linked to the www-1 node after the outage despite being first linked to the www-2 node.

This is it! our setup is now tested and ready. Next we'll talk about maintaining our HAProxy configuration files up-to-date on the Reverse Proxy nodes.



Keeping HAProxy configuration files up to date with Mercurial


Well, now that our setup is working, we may want to be able to maintain it easily. Mercurial is a distributed revision control software which can help us in updating with a version control our HAProxy configuration files.

Our goal is to create a Central Repository in our backend DMZ which will be used to push changes via SSH to the Reverse Proxy nodes easily while being able to keep a track of our running configuration file.

Once more, the SSH related Firewall rules can be located at the bottom.


Configuring the Mercurial Central Repository


So, in this part we will use another Debian server called "Mercurial Repo" which will be our Central Mercurial Repository, this server is located in our backend DMZ.


Let's prepare our two Reverse Proxy nodes, we will be using an account called srv-operator on both nodes, which will have write permissions on /etc/haproxy.

First we install mercurial on both nodes:

sudo apt-get install mercurial

Next, we create a group called haops for our srv-operator user. This group will then have a write permissions on /etc/haproxy

sudo groupadd haops
sudo usermod -a -G haops srv-operator
sudo chgrp -R haops /etc/haproxy
sudo chmod -R g+w /etc/haproxy

This is it for our Reverse Proxy nodes for now.

Alright, after creating the srv-operator user on the Central Repository node we log on with him to create our HAProxy repository.

srv-operator@mercurial01:~$ sudo apt-get install mercurial

Since we will be using SSH to push our changes, we need to create an RSA Key on the Central Repository for srv-operator which will be used to perform password-less Authentication on our Reverse Proxy nodes.

We simply create a password-less RSA Key and we copy it on our Reverse Proxy nodes

srv-operator@mercurial01:~$ ssh-keygen -t rsa
srv-operator@mercurial01:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub srv-operator@192.168.19.99
srv-operator@mercurial01:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub srv-operator@192.168.19.98

Next, we create the repository folder for HAproxy, it can be placed about anywhere, so I will be placing it in /repo/haproxy

srv-operator@mercurial01:~$ sudo mkdir -p /repo/haproxy
srv-operator@mercurial01:~$ sudo chown srv-operator /repo/haproxy

Now that our folder is created and that srv-operator owns it, we can go ahead and initialize it (don't forget to first edit your mercurial config file with "hg config --edit" to fill in the username!)

srv-operator@mercurial01:~$ cd /repo/haproxy
srv-operator@mercurial01:/repo/haproxy$ hg init
srv-operator@mercurial01:/repo/haproxy$ hg add haproxy.cfg
srv-operator@mercurial01:/repo/haproxy$ hg commit -m "Added initial haproxy.cfg"

Alright, let's clone our Central Repository on our Reverse Proxy nodes (Since we copied our RSA key over there, we should not have a password prompt during the clone phase)

Note that we're turning the /etc/haproxy folder of our Reverse Proxy nodes into a Mercurial Repository

srv-operator@mercurial01:/repo/haproxy$ hg clone /repo/haproxy/ ssh://srv-operator@192.168.19.98//etc/haproxy
srv-operator@mercurial01:/repo/haproxy$ hg clone /repo/haproxy/ ssh://srv-operator@192.168.19.99//etc/haproxy

If everything goes smoothly, you should be able to see an output similar to those lines:

searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 1 changes to 1 files

Our Central Repository is now ready and cloned on our Reverse Proxy nodes! Next, we will create a hook script to make our setup smoother.


Creating the Hook Script


In a Mercurial Repository, a hook may be used to trigger a script or a command after or before an action occurring in the repository (commit, precommit,...). A Guide on hooks can be found here.

In our case, we will be using the changegroup hook on our Reverse Proxy nodes which, will trigger itself after the detection of a Changeset in the repository (i.e. a new configuration is pushed over here)

Hooks are defined in a file called hgrc which is located in the hidden folder .hg present in the repository. Since we didn't do anything with it yet, we have to create it in /etc/haproxy/.hg/hgrc on both nodes with the following content

[hooks]
changegroup = bash /home/srv-operator/haproxy_update.sh

Our changegroup hook will simply call a script located at /home/srv-operator/haproxy_update.sh which will update our HAProxy configuration when called.

#!/bin/sh
cd /etc/haproxy/

echo "Attempting to update haproxy configuration file"

# Make sure that the repository exists
if hg identify &>/dev/null
then
        echo "HAPROXY: Located the Mercurial repository"

        # Update haproxy.cfg to it's latest revision
        hg update -C
        echo "HAPROXY: Updated the haproxy.cfg to the latest revision"

        # Reload the haproxy service with the new haproxy.cfg
        sudo /etc/init.d/haproxy reload
else
        echo "HAPROXY: Unable to locate the Mercurial repository in /etc/haproxy/"
fi

Since it's a script, we need to be able to perform password-less execution via sudo for the reload command. Make sure that this line is present after the sudo group's line in your sudoers file (otherwise the group line will override the user line)

srv-operator ALL=NOPASSWD: /etc/init.d/haproxy reload


Now that our Changegroup hook script is set and ready, let's try to make a change to the haproxy.cfg present on our Central Repository before pushing it to our Reverse Proxy nodes.

Open up /repo/haproxy/haproxy.cfg with your favorite text editor and add a comment on top

Alright, let's commit our change and push it to one of our nodes

srv-operator@mercurial01:/repo/haproxy$ hg commit -m "Added a comment"
srv-operator@mercurial01:/repo/haproxy$ hg push ssh://srv-operator@192.168.19.98//etc/haproxy
pushing to ssh://srv-operator@192.168.19.99//etc/haproxy
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 1 changes to 1 files
remote: HAPROXY: Attempting to update haproxy configuration file
remote: HAPROXY: Located the Mercurial repository
remote: 1 files updated, 0 files merged, 0 files removed, 0 files unresolved
remote: HAPROXY: Updated the haproxy.cfg to the latest revision
remote: Reloading haproxy configuration (via systemctl): haproxy.service.

As you can see from the above output, the hook script got triggered after that a Changeset was added. Our configuration file was updated to the latest comitted version on this node.

This is it! Our setup is now complete!



DMZ Firewall Rules


Here are the rules that were needed for this setup on the DMZ Firewall:

# Allow SMTP (In): Reverse Proxy
iptables -A FORWARD -p tcp -s 192.168.19.98 -d 10.10.10.100 --dport 25 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A FORWARD -p tcp -s 192.168.19.99 -d 10.10.10.100 --dport 25 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A FORWARD -p tcp -s 10.10.10.100 -d 192.168.19.98 --sport 25 -m state --state ESTABLISHED -j ACCEPT
iptables -A FORWARD -p tcp -s 10.10.10.100 -d 192.168.19.99 --sport 25 -m state --state ESTABLISHED -j ACCEPT

# Allow SSH (Out): Reverse Proxy
iptables -A FORWARD -p tcp -s 10.10.10.101 -d 192.168.19.98 --dport 22 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A FORWARD -p tcp -s 10.10.10.101 -d 192.168.19.99 --dport 22 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A FORWARD -p tcp -s 192.168.19.98 -d 10.10.10.101 --sport 22 -m state --state ESTABLISHED -j ACCEPT
iptables -A FORWARD -p tcp -s 192.168.19.99 -d 10.10.10.101 --sport 22 -m state --state ESTABLISHED -j ACCEPT

Don't forget to declare the proper routes too:
  • Our Reverse Proxy nodes need to be able to reach the SMTP relay/server and the Central Mercurial Repository
  • The SMTP relay and the Central Mercurial Repository need to be able to reach the Reverse Proxy Nodes
  • The default route for our Reverse Proxy nodes is the Internet Firewall

Aucun commentaire:

Enregistrer un commentaire