Service reliability

Health checks

Health checks ensure that only healthy servers are kept in the load balancing rotation. They check the status of each server by using one of the health checking modes described in this section.

Active health checks Jump to heading

An active health check attempts to connect to a server or send it an HTTP request at a regular interval. If the connection cannot be established or the HTTP request fails, the health checks fails.

If the number of consecutive failed checks meets the failure threshold, the server is taken out of rotation. Health checks continue while the server is down, however. If the server resumes service and responds successfully to the health checks, and if the number of consecutive successful responses meets the success threshold, the server is restored to rotation.

TCP health checks Jump to heading

A basic TCP-layer health check tries to connect to the server’s TCP port. The check is valid when the server answers with a SYN/ACK packet. Enable it by adding a check argument to each server line that you would like to monitor.

In the following example, the load balancer tries to connect to port 80 on each server:

haproxy
backend be_myapp
server srv1 10.0.0.1:80 check
server srv2 10.0.0.2:80 check
haproxy
backend be_myapp
server srv1 10.0.0.1:80 check
server srv2 10.0.0.2:80 check

To send health check probes to a port other than the one to which normal traffic is sent, add the port argument. In the following example, the health check is sent to port 8080.

haproxy
backend be_myapp
server srv1 10.0.0.1:80 check port 8080
server srv2 10.0.0.2:80 check port 8080
haproxy
backend be_myapp
server srv1 10.0.0.1:80 check port 8080
server srv2 10.0.0.2:80 check port 8080

Define a send/expect sequence Jump to heading

Use option tcp-check to define a sequence of messages to send and responses to expect back. Below, we send the string PING and expect to receive the string PONG:

haproxy
backend be_mayapp
option tcp-check
tcp-check send PING\r\n
tcp-check expect string PONG
server srv1 10.0.0.1:80 check
haproxy
backend be_mayapp
option tcp-check
tcp-check send PING\r\n
tcp-check expect string PONG
server srv1 10.0.0.1:80 check

HTTP health checks Jump to heading

An HTTP-layer health check sends an HTTP OPTIONS request to the server and expects to get a successful response. To enable it, add option httpchk to the backend section:

haproxy
backend be_myapp
option httpchk
server srv1 192.168.1.5:80 check
haproxy
backend be_myapp
option httpchk
server srv1 192.168.1.5:80 check

Checks send an OPTIONS request to the URL / by default.

You can change the HTTP method and URL by specifying them on the option httpchk line. In the following example, we send checks using GET instead of OPTIONS to the URL /healthz:

haproxy
backend be_myapp
option httpchk GET /healthz
server srv1 10.0.0.1:80 check
server srv2 10.0.0.2:80 check
haproxy
backend be_myapp
option httpchk GET /healthz
server srv1 10.0.0.1:80 check
server srv2 10.0.0.2:80 check

If the response status code is in the 2xx or 3xx range, the server is healthy.

Expect a response status Jump to heading

Use http-check expect to specify which HTTP status code indicates a healthy server. In the following example, the server must return a 200 OK response:

haproxy
backend bk_myapp
option httpchk
http-check expect status 200
server srv1 10.0.0.1:80 check
server srv2 10.0.0.2:80 check
haproxy
backend bk_myapp
option httpchk
http-check expect status 200
server srv1 10.0.0.1:80 check
server srv2 10.0.0.2:80 check

As an alternative to http-check expect status, where you specify one explicit status value, you can use rstatus to specify a regular expression to match multiple status codes. In the next example, the health check uses rstatus in conjunction with the negation operator (!) to consider all statuses as valid except for 5xx responses:

haproxy
backend bk_myapp
option httpchk
http-check expect ! rstatus ^5
default-server inter 3s fall 3 rise 2
server srv1 10.0.0.1:80 check
server srv2 10.0.0.2:80 check
haproxy
backend bk_myapp
option httpchk
http-check expect ! rstatus ^5
default-server inter 3s fall 3 rise 2
server srv1 10.0.0.1:80 check
server srv2 10.0.0.2:80 check

Expect a string in the response Jump to heading

To specify a string to search for in the body of an HTTP or TCP response:

  1. Set the string that you expect to see in the body by adding the expect string directive to http-check or tcp-check. In the next example, the response must contain the string OK:

    haproxy
    backend be_myapp
    option httpchk
    http-check expect string OK
    server srv1 10.0.0.1:80 check
    server srv2 10.0.0.2:8080 check
    haproxy
    backend be_myapp
    option httpchk
    http-check expect string OK
    server srv1 10.0.0.1:80 check
    server srv2 10.0.0.2:8080 check

Use the expect rstring argument to specify a regular expression instead of an explicit string.

Customize with the send directive Jump to heading

Available since

  • HAProxy 2.2
  • HAProxy Enterprise 2.2r1
  • HAProxy ALOHA 12.5

Another way to change the HTTP method and URL is by adding the http-check send line and specifying the new values there. In the following example, checks send GET requests to the URL /healthz:

haproxy
backend be_myapp
option httpchk
http-check send meth HEAD uri /healthz ver HTTP/1.1 hdr Host test.local
server srv1 192.168.1.5:80 check
haproxy
backend be_myapp
option httpchk
http-check send meth HEAD uri /healthz ver HTTP/1.1 hdr Host test.local
server srv1 192.168.1.5:80 check

You can send POST requests too:

haproxy
backend be_myapp
option httpchk
http-check send meth POST uri /health hdr Content-Type "application/json;charset=UTF-8" hdr Host www.mwebsite.com body "{\"id\": 1, \"field\": \"value\"}"
server srv1 192.168.1.5:80 check
haproxy
backend be_myapp
option httpchk
http-check send meth POST uri /health hdr Content-Type "application/json;charset=UTF-8" hdr Host www.mwebsite.com body "{\"id\": 1, \"field\": \"value\"}"
server srv1 192.168.1.5:80 check

Customize connect arguments Jump to heading

Available since

  • HAProxy 2.2
  • HAProxy Enterprise 2.2r1
  • HAProxy ALOHA 12.5

Use the connect directive to enable SNI, connect over SSL/TLS, perform health checks over SOCKS4, and choose the protocol, such as HTTP/2 or FastCGI. Here’s an example where health checks are performed using HTTP/2 and SSL:

haproxy
backend be_myapp
option httpchk
http-check connect ssl alpn h2
http-check send meth HEAD uri /health ver HTTP/2 hdr Host www.test.local
server srv1 192.168.1.5:443 check
haproxy
backend be_myapp
option httpchk
http-check connect ssl alpn h2
http-check send meth HEAD uri /health ver HTTP/2 hdr Host www.test.local
server srv1 192.168.1.5:443 check

To close a connection cleanly instead of sending a RST, use the linger option.

Check multiple HTTP endpoints Jump to heading

Available since

  • HAProxy 2.2
  • HAProxy Enterprise 2.2r1
  • HAProxy ALOHA 12.5

Additional power comes from the ability to query several endpoints during a single health check. In the following example, we make requests to two distinct services: one listening at port 8080 and the other at port 8081. We also use different URIs. If either endpoint fails to respond, the entire health check fails.

haproxy
backend servers
option httpchk
http-check connect port 8080
http-check send meth HEAD uri /health
http-check connect port 8081
http-check send meth HEAD uri /up
server server1 127.0.0.1:80 check
haproxy
backend servers
option httpchk
http-check connect port 8080
http-check send meth HEAD uri /health
http-check connect port 8081
http-check send meth HEAD uri /up
server server1 127.0.0.1:80 check

Change the interval Jump to heading

By default, the load balancer sends a health check every two seconds. Change this by adding the inter argument to the server line. In the next example, we send a health check every four seconds:

haproxy
backend be_myapp
server srv1 10.0.0.1:80 check inter 4s
server srv2 10.0.0.2:80 check inter 4s
haproxy
backend be_myapp
server srv1 10.0.0.1:80 check inter 4s
server srv2 10.0.0.2:80 check inter 4s

Use any of the following time suffixes:

  • us : microseconds
  • ms : milliseconds
  • s : seconds
  • m : minutes
  • h : hours
  • d : days

Other arguments that affect the check interval are defined below:

Argument Description
inter Sets the interval between two consecutive health checks. If not specified, the default value is 2s.
fastinter Sets the interval between two consecutive health checks when the server is in any of the transition states: UP - transitionally DOWN or DOWN - transitionally UP. If not set, then inter is used.
downinter Sets the interval between two consecutive health checks when the server is in the DOWN state. If not set, then inter is used.

Change the failure threshold Jump to heading

Use the fall argument to change the number of failed health checks that will trigger removing the server from the load balancing rotation. By default, this is set to 3. In the following example, 5 failed checks will put the server into the DOWN state:

haproxy
backend be_myapp
server srv1 10.0.0.1:80 check fall 5
server srv2 10.0.0.2:80 check fall 5
haproxy
backend be_myapp
server srv1 10.0.0.1:80 check fall 5
server srv2 10.0.0.2:80 check fall 5

Use the rise argument to set how many successful checks are needed to bring a down server back up. The default is 2. In the following example, 10 successful health checks are needed before the server will return to the load balancing rotation:

haproxy
backend be_myapp
server srv1 10.0.0.1:80 check fall 5 rise 10
server srv2 10.0.0.2:80 check fall 5 rise 10
haproxy
backend be_myapp
server srv1 10.0.0.1:80 check fall 5 rise 10
server srv2 10.0.0.2:80 check fall 5 rise 10

Set check-scoped variables Jump to heading

Available since

  • HAProxy 2.2
  • HAProxy Enterprise 2.2r1
  • HAProxy ALOHA 12.5

Use either tcp-check set-var or http-check set-var to set a variable scoped to a health check session. For example:

haproxy
backend be_myapp
tcp-check set-var(check.port) int(1234)
tcp-check connect port var(check.port)
server srv1 10.0.0.1:80 check
haproxy
backend be_myapp
tcp-check set-var(check.port) int(1234)
tcp-check connect port var(check.port)
server srv1 10.0.0.1:80 check

Passive health checks Jump to heading

A passive health check monitors live traffic for errors. You can watch for either failed TCP connections or bad HTTP responses. Passive checks will detect errors returning from any part of your proxied service, but they require active traffic to monitor.

Monitor for TCP connection errors Jump to heading

To monitor live traffic for TCP connection errors, follow these steps:

  1. Add the check argument to the server lines you want to monitor.
  2. Add the observe layer4 argument to each server line to activate passive health checking.
  3. Add the error-limit and on-error arguments to set the threshold for failed passive health checks and the action to take when errors exceed that threshold.

In the following example, we monitor for TCP connection errors. When there are at least 10 of these errors, we mark the server as down by using the mark-down value for the on-error argument:

haproxy
backend servers
server server1 192.168.0.10:80 check inter 2m observe layer4 error-limit 10 on-error mark-down
haproxy
backend servers
server server1 192.168.0.10:80 check inter 2m observe layer4 error-limit 10 on-error mark-down

The check argument enables an active health check probe that will ping the server’s TCP port at an interval. The interval is 2 seconds by default, which you can change using the inter keyword. After a set number of successful active health check probes, this will bring the server back online after it has been removed from the load-balancing rotation from failed passive health checks. In the example above, the interval is increased to 2 minutes to ensure that the server can remain healthy for a longer period of time before returning to service.

Monitor for HTTP response errors Jump to heading

To monitor live traffic for HTTP response errors, follow these steps:

  1. Add the check argument to the server lines you want to monitor.
  2. Add the observe layer7 argument to each server line to activate passive health checking.
  3. Add the error-limit and on-error arguments to set the threshold for failed passive health checks and the action to take when errors exceed that threshold.

In the following example, we monitor for HTTP response errors. When there are at least 10 of these errors, we mark the server as down by using the mark-down value for the on-error argument:

haproxy
backend servers
server server1 192.168.0.10:80 check observe layer7 error-limit 10 on-error mark-down
haproxy
backend servers
server server1 192.168.0.10:80 check observe layer7 error-limit 10 on-error mark-down

The check argument enables an active health check probe that will ping the server’s TCP port at an interval. After a set number of successful active health check probes, this will bring the server back online after it has been removed from the load balancing rotation from failed passive health checks.

Set the on-error action Jump to heading

The on-error argument on the server line determines what action to take when errors exceed the threshold you set with the error-limit. It accepts any of the following values:

Action Description
fastinter Forces fastinter mode, which causes the active health check probes to be sent more rapidly.
fail-check Increments one failed active health check and forces fastinter mode.
sudden-death Simulates a pre-fatal failed check. One more check will mark the server as down. It also forces fastinter mode.
mark-down Marks the server as down and forces fastinter mode.

Agent checks Jump to heading

An agent check is one where the load balancer connects to an agent program running on a backend server. In response to the agent check probe, the agent program sends back a string of ASCII text that triggers a change in the load balancer.

The program running on the server can send back to the load balancer a string containing any of the following commands.

Text Effect on the load balancer
<number>% Changes the server’s weight to a percentage of its current value. Specifying 0% is equivalent to drain. Example: Change server weight to one-half its current value: 50%
down Marks the server as down due to critical condition such as missing process or port not responding. Optionally, you can append a number sign (#) followed by a description string.
drain Puts the server into drain mode, where it will not accept new connections other than those accepted via persistence.
fail Marks the server as down and can indicate that a validity test has failed. Optionally, you can append a number sign (#) followed by a description string.
maint Puts the server into maintenance mode, where it will not accept any new connections, and health checks will be stopped.
maxconn:<number> Changes the server’s maxconn value to the given number. Do not issue a space in between maxconn: and the number. Example: Set maximum connections value to 30: maxconn:30
ready Takes the server out of maintenance mode.
stopped Marks the server as down due to intentional halt. Optionally, you can append a number sign (#) followed by a description string.
up Marks the server as up.

The string is formatted as one or more of these commands separated by spaces, tabs, or commas. The string must end with a carriage return (\r) or new line (\n) character. Example: ready 50% maxconn:30

Create an agent program Jump to heading

The agent program can be written in any programming language, as long as it allows you to listen on a TCP port. When the program detects that the load balancer has connected, it should return a string of ASCII text that makes a change to the load balancer or keeps it at its current state.

Below is a program written in the Go programming language. It returns the string 50%\n if the server’s CPU idle time is less than 10, which would indicate it is near to maxing out its CPU. Otherwise, it returns the string 100%\n. Notice that the string should end with a line feed character (\n):

agent-program.go
go
package main
import (
"fmt"
"time"
"github.com/firstrow/tcp_server"
"github.com/mackerelio/go-osstat/cpu"
)
func main() {
server := tcp_server.New(":9999")
server.OnNewClient(func(c *tcp_server.Client) {
fmt.Println("Client connected")
cpuIdle, err := getIdleTime()
if err != nil {
fmt.Println(err)
c.Close()
return
}
if cpuIdle < 10 {
// Set server weight to half
c.Send("50%\n")
} else {
c.Send("100%\n")
}
c.Close()
})
server.Listen()
}
func getIdleTime() (float64, error) {
before, err := cpu.Get()
if err != nil {
return 0, err
}
time.Sleep(time.Duration(1) * time.Second)
after, err := cpu.Get()
if err != nil {
return 0, err
}
total := float64(after.Total - before.Total)
cpuIdle := float64(after.Idle-before.Idle) / total * 100
return cpuIdle, nil
}
agent-program.go
go
package main
import (
"fmt"
"time"
"github.com/firstrow/tcp_server"
"github.com/mackerelio/go-osstat/cpu"
)
func main() {
server := tcp_server.New(":9999")
server.OnNewClient(func(c *tcp_server.Client) {
fmt.Println("Client connected")
cpuIdle, err := getIdleTime()
if err != nil {
fmt.Println(err)
c.Close()
return
}
if cpuIdle < 10 {
// Set server weight to half
c.Send("50%\n")
} else {
c.Send("100%\n")
}
c.Close()
})
server.Listen()
}
func getIdleTime() (float64, error) {
before, err := cpu.Get()
if err != nil {
return 0, err
}
time.Sleep(time.Duration(1) * time.Second)
after, err := cpu.Get()
if err != nil {
return 0, err
}
total := float64(after.Total - before.Total)
cpuIdle := float64(after.Idle-before.Idle) / total * 100
return cpuIdle, nil
}

Configure the backend Jump to heading

Configure the servers in the backend to send agent checks.

Here, the load balancer sends an agent-based check probe every five seconds to a program listening at 192.168.0.10 at port 8080:

haproxy
backend servers
server server1 192.168.0.10:80 check weight 100 agent-check agent-addr 192.168.0.10 agent-port 8080 agent-inter 5s agent-send ping\n
haproxy
backend servers
server server1 192.168.0.10:80 check weight 100 agent-check agent-addr 192.168.0.10 agent-port 8080 agent-inter 5s agent-send ping\n

Use the following arguments on each server line to enable agent-based checks:

Argument Description
check Enables health checking.
agent-check Enables agent checks for the server.
agent-addr Identifies the IP address where the agent is listening.
agent-port Identifies the port where the agent is listening.
agent-inter Defines the interval between checks.
agent-send A string that the load balancer sends to the agent upon connection. Be sure to end it with a newline character.

LDAP health checks Jump to heading

You can health check LDAPv3 servers. The load balancer uses the Anonymous Authentication Mechanism of Simple Bind to connect. The check is valid if the server responds with a successful result message.

  1. Configure the LDAP servers accordingly to allow anonymous binding. You can do this with an IP alias on the server side that allows only the load balancer’s IP addresses to bind to it.

  2. Add option ldap-check to your backend section. In this example, we send the health check probes to alternative IP addresses specified with the addr argument on the server lines:

    haproxy
    backend be_myapp
    option ldap-check
    server srv1 10.0.0.1:389 check addr 10.0.0.11
    server srv2 10.0.0.2:389 check addr 10.0.0.12
    haproxy
    backend be_myapp
    option ldap-check
    server srv1 10.0.0.1:389 check addr 10.0.0.11
    server srv2 10.0.0.2:389 check addr 10.0.0.12

MySQL health checks Jump to heading

You can health check MySQL database servers. The check is valid if the server responds with a successful result message. Two modes exist:

  • check the MySQL handshake packet
  • test Client Authentication

In the following example, we check a MySQL handshake by adding the option mysql-check directive:

haproxy
backend be_myapp
option mysql-check
server srv1 10.0.0.1:3306 check
server srv2 10.0.0.2:3306 check
haproxy
backend be_myapp
option mysql-check
server srv1 10.0.0.1:3306 check
server srv2 10.0.0.2:3306 check

Add a user argument to option mysql-check for the health check probe to send a Client Authentication packet:

haproxy
backend be_myapp
option mysql-check user hapee-lb
server srv1 10.0.0.1:3306 check
server srv2 10.0.0.2:3306 check
haproxy
backend be_myapp
option mysql-check user hapee-lb
server srv1 10.0.0.1:3306 check
server srv2 10.0.0.2:3306 check

PostgreSQL health checks Jump to heading

You can perform a simple PostgreSQL check by sending a StartupMessage. The check is valid if the server responds with a successful Authentication request message rather than an error response. Add the option pgsql-check directive to your backend section and include a check argument on each server line.

haproxy
backend be_pgsql
option pgsql-check
server srv1 10.0.0.1:5432 check
server srv2 10.0.0.2:5432 check
haproxy
backend be_pgsql
option pgsql-check
server srv1 10.0.0.1:5432 check
server srv2 10.0.0.2:5432 check

Optionally, include the username that will be used to connect to the PostgreSQL server (here, hapee-lb):

haproxy
backend be_pgsql
option pgsql-check hapee-lb
server srv1 10.0.0.1:5432 check
server srv2 10.0.0.2:5432 check
haproxy
backend be_pgsql
option pgsql-check hapee-lb
server srv1 10.0.0.1:5432 check
server srv2 10.0.0.2:5432 check

Redis health checks Jump to heading

You can monitor a Redis service by sending the PING command. The check is valid if the server responds with the string +PONG. Add the option redis-check directive to your backend section and include a check argument on each server line.

haproxy
backend be_redis
option redis-check
server srv1 10.0.0.1:6379 check
server srv2 10.0.0.2:6379 check
haproxy
backend be_redis
option redis-check
server srv1 10.0.0.1:6379 check
server srv2 10.0.0.2:6379 check

SMTP health checks Jump to heading

You can monitor a Simple Mail Transfer Protocol (SMTP) service. Add the option smtpchk directive to your backend section and include a check argument on each server line. The check is valid if the server response code starts with the number 2.

haproxy
backend be_smtp
option smtpchk
server srv1 10.0.0.1:25 check
server srv2 10.0.0.2:25 check
haproxy
backend be_smtp
option smtpchk
server srv1 10.0.0.1:25 check
server srv2 10.0.0.2:25 check

You can also monitor an Extended Simple Mail Transfer Protocol (ESMTP) service. Add the hello command to use, which is HELO for SMTP and EHLO for ESMTP. Follow this with the domain name to present to the server:

haproxy
backend be_smtp
option smtpchk EHLO mydomain.com
server srv1 10.0.0.1:25 check
server srv2 10.0.0.2:25 check
haproxy
backend be_smtp
option smtpchk EHLO mydomain.com
server srv1 10.0.0.1:25 check
server srv2 10.0.0.2:25 check

See also Jump to heading

Do you have any suggestions on how we can improve the content of this page?