Service reliability
Health checks
Health checks ensure that only healthy servers are kept in the load balancing rotation. They check the status of each server by using one of the health checking modes described in this section.
Active health checks Jump to heading
An active health check attempts to connect to a server or send it an HTTP request at a regular interval. If the connection cannot be established or the HTTP request fails, the health checks fails.
If the number of consecutive failed checks meets the failure threshold, the server is taken out of rotation. Health checks continue while the server is down, however. If the server resumes service and responds successfully to the health checks, and if the number of consecutive successful responses meets the success threshold, the server is restored to rotation.
TCP health checks Jump to heading
A basic TCP-layer health check tries to connect to the server’s TCP port. The check is valid when the server answers with a SYN/ACK
packet. Enable it by adding a check
argument to each server
line that you would like to monitor.
In the following example, the load balancer tries to connect to port 80 on each server:
haproxy
backend be_myappserver srv1 10.0.0.1:80 checkserver srv2 10.0.0.2:80 check
haproxy
backend be_myappserver srv1 10.0.0.1:80 checkserver srv2 10.0.0.2:80 check
To send health check probes to a port other than the one to which normal traffic is sent, add the port
argument. In the following example, the health check is sent to port 8080.
haproxy
backend be_myappserver srv1 10.0.0.1:80 check port 8080server srv2 10.0.0.2:80 check port 8080
haproxy
backend be_myappserver srv1 10.0.0.1:80 check port 8080server srv2 10.0.0.2:80 check port 8080
Define a send/expect sequence Jump to heading
Use option tcp-check
to define a sequence of messages to send and responses to expect back. Below, we send the string PING
and expect to receive the string PONG
:
haproxy
backend be_mayappoption tcp-checktcp-check send PING\r\ntcp-check expect string PONGserver srv1 10.0.0.1:80 check
haproxy
backend be_mayappoption tcp-checktcp-check send PING\r\ntcp-check expect string PONGserver srv1 10.0.0.1:80 check
HTTP health checks Jump to heading
An HTTP-layer health check sends an HTTP OPTIONS
request to the server and expects to get a successful response. To enable it, add option httpchk
to the backend
section:
haproxy
backend be_myappoption httpchkserver srv1 192.168.1.5:80 check
haproxy
backend be_myappoption httpchkserver srv1 192.168.1.5:80 check
Checks send an OPTIONS
request to the URL /
by default.
You can change the HTTP method and URL by specifying them on the option httpchk
line. In the following example, we send checks using GET
instead of OPTIONS
to the URL /healthz
:
haproxy
backend be_myappoption httpchk GET /healthzserver srv1 10.0.0.1:80 checkserver srv2 10.0.0.2:80 check
haproxy
backend be_myappoption httpchk GET /healthzserver srv1 10.0.0.1:80 checkserver srv2 10.0.0.2:80 check
If the response status code is in the 2xx or 3xx range, the server is healthy.
Expect a response status Jump to heading
Use http-check expect
to specify which HTTP status code indicates a healthy server. In the following example, the server must return a 200 OK
response:
haproxy
backend bk_myappoption httpchkhttp-check expect status 200server srv1 10.0.0.1:80 checkserver srv2 10.0.0.2:80 check
haproxy
backend bk_myappoption httpchkhttp-check expect status 200server srv1 10.0.0.1:80 checkserver srv2 10.0.0.2:80 check
As an alternative to http-check expect status
, where you specify one explicit status value, you can use rstatus
to specify a regular expression to match multiple status codes. In the next example, the health check uses rstatus
in conjunction with the negation operator (!
) to consider all statuses as valid except for 5xx responses:
haproxy
backend bk_myappoption httpchkhttp-check expect ! rstatus ^5default-server inter 3s fall 3 rise 2server srv1 10.0.0.1:80 checkserver srv2 10.0.0.2:80 check
haproxy
backend bk_myappoption httpchkhttp-check expect ! rstatus ^5default-server inter 3s fall 3 rise 2server srv1 10.0.0.1:80 checkserver srv2 10.0.0.2:80 check
Expect a string in the response Jump to heading
To specify a string to search for in the body of an HTTP or TCP response:
-
Set the string that you expect to see in the body by adding the
expect string
directive tohttp-check
ortcp-check
. In the next example, the response must contain the stringOK
:haproxybackend be_myappoption httpchkhttp-check expect string OKserver srv1 10.0.0.1:80 checkserver srv2 10.0.0.2:8080 checkhaproxybackend be_myappoption httpchkhttp-check expect string OKserver srv1 10.0.0.1:80 checkserver srv2 10.0.0.2:8080 check
Use the expect rstring
argument to specify a regular expression instead of an explicit string.
Customize with the send directive Jump to heading
Available since
- HAProxy 2.2
- HAProxy Enterprise 2.2r1
- HAProxy ALOHA 12.5
Another way to change the HTTP method and URL is by adding the http-check send
line and specifying the new values there. In the following example, checks send GET
requests to the URL /healthz
:
haproxy
backend be_myappoption httpchkhttp-check send meth HEAD uri /healthz ver HTTP/1.1 hdr Host test.localserver srv1 192.168.1.5:80 check
haproxy
backend be_myappoption httpchkhttp-check send meth HEAD uri /healthz ver HTTP/1.1 hdr Host test.localserver srv1 192.168.1.5:80 check
You can send POST
requests too:
haproxy
backend be_myappoption httpchkhttp-check send meth POST uri /health hdr Content-Type "application/json;charset=UTF-8" hdr Host www.mwebsite.com body "{\"id\": 1, \"field\": \"value\"}"server srv1 192.168.1.5:80 check
haproxy
backend be_myappoption httpchkhttp-check send meth POST uri /health hdr Content-Type "application/json;charset=UTF-8" hdr Host www.mwebsite.com body "{\"id\": 1, \"field\": \"value\"}"server srv1 192.168.1.5:80 check
Customize connect arguments Jump to heading
Available since
- HAProxy 2.2
- HAProxy Enterprise 2.2r1
- HAProxy ALOHA 12.5
Use the connect
directive to enable SNI, connect over SSL/TLS, perform health checks over SOCKS4, and choose the protocol, such as HTTP/2 or FastCGI. Here’s an example where health checks are performed using HTTP/2 and SSL:
haproxy
backend be_myappoption httpchkhttp-check connect ssl alpn h2http-check send meth HEAD uri /health ver HTTP/2 hdr Host www.test.localserver srv1 192.168.1.5:443 check
haproxy
backend be_myappoption httpchkhttp-check connect ssl alpn h2http-check send meth HEAD uri /health ver HTTP/2 hdr Host www.test.localserver srv1 192.168.1.5:443 check
To close a connection cleanly instead of sending a RST, use the linger
option.
Check multiple HTTP endpoints Jump to heading
Available since
- HAProxy 2.2
- HAProxy Enterprise 2.2r1
- HAProxy ALOHA 12.5
Additional power comes from the ability to query several endpoints during a single health check. In the following example, we make requests to two distinct services: one listening at port 8080 and the other at port 8081. We also use different URIs. If either endpoint fails to respond, the entire health check fails.
haproxy
backend serversoption httpchkhttp-check connect port 8080http-check send meth HEAD uri /healthhttp-check connect port 8081http-check send meth HEAD uri /upserver server1 127.0.0.1:80 check
haproxy
backend serversoption httpchkhttp-check connect port 8080http-check send meth HEAD uri /healthhttp-check connect port 8081http-check send meth HEAD uri /upserver server1 127.0.0.1:80 check
Change the interval Jump to heading
By default, the load balancer sends a health check every two seconds. Change this by adding the inter
argument to the server
line. In the next example, we send a health check every four seconds:
haproxy
backend be_myappserver srv1 10.0.0.1:80 check inter 4sserver srv2 10.0.0.2:80 check inter 4s
haproxy
backend be_myappserver srv1 10.0.0.1:80 check inter 4sserver srv2 10.0.0.2:80 check inter 4s
Use any of the following time suffixes:
us
: microsecondsms
: millisecondss
: secondsm
: minutesh
: hoursd
: days
Other arguments that affect the check interval are defined below:
Argument | Description |
---|---|
inter | Sets the interval between two consecutive health checks. If not specified, the default value is 2s. |
fastinter | Sets the interval between two consecutive health checks when the server is in any of the transition states: UP - transitionally DOWN or DOWN - transitionally UP. If not set, then inter is used. |
downinter | Sets the interval between two consecutive health checks when the server is in the DOWN state. If not set, then inter is used. |
Change the failure threshold Jump to heading
Use the fall
argument to change the number of failed health checks that will trigger removing the server from the load balancing rotation. By default, this is set to 3. In the following example, 5 failed checks will put the server into the DOWN state:
haproxy
backend be_myappserver srv1 10.0.0.1:80 check fall 5server srv2 10.0.0.2:80 check fall 5
haproxy
backend be_myappserver srv1 10.0.0.1:80 check fall 5server srv2 10.0.0.2:80 check fall 5
Use the rise
argument to set how many successful checks are needed to bring a down server back up. The default is 2. In the following example, 10 successful health checks are needed before the server will return to the load balancing rotation:
haproxy
backend be_myappserver srv1 10.0.0.1:80 check fall 5 rise 10server srv2 10.0.0.2:80 check fall 5 rise 10
haproxy
backend be_myappserver srv1 10.0.0.1:80 check fall 5 rise 10server srv2 10.0.0.2:80 check fall 5 rise 10
Set check-scoped variables Jump to heading
Available since
- HAProxy 2.2
- HAProxy Enterprise 2.2r1
- HAProxy ALOHA 12.5
Use either tcp-check set-var
or http-check set-var
to set a variable scoped to a health check session. For example:
haproxy
backend be_myapptcp-check set-var(check.port) int(1234)tcp-check connect port var(check.port)server srv1 10.0.0.1:80 check
haproxy
backend be_myapptcp-check set-var(check.port) int(1234)tcp-check connect port var(check.port)server srv1 10.0.0.1:80 check
Passive health checks Jump to heading
A passive health check monitors live traffic for errors. You can watch for either failed TCP connections or bad HTTP responses. Passive checks will detect errors returning from any part of your proxied service, but they require active traffic to monitor.
Monitor for TCP connection errors Jump to heading
To monitor live traffic for TCP connection errors, follow these steps:
- Add the
check
argument to theserver
lines you want to monitor. - Add the
observe layer4
argument to eachserver
line to activate passive health checking. - Add the
error-limit
andon-error
arguments to set the threshold for failed passive health checks and the action to take when errors exceed that threshold.
In the following example, we monitor for TCP connection errors. When there are at least 10 of these errors, we mark the server as down by using the mark-down
value for the on-error
argument:
haproxy
backend serversserver server1 192.168.0.10:80 check inter 2m observe layer4 error-limit 10 on-error mark-down
haproxy
backend serversserver server1 192.168.0.10:80 check inter 2m observe layer4 error-limit 10 on-error mark-down
The check
argument enables an active health check probe that will ping the server’s TCP port at an interval. The interval is 2 seconds by default, which you can change using the inter
keyword. After a set number of successful active health check probes, this will bring the server back online after it has been removed from the load-balancing rotation from failed passive health checks. In the example above, the interval is increased to 2 minutes to ensure that the server can remain healthy for a longer period of time before returning to service.
Monitor for HTTP response errors Jump to heading
To monitor live traffic for HTTP response errors, follow these steps:
- Add the
check
argument to theserver
lines you want to monitor. - Add the
observe layer7
argument to eachserver
line to activate passive health checking. - Add the
error-limit
andon-error
arguments to set the threshold for failed passive health checks and the action to take when errors exceed that threshold.
In the following example, we monitor for HTTP response errors. When there are at least 10 of these errors, we mark the server as down by using the mark-down
value for the on-error
argument:
haproxy
backend serversserver server1 192.168.0.10:80 check observe layer7 error-limit 10 on-error mark-down
haproxy
backend serversserver server1 192.168.0.10:80 check observe layer7 error-limit 10 on-error mark-down
The check
argument enables an active health check probe that will ping the server’s TCP port at an interval. After a set number of successful active health check probes, this will bring the server back online after it has been removed from the load balancing rotation from failed passive health checks.
Set the on-error action Jump to heading
The on-error
argument on the server
line determines what action to take when errors exceed the threshold you set with the error-limit
. It accepts any of the following values:
Action | Description |
---|---|
fastinter | Forces fastinter mode, which causes the active health check probes to be sent more rapidly. |
fail-check | Increments one failed active health check and forces fastinter mode. |
sudden-death | Simulates a pre-fatal failed check. One more check will mark the server as down. It also forces fastinter mode. |
mark-down | Marks the server as down and forces fastinter mode. |
Agent checks Jump to heading
An agent check is one where the load balancer connects to an agent program running on a backend server. In response to the agent check probe, the agent program sends back a string of ASCII text that triggers a change in the load balancer.
The program running on the server can send back to the load balancer a string containing any of the following commands.
Text | Effect on the load balancer |
---|---|
<number>% |
Changes the server’s weight to a percentage of its current value. Specifying 0% is equivalent to drain . Example: Change server weight to one-half its current value: 50% |
down |
Marks the server as down due to critical condition such as missing process or port not responding. Optionally, you can append a number sign (# ) followed by a description string. |
drain |
Puts the server into drain mode, where it will not accept new connections other than those accepted via persistence. |
fail |
Marks the server as down and can indicate that a validity test has failed. Optionally, you can append a number sign (# ) followed by a description string. |
maint |
Puts the server into maintenance mode, where it will not accept any new connections, and health checks will be stopped. |
maxconn:<number> |
Changes the server’s maxconn value to the given number. Do not issue a space in between maxconn: and the number. Example: Set maximum connections value to 30: maxconn:30 |
ready |
Takes the server out of maintenance mode. |
stopped |
Marks the server as down due to intentional halt. Optionally, you can append a number sign (# ) followed by a description string. |
up |
Marks the server as up. |
The string is formatted as one or more of these commands separated by spaces, tabs, or commas. The string must end with a carriage return (\r
) or new line (\n
) character. Example: ready 50% maxconn:30
Create an agent program Jump to heading
The agent program can be written in any programming language, as long as it allows you to listen on a TCP port. When the program detects that the load balancer has connected, it should return a string of ASCII text that makes a change to the load balancer or keeps it at its current state.
Below is a program written in the Go programming language. It returns the string 50%\n
if the server’s CPU idle time is less than 10, which would indicate it is near to maxing out its CPU. Otherwise, it returns the string 100%\n
. Notice that the string should end with a line feed character (\n
):
agent-program.gogo
package mainimport ("fmt""time""github.com/firstrow/tcp_server""github.com/mackerelio/go-osstat/cpu")func main() {server := tcp_server.New(":9999")server.OnNewClient(func(c *tcp_server.Client) {fmt.Println("Client connected")cpuIdle, err := getIdleTime()if err != nil {fmt.Println(err)c.Close()return}if cpuIdle < 10 {// Set server weight to halfc.Send("50%\n")} else {c.Send("100%\n")}c.Close()})server.Listen()}func getIdleTime() (float64, error) {before, err := cpu.Get()if err != nil {return 0, err}time.Sleep(time.Duration(1) * time.Second)after, err := cpu.Get()if err != nil {return 0, err}total := float64(after.Total - before.Total)cpuIdle := float64(after.Idle-before.Idle) / total * 100return cpuIdle, nil}
agent-program.gogo
package mainimport ("fmt""time""github.com/firstrow/tcp_server""github.com/mackerelio/go-osstat/cpu")func main() {server := tcp_server.New(":9999")server.OnNewClient(func(c *tcp_server.Client) {fmt.Println("Client connected")cpuIdle, err := getIdleTime()if err != nil {fmt.Println(err)c.Close()return}if cpuIdle < 10 {// Set server weight to halfc.Send("50%\n")} else {c.Send("100%\n")}c.Close()})server.Listen()}func getIdleTime() (float64, error) {before, err := cpu.Get()if err != nil {return 0, err}time.Sleep(time.Duration(1) * time.Second)after, err := cpu.Get()if err != nil {return 0, err}total := float64(after.Total - before.Total)cpuIdle := float64(after.Idle-before.Idle) / total * 100return cpuIdle, nil}
Configure the backend Jump to heading
Configure the servers in the backend
to send agent checks.
Here, the load balancer sends an agent-based check probe every five seconds to a program listening at 192.168.0.10
at port 8080
:
haproxy
backend serversserver server1 192.168.0.10:80 check weight 100 agent-check agent-addr 192.168.0.10 agent-port 8080 agent-inter 5s agent-send ping\n
haproxy
backend serversserver server1 192.168.0.10:80 check weight 100 agent-check agent-addr 192.168.0.10 agent-port 8080 agent-inter 5s agent-send ping\n
Use the following arguments on each server
line to enable agent-based checks:
Argument | Description |
---|---|
check |
Enables health checking. |
agent-check |
Enables agent checks for the server. |
agent-addr |
Identifies the IP address where the agent is listening. |
agent-port |
Identifies the port where the agent is listening. |
agent-inter |
Defines the interval between checks. |
agent-send |
A string that the load balancer sends to the agent upon connection. Be sure to end it with a newline character. |
LDAP health checks Jump to heading
You can health check LDAPv3 servers. The load balancer uses the Anonymous Authentication Mechanism of Simple Bind to connect. The check is valid if the server responds with a successful result message.
-
Configure the LDAP servers accordingly to allow anonymous binding. You can do this with an IP alias on the server side that allows only the load balancer’s IP addresses to bind to it.
-
Add
option ldap-check
to yourbackend
section. In this example, we send the health check probes to alternative IP addresses specified with theaddr
argument on theserver
lines:haproxybackend be_myappoption ldap-checkserver srv1 10.0.0.1:389 check addr 10.0.0.11server srv2 10.0.0.2:389 check addr 10.0.0.12haproxybackend be_myappoption ldap-checkserver srv1 10.0.0.1:389 check addr 10.0.0.11server srv2 10.0.0.2:389 check addr 10.0.0.12
MySQL health checks Jump to heading
You can health check MySQL database servers. The check is valid if the server responds with a successful result message. Two modes exist:
- check the MySQL handshake packet
- test Client Authentication
In the following example, we check a MySQL handshake by adding the option mysql-check
directive:
haproxy
backend be_myappoption mysql-checkserver srv1 10.0.0.1:3306 checkserver srv2 10.0.0.2:3306 check
haproxy
backend be_myappoption mysql-checkserver srv1 10.0.0.1:3306 checkserver srv2 10.0.0.2:3306 check
Add a user
argument to option mysql-check
for the health check probe to send a Client Authentication packet:
haproxy
backend be_myappoption mysql-check user hapee-lbserver srv1 10.0.0.1:3306 checkserver srv2 10.0.0.2:3306 check
haproxy
backend be_myappoption mysql-check user hapee-lbserver srv1 10.0.0.1:3306 checkserver srv2 10.0.0.2:3306 check
PostgreSQL health checks Jump to heading
You can perform a simple PostgreSQL check by sending a StartupMessage. The check is valid if the server responds with a successful Authentication request message rather than an error response. Add the option pgsql-check
directive to your backend
section and include a check
argument on each server
line.
haproxy
backend be_pgsqloption pgsql-checkserver srv1 10.0.0.1:5432 checkserver srv2 10.0.0.2:5432 check
haproxy
backend be_pgsqloption pgsql-checkserver srv1 10.0.0.1:5432 checkserver srv2 10.0.0.2:5432 check
Optionally, include the username that will be used to connect to the PostgreSQL server (here, hapee-lb
):
haproxy
backend be_pgsqloption pgsql-check hapee-lbserver srv1 10.0.0.1:5432 checkserver srv2 10.0.0.2:5432 check
haproxy
backend be_pgsqloption pgsql-check hapee-lbserver srv1 10.0.0.1:5432 checkserver srv2 10.0.0.2:5432 check
Redis health checks Jump to heading
You can monitor a Redis service by sending the PING
command. The check is valid if the server responds with the string +PONG
. Add the option redis-check
directive to your backend
section and include a check
argument on each server
line.
haproxy
backend be_redisoption redis-checkserver srv1 10.0.0.1:6379 checkserver srv2 10.0.0.2:6379 check
haproxy
backend be_redisoption redis-checkserver srv1 10.0.0.1:6379 checkserver srv2 10.0.0.2:6379 check
SMTP health checks Jump to heading
You can monitor a Simple Mail Transfer Protocol (SMTP) service. Add the option smtpchk
directive to your backend
section and include a check
argument on each server
line. The check is valid if the server response code starts with the number 2
.
haproxy
backend be_smtpoption smtpchkserver srv1 10.0.0.1:25 checkserver srv2 10.0.0.2:25 check
haproxy
backend be_smtpoption smtpchkserver srv1 10.0.0.1:25 checkserver srv2 10.0.0.2:25 check
You can also monitor an Extended Simple Mail Transfer Protocol (ESMTP) service. Add the hello command to use, which is HELO
for SMTP and EHLO
for ESMTP. Follow this with the domain name to present to the server:
haproxy
backend be_smtpoption smtpchk EHLO mydomain.comserver srv1 10.0.0.1:25 checkserver srv2 10.0.0.2:25 check
haproxy
backend be_smtpoption smtpchk EHLO mydomain.comserver srv1 10.0.0.1:25 checkserver srv2 10.0.0.2:25 check
See also Jump to heading
Do you have any suggestions on how we can improve the content of this page?