Service reliability
Circuit breakers
A circuit breaker is a mechanism that monitors services in real time, checking for errors in the service’s responses. If failures exceed a threshold, the circuit breaker flips into the open state and shuts off access to the service. Its purpose is to detect error conditions that may last a long time and rather than allowing dependent services to continue calling the faulty service, it sends back an error immediately. This prevents them from trying to use the service for a period of time.
Circuit breaker using the observe argument Jump to heading
A simple implementation of the circuit breaker pattern involves using the observe
argument to monitor live traffic for errors. Consider the following example, which will disable access to a server if it detects at least 50 percent HTTP errors:
haproxy
backend myservicedefault-server maxconn 30 check observe layer7 error-limit 50 on-error mark-down inter 1s rise 30 slowstart 20sserver s1 192.168.0.10:80server s2 192.168.0.11:80
haproxy
backend myservicedefault-server maxconn 30 check observe layer7 error-limit 50 on-error mark-down inter 1s rise 30 slowstart 20sserver s1 192.168.0.10:80server s2 192.168.0.11:80
How it works:
- The
default-server
directive sets arguments that apply to allserver
lines in thebackend
section. - The
check
argument enables health checking of the server. - The
observe layer7
argument enables monitoring of traffic coming and going from the server. - The
error-limit 50
argument sets a threshold of 50 errors, after which it triggers theon-error
action. - The
on-error mark-down
argument marks the service as DOWN if theerror-limit
is reached. - The
inter 1s
sets how often to send active health checks (1 second), which are responsible for checking a service after it has failed to know when to bring it back online. - The
rise 30
argument sets how many successful active health checks there must be (30) before bringing the server back online. When you multiply theinter
value by therise
value, you get the minimum amount of time that the server will be removed from the load-balancing rotation (1 second x 30 = 30 seconds). - The
slowstart 20s
argument sends traffic to the server gradually over 20 seconds after it has recovered until it reaches 100% of its maximum connections, as set bymaxconn
.
You may also set observe
to layer4
if you prefer to monitor for unsuccessful connections to a server rather than failed HTTP responses.
Circuit breaker using stick tables Jump to heading
In this more complex example, the load balancer monitors the number of HTTP 5xx errors returned from all servers in the backend. If that number makes up 50% of all responses, it disables access to the service by rejecting all new requests for the next 30 seconds.
haproxy
backend myservicestick-table type string size 1 expire 30s store http_req_rate(10s),gpc0,gpc0_rate(10s),gpc1# Is the circuit open (no traffic can flow)?acl circuit_open be_name,table_gpc1 gt 0# Reject request if circuit is openhttp-request deny deny_status 503 if circuit_open# Begin tracking requestshttp-request track-sc0 be_name# Count HTTP 5xx server errorshttp-response sc-inc-gpc0(0) if { status ge 500 }# Store the HTTP request rate and error rate in variableshttp-response set-var(res.req_rate) sc_http_req_rate(0)http-response set-var(res.err_rate) sc_gpc0_rate(0)# Check if error rate is greater than 50% using some mathhttp-response sc-inc-gpc1(0) if { int(100),mul(res.err_rate),div(res.req_rate) gt 50 }server s1 192.168.0.10:80 checkserver s2 192.168.0.11:80 check
haproxy
backend myservicestick-table type string size 1 expire 30s store http_req_rate(10s),gpc0,gpc0_rate(10s),gpc1# Is the circuit open (no traffic can flow)?acl circuit_open be_name,table_gpc1 gt 0# Reject request if circuit is openhttp-request deny deny_status 503 if circuit_open# Begin tracking requestshttp-request track-sc0 be_name# Count HTTP 5xx server errorshttp-response sc-inc-gpc0(0) if { status ge 500 }# Store the HTTP request rate and error rate in variableshttp-response set-var(res.req_rate) sc_http_req_rate(0)http-response set-var(res.err_rate) sc_gpc0_rate(0)# Check if error rate is greater than 50% using some mathhttp-response sc-inc-gpc1(0) if { int(100),mul(res.err_rate),div(res.req_rate) gt 50 }server s1 192.168.0.10:80 checkserver s2 192.168.0.11:80 check
How it works:
- The
stick-table
line tracks requests entering thebackend
. It monitors the HTTP request rate, the HTTP error rate (captured with the generic counters namedgpc0
andgpc0_rate
), and a counter that acts as a flag that opens the circuit (gpc1
) when the error percentage exceeds a threshold. Theexpire
argument sets how long to disable access to the service once thegpc1
flag has been incremented. In this example, the period to disable the service when it becomes faulty is 30 seconds. - The
circuit_open
ACL checks whether the flaggpc1
is 0 or 1. If it is 1, the circuit is open. - The
http-request deny
line rejects all requests while the circuit is open, returning anHTTP 503 - Service Unavailable
response in the meantime. - The
http-request track-sc0
line ensures that all requests entering the backend are monitored for errors. - The
http-response sc-in-gpc0(0)
line increments the error counter (gpc0
) every time a server returns an HTTP 5xx response (i.e. any HTTP error in the 500-599 range). - The
http-response set-var
lines set two variables. The first isres.req_rate
, which holds the current HTTP request rate. The second isres.err_rate
, which holds the current HTTP error rate. - The
http-response sc-inc-gpc1(0)
line increments the gpc1 flag to 1 if the error rate makes up at least 50% of the request rate. This opens the circuit. The circuit is left open and no requests are allowed into the backend until the record expires in the stick table after 30 seconds.
Adjust the error rate threshold on the http-response sc-inc-gpc1(0)
line to a number other than 50. Or, adjust the time period that the circuit stays open by changing the expire
argument on the stick table.
See also Jump to heading
Do you have any suggestions on how we can improve the content of this page?