HAProxy offers a powerful logging system that allows users to capture information about HTTP transactions. As part of that, logging headers provide insight into what's happening behind the scenes, such as when a web application firewall (WAF) blocks a request because of a problematic header. It can also be helpful when an application has trouble parsing a header, allowing you to review the logged headers to ensure that they conform to a specific format. While logging headers is essential, the process can easily consume storage capacity. In this blog post, we show how to collect HTTP header logs and store them remotely to avoid overwhelming your standard log system.
Warning: Ensure Compliant Handling of HTTP Headers
Before delving into the process of logging HTTP headers with HAProxy, it is crucial to carefully understand the following. Failure to do so may potentially expose you or your company to significant risks and consequences.
HTTP headers can contain sensitive information related to privacy, which falls under the purview of the GDPR (General Data Protection Regulation) or any similar regulation such as LGPDP (Lei Geral de Proteção de Dados Pessoais) in Brazil. It's important to note that certain HTTP headers, like session cookies, are used to provide access to specific sections of a website and may sometimes convey private information. To comply with relevant data protection regulations, you must ensure the privacy and security of user information when handling and logging these headers.
The examples provided below assume your infrastructure meets all necessary privacy and security requirements. This blog post will focus solely on presenting solutions for logging headers, rather than addressing how to secure these logs. It is your responsibility to ensure that the necessary privacy and security measures are in place when implementing these examples.
What Are HTTP Headers?
HTTP headers are additional pieces of information sent between client and server during an HTTP transaction. Headers can be grouped under request headers and response headers.
Request headers are sent from the client to the server, bearing information on the client’s preferences and any necessary information about the request (this includes sensitive information).
Response headers are sent from the server to the client, instructing the client on how to handle additional information about the response.
The instructions in this blog post focus on gathering information from request headers. Logging these HTTP headers can be useful in understanding requests, identifying threats, and monitoring the flow of data.
Using the Logging System
When it comes to operationalizing your log data, HAProxy provides a wealth of information. If you are just getting started, read our blog post, Introduction to HAProxy Logging, to learn how to set up HAProxy logging, target a Syslog server, understand the log fields, and discover some helpful tools for parsing log files. If you want to go deeper and see HAProxy logs in action, watch our on-demand webinar, Deep Dive Into HAProxy Logging.
HAProxy offers a comprehensive configuration option to generate a log line that includes all headers. The req.hdrs
sample fetch is the simplest way to achieve this. It retrieves the request headers block and returns it as a single contiguous string represented as an HTTP/1.1 request, one header per line, including the final empty line.
To ensure accurate debugging and logging of the original headers sent by the client, call req.hdrs
as early as possible, before any header modification takes place. The res.hdrs
sample fetch reflects the request headers exactly as they appear when req.hdrs
is invoked. By calling it early, you can obtain the unmodified headers for effective debugging purposes.
A log line is generated after the HTTP transaction completes, which means the request buffer is no longer accessible. However, the result of the req.hdrs
sample fetch can be stored in a transaction variable and used in a log-format
directive. This allows you to include the headers in the log output.
global | |
log 127.0.0.1 local0 | |
log 127.0.0.1 local1 notice | |
[...] | |
defaults | |
log global | |
mode http | |
option dontlognull | |
[...] | |
# This is a basic proxy configuration that appends all request headers to the | |
# end of the standard log line. Please be aware that if you are using an older | |
# version of HAProxy, the predefined variable "HAPROXY_HTTP_LOG_FMT" | |
# introduced in HAProxy 2.7 should be replaced with the default log format: | |
# | |
# "%ci:%cp [%tr] %ft %b/%s %TR/%Tw/%Tc/%Tr/%Ta %ST %B %CC %CS %tsc %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs %{+Q}r". | |
frontend with-syslog | |
bind *:8480 | |
log-format "${HAPROXY_HTTP_LOG_FMT} hdrs:%{+Q}[var(txn.req_hdrs)]" | |
http-request set-var(txn.req_hdrs) req.hdrs | |
# put your configuration here |
With this example, the following log line is generated. The headers are appended at the end of the standard log line. Any delimiters, such as newline and carriage return, are converted to their ASCII numbers and can be used later for header splitting purposes. Additionally, the empty line that separates the headers from the body is also logged.
Jul 5 05:23:51 localhost haproxy[15114]: 172.29.1.14:65076 [05/Jul/2023:05:23:51.240] with-syslog with-syslog/<NOSRV> 0/-1/-1/-1/0 200 91 - - LR-- 1/1/0/0/0 0/0 "GET / HTTP/1.1" hdrs:"host: 172.16.29.13:8480#015#012user-agent: curl/7.64.1#015#012accept: */*#015#012#015#012" |
Log Length and Storage Capacity
By default, HAProxy truncates log lines after 1024 characters. However, you can adjust this setting by adding a len
option (more details below) to the log
directive in the global
section of the configuration. This allows HAProxy to send a larger amount of data to syslog.
The handling of messages in syslog is defined by RFC 5424. According to section 6.1 of the RFC, the recommended minimum size any log server should be able to process is 480 bytes, with a recommended size of 2048 bytes. There is no maximum value defined in this RFC. Furthermore, its predecessor RFC 3164, which has been the official one for a decade, strictly forbids any client to send packets larger than 1024 bytes (section 4.1), which has set a hard-coded limit in many implementations to this exact value. Therefore, it is crucial that the software implementation of your logging chain (from HAProxy to the syslog daemon and all the processing tools until the storage) supports larger log line sizes.
Additionally, each log line must fit into a UDP packet, which means the maximum length is slightly less than 64k due to protocol overheads.
global | |
# To avoid truncation of the log line by HAProxy, it is necessary to | |
# increase the access log length. However, it's important to note that | |
# even with an increased access log length, there is no guarantee that | |
# the entire log line will not be truncated by the logging system, such | |
# as syslog or any other software used for logging. | |
log 127.0.0.1 len 4096 local0 | |
log 127.0.0.1 local1 notice | |
[...] |
This setup has limitations and can consume a significant amount of storage capacity. The size of HTTP request headers can rapidly increase, depending on the type of information being exchanged between the client and server. It is not uncommon to encounter requests with 2 or 3 kilobytes of headers. For larger websites, at the time of writing, the accumulated log data for headers could reach several hundred gigabytes or more each day. This factor should be carefully considered when implementing HTTP header logging to ensure appropriate storage resources are available.
To address this, we can send header logs to remote storage. This approach allows us to efficiently manage our header logs without bogging down local systems with the headers generated during HTTP transactions. Offloading header logs to a dedicated remote storage not only alleviates system resources but streamlines header analysis. We outline how to send header logs to remote storage below.
Using Lua to Send Header Logs to Remote Storage
Lua is a versatile language that allows you to extend the functionality of your load balancer (you can refer to our blog post on Lua and its new event framework in HAProxy for more information). In Lua, you can access almost all sample fetches using the f:FETCH_NAME()
function. Functions with a dot in their name can be accessed by replacing the dot with an underscore (i.e., req.hdrs
becomes req_hdrs
).
The provided Lua script serves as an example and performs a few actions. First, it creates a dump_headers
function that gathers the headers as a string using txn.f:req_hdrs()
, which is equivalent to the req.hdrs()
fetch sample. Then, it uses the HTTP client to send the headers to an application on a remote server for offloaded processing. The dump_headers
function is registered as an action for http-request
rules, enabling its usage in a proxy definition at a later stage.
-- The dump_headers function retrieves the headers of the current | |
-- transaction and sends them to an archiving application over HTTP | |
-- using the httpclient. | |
-- | |
-- To utilize this function in a proxy, the following line needs to be | |
-- added: | |
-- | |
-- http-request lua.dump_headers | |
-- | |
-- This line instructs HAProxy to invoke the dump_headers function for | |
-- each HTTP request and send the headers to the specified archiving | |
-- application. | |
local function dump_headers(txn) | |
local hdrs = txn.f:req_hdrs() | |
local headers = { | |
-- We are aware of the body size, and we prefer not to send | |
-- it in chunks. | |
["content-length"]= {string.len(hdrs)}, | |
} | |
-- Use POST method to send the header to the remote storage | |
-- application. | |
local httpclient = core.httpclient() | |
local response = httpclient:post{ | |
url="http://127.0.0.1:8001/", | |
body=hdrs, | |
headers=headers, | |
timeout=10 | |
} | |
end | |
-- Associate the dump_headers action with the dump_header function for | |
-- http-request rules. | |
core.register_action('dump_headers', {'http-req'}, dump_headers) |
Now, to configure a proxy that invokes the dump_headers
action to send the headers to the remote storage application. To log the client's original headers accurately, we recommend that you invoke dump_headers
as early as possible in the configuration. This is because dump_headers
internally relies on res.hdrs
to access the request headers.
# This proxy utilizes the lua.dump_headers function to send the | |
# request header to a remote web application responsible for storing | |
# the header in a secure location. | |
frontend lua-log | |
bind *:8481 | |
http-request lua.dump_headers | |
# put your configuration here |
On the remote machine that processes the logs, we require a basic web service. While it is possible to use an existing one, for the purpose of this blog post we will create our own minimalistic server using Python due to its simplicity. The Lua script expects the server to receive a POST request. The provided Python script serves as an example, but it is crucial to note that you should not use this example in production environments. If you require such a service, we recommend creating your own implementation with proper resilience and security measures in place, or use an off-the-shelf solution like Logstash, Fluentd, or Loki, although often these tools expect to receive a JSON message.
#!/usr/bin/env python3 | |
# This is an example of a web service that demonstrates how to receive | |
# a request header in a POST request and asynchronously write them to | |
# a file. It is provided as an illustration of how to log HTTP headers | |
# using HAProxy. | |
# | |
# Please note that this server does not implement any security or | |
# performance measures. | |
# | |
# IMPORTANT: DO NOT USE IN A PRODUCTION ENVIRONMENT! | |
from http.server import BaseHTTPRequestHandler, HTTPServer | |
import threading | |
import queue | |
q = queue.Queue() | |
# The log file is currently located in /tmp in this example because | |
# it doesn't require specific permissions to write to. You can change | |
# it to /var/log with the appropriate permissions if needed. | |
log_file='/tmp/headers.log' | |
def worker(): | |
t = threading.currentThread().getName() | |
with open(log_file, 'a') as fh: | |
while True: | |
headers = q.get() | |
# Concatenate all headers into a single line using the delimiter | |
# "@@". | |
headers = bytes.decode(headers, 'utf-8') | |
headers = headers.rstrip().replace('\r\n','@@') | |
# Write headers and flush the file. | |
fh.write(headers+'\n') | |
fh.flush() | |
q.task_done() | |
class Server(BaseHTTPRequestHandler): | |
def do_POST(self): | |
# Read the Content-Length header to determine the number of | |
# bytes that need to be read from the body. Post the body | |
# content to the queue for asynchronous processing as swiftly | |
# as possible. | |
content_length = int(self.headers['Content-Length']) | |
try: | |
q.put(self.rfile.read(content_length)) | |
finally: | |
self.send_response_only(200, "OK") | |
self.end_headers() | |
def run(): | |
# Start http server | |
httpd = HTTPServer(('', 8001), Server) | |
# Turn-on the worker thread. | |
threading.Thread(target=worker,daemon=True).start() | |
try: | |
httpd.serve_forever() | |
except KeyboardInterrupt: | |
pass | |
# Wait for all workers before exiting. | |
q.join() | |
httpd.server_close() | |
if __name__ == '__main__': | |
run() |
You can modify this example to log the server's responses as well, but we won’t cover that in these instructions. Keep in mind that Lua has access to only tune.bufsize
bytes (default is 16k) for each request and response. Attempting to log the body would not be a wise idea, as no more than tune.bufsize
of it are accessible before processing the request, and it may quickly overflow the log server's maximum message size.
Relate HAProxy Logs to Lua with Unique IDs
With the current setup, we have two separate log events in two different files for each request: one from HAProxy and another from the Python application responsible for storing headers. However, there is no direct way to correlate lines from these two files. Thankfully, HAProxy provides a solution: the unique-id-format
directive creates a unique identifier for each request. By including this unique ID in both the HAProxy logs and the header dump, we can now effectively correlate the information between the two sources.
# This proxy operates similarly to lua-log, but with the addition of | |
# generating a unique ID for each request. It appends this unique ID | |
# to the x-unique-id header before invoking lua.dump_headers and | |
# includes it in the request logs for reference. | |
frontend lua-log-id | |
bind *:8482 | |
log-format "${HAPROXY_HTTP_LOG_FMT} id:%{+Q}[var(txn.unique_id)]" | |
unique-id-format "%{+X}o %ci:%cp_%fi:%fp_%Ts_%rt:%pid" | |
# If an x-unique-id header already exists (if several proxies are | |
# chained), it will not be overwritten. However, it's important to | |
# be cautious with this method as the x-unique-id header can be | |
# potentially forged. It is recommended to implement additional | |
# validation mechanisms to ensure that only trusted sources are | |
# allowed to provide the x-unique-id header. | |
http-request set-header x-unique-id %[unique-id] if ! { req.hdr(x-unique-id) -m found } | |
http-request set-var(txn.unique_id) req.hdr(x-unique-id) | |
http-request lua.dump_headers | |
# put your configuration here |
Use Asynchronous Operation to Queue Request Headers Efficiently
In the previous example, if the remote HTTP server is unavailable or becomes overwhelmed by the large volume of data to process, the request is blocked until a timeout is triggered. This can result in the client experiencing latency, making the service less performant.
However, the new Lua queuing framework introduced in HAProxy 2.8 enables you to place the data in a queue to be processed asynchronously by a worker. In this scenario, the Lua action is highly efficient as its primary purpose is to swiftly enqueue the request headers, allowing for near-instantaneous processing of the request.
Each queued item in memory requires approximately 1 kilobyte of RAM for a 1 kilobyte header block. In this example, the queue size is limited to 10,000 items, which corresponds to about 5MB of memory for an average header block size of 3 to 5 kilobytes. This restriction prevents HAProxy from exhausting its allocated memory from tune.lua.maxmem
.
If the remote server experiences a prolonged failure and the queue reaches its maximum capacity, incoming headers will not be added to the queue. Instead, their logging will be silently dropped until space becomes available by removing previous items from the queue.
-- Create a queue to accumulate all headers for subsequent processing. | |
local headerqueue = core.queue() | |
-- dump_headers_async retrieves all request headers and pushes them to | |
-- the header queue. This approach minimizes the impact on the request | |
-- processing time as the data is processed asynchronously. The queue | |
-- is only allowed to hold 10000 items in order to avoid HAProxy from | |
-- using all of its allocated RAM in the event of a prolonged failure | |
-- of the remote server. Once this limit reached, the header logging | |
-- is silently discarded. | |
-- | |
-- To utilize this function in a proxy, the following line needs to be | |
-- added: | |
-- | |
-- http-request lua.dump_headers_async | |
-- | |
local function dump_headers_async(txn) | |
if headerqueue:size() > 10000 then | |
return | |
end | |
local hdrs = txn.f:req_hdrs() | |
headerqueue:push(hdrs) | |
end | |
-- send_headers is a task worker which take all headers placed in the | |
-- header queue and posts them to the external storage application. | |
function send_headers() | |
local httpclient = core.httpclient() | |
while true do | |
local hdrs = headerqueue:pop_wait() | |
-- We are aware of the body size, and we prefer not to send | |
-- it in chunks. | |
local headers = { | |
["content-length"]= {string.len(hdrs)}, | |
} | |
-- Use POST method to send the header to the remote storage | |
-- application. | |
local response = httpclient:post{ | |
url="http://127.0.0.1:8001/", | |
body=hdrs, | |
headers=headers, | |
timeout=10 | |
} | |
end | |
end | |
-- start send_headers worker. | |
core.register_task(send_headers) | |
-- Associate the dump_headers_async action with the dump_header_async | |
-- function for http-request rules. | |
core.register_action('dump_headers_async', {'http-req'}, dump_headers_async) |
To use dump_headers_async
, we need to modify the existing lua-log-id
proxy to invoke this function instead of dump_headers
. We can simply create a new proxy that allows us to compare both approaches side-by-side.
# The lua-log-async proxy is essentially the same as lua-log-id, but | |
# instead of calling lua.dump_headers, it invokes lua.dump_headers_async | |
# to push the headers into the header queue for asynchronous processing. | |
frontend lua-log-async | |
bind *:8483 | |
log-format "${HAPROXY_HTTP_LOG_FMT} id:%{+Q}[var(txn.unique_id)]" | |
unique-id-format "%{+X}o %ci:%cp_%fi:%fp_%Ts_%rt:%pid" | |
http-request set-header x-unique-id %[unique-id] if ! { req.hdr(x-unique-id) -m found } | |
http-request set-var(txn.unique_id) req.hdr(x-unique-id) | |
http-request lua.dump_headers_async | |
# put your configuration here |
Now, even if the external application responsible for storing headers is experiencing issues or is slowed down, the main proxy can still process requests at maximum performance levels.
With the ability to log requests headers asynchronously, toggling this feature still requires commenting or uncommenting the http-request lua.dump_headers_async
line. Reloading HAProxy is necessary after making this change.
Conditional Logging by Client Source IP
So far, the example configuration logs headers for all clients, which may not be very convenient or efficient. To address this, you can change the lua.dump_headers_async
call to fire only when an incoming request's IP address matches an entry declared in a map file. If the client's source IP is found in the map, its headers will be logged. Otherwise, no action will be taken for that client. This conditional approach allows for more control over which clients' headers are logged, making the logging process more efficient and customizable.
To implement this feature, the http-request lua.dump_headers_async
line needs to be modified to http-request lua.dump_headers_async if { src -m ip -M -f headers.map }
. This modification allows the lua-log-async-conditional
to conditionally log headers based on the presence of the client's source IP in the headers.map
file.
# The lua-log-async-conditional proxy functions similarly to | |
# lua-log-async, but with an additional condition. It only sends logs | |
# to the remote logging server if the client's source IP is listed in | |
# the headers.map file. This file must exist in the filesystem and is | |
# empty by default. If you want to activate headers logging for a | |
# specific client, you can manage it using the debug-status-manager | |
# proxy. | |
listen lua-log-async-conditional | |
bind *:8484 | |
log-format "${HAPROXY_HTTP_LOG_FMT} id:%{+Q}[var(txn.unique_id)]" | |
unique-id-format "%{+X}o %ci:%cp_%fi:%fp_%Ts_%rt:%pid" | |
http-request set-header x-unique-id %[unique-id] if ! { req.hdr(x-unique-id) -m found } | |
http-request set-var(txn.unique_id) req.hdr(x-unique-id) | |
# The logging of headers will be performed only if the client's IP | |
# address is found in the headers.map file. The "-M" flag ensures | |
# that the headers.map file is loaded as a map instead of an ACL. | |
http-request lua.dump_headers_async if { src -m ip -M -f headers.map } | |
# put your configuration here |
To activate logging in HAProxy, you can use the Runtime API:
socat unix:/var/run/haproxy.sock - <<< "add map headers.map 172.29.1.14 1" |
To deactivate logging for that IP:
socat unix:/var/run/haproxy.sock - <<< "del map headers.map 172.29.1.14" |
Instead of specifying a single address, you can define a full range of IP addresses by using CIDR notation. For example, use 172.29.1.0/24 instead of 172.29.1.14.
Similar Articles:
Activate logging on demand with a dedicated proxy
You have the option to use a dedicated proxy to manage the logging feature. In the next example, the proxy named debug-status-manager
is specifically designed for this purpose.
First, it checks the client's source address to ensure that it originates from a trusted source, which in this example is the range 172.16.29.0/24. Since the source IP can be checked when the connection is established, it is more efficient to reject unauthorized clients using a tcp-request connection
rule, especially if the manager operates over HTTPS (although this is not the case in this example). This TCP rule would be executed before any TLS handshake takes place.
Second, if the request path is not /debug-on
or /debug-off
, or if the clientip
parameter is not found, the request will be rejected.
Finally, the content of the headers.map
file is modified based on the path used to enable or disable the debug feature. These modifications are done in memory only, which means the actual file on the filesystem remains unchanged. If the proxy is reloaded, the headers.map
file will be reset to its original state.
# The debug-status-manager proxy allows the control of the content | |
# stored in the headers.map file in memory, without altering the file | |
# on the filesystem. | |
# | |
# Before making any changes, a series of sanity checks are performed: | |
# * The client must connect from the local network. | |
# * The request path must match a known path, and the "clientip" | |
# parameter with the client IP must be set. | |
# | |
# The request will return the debugging status for the provided client | |
# IP. | |
# | |
# To activate logs for a specific client: | |
# curl '172.16.29.13:8084/debug-on?clientip=172.29.1.14' | |
# To activate logs for a range of clients: | |
# curl '172.16.29.13:8084/debug-on?clientip=172.29.1.0/24' | |
# To deactivate a request header logging session: | |
# curl '127.0.0.1:8084/debug-off?clientip=172.29.1.14' | |
frontend debug-status-manager | |
bind 172.16.29.13:8084 | |
# If the connection is not initiated from the local network, reject | |
# it as quickly as possible. | |
tcp-request connection reject if !{ src -m ip 172.16.29.0/24 } | |
# The request path must be either "/debug-on" or "/debug-off" and it | |
# must include a parameter named "clientip". | |
http-request deny if !{ path -m str /debug-on /debug-off } || !{ url_param(clientip) -m found } | |
# Set the logging status for the provided client and return its | |
# current status. | |
http-request set-map(headers.map) %[url_param(clientip)] 1 if { path -m str /debug-on } | |
http-request del-map(headers.map) %[url_param(clientip)] if { path -m str /debug-off } | |
http-request return status 200 content-type text/plain lf-string "%[url_param(clientip)] %[url_param(clientip),map_ip(headers.map,0)]\n" |
Now, you can use the curl command to control the behavior of the proxy:
curl '172.16.29.13:8084/debug-on?clientip=172.29.1.14' | |
curl '172.16.29.13:8084/debug-off?clientip=172.29.1.14' | |
curl '172.16.29.13:8084/debug-on?clientip=172.29.1.0/24' |
Conclusion
HAProxy logs are very powerful and easy to configure to add some debugging information. However, since the initial syslog implementations were limited to 1024 bytes, it is clearly not designed to handle large HTTP request headers.
With Lua and the new HAProxy Lua queuing framework, new possibilities arise for adding custom debugging functionalities. We can harness the power and flexibility of HAProxy to create a feature that enables the logging of debugging information on demand.
Subscribe to our blog. Get the latest release updates, tutorials, and deep-dives from HAProxy experts.