Consul connection drops:

Consul connection drops: "nf_conntrack: table full, dropping packet"

This error message means that the connection tracking table has reached its limit, causing various issues like network timeouts.

Roman Zipp, September 22nd, 2022

When we were stress testing our new infrastructure based on Nomad & Consul we quickly experienced issues where all packets were dropped after a couple of seconds. There were no issues indicating any problems with CPU load or network bandwidth. In addition, we weren't able to monitor the uptime of any jobs because HAProxy only displayed L7 CONN issues.

The evidence

After looking through all possible logfiles the kern.log showed a lot of errors which indicated a network issue.

nf_conntrack: table full, dropping packets.

Conntrack

A Conntrack Firewall is a type of firewall that uses the Linux kernel's connection tracking system (known as "conntrack") to improve network security. With this firewall, incoming and outgoing network traffic is examined to determine its state, and rules are applied based on that state.

The conntrack firewall is based on the idea that all network traffic should be classified into one of several states, such as "new," "established," or "related." These states are based on the protocol and port information contained in each packet, as well as other information such as source and destination addresses. The firewall then applies rules to each state, allowing or blocking traffic based on criteria set by the system administrator.

table full, dropping packets.

This error message means that the connection tracking table has reached its limit, causing various issues like network timeouts that can occur randomly or consistently.

The conntrack table stores information about active connections that are being processed by the kernel. In Nomad clusters, this frequently happens when jobs communicate with external endpoints or other services within the same cluster via Consul. NAT and stateful firewall rules are used in such situations, and these rules are maintained in the conntrack table as entries.

The fix

You can determine the current maximum table size via the following command:

cat /proc/sys/net/netfilter/nf_conntrack_max

You can use the sysctl utility to increase the maximum table limit. I've used a value of 262144 for a 8 GB ubuntu system. Please keep in mind that the table will be regurarily flushed.

sysctl -w net.netfilter.nf_conntrack_max=<value>

To persist this change, add the following config file to your sysctl daemon which will be run on each boot.

echo "net.netfilter.nf_conntrack_max=<value>" > /etc/sysctl.d/10-conntrack-max.conf sysctl -p /etc/sysctl.d/10-conntrack-max.conf