Linux kernel performance has come a long way in the last few years, especially in the 2.6/3.x kernel line. However, interrupt handling can become a problem at high IO rates, especially for the network.
We’ve seen this on high-performance systems saturating one or more 1Gbps NICs. As well as in VMs with lots of small packets, with a recent overload at about 10,000 packets per second.
Identifying the Cause
The reasons are clear: in the simplest modes, the kernel processes each packet via a hardware interrupt from the NIC. But as the packet rate rises, these interrupts overload the SINGLE CPU that handles them. This single CPU concept is important and poorly-understood by Sysadmins.
On a common 4-16 core system, an overloaded core is hard to see since CPU utilization is 6-25% and the server looks normal. But the system will run poorly, dropping packets with no warning, no dmesg log items, and with nothing appearing to be wrong.
If you look in top in multi-cpu mode (run top and press 1) at the %si item (System Interrupt) or in mpstat IRQ item (mpstat -P ALL 1) you can see this – on a busy system. It’s clear that interrupts are high, and with advanced mpstat usage you can see which CPU and driver is the problem.
Checking the IRQ Load
You need a newer version of mpstat that can run -I mode. Then to see IRQ load, run:
mpstat -I SUM -P ALL 1
Anything over a 5,000/second is a lot, while 10-20,000/second is extreme.
To find out what driver/item is creating the load, run:
mpstat -I CPU -P ALL 1
This output is hard to read but you need to trace the right column to see which IRQ is causing the load, such as 15, 19, 995, etc. You can also specify just the CPU you want to simplify the display of by running: mpstat -I CPU -P 3 1
For CPU #3, note that top, htop, and mpstat may differ in how they number CPUs. They can start at 0 or 1; both top and mpstat use 0, 1, 2 but htop uses 1, 2, 3.
Once you know the IRQ number, look at the interrupt table via “cat /proc/interrupts” and find the number from mpstat’s load. Now you can see the driver using that IRQ. That file will also show you the # of interrupts so you can see which is loading the system.
First, make sure you are running irqbalance, which is a nice daemon that will spread your IRQs across CPUs. This is important on a busy system, especially with two NIC cards. By default CPU 0 will handle all interrupts and can easily become overloaded. irqbalance spreads these around to lower the load. For the most performance, you can manually balance these to spread across sockets and hyperthread-shared cores, but this is usually not worth the trouble.
But even after IRQ balancing, a single NIC can still overload a single CPU core. The solution to this depends on your NIC and driver, but there are two helpful choices.
Multiple NIC Queues
The first is multiple NIC queues, which some Intel NICs have. If they have four queues, these can have different interrupts and can thus be managed by four CPU cores, spreading the load. Usually the driver automates this and you can check via mpstat.
The other and often more important driver option is IRQ coalescing. This powerful function allows the NIC to buffer several packets before calling the IRQ, saving a massive amount of time and load on the system.
For example, if the NIC buffers 10 packets, the CPU load is reduced by almost 90%. This function is usually controlled by the ethtool utility using the ‘c’ options. Some drivers need this set at driver load time; see your documentation on how to set this. For example, some cards like the Intel NICs have automatic modes that try to do the best thing based on load.
Finally, some drivers such as we saw on our VMs just don’t support multiple queues or coalescing. In this case, the CPU limits performance until you can change NICs or drivers.
This is a complex area that’s not well-known, but a few good techniques can improve performance on busy systems. Also, a little extra monitoring can help find and diagnose these hard-to-see problems.