OpenVz, /proc/user_beancounters and tcpsndbuf

If you're using OpenVz, you owe it to yourself to take a look at /proc/user_beancounters every now and again. Just today at work we were having bizarre problems with one of the containers: Everything would seemingly work fine. Then the load would spike from 0.something to 150+ in less than a minute. Even weirder, when I tried strace'ing some of the processes that seemed to hog the CPU, things looked normal. Imagine my surprise when ran top again, and the offending processes suddenly didn't use much CPU anymore - each time I strace'd a process, it's CPU usage would drop. user_beancounters to the rescue:

# more /proc/user_beancounters

Version: 2.5
       uid  resource           held    maxheld    barrier      limit    failcnt
[snipped other values]
            tcpsndbuf        663088    3190632    4000000    4000000      90686

Uh-oh. The failcnt (in red) was huge (This output is actually from AFTER the problem was fixed - that's why failcnt is large even though "maxheld" which shows the maximum value reached is lower than barrier and limit). failcnt shows when the limits have been reached, and so in a properly running and setup system it should remain fairly stable, though not necessarily at 0 (the odd massive spike in "something" could cause an error here or there - things can't always be perfect). In this case the failcnt shot up each time the problem occurred - we observed the "held" value hitting the limit once to make sure it was the cause before increasing it:

vzctl set 204 --save --tcpsndbuf 4000000:4000000

The line above was our final change - we increased it gradually, but each time it kept going up until it got dangerously close to the new limit. "maxheld" shows how high it eventually got. This problem is a nasty one - the processes did not handle hitting the tcpsndbuf limit very well, and so if one process hit it, the container started spinning out of control. Everything slowly ground to a halt as Apache handled connections slower, causing more simultaneous servers to be returning data, requiring more buffers, causing more of them to lock up as they were unable to complete, causing memory usage to spin out of control too, to the point where I couldn't even get into the container. Luckily for us recovery is one of the areas where OpenVz really shines, though. Because the processes of the containers are visible in the host as normal processes, I could kill enough of the runaway processes from the host to get back in, restart Apache and clear things up, and then proceed to analyze and fix the problem without even having to consider restarting the whole container. Setting limits judiciously to make sure the host will always have enough resources is of course key to this. Another benefit of this is that you can do certain health monitoring from the "outside" that would be impossible to reliably do with a monitoring process running inside the container itself - from the host I could still look at what limits where being hit etc. at the same time as the container was so loaded and messed up that Nagios etc. on the container itself couldn't run (nor send data out over TCP - it would hang because of the problem we were facing in the first place) In this specific case the container was a "legacy" container that holds a lot of different stuff for different clients that used to run directly on a single host. Virtualizing it let us migrate the sites to a new host with no downtime as well as keep in-sync copies on other hosts for near immediate takeover on failure while we work to migrate those clients to an even more failure tolerant cluster. Through some brutal routing tricks (excessive abuse of arp pings and temporary address rewrites with iptables on our firewall to ensure clients got to the new host immediately) we actually migrated these services off the physical host and into a container with no site downtime. The fact there's so much stuff on it made not having to do a reboot quite essential. Even then, "rebooting" OpenVz containers is blindingly fast, since the container doesn't need to go through the typical boot motions of checking hardware, checking filesystems etc. that a physical box or even Xen usually would. But that's a digression. The morale of this story is that if you see "weird stuff" happening to your OpenVz containers, /proc/user_beancounters should be one, if not THE, first place(s) you look.

OpenVz, /proc/user_beancounters and tcpsndbuf 2008-04-16