Recently we have been considering moving some MySQL database services from our cloud servers to Amazon RDS to simplify our management tasks, and wanted to run atop on some Ubuntu 14.04 LTS machines to get an initial sizing for RDS instances.
While trying to apt-get install the atop package, I was getting a “disk full” error, but df -h said there were several GB’s left. It took a while to figure out the problem wasn’t disk space… I was out of inodes (basically, file handles.) Running df -i showed 100% of the inodes were used up which has a similar result. We monitor a lot of things, but inode usage was not one of them!
While figuring this out, I did manage to brick one of our servers, making this the day that all the work building our infrastructure with Chef paid off! Instead of panicking and having to work all night, I was able to build and configure a new machine in just a few minutes. But I digress.
Here’s what was going on: Most of these servers have been running a while, and we apply all the linux/Ubuntu security updates regularly, and this often updates the linux kernel. What we didn’t realize was that none of the old versions are ever automatically deleted. We ended up with a /usr/src directory that looked like this:
drwxr-xr-x 6 root root 4096 Aug 11 06:39 . drwxr-xr-x 10 root root 4096 Mar 25 2015 .. drwxr-xr-x 24 root root 4096 Mar 25 2015 linux-headers-3.13.0-48 drwxr-xr-x 7 root root 4096 Mar 25 2015 linux-headers-3.13.0-48-generic (many, many more unused versions here!) drwxr-xr-x 24 root root 4096 Aug 11 06:39 linux-headers-3.13.0-93 drwxr-xr-x 7 root root 4096 Aug 11 06:39 linux-headers-3.13.0-93-generic
And each version included a ton of individual inode-sucking files. Finally some of the machines had gotten to the point where kernel updates were failing silently due to the inode shortage.
It turns out apt does not play well once something’s messed up, and as I said, subsequent fiddling bricked a machine before I figured out this solution.
We had to start by making some headroom using dpkg directly. While you can shoot your foot off by deleting the kernel version you are using, and possibly recent ones, we had good success by starting with the oldest ones and only deleting a couple, like this:
sudo dpkg -r linux-headers-3.13.0-48-generic sudo dpkg -r linux-headers-3.13.0-48
etc… This ran ok and freed up enough inodes so we could run apt-get without disk full errors. The autoremove option did all the work.
sudo apt-get -f autoremove
That removed the rest of the unused versions all at once, though we still had to manually “ok” a configuration option while uninstalling some of the packages.
From what I read, the default install of Ubuntu 16.04 LTS will default to removing old versions of kernels once they are not needed. Until then, I’m writing this down in case it happens again!