Ubuntu 14.04… apt-get install “disk full” error (aaack!)

Recently we have been considering moving some MySQL database services from our cloud servers to Amazon RDS to simplify our management tasks, and wanted to run atop on some Ubuntu 14.04 LTS machines to get an initial sizing for RDS instances.

While trying to apt-get install  the atop package, I was getting a “disk full” error, but df -h said there were several GB’s left. It took a while to figure out the problem wasn’t disk space… I was out of inodes (basically, file handles.) Running df -i  showed 100% of the inodes were used up which has a similar result.  We monitor a lot of things, but inode usage was not one of them!

While figuring this out, I did manage to brick one of our servers, making this the day that all the work building our infrastructure with Chef paid off!  Instead of panicking and having to work all night, I was able to build and configure a new machine in just a few minutes.  But I digress.

The problem

Here’s what was going on:  Most of these servers have been running a while, and we apply all the linux/Ubuntu security updates regularly, and this often updates the linux kernel.  What we didn’t realize was that none of the old versions are ever automatically deleted.  We ended up with a /usr/src directory that looked like this:

And each version included a ton of individual inode-sucking files.  Finally some of the machines had gotten to the point where kernel updates were failing silently due to the inode shortage.

The solution

It turns out  apt  does not play well once something’s messed up, and as I said, subsequent fiddling bricked a machine before I figured out this solution.

We had to start by making some headroom using dpkg directly.  While you can shoot your foot off by deleting the kernel version you are using, and possibly recent ones, we had good success by starting with the oldest ones and only deleting a couple, like this:

etc… This ran ok and freed up enough inodes so we could run apt-get without disk full errors.  The autoremove option did all the work.

That removed the rest of the unused versions all at once, though we still had to manually “ok” a configuration option while uninstalling some of the packages.

From what I read, the default install of Ubuntu 16.04 LTS will default to removing old versions of kernels once they are not needed.  Until then, I’m writing this down in case it happens again!

basic_vsftpd cookbook for Chef

Recently I was rebuilding an old “virtual user” vsFTPd server, this time using Chef.  Of course I started by looking for a decent vsftpd cookbook.   There are several popular ones, for example:

  • The vsftpd “supermarket” cookbook might be fine for some users, but it’s not been updated since 2010, and it seems to lack a good way to override most of the default vsftpd.conf settings.
  • TheSerapher’s chef-vsftpd cookbook on github is popular but it’s opinionated with respect to defaults and seems to be aimed at setting up FTP for local users.

I was disappointed that these didn’t suit our needs, and a little bummed I wasted so much time reading the code to figure that out.  But by that time, I was so brushed up on vsftpd config that, against all advice, I started from scratch and created the basic_vsftpd cookbook.

My goal was to create a general purpose cookbook designed on three principles:

  • To be as simple as possible, only about installing vsftpd and nothing else
  • To let you create any possible vsftpd configuration
  • To set no defaults and make no assumptions about the intended use
  • OK, four.  Using code that’s easy to read and understand

In other words, the goal was to create a solid base recipe that is easy to use or extend via a wrapper cookbook.  The pleasing result was that after building/testing this cookbook, using it to deploy a real FTP server was a dream.  I’ve made this a public repo in hopes others will find it useful as well.

basic_vsftpd

“A basic and fully configurable cookbook for the vsftpd package.”  on github, and at the Chef supermarket.

Recipes

  • default – installs and configures the vsftpd package
  • chroot_list – Creates a chroot_list file for vsftpd
  • userlist – Creates a userlist file for vsftpd

Resources

  • user_conf – Creates a user configuration file in the vsftpd user_config_dir directory

Authenticating vsFTPd virtual users with pam_pwdfile.so

For years, the standard way to set up password authentication for vsFTPd FTP server was to use PAM with the pam_userdb.so module.  It looks great on paper, but if you have tried this, you know that generating a Berkeley DB password file is a PITA, debugging is blind and brutal, and password file generation does not play well with automated deployments.  On top of that, it turns out that pam_userdb.so is (apparently) being phased out of the PAM package.

I stumbled across the pam_pwdfile.so module and it worked for us without all the confusing dead-ends we got with userdb. This module seems to be supported long-term, and uses an htpasswd-like password file. Here’s how to set it up, in four steps:

Installing pam_pwdfile.so

We’re using Ubuntu 14.04 at the moment, and you must install this module as a package:

or in a chef recipe, simply:

Creating a PAM service

Create this file at /etc/pam.d/vsftpd

This creates a “PAM service” named vsftpd.   The debug option dumps some extra info to /var/log/auth.log and is very helpful in getting things set up the first time.  The pwdfile= option denotes the filename of the user/pw database we’ll create next.

Configuring vsFTPd

To use this new service, just add the following option to /etc/vsftpd.conf.

Creating the user/password file

This is the payoff.  There are a couple ways to generate the password file.  From the command line you can user the Apache htpasswd utility, and there seems to be a number of other tools to generate these files as well.

But we’re deploying with Chef and it would be great to be able to automate our deployment, and with this file format we can do it. The key here is to know you can create a properly-hashed password using  openssl passwd -1 mypa$$w0rd . Here’s an example of how to create the whole pwdfile in a Chef recipe:

That’s it.  VIrtual users should now be able to log in using passwords hashed in the passwd file.  (I’m assuming the rest of the vsFTPd configuration supports using virtual users.  This can be a can of worms to get it set up the way you want, but is beyond the scope of this post.)

Troubleshooting

First off, don’t forget to restart the vsftpd service after all the changes… and make sure it starts!  A common issue is that certain config error seem to send vsftpd into a restart loop and the system kills it.  So your start messages looks good but then it dies.

In my experience, the most likely problem here will be with the vsftpd setup and not the authentication.  To effectively “stub out” the authentication, temporarily replace the /etc/pam.d/vsftpd file with this:

This allows any user/pw to log in.  If you cannot log in now, your problem is with vsFTPd, your firewall, etc.  (Don’t forget that this leaves the FTP server wide open!)

For PAM problems, the debug option in the pam service file is helpful, as is just watching the FTP connection/login conversation.

Good luck.  I hope this saves you some of the 8+ hours we spent screwing with promising “solutions” that did not work!

Fear and loathing with Chef and nginx

I always hate having to explain why I spent 4 hours on one line of code:

It’s not really even code, it’s just a configuration setting.  I ran into the problem using Chef, the opscode nginx cookbook, and the very-new Ubuntu 14.04 LTS.  Not sure if this applies to other configurations but debugging this sort of thing takes a long time with all the vagrant up-ing and all, so in case this saves someone else a few hours…

Symptoms

Whenever I’d “vagrant up” a new instance, and then run a Capistrano deploy, everytthing looked ok but would not serve web requests.

  • Web response was 404.  Logs showed the missing content was something like “/var/www/nginxdefault/index.html”, the default that’s created with a new install of nginx (though the chef run had already deleted it.)
  • Cap tasks to manage nginx were ineffective, and SSH’ing to the server and running the raw commands manually against nginx (“nginx -s reload” etc) didn’t solve anything either.
  • Web requests failed until I did a reboot then magically, everything worked fine.

After poring over all the code in the whole stack from the chef run list through the Rails application code, it turns out the problem was simply…

The pid file location!

It turns out that the pid file setting in the config file (/etc/nginx/nginx.conf ) and the startup file (/etc/init.d/nginx)  was

but the pid file for the running process created from the initial install was actually at

Issuing nginx command line commands didn’t work because they referred to the config file for the pid location.  The only simple way to get rid of the bad nginx processes was kill -9, or a reboot.

 Simple solution

The right thing to do is to fix the chef recipe so the new nginx processes match what’s in the recipe configuration, but with new versions of Ubuntu, the opscode guys must be rather busy.  I guess I should figure this out and send a pull request .  But the quick solution is shown on the second line of this post.

Since it’s not obvious how the initial startup pid file is set, it’s easier just to set the pid file location defined in the nginx config and startup files to match what it turns out to be.  Then you can control nginx properly all the way through the Capistrano deploy, and over reboots.

Yep, it’s brittle.  If the nginx package or the chef recipe changes that, the same old problem is back.

Update…

I ran a clean build with Ubuntu 14.04 LTS, and installed the default nginx package manually.  The default config files do not specify the pid file location explicitly.  The install leaves a running instance of nginx with the pid file at /run/nginx.pid.

It turns out this is actually not a new issue at all.  It’s described as a bug here nginx.pid location changed for Ubuntu releases 11.04 and up on 09/2012, and the report references an 08/2011 AskUbuntu question Why has /var/run been migrated to /run?.  Apparently starting with Ubuntu 13, /var/run is no longer symlinked to /run and this will start breaking.

Bottom line is that now, if you do not happen to pick the actual default pid file location (which is not what’s set by the recipe defaults) the pid file location is changed after the service is running.  Then you’ll have to manually kill the original nginx processes or just reboot.  Not what anyone wants.

Extending logwatch with Chef

If you saw my previous post on Extending Logwatch, it may have occurred to you that even something as simple as manually creating three small files and saving them to the (different) correct locations with the right owners and permissions is rife with error.  Here we basically assume that anything you do manually will get screwed up, and we are seldom disappointed.

On the other hand, if you are already using Chef to deploy or manage servers, you may have realized what a nice little recipe this would be.

We did create a Chef cookbook for this task, and published it here https://github.com/flatrocks/restart-watch as an example of a simple and useful Chef application.  Here are the guts of the default recipe:

It simply:

  • Ensures that the basic logwatch recipe is included (to install logwatch,) and
  • Copies the three files required to create the service.   The three files that get copied are saved in the cookbook and are under version control along with the rest of the recipe. 

We have not automated everything yet, but we’re working on it, and small steps like this are paying off quickly.