Fear and loathing with Chef and nginx


I always hate having to explain why I spent 4 hours on one line of code:

override['nginx']['pid'] = '/run/nginx.pid'

It’s not really even code, it’s just a configuration setting.  I ran into the problem using Chef, the opscode nginx cookbook, and the very-new Ubuntu 14.04 LTS.  Not sure if this applies to other configurations but debugging this sort of thing takes a long time with all the vagrant up-ing and all, so in case this saves someone else a few hours…

Symptoms

Whenever I’d “vagrant up” a new instance, and then run a Capistrano deploy, everytthing looked ok but would not serve web requests.

  • Web response was 404.  Logs showed the missing content was something like “/var/www/nginxdefault/index.html”, the default that’s created with a new install of nginx (though the chef run had already deleted it.)
  • Cap tasks to manage nginx were ineffective, and SSH’ing to the server and running the raw commands manually against nginx (“nginx -s reload” etc) didn’t solve anything either.
  • Web requests failed until I did a reboot then magically, everything worked fine.

After poring over all the code in the whole stack from the chef run list through the Rails application code, it turns out the problem was simply…

The pid file location!

It turns out that the pid file setting in the config file (/etc/nginx/nginx.conf ) and the startup file (/etc/init.d/nginx)  was

/var/run/nginx.pid

but the pid file for the running process created from the initial install was actually at

/run/nginx.pid

Issuing nginx command line commands didn’t work because they referred to the config file for the pid location.  The only simple way to get rid of the bad nginx processes was kill -9, or a reboot.

 Simple solution

The right thing to do is to fix the chef recipe so the new nginx processes match what’s in the recipe configuration, but with new versions of Ubuntu, the opscode guys must be rather busy.  I guess I should figure this out and send a pull request .  But the quick solution is shown on the second line of this post.

Since it’s not obvious how the initial startup pid file is set, it’s easier just to set the pid file location defined in the nginx config and startup files to match what it turns out to be.  Then you can control nginx properly all the way through the Capistrano deploy, and over reboots.

Yep, it’s brittle.  If the nginx package or the chef recipe changes that, the same old problem is back.

Update…

I ran a clean build with Ubuntu 14.04 LTS, and installed the default nginx package manually.  The default config files do not specify the pid file location explicitly.  The install leaves a running instance of nginx with the pid file at /run/nginx.pid.

It turns out this is actually not a new issue at all.  It’s described as a bug here nginx.pid location changed for Ubuntu releases 11.04 and up on 09/2012, and the report references an 08/2011 AskUbuntu question Why has /var/run been migrated to /run?.  Apparently starting with Ubuntu 13, /var/run is no longer symlinked to /run and this will start breaking.

Bottom line is that now, if you do not happen to pick the actual default pid file location (which is not what’s set by the recipe defaults) the pid file location is changed after the service is running.  Then you’ll have to manually kill the original nginx processes or just reboot.  Not what anyone wants.

, ,