A Puppet definition for a Ensembl API server

At SANBI we use Puppet to manage system configuration for our servers. This significantly reduces the management headache, allowing us to make changes in a central location (e.g. what the DNS server IP addresses are) and also allows us to create “classes” of servers for different roles. Recently we hosted a course on the Ensembl Genome Browser taught by Bert Overduin of the EBI. In addition to teaching people how to use the Ensembl website, Bert taught a number of students how to use the Ensembl Perl API. I set up a VM, using the web interface to SANBI’s private VM cloud, and created a puppet definition that would install the Ensembl API on the server. So here’s a commented version of the definition I created.

First, a note about puppet: Puppet configuration is declarative, in other words it defines what should be, not (necessarily) how to get there. Each configuration item creates a “resource”. Puppet provides a bunch of resource types out of the box and allows you to define your own types. For this server, I defined two types, the download and the unpack types, referring to a resource that required downloading and a resource that required unpacking respectively. These definitions went in my .pp file ahead of my server definition, along with a download_and_unpack type that combined the two definitions. The download_and_unpack type uses resource ordering, in its arrow (->) form. Since the Puppet configuration language is declarative, not imperative, you cannot assume that resources are created in the order that you specify, so if order is a requirement you need to specify it. Anyway here are these types:


define download( $url, $dist='defaultvalue', $download_dir='/var/tmp' ) {

    if $dist == 'defaultvalue' {
        $path_els = split($url, '/')
        $dist_file = $path_els[-1]
    } else {
        $dist_file = $dist
    }
    $downloaded_dist = "$download_dir/$dist_file"
    exec { "download_$title":
        creates => $downloaded_dist,
        path => '/usr/bin',
        command => "wget -O $downloaded_dist $url",
    }
}

define unpack ( $dist, $creates, $dest='/opt', $download_dir='/var/tmp' ) {
    $suffix = regsubst($dist, '^.*(gz|bz2)$', '\1', 'I')
    if $suffix == 'gz' {
         $comp_flag = 'z'
    } elsif $suffix == 'bz2' {
         $comp_flag = 'j'
    } else { 
         $comp_flag = ''
    }

    exec { "unpack_$title":
         creates => "$dest/$creates",
         command => "tar -C $dest -${comp_flag}xf $download_dir/$dist",
         path => '/bin',
    }
}

define download_and_unpack ( $url, $dist='defaultvalue', 
                             $creates, $dest='/opt',
                             $download_dir='/var/tmp' ) {
    if $dist == 'defaultvalue' {
        $path_els = split($url, '/')
        $dist_file = $path_els[-1]
    } else {
        $dist_file = $dist
    }
    download { "get_$title":
        url => $url,
        dist => $dist_file, 
        download_dir => $download_dir 
    }
    ->
    unpack { "install_$title":
        dist => $dist_file, 
        creates => $creates, 
        dest => $dest, 
        download_dir => $download_dir 
    }
}

Just one last notes on these types: they use exec, that executes a command. In Puppet exec will be executed each time the config is run, unless you use a creates, onlyif or unless statement. I thus use knowledge of what the commands do to specify that they should NOT be run if certain files exist.

Then there is one more type I need: a Ensembl course user with a particular defined password (the password matches the username – yes, very insecure, but this is on a throwaway VM for a single course). This is defined in terms of a user and an exec resource. The exec resource checks for the presence of the username *without* a password in /etc/shadow, and if it exists uses usermod to set the password (first generating it using openssl). Note that the generate() function runs on the Puppet server, not the client, so anything you are using there needs to be installed on the server (in this case it was openssl that was installed on the server already).


define enscourse_createuser {
    $tmp = generate("/usr/bin/openssl","passwd","-1",$name)
    $password_hash = inline_template('<%= @tmp.chomp %>')
    user { "$name":
      require => Group['enscourse'],
      ensure => present,
      gid => 'enscourse',
      comment => "Ensembl Course User $name",
      home => "/home/$name",
      managehome => true,
      shell => '/bin/bash',
    }
    exec { "/usr/sbin/usermod -p '${password_hash}' ${name}":
      onlyif => "/bin/egrep -q '^${name}:[*!]' /etc/shadow",
      require => User[$name],
    }
}

With the custom types out of the way we can start looking at the Puppet node that defines the “enscourse.sanbi.ac.za” server configuration:


node 'enscourse.sanbi.ac.za' inherits 'sanbi-server-ubuntu1204' {
    network::interface { "eth0":
         ipaddr  => "192.168.2.144",
         netmask => "255.255.255.0",
    }

We have an established “base machine definition” that we inherit from. This is *not* the recommended way to create Puppet configs, but we didn’t know that when we started using Puppet at SANBI. Puppet’s type system encourages a kind of mixin style programming, so there should be a set of Puppet classes e.g. sanbi-server or ubuntu-1204-server, and we should include them in the node definition. Just a quick note: Puppet classes are effectively singleton objects: they define a collection of resources that is declared once (as soon as the class is used in an include statement) in the entire Puppet catalog (a Puppet catalog is the collection of resources that will be applied to a particular system). Read Craig Dunn’s blog for a bit on the difference between Puppet defined types and classes.

We then define the network interface parameters (an entry on SANBI’s private Class C network). And then onwards to an Augeas definition that ensures that pam_mkhomedir is enabled. Augeas is a configuration management tool that parses text files and turns them into a tree that can be addressed and manipulating using a path specification language.


    augeas { 'mod_mkhomedir in pam':
        context => '/files/etc/pam.d/common-session',
        changes => [ 'ins 1000 after *[last()]',
                     'set 1000/type session',
                     'set 1000/control required',
					 'set 1000/module pam_mkhomedir.so',
					 'set 1000/argument umask=0022',
				   ],
	    onlyif => "match *[module='pam_mkhomedir.so'] size == 0",
    }

And now on to some package definitions. Ensembl requires a specific version of Bioperl (version 1.7.3) so we need to ensure that the Bioperl from the Ubuntu repositories is not installed. And then we provide a few text editors, the CVS version control system, and the mysql server.


    # pvh - 03/09/2013 - can't use bioperl from ubuntu repo. must be v 1.2.3
    package {['bioperl','bioperl-run']:
        ensure => "absent",
    }

    package {['emacs23-nox', 'joe', 'jupp']:
        ensure => "present",
    }

    package {'cvs':
        ensure => "present",
    }

    package { 'mysql-server':
        ensure => "present",
    }

Now we get to use our download_and_unpack resource type to download and unpack the modules, as specificed by the Ensembl API installation instructions. Then define a /etc/profile.d/ensembl.sh file so that the Ensembl stuff gets added to users’ PERL5LIB environment variables:


    download_and_unpack { 'bioperl':
        url => 'http://bioperl.org/DIST/old_releases/bioperl-1.2.3.tar.gz',
        creates => 'bioperl-1.2.3/t/trim.t',
    }

    download_and_unpack { 'ensembl':
        url => 'http://www.ensembl.org/cvsdownloads/ensembl-72.tar.gz',
        creates => 'ensembl/sql/table.sql',
    }

    download_and_unpack { 'ensembl-compara':
        url => 'http://www.ensembl.org/cvsdownloads/ensembl-compara-72.tar.gz',
        creates => 'ensembl-compara/sql/tree-stats.sql',
    }

    download_and_unpack { 'ensembl-variation':
        url => 'http://www.ensembl.org/cvsdownloads/ensembl-variation-72.tar.gz',
        creates => 'ensembl-variation/sql/var_web_config.sql',
    }

    download_and_unpack { 'ensembl-functgenomics':
        url => 'http://www.ensembl.org/cvsdownloads/ensembl-functgenomics-72.tar.gz',
        creates => 'ensembl-functgenomics/sql/trimmed_funcgen_schema.xls',
    }

    file { '/etc/profile.d/ensembl.sh':
        content => '#!/bin/sh
PERL5LIB=/opt/bioperl-1.2.3
PERL5LIB=${PERL5LIB}:/opt/ensembl/modules
PERL5LIB=${PERL5LIB}:/opt/ensembl-compara/modules
PERL5LIB=${PERL5LIB}:/opt/ensembl-variation/modules
PERL5LIB=${PERL5LIB}:/opt/ensembl-functgenomics/modules
export PERL5LIB
',
        owner => root,
        mode => 0644,
    }        

While much of the Ensembl API is pure Perl, Bert wanted the calc_genotypes tool compiled for use during the course, so we need a few more packages and an exec resource to do the compilation (with the associated creates statement to stop it being re-run on each puppet run):


    # for compiling calc_genotypes
    package { ['libipc-run-perl', 'build-essential']:
       ensure => present,
    }

    exec { 'build_calc_genotypes':
       creates => '/opt/ensembl-variation/C_code/calc_genotypes',
       require => [Download_and_unpack['ensembl-variation'],
                   Package['build-essential'],
                   Package['libipc-run-perl']],
       command => 'make calc_genotypes',
       cwd => '/opt/ensembl-variation/C_code',
       user => 'root',
       path => '/bin:/usr/bin',
    }

 

And finally some ugly hackery. I need a list of users to create, but Puppet doesn’t have an easy way to do this. So I wrote a little Python script that generates a list of usernames, separated by @. When I use this with generate() I need to get rid of the spurious newline, which I do using an inline template, and finally generate the list using split(). Yes I know, really ugly. Its this kind of stuff that is making us here at SANBI consider switching to Salt Stack (also because we love Python here).

Anyway, once we’ve got a list we can just pass it to define a collect of enscourse_createuser resources. The resource naming is a bit off, since “createuser” implies something imperative. I should have just called this enscourse_user or something. And finally close off the curly braces, our node definition is complete!


     $tmp = generate('/usr/local/bin/gen_user_list.py', 'user', 25)
     $user_string = inline_template('<%= @tmp.chomp %>')
     notice("user string :${user_string}:")
     $user_list = split($user_string, '@')

     group { 'enscourse':
       ensure => present
     } 

     enscourse_createuser { $user_list: }
}

Here is that little Python script by the way:


#!/usr/bin/python

import sys

base = sys.argv[1]
limit = int(sys.argv[2])
num_list = [base + str(x) for x in range(1,limit+1)]
print "@".join(num_list),

Remember that generate() is run on the Puppet server, so this script is installed on there. Well that’s it! And here is the whole thing as one block in case you want to copy and paste it:

Continue reading

Gotchas with HTTP and HTTPS

I recently installed a WordPress Network to provide blogs (and simple websites) for SANBI (the bioinformatics SANBI, not the biodiversity SANBI), with authentication provided by nginx’s auth_pam module (and thus linked to our site-wide authentication, so we don’t need to maintain separate WordPress users and passwords). The login page for WordPress is protected with SSL, but I was serving the rest of the site using plain HTTP. This led to a strange bug – when I wanted to edit a post, the editor was a very narrow column.

Like this, with all
the text cramped
together.

What the heck? I tried changing WordPress settings, but as someone said out there on the net “Before you waste hours switching off / on plugins, check the javascript debugger in your browser!”. Turns out that Chrome was blocking content because the combination of HTTP and HTTPS meant I had created a “mixed content” site, and modern browsers (including Chrome) frown on such behaviour. Melissa Koenig has a little blog post on mixed content for the curious. After much googling, I found a blog post by Ken Chen detailing how to set this up right: in the SSL config (but not in the main HTTP config) you serve the main WordPress content using HTTPS, ensuring that “mixed content” is avoided.

So here’s the nginx config file:


upstream php {
server unix:/var/run/php5-fpm.sock;
}

server {
listen 443;
ssl on;
server_name .wp.sanbi.ac.za *.wp.sanbi.ac.za blog.sanbi.ac.za *.blog.sanbi.ac.za;

ssl_certificate /etc/ssl/certs/sanbi.pem;
ssl_certificate_key /etc/ssl/private/sanbi.key;

root /usr/lib/wordpress;
index index.php index.html index.htm;

# Process only the requests to wp-login and wp-admin
location ~ /wp-(admin|login|includes|content) {
auth_pam "SANBI authentication";
auth_pam_service_name "nginx";
try_files $uri $uri/ \1/index.php?args;

location ~ \.php$ {
try_files $uri =404;
include fastcgi_params;
fastcgi_param REMOTE_USER $remote_user;
fastcgi_index index.php;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_pass php;
fastcgi_intercept_errors on;
}
}
# Redirect everything else to port 80
location / {
return 301 http://$host$request_uri;
}
}

server {
#listen 80; ## listen for ipv4; this line is default and implied
#listen [::]:80 default ipv6only=on; ## listen for ipv6

server_name wp.sanbi.ac.za *.wp.sanbi.ac.za blog.sanbi.ac.za *.blog.sanbi.ac.za;
root /usr/lib/wordpress;

index index.php;

location ~ /wp-(?:admin|login) {
return 301 https://$host$request_uri;
}

location = /favicon.ico {
log_not_found off;
access_log off;
}

location = /robots.txt {
allow all;
log_not_found off;
access_log off;
}

location / {
# This is cool because no php is touched for static content.
# include the "?$args" part so non-default permalinks doesn't break when using query string
try_files $uri $uri/ /index.php?$args;
}

location ~ \.php$ {
#NOTE: You should have "cgi.fix_pathinfo = 0;" in php.ini
location ~ /wp-(admin|login) {
return 301 https://$host$request_uri;
}
try_files $uri =404;
include fastcgi_params;
fastcgi_intercept_errors on;
fastcgi_pass php;
}

location ~* ^.+\.(ogg|ogv|svg|svgz|eot|otf|woff|mp4|ttf|rss|atom|jpg|jpeg|gif|png|ico|zip|tgz|gz|rar|bz2|doc|xls|exe|ppt|tar|mid|midi|wav|bmp|rtf)$ {
access_log off; log_not_found off; expires max;
}

location ~ /\. { deny all; access_log off; log_not_found off; }
}

 

Note that this uses a Unix socket for php5-fpm. I’ve found that I don’t need to use WordPress FORCE_SSL_ADMIN (or the WordPress SSL plugin) and since I had some issues with the SSL plugin (it created an invalid default SSL link for newly created network sites), I left out that bit of the config.Oh by the way I use the HTTP authentication plugin to provide an authentication dialog to users. And you can read this guide on the basics of a WordPress network setup on nginx. A quick note: when you add a new site to the network, make sure to go into the dashboard of the new site and enable the automatically create new users setting in the HTTP authentication settings, at least until the new site’s owner has created an account for themselves (by logging in).

Finally, the nginx auth_pam module is compiled into the nginx I installed on ubuntu (from the nginx-full package). I created a file /etc/pam.d/nginx that directed authentication to both our Kerberos setup and the Unix accounts on the local server (so that I could create ad-hoc users, e.g. for the default WordPress admin user):


auth [success=2 default=ignore] pam_krb5.so minimum_uid=1000 ignore_k5login
auth [success=1 default=ignore] pam_unix.so use_first_pass
auth requisite pam_deny.so
auth required pam_permit.so