As part of my (rather prolonged) work towards a M.Sc. in bioinformatics, I maintain a Galaxy server at SANBI. I've recently upgraded to Galaxy 21.05, at the time of this writing the latest Galaxy release. You can read more about that release here.
My Galaxy server is deployed using Ansible with a combination of the standard Galaxy roles and ones developed at SANBI to match our infrastructure. Specifically, we have roles for integrating with our infrastructure's authentication, monitoring and CephFS filesystem. I also wrote a workaround for deploying letsencrypt based SSL. You can find this configuration in this repository.
The Galaxy server integrates with our cluster, the worker nodes of which are running Ubuntu 18.04 (the Galaxy server is on Ubuntu 20.04). For a number of tasks, Galaxy requires tools to have some access to Python libraries that are not part of core Python for the business of "finishing" jobs (i.e. feeding results back into Galaxy) and so on. In the past I have found that using the single virtualenv that the Galaxy roles configure on the Galaxy server causes problems when running jobs on the cluster. Thus I have a specific venv for running on the cluster that is configured on the cluster. I.e. after the Galaxy server install was completed, I logged into one of the cluster worker nodes as root, deleted the old cluster_venv and ran:
cd /projects/galaxy/pvh_masters_galaxy1 export GALAXY_VIRTUAL_ENV=$(pwd)/cluster_venv cd server scripts/common_startup.sh --skip-client-build --skip-samples
Obviously it would be better to automate the above, but I have not got around to doing so yet. I'm not sure if this is the best approach but it works at least for our environment, so I'm writing this blog post in case it is useful to others (or to jog my own memory down the line!). This
cluster_venv setup is exposed to the job runners in
job_conf.xml - here is a snippet of my configuration:
<job_conf> <plugins workers="4"> <plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner"/> <plugin id="slurm" type="runner" load="galaxy.jobs.runners.slurm:SlurmJobRunner"/> </plugins> <destinations default="dynamic"> <destination id="slurm" runner="slurm"> <param id="tmp_dir">True</param> <env id="GALAXY_VIRTUAL_ENV">/projects/galaxy/pvh_masters_galaxy1/cluster_venv</env> <env id="GALAXY_CONFIG_FILE">/projects/galaxy/pvh_masters_galaxy1/config/galaxy.yml</env> </destination> <destination id="local" runner="local"/> <destination id="dynamic" runner="dynamic"> <param id="tmp_dir">True</param> <param id="type">dtd</param> </destination> <destination id="cluster_default" runner="slurm"> <param id="tmp_dir">True</param> <env id="SLURM_CONF">/tools/admin/slurm/etc/slurm.conf</env> <env id="GALAXY_VIRTUAL_ENV">/projects/galaxy/pvh_masters_galaxy1/cluster_venv</env> <env id="GALAXY_CONFIG_FILE">/projects/galaxy/pvh_masters_galaxy1/config/galaxy.yml</env> <param id="nativeSpecification">--mem=10000</param> <resubmit condition="memory_limit_reached" destination="cluster_20G" /> </destination>
P.S. this was the only manual task I had to perform (on the Galaxy side of things). Mostly the update consisted of updating our SANBI ansible roles to support Ubuntu 20.04 (and Ceph octopus), switching to the latest roles (as described in the training material for Galaxy admins), flicking the version number from
release_21.05 and running the Ansible playbook.