When dealing with clinical samples like those from SARS-CoV-2 testing (aka COVID-19 testing) one invariably ends up with a mix of organisms in the sample being sequenced. This can also be true for cultured organisms where we don't know if any contaminants have crept into the sample. This can especially be true if the sample has been sent from elsewhere and we don't know what protocols they were following in preparing the sample for sequencing. This little guide will show you have to use Galaxy (specifically on the https://usegalaxy.eu server) to make sense of what species(es) you have sequenced.
I will start with two samples in a Galaxy history. The are single ended sequencing samples from Oxford Nanopore sequencing. I presume you have uploaded these and ensured that they are fastqsanger.gz datatypes.
The first step is to organise these into a collection. Since these are single ended samples I can use a list collection. For paired end collection you'd want a list of pairs. Select the checkbox to select multiple samples in a history, select all the samples that you want to group together and then select Build Dataset List or Build List of Dataset Pairs as appropriate.
Then use kraken2 to assign taxonomic labels to each read. The key options to choose here are firstly the input: select the folder icon so that you can use the collection as input. Then select the correct database - it should be a copy of the Standard database, as recent as possible.
Galaxy will run a copy of kraken2 for each dataset. This will take some time (several minutes) but each sample is processed in parallel (up to the capacity of the Galaxy server). While kraken2 is running you can set up the next analyses:
- Convert Kraken data to Galaxy taxonomy representation using the most recent taxonomy database available. Select column 2 for the read name and column 3 for the taxonomy ID.
- Krona pie chart from taxonomic profile - here you can choose what resolution you want to display: from Class for a high level summary to e.g. Species for a very detailed view. Note that kraken2 resolution might not always be accurate down to the lowest taxonomy levels.
Depending on the load on the Galaxy server, you might wait some time for your jobs to start running. Once they do, however, each of the analyses mentioned above will be run in sequence.
All analyses until the Krona pie chart one produce a list of outputs, one output for each input sample. The Krona pie chart analysis produces a single output with a selector that will show the piece chart for each sample. In this screenshot the first dataset is shown. This is sample002.fastq.gz from the original sample list.
The Krona pie chart can we explored to focus on the different segments and a snapshot can be taken, giving the option to download the visualition as a SVG graphic.
Turning your analysis into a workflow
Finally, in the future one might want to repeat the steps of this analysis using a workflow rather than one by one. To create a workflow, select Extract workflow from the History menu (the drop down button on the top right).
You can then give a descriptive name to the workflow and save it for later use. Select Create workflow to create a re-usable workflow.
And then from the next page edit the workflow to see it in detail. In the workflow editor you can describe the inputs needed and select the output for which steps you want to see. In my example, I have unselected the blue checkboxes for the Kraken2 and Convert Kraken2 tools as those are intermediate results that I am not generally interested in. Once you save the resulting workflow, you can re-use it from the Workflow menu.
Acknowledgements and references
The idea for this post was partly inspired by this thread on the Galaxy help forum. And of course I am using these awesome tools: