Loading Nanopore data into Galaxy

Basecalled nanopore data is output in a structure like this:

fastq_pass/
.....barcode01/
........XXXX.fastq
........YYYY.fastq
.....barcode02/
........AAAA.fastq
........BBBB.fastq

In other words, for each barcode there is a collection of multiple fastq files. While it is straightforward to concatenate these together on the command line, for non-command-line-users that is not as easy. To get this into Galaxy for analysis, you can follow these steps:

  1. Upload the whole fastq_pass directory to the Galaxy server using FTP upload
  2. Using the Rule Based Uploader, select that you want to create Collections and that your source is the FTP Directory and use these rules (these rules expect there to be a single directory in your FTP area with barcode directories. If that is not the case, you need to change the .*/barcode\\d+/(.*) regular expression, to match your upload directory. For example if your upload directory was called fastq_pass make it fastq_pass/barcode\\d+/(.*). These rules also assume that your data is gzipped FASTQ. If it is not gzipped fastq, changed the fastq.gz$ to fastq$ and changed the fastqsanger.gz to fastqsanger. You can also find these rules here:
    {
    "rules": [
    {
      "type": "add_column_metadata",
      "value": "path"
    },
    {
      "type": "add_filter_regex",
      "target_column": 0,
      "expression": "fastq.gz$",
      "invert": false
    },
    {
      "type": "add_column_regex",
      "target_column": 0,
      "expression": ".*/barcode\\d+/(.*)",
      "group_count": 1
    },
    {
      "type": "add_column_regex",
      "target_column": 0,
      "expression": "./(barcode\\d+)/.*",
      "group_count": 1
    },
    {
      "type": "swap_columns",
      "target_column_0": 1,
      "target_column_1": 2
    }
    ],
    "mapping": [
    {
      "type": "ftp_path",
      "columns": [
        2
      ]
    },
    {
      "type": "list_identifiers",
      "columns": [
        1,
        2
      ],
      "editing": false
    }
    ],
    "extension": "fastqsanger.gz"
    }
    
  3. Give the collection a name (e.g. samples1) and Upload. A new list of lists called samples1 will be created.
  4. Create a tabular mapping file with the first column being the barcode name and the second column to sample name that you want to use. You can either create this as a TSV (e.g. using Excel) or you can type it into the "Paste" box in the Galaxy uploader (and make sure to select the type as tabular and the "convert spaces to tabs" option in the settings).
  5. Using the input collection and tabular renaming file as inputs, run this workflow. The result is a list with the elements of the list being the concatenated reads from the barcode directories.

You can watch a video demo of this method here.