Workflow managers have become essential infrastructure in bioinformatics. If you’ve outgrown shell scripts and are finding Snakemake limiting for complex multi-step pipelines — especially those that need to run across different compute environments — Nextflow DSL2 is worth a serious look.
Why DSL2?
Nextflow’s original syntax (DSL1) worked, but it made reuse difficult. DSL2 introduces a modular system where:
- Processes are atomic, container-isolated execution units
- Workflows compose processes into directed acyclic graphs (DAGs)
- Modules are shareable, versionable process definitions
The result is a pipeline architecture that separates what a tool does from how the pipeline orchestrates it — a clean separation of concerns.
Pipeline Architecture
A well-structured DSL2 project looks like this:
my-pipeline/
├── main.nf # Entry workflow
├── nextflow.config # Executor, container, and resource config
├── modules/
│ ├── local/ # Pipeline-specific modules
│ │ └── custom_filter/
│ │ └── main.nf
│ └── nf-core/ # Community modules (via nf-core/modules)
│ ├── fastqc/main.nf
│ ├── trimgalore/main.nf
│ └── star/align/main.nf
├── subworkflows/
│ └── local/
│ └── align_and_qc/
│ └── main.nf
└── conf/
├── base.config # Default resource profiles
├── hpc.config # HPC-specific settings
└── cloud.config # AWS/GCP settings
Data flows through the pipeline as channels — lazy, asynchronous streams that Nextflow schedules automatically. Here’s a simplified ASCII diagram of a typical RNA-seq pipeline DAG:
FASTQ files
│
▼
[FASTQC] ──────────────────────────┐
│ │
▼ ▼
[TRIM_GALORE] [MULTIQC report]
│
▼
[STAR_ALIGN]
│
├──────────────────────────────┐
▼ ▼
[FEATURECOUNTS] [SAMTOOLS_SORT]
│ │
▼ ▼
[DESEQ2] [BAMCOVERAGE (bigWig)]
Defining a Process
Processes are the atomic units of work. Each process runs in its own container and has explicit input/output declarations:
process STAR_ALIGN {
tag "$meta.id"
label 'process_high'
container 'quay.io/biocontainers/star:2.7.10b--h9ee0642_0'
input:
tuple val(meta), path(reads)
path index
path gtf
output:
tuple val(meta), path("*Aligned.sortedByCoord.out.bam"), emit: bam
tuple val(meta), path("*Log.final.out"), emit: log_final
tuple val(meta), path("*SJ.out.tab"), emit: sj
script:
def prefix = task.ext.prefix ?: "${meta.id}"
"""
STAR \\
--runThreadN $task.cpus \\
--genomeDir $index \\
--sjdbGTFfile $gtf \\
--readFilesIn $reads \\
--readFilesCommand zcat \\
--outSAMtype BAM SortedByCoordinate \\
--outFileNamePrefix ${prefix}. \\
--outSAMattributes NH HI AS NM MD
"""
}
A few things worth noticing:
tagadds a per-task label to logs — invaluable when debugginglabelties to resource profiles defined innextflow.configmetais a map carrying sample metadata (id, strandedness, etc.) — a DSL2 convention popularised by nf-coreemitgives named handles to outputs, making downstream composition explicit
Composing a Subworkflow
Subworkflows group related processes. Here’s an alignment subworkflow:
include { STAR_ALIGN } from '../../modules/nf-core/star/align/main'
include { SAMTOOLS_SORT } from '../../modules/nf-core/samtools/sort/main'
include { SAMTOOLS_INDEX } from '../../modules/nf-core/samtools/index/main'
workflow ALIGN_AND_QC {
take:
reads // channel: [ val(meta), [ path(fastq) ] ]
index // path: STAR genome index
gtf // path: annotation GTF
main:
STAR_ALIGN ( reads, index, gtf )
SAMTOOLS_SORT ( STAR_ALIGN.out.bam )
SAMTOOLS_INDEX ( SAMTOOLS_SORT.out.bam )
emit:
bam = SAMTOOLS_SORT.out.bam
bai = SAMTOOLS_INDEX.out.bai
log = STAR_ALIGN.out.log_final
}
The take / main / emit blocks make the data contract of each subworkflow explicit — critical when you want to swap implementations later.
Configuration and Portability
One of Nextflow’s biggest strengths is executor portability. The same pipeline can run on a laptop, SLURM cluster, or AWS Batch by changing a config profile:
// nextflow.config
profiles {
local {
process.executor = 'local'
docker.enabled = true
}
slurm {
process.executor = 'slurm'
singularity.enabled = true
singularity.autoMounts = true
process {
withLabel: 'process_high' {
cpus = 16
memory = '64.GB'
time = '12.h'
queue = 'long'
}
}
}
awsbatch {
process.executor = 'awsbatch'
process.queue = 'nextflow-queue'
aws.region = 'eu-west-1'
docker.enabled = true
}
}
This configuration-driven portability is what makes Nextflow a practical choice for shared research infrastructure.
Consuming nf-core Modules
The nf-core/modules repository contains over 1,000 community-maintained, Docker-backed process definitions. You can pull them directly:
# Install nf-core tools
pip install nf-core
# Add a module to your pipeline
nf-core modules install fastqc
nf-core modules install trimgalore
nf-core modules install star/align
This command downloads the module into modules/nf-core/ and pins a specific version — making your pipeline auditable and reproducible.
Testing with nf-test
Untested pipelines accumulate hidden bugs. nf-test brings unit and integration testing to Nextflow:
// tests/modules/star_align.nf.test
nextflow_process {
name "Test STAR_ALIGN"
script "../../../modules/nf-core/star/align/main.nf"
process "STAR_ALIGN"
test("human - paired-end reads") {
when {
process {
"""
input[0] = [
[ id:'test', single_end:false ],
[ file(params.test_data['homo_sapiens']['illumina']['test_paired_end_1_fastq_gz']),
file(params.test_data['homo_sapiens']['illumina']['test_paired_end_2_fastq_gz']) ]
]
input[1] = file(params.test_data['homo_sapiens']['genome']['star_index'])
input[2] = file(params.test_data['homo_sapiens']['genome']['gtf'])
"""
}
}
then {
assert process.success
assert process.out.bam.size() == 1
assert snapshot(process.out.log_final).match()
}
}
}
Run tests with:
nf-test test tests/modules/star_align.nf.test
Key Takeaways
Nextflow DSL2 gives you a principled way to build bioinformatics pipelines that are:
- Modular — swap tools without rewriting the orchestration layer
- Portable — one codebase, multiple compute environments
- Reproducible — containers + version-pinned modules
- Testable — nf-test enables process-level unit tests
For new projects I strongly recommend starting from the nf-core pipeline template — it gives you CI/CD, linting, docs, and test infrastructure out of the box.