Downloads

Nextflow nf-datatrail Plugin – Understand Your Pipeline's Dataflow

This plugin helps you analyze and visualize the data flow within your Nextflow pipeline.

Features

Visual Representation: Generates a DAG to illustrate the data flow between processes.
Detailed Analysis: Captures inputs and outputs of each process for in-depth examination.
Pipeline Overview: Summarizes data volume processed per process and dependencies.

Installation

No installation required. Simply add the plugin to your Nextflow config:

plugins {
   id 'nf-datatrail'
}

Usage

1. Visualizing the Dataflow

Specify an output file in your Nextflow config:

datatrail.plot.file = "physicalDag.dot"

This generates a physicalDag.dot file in your working directory.
Supported formats: dot and all Graphviz formats. Graphviz is required to visualize formats other than dot.

Example visualization created with Graphviz, details activated:
Dataflow

Example visualization created with Graphviz, details deactivated (recommended for large DAGs), no legend, and filtering MULTIQC: Dataflow Additional details are available on mouse hover if you view it on full size.

The following options control the DAG’s rendering: Options are set in the datatrail.plot block in your Nextflow config.

Option	Default Value	Description
`file`	---	The path where the DAG will be stored.
`rankdir`	`TB`	Direction of the graph layout. Options: `TB` (top to bottom), `LR` (left to right).
`detailed`	`false`	If `true`, shows detailed information about each process in the DAG. Otherwise only on mouse over.
`external`	`true`	If `true`, shows external inputs in the DAG.
`legend`	`true`	If `true`, shows a legend in the DAG.
`cluster`	`false`	If `true`, clusters processes by their tag.
`tagNames`	`true`	If `true`, shows tag name in the cluster. Only takes effect if cluster is set to `true`.
`filter`	`[]`	List of regex filters. If one regex matches a process name task instances of this process will not be displayed.

Additionally, you can set a path to write the physical DAG as a JSON with the following options:

datatrail.persist = "dag.json"

This file can be used for further analyses.

2. Analyzing Process Inputs & Outputs

Create CSV files to track input and output files:

datatrail {
    input = "input.csv"
    output = "output.csv"
}

Generated files: - input.csv - output.csv

Columns:

Column	Description
name	Process instance name
hash	Process hash (matches trace file)
path	Input/output file path
type	`f` (file) or `d` (directory)
size	Size of the file/directory

3. Analyzing Task Dependencies

Generate a summary file with process dependencies and data volume:

datatrail {
    summary = "summary.csv"
}

Summary Columns:

Column	Description
task	Process instance name
hash	Process hash (matches trace file)
inputs	Number of inputs
inputSize	Total input size
outputs	Number of outputs
outputSize	Total output size
usedBy	Number of dependent processes

4. Additional Options

overwrite: If true, overwrites existing output files (default: false).

Releases

Release	Date	Downloads	Author
0.0.1	2025-04-11	22	Lehmann-Fabian