Downloads

nf-synapse

A centralized repository of Nextflow workflows that interact with Synapse.

Purpose

The purpose of this repository is to provide a collection of Nextflow workflows that interact with Synapse by leveraging the Synapse Python Client. These workflows are intended to be used in a Seqera Platform environment primarily, but they can also be executed using the Nextflow CLI on your local machine.

Structure

This repository is organized as follows: 1. Individual process definitions, or modules, are stored in the modules/ directory. 1. Modules are then combined into workflows, which are stored in the workflows/ directory. These workflows are intended to capture the entire process of an interaction with Synapse. 1. Workflows are then imported into the main.nf script, and can be run by specifying the params.entry parameter.

Usage

Only one workflow can be used per nf-synapse run. The configuration for a workflow run will need to include which workflow you intend to use (indicated by specifying params.entry), along with all of the parameters required for that specific workflow.

In the example below, we provide the params.entry parameter synstage to indicate that we want to run the SYNSTAGE workflow. We also provide params.input, which is required for SYNSTAGE.

nextflow run main.nf -profile docker --entry synstage --input path/to/input.csv

Meta-Usage

nf-synapse is designed to be used in conjunction with a general-purpose Nextflow Workflow to stage input files from Synapse or SevenBridges to an S3 bucket, run a workflow of your choosing, and then index the output files from the S3 bucket back into Synapse.

flowchart LR;
   A[nf-synapse:SYNSTAGE]-->B[WORKFLOW];
   B-->C[nf-synapse:SYNINDEX];

See demo.py in Sage-Bionetworks-Workflows/py-orca for an example of accomplishing this goal with Python code.

Authentication

For Seqera Platform runs, you can configure your secrets using the Tower CLI or the Seqera Platform UI. If you are running the workflow locally, you can configure your secrets within the Nextflow CLI.

Synapse

All included workflows require a SYNAPSE_AUTH_TOKEN secret. You can generate a Synapse personal access token using this dashboard.

Profiles

Current profiles included in this repository are: 1. docker: Indicates that you want to run the workflow using Docker for running process containers (default behavior in Seqera Platform runs). 1. conda: Indicates that you want to use a conda environment for running process containers. 1. synstage: Indicates that you want to run the SYNSTAGE workflow (sets params.entry = 'synstage'). 1. synindex: Indicates that you want to run the SYNINDEX workflow (sets params.entry = 'synindex').

Included Workflows

`SYNSTAGE`: Stage Synapse Files To AWS S3

Purpose

The purpose of this workflow is to automate the process of staging Synapse and SevenBridges files to a Seqera Platform-accessible location (e.g. an S3 bucket). In turn, these staged files can be used as input for a general-purpose (e.g. nf-core) workflow that doesn't contain platform-specific steps for staging data.

Overview

SYNSTAGE performs the following steps:

Extract all Synapse and SevenBridges URIs (e.g. syn://syn28521174 or sbg://63b717559fd1ad5d228550a0) from a given text file.
Download the corresponding files from both platforms in parallel.
Replace the URIs in the text file with their staged locations.
Output the updated text file so it can serve as input for another workflow.

Workflow Diagram

flowchart LR
    A[Input File] --> B[Extract URIs]
    B --> C1[Synapse URIs]
    B --> C2[SevenBridges URIs]
    C1 --> D1[SYNAPSE_GET]
    C2 --> D2[SEVENBRIDGES_GET]
    D1 --> E[STAGE_FILE]
    D2 --> E
    E --> F[UPDATE_INPUT]
    F --> G[Updated Input File]

Quickstart: SYNSTAGE

The examples below demonstrate how you would stage Synapse files in an S3 bucket called example-bucket, but they can be adapted for other storage backends.

Prepare your input file containing Synapse and/or SevenBridges URIs and stage it to the S3 bucket that you want all files to be uploaded to. This example CSV file follows the format required for running the nf-core/rnaseq workflow.

Example: Uploaded to s3://example-bucket/input.csv
```
sample,fastq_1,fastq_2,strandedness
foobar,syn://syn28521174,syn://syn28521175,unstranded
```
Launch workflow using the Nextflow CLI, the Tower CLI, or the Seqera Platform UI.

Example: Launched using the Nextflow CLI
```
nextflow run main.nf -profile docker --entry synstage --input path/to/input.csv
```
Retrieve the output file, which by default is stored in a synstage/ subfolder within the parent directory of the input file. The Synapse and/or Seven Bridges URIs have been replaced with their staged locations. This file can now be used as the input for other workflows.

Example: Downloaded from s3://example-bucket/synstage/input.csv
```
sample,fastq_1,fastq_2,strandedness
foobar,s3://example-scratch/synstage/syn28521174/foobar.R1.fastq.gz,s3://example-scratch/synstage/syn28521175/foobar.R2.fastq.gz,unstranded
```

Special Considerations for Staging Seven Bridges Files

If you are staging Seven Bridges files, there are a few differences that you will want to incorporate in your Nextflow run.

You will need to configure SB_AUTH_TOKEN and SB_API_ENDPOINT secrets.
- You can generate an authenication token and retrieve your API endpoint by logging in to the Seven Bridges portal you intend to stage files from, such as Seven Bridges CGC. From there, click on the "Developer" dropdown and then click "Authentication Token". A full list of Seven Bridges API endpoints can be found here
When adding your URIs to your input file, SevenBridges file URIs should have the prefix sbg://.
There are two ways to get the ID of a file in SevenBridges:
- The first way involves logging into a SevenBridges portal, such as SevenBridges CGC, navigating to the file and copying the ID from the URL. For example, your URL might look like this: "https://cgc.sbgenomics.com/u/user_name/project/63b717559fd1ad5d228550a0/". From this url, you would copy the "63b717559fd1ad5d228550a0" piece and combine it with the sbg:// prefix to have the complete URI sbg://63b717559fd1ad5d228550a0.
- The second way involves using the SBG CLI. To get the ID numbers that you need, run the sb files list command and specify the project that you are downloading files from. A list of all files in the project will be returned, and you will combine the ID number with the prefix for each file that you want to stage.

Note: SYNSTAGE can handle either or both types of URIs in a single input file.

Parameters

Check out the Quickstart section for example parameter values.

entry: (Required) The name of the workflow to run (synstage). This should be the name of the workflow file in the workflows/ directory.
input: (Required) A text file containing Synapse URIs (e.g. syn://syn28521174). The text file can have any format (e.g. a single column of Synapse URIs, a CSV/TSV sample sheet for an nf-core workflow).
outdir: (Optional) An output location where the Synapse files will be staged. Currently, this location must be an S3 prefix for Nextflow Tower runs. If not provided, this will default to the parent directory of the input file.
save_strategy: (Optional) A string indicating where to stage the files within the outdir. Options include:
- id_folders: Files will be staged in child folders named after the Synapse or Seven Bridges ID of the file. This is the default behavior.
- flat: Files will be staged in top level of the outdir.

Known Limitations

The only way for the workflow to download Synapse files is by listing Synapse URIs in a file. You cannot provide a list of Synapse IDs or URIs to a parameter.
The workflow doesn't check if newer versions exist for the files associated with the Synapse URIs. If you need to force-download a newer version, you should manually delete the staged version.

`SYNINDEX`: Index S3 Objects Into Synapse

Purpose

The purpose of this workflow is to parallelize the process of indexing files in an S3 bucket into Synapse. SYNINDEX is intended to be used after a general-purpose (e.g. nf-core) workflow that doesn't contain platform-specific steps for uploading/indexing data.

Overview

SYNINDEX performs the following steps:

Gets the Synapse user ID for the account that provided the SYNAPSE_AUTH_TOKEN secret.
Updates or creates the owner.txt file in the S3 bucket to make the current user an owner.
Registers the S3 bucket as an external storage location for Synapse.
Generates a list of all of the objects in the S3 bucket to be indexed.
Recreates the folder structure of the S3 bucket in the Synapse project.
Indexes the files in the S3 bucket into the Synapse project.

Workflow Diagram

flowchart LR
    A[S3 Files] --> B[GET_USER_ID]
    B --> C[UPDATE_OWNER]
    C --> D[REGISTER_BUCKET]
    A --> E[LIST_OBJECTS]
    E --> F[SYNAPSE_MIRROR]
    F --> G[SYNAPSE_INDEX]
    D --> G
    G --> H[Output CSV]

Quickstart:SYNINDEX

The examples below demonstrate how you would index files from an S3 bucket called example-bucket into Synapse.

Prepare your S3 bucket by setting the output directory of your general-purpose workflow to a Nextflow Tower S3 bucket. Ideally, you want this S3 bucket to be persistent (not a -scratch bucket) so that your files will remain accessible indefinitely.

Example: s3://example-bucket file structure:
```
example-dev-project-tower-bucket
├── child_folder
├── test.txt
│   ├── child_child_folder
│   │   └── test2.txt
│   ├── test1.txt
```
Launch workflow using the Nextflow CLI, the Tower CLI, or the Seqera Platform UI.

Example: Launched using the Nextflow CLI
```
nextflow run main.nf -profile docker --entry synindex --s3_prefix s3://example-bucket --parent_id syn12345678
```
Retrieve the output file, which by default is stored in S3://example-bucket/synindex/under-syn12345678/ in our example. This folder will contain a mapping of Synapse URIs to their indexed Synapse IDs.

Parameters

Check out the Quickstart section for example parameter values.

entry: (Required) The name of the workflow to run (synindex). This should be the name of the workflow file in the workflows/ directory.
s3_prefix: (Required) The S3 URI of the S3 bucket that contains the files to be indexed.
parent_id: (Required) The Synapse ID of the Synapse project or folder that the files will be indexed into.
filename_string: (Optional) A string that will be matched against the names of the files in the S3 bucket. If provided, only files that contain the string will be indexed.

Known Limitations

At present, it is not possible for SYNINDEX to be run outside of Nextflow Tower. This is due to AWS permissions complications. Future work will include enabling the workflow to run on local machines/in virtual machines.

Releases

Release	Date	Downloads	Author
0.1.0	2025-04-03	0	BWMac

Downloads

nf-synapse

Purpose

Structure

Usage

Meta-Usage

Authentication

Synapse

Profiles

Included Workflows

SYNSTAGE: Stage Synapse Files To AWS S3

Purpose

Overview

Workflow Diagram

Quickstart: SYNSTAGE

Special Considerations for Staging Seven Bridges Files

Parameters

Known Limitations

SYNINDEX: Index S3 Objects Into Synapse

Purpose

Overview

Workflow Diagram

Quickstart:SYNINDEX

Parameters

Known Limitations

Releases

`SYNSTAGE`: Stage Synapse Files To AWS S3

`SYNINDEX`: Index S3 Objects Into Synapse