nf-parquet is a Nextflow plugin to read parquet files

Features

  • multiple configurable schemas

Changes

Read parquet file

Given a parquet file with 2 fields, myString and myInteger you can send all records to a Channel:

Configure a schema in nextflow.config parquet plugin section

nextflow.config
plugins {
  id 'nf-parquet@0.0.1-rc2'
}

parquet {
    schemas = {
        catalog {
            field "myString" type "string" optional true
            field "myInteger" type "double" optional true
        }
        schema "mySchema"
    }
}
read.nf
include { fromParquetFile } from 'plugin/nf-parquet'

channel
        .fromParquetFile( Path.of('area1.parquet'), 'mySchema')
        | view
INFO

Defining different schemas you can improve the read process as the implementation will read only the fields specified in the list, skipping others.

If you don’t want to deal with the definition of fields (or don’t know the full schema) you can omit the schemas closure configuration:

nextflow.config
plugins {
  id 'nf-parquet@0.0.1-rc2'
}
//(1)
  1. Don’t specify schemas

readAll.nf
include { fromParquetFile } from 'plugin/nf-parquet'

channel
        .fromParquetFile( Path.of('area1.parquet') ) //(1)
        | view
  1. Don’t specify the schema name to use