nf-parquet is a Nextflow plugin to read parquet files
Features
-
multiple configurable schemas
Changes
Read parquet file
Given a parquet file with 2 fields, myString
and myInteger
you can send all records to a Channel:
Configure a schema in nextflow.config
parquet plugin section
nextflow.config
plugins {
id 'nf-parquet@0.0.1-rc2'
}
parquet {
schemas = {
catalog {
field "myString" type "string" optional true
field "myInteger" type "double" optional true
}
schema "mySchema"
}
}
read.nf
include { fromParquetFile } from 'plugin/nf-parquet'
channel
.fromParquetFile( Path.of('area1.parquet'), 'mySchema')
| view
- INFO
-
Defining different schemas you can improve the read process as the implementation will read only the fields specified in the list, skipping others.
If you don’t want to deal with the definition of fields (or don’t know the full schema) you can omit the schemas
closure configuration:
nextflow.config
plugins {
id 'nf-parquet@0.0.1-rc2'
}
//(1)
-
Don’t specify schemas
readAll.nf
include { fromParquetFile } from 'plugin/nf-parquet'
channel
.fromParquetFile( Path.of('area1.parquet') ) //(1)
| view
-
Don’t specify the schema name to use