BiG Talks! Enabling reproducible in-silico data analyses with Nextflow


The SciLifeLab Bioinformatics and Genomics seminar series (“BiG Talks!”) is a new initiative that aims to give inspiration to the SciLifeLab community and to create new networking possibilities. The BiG Talks will rotate between different SciLifeLab nodes.

On the 1st October 2018 we will be welcoming Paolo Di Tommaso (Research software engineer, Center for Genomic Regulation, Spain). Paolo is a computer scientist and bioinformatician. He has 20 years of experience as a software developer and architect. His main interests are parallel programming, HPC, cloud computing and containerisation technologies. He is an open source advocate and he is the creator and project leader of the Nextflow workflow framework.

After the seminar, fika will be served for those who are registered. There will be a chance to mingle with the speaker and fellow coworkers and give your feedback for the BiG Talk! initiative. Register for the fika here: https://goo.gl/forms/EvOZa7rd9zQvRMVg1

Venue: Air Auditorium, SciLifeLab, Solna.

Contact person: Dr. Phil Ewels, NGI SciLifeLab Stockholm, phil.ewels@scilifelab.se

Enabling reproducible in-silico data analyses with Nextflow

Reproducibility has become one of biology’s most pressing issues. This impasse has been fuelled by the combined reliance on increasingly complex data analysis methods and the exponential growth of biological datasets. When considering the installation, deployment and maintenance of bioinformatic pipelines, an even more challenging picture emerges due to the lack of community standards. Moreover, the effect of limited standards on reproducibility is amplified by the very diverse range of computational platforms and configurations on which these applications are expected to be applied (workstations, clusters, HPC, clouds, etc.).

This presentation will give an introduction of Nextflow, a pipeline orchestration tool that has been designed to address exactly these issues. Nextflow is a computational environment which provides a domain specific language (DSL), meant to simplify the writing of complex distributed computational pipelines in a portable and replicable manner. It allows the seamless parallelization and deployment of any existing application with minimal development and maintenance overhead, irrespective of the original programming language.

See also: BiG Talk Poster.