SciLifeLab creates freely available portable workflow for WGS and WES analysis
In a national collaboration, jointly initiated by the SciLifeLab’s Bioinformatics platform (NBIS), National Genomics Infrastructure (NGI), Clinical Genomics platform, and The Swedish Childhood Tumour Biobank (Barntumörbanken), a group of researchers has created a new portable and reproducible workflow named SAREK.
SAREK can detect germline and somatic variants from whole-genome sequencing (WGS), whole-exome sequencing (WES), and gene panel data. The workflow may be adopted as a production workflow at sequencing units or for individual research groups. The Sarek source code has been made freely available online.
For the analysis of complex cancer genomes and somatic variant analysis the need for maintaining workflows, combining software efficiently, and allowing for reproducibility has long been underestimated.
A combination of tools is often required to detect various types of gene mutations and abnormalities, and research teams have been forced to develop their own novel solutions to solve similar challenges. This has sometimes led to poor adaptability and bulky sets of code that are hard to re-use and adopt by other researchers.
“The main problems have been to make a complex workflow like this truly portable and easy to run. This is very difficult to achieve, and that is exactly the reason why there are so few alternatives out there for researchers to use”, says last author Björn Nystedt (SciLifeLab/UU/NBIS).
In an attempt to address this issue, researchers from the SciLifeLab National Genomics Infrastructure (NGI) and Barntumörbanken at Karolinska Institutet, among them first authors Maxime Garcia and Szilveszter Juhos, have developed a new portable workflow named SAREK. The software can easily be installed on POSIX compatible systems such as Mac OS X and Linux, which are designed to work in environments that handle personal information without internet access.
“This tool is designed to take the complex task of mapping and variant calling of Human WGS data from cancer or non-cancer samples and make this easy and robust to run for any core unit or research group, and we see international interest emerging already”, says Björn Nystedt.
The workflow has been tested and used within Barntumörbanken, an infrastructure and a national sample as well as deep genetic data collection of pediatric cancers, since the development of Sarek was initiated. Furthermore, a wide-spread use by research groups across Europe is already seen, such as the Institute of Cancer Research (London, UK), the University College London (London UK), the Genome Data Science Lab at IRB (Barcelona, Spain), and in Germany.
Utilizing a range of state-of-the-art software and data resources, SAREK contains a portable workflow that can be used for; “germline and somatic variant detection, annotation and quality control based on WGS, WES or gene panel data”.
“I think this has been a very fruitful collaboration between several SciLifeLab platforms and Swedish research groups to meet new challenges in BigData”, says Björn Nystedt.