We use statistical modeling and algorithms to develop methods to analyze large biological datasets. Particularly, we develop scalable algorithms for high-throughput genomic and transcriptomic sequencing data to study problems related to genome assembly, structural variation detection, and transcriptome analysis. While our lab has a strong theoretical component, we emphasize the applicability of methods and models to relevant biological and biomedical questions. The applicability of our algorithms has been demonstrated through integration in official bioinformatic pipelines offered by leading sequencing companies. We have both academic and industrial collaborations.
Kristoffer Sahlin, PI
- Kristoffer Sahlin† and Paul Medvedev. De novo clustering of long-read transcriptome data using a greedy, quality-value based algorithm. In Research in Computational Molecular Biology, pages 227–242, Cham, 2019. Springer International Publishing
- Kristoffer Sahlin*, Marta Tomaszkiewicz*, Kateryna D. Makova†, and Paul Medvedev†. Deciphering highly similar multigene family transcripts from Iso-Seq data with IsoCon. Nature Communications, 9(1):4601, 2018.
- Kristoffer Sahlin†, Mattias Franberg, and Lars Arvestad. Structural variation detection with read pair information: An improved null-hypothesis reduces bias. Journal of Computational Biology, 24(6):581–589, 2017
- Kristoffer Sahlin†, Rayan Chikhi, and Lars Arvestad. Assembly scaffolding with pe-contaminated mate-pair libraries. Bioinformatics, 2016
- Kristoffer Sahlin†, Francesco Vezzi, Bjorn. Nystedt, Joakim Lundeberg, and Lars Arvestad. BESST – Efficient scaffolding of large fragmented assemblies. BMC Bioinformatics, 15(1):281, 2014