Research Interest
The Computational Genomics Research Lab (CGRLab.github.io) is led by Dr. Sina Majidian and is located in the Data Science and AI division at Chalmers University of Technology and the University of Gothenburg. We use machine learning and mathematical modelling to address biological questions in genomics.
Personalized human pangenome
Pangenomes are now redefining our understanding of genetic variation across populations and genome evolution across species. A single reference genome cannot fully represent human genetic diversity, even when it is complete enough to be considered “telomere-to-telomere.” To fully harness the power of pangenomes in biomedicine, there is a pressing need for efficient methods to store, visualize, and extract relevant information. Our lab aims to understand human genome variation and evolution across different genomic regions by developing interpretable and efficient methods, leveraging deep learning and statistical analysis.
Variant effect prediction via genomics language model
Alignment enables the identification of similar regions across species, encompassing both genic and intergenic regions. We aim to discover homologies and population-specific differences in noncoding regions by leveraging large language models such as DNABERT and MSA Transformer, and by designing new ML architectures. This helps identify alternative model organisms that best represent regulatory elements for further study. Also, leveraging cross-species conservation and genomic context enables more accurate predictions of allele effects and a deeper understanding of genotype–phenotype relationships, specifically revealing how genetic variations in noncoding regions contributes to disease.
Comparative genomics
Genome data keeps piling up, with efforts to sequence 1.5 million eukaryotic species, which could transform our understanding of evolution. However, this requires an overhaul of conventional comparative genomics methods, which are limited to studying tens of genomes. Methods for inferring orthologous genes and phylogenetic relationships are computationally demanding. We have developed FastOMA and Read2Tree to tackle these challenges. FastOMA is a method for inferring orthology relationships, combining k-mer-based placement, species-tree-guided subsampling, and highly parallel computing to achieve near-linear performance in the number of input genomes. Read2Tree is a method for inferring phylogenetic trees directly from raw sequencing data, bypassing genome assembly, gene annotation, and all-versus-all sequence comparisons. Our lab continues to develop methods for comparative genomics at scale.

Group Members
CGR lab is hiring postdocs, PhD and master students. Please reach out.
