A group of scientists from KTH, Karolinska Institutet, and the Karolinska University Laboratory, led by SciLifeLab researcher Cecilia Williams (KTH), has experimentally proved the importance of sexes in colorectal cancer biomarker discovery. Using machine learning, the scientists were able to identify top-ranked expression biomarkers.
To date, the majority of scientific research has been focused on one sex, with the basic assumption that studying the other sex would yield similar results. However, this is often not the case and women and men can present different disease susceptibility, symptoms, disease progression, and treatment responses, making proper diagnosis and disease management difficult. Hence, understanding sex differences in disease is critical for proper treatments in both sexes.
“The importance of studying sex differences cannot be emphasized enough”, says last author and SciLifeLab group leader Cecilia Williams (KTH/KI).
In the new article, published in the International Journal of Molecular Sciences, her research group, including SciLifeLab PhD student Linnea Hases (KTH/KI), reports that the main finding of their project is the importance of sexes in colorectal cancer biomarker discovery.
“There were both differences between sexes in the top-ranked diagnostic biomarkers but most interestingly, all biomarkers with a prognostic value were sex-dependent“, says Cecilia Williams.
In the paper, the researchers utilized supervised machine learning to identify top-ranked diagnostic biomarkers. Using these kinds of methods, a new layer can be added to the differential expression analysis and substantially improve current biomarker discovery.
“A supervised machine learning-based approach can also be used to identify a set of genes whose expression can jointly classify colorectal cancer into different subtypes”, she continues.
Current differential gene expression analysis presents limitations with the large amount of data generated from transcriptomic studies, which is not performed in a multivariate setting and does not consider inter-gene relationships.
SciLifeLab infrastructure crucial for success
“We used the SciLifeLab National Genomics Infrastructure (NGI) for RNA-sequencing with Illumina and their best practice RNA-seq analysis, and the National Bioinformatics Infrastructure (NBIS) for computing and storage allocations at UPPMAX provided by The Swedish National Infrastructure for Computing (SNIC)”, says Cecilia Williams.
According to her, the hardest part of the project was to implement machine learning, which none of the researchers were familiar with.
“We took help from a master student in health informatics at Stockholm University who performed her master thesis project in our group“.
The researchers now hope that their results will be proving ground for future research and would like to see their results confirmed in independent cohorts.
“Based on our main findings, we hope and believe that this will yield an improved understanding of the importance of sexes in colorectal cancer research, and lead to improved biomarker discovery. We hope to see more studies that are based on both sexes, taking sex differences into account, both in preclinical and clinical studies”, she says.
What did you enjoy the most on this project?
“The best part with the project was to see how well the machine learning methods worked on our data. We could confirm many previously proposed colorectal cancer biomarkers.”
How do you think this research will affect the field in the future?
“Here we support a role for supervised machine learning for biomarker discovery. We think that the transcriptomic analysis will move from the normal differential expression pipelines to a machine learning approach when you work with large sample sizes. Supervised machine learning can be used to identify clusters of genes, which correlate to disease progression, subtypes, or prognosis.”