Just a few grams of a of lake water, sediments or soil can harbour the genetic information of thousands of organisms. The wealth of data contained in these samples can theoretically be utilised for a wide range of applications, including large-scale biodiversity monitoring, species detection or individual tracking through space and time. The research community has traditionally focused on analysing such samples by looking at very small stretches of DNA that are unique for a focal set of species (metabarcoding). Such methods analyse a minute fraction of the total DNA in the sample and thus the accuracy and sensitivity of metabarcoding methods are far below the theoretical possibilities.
Within the last decade, sequencing costs, high performance computer clusters and genome reference databases have improved by orders of magnitude. It is now financially feasible to sequence nearly all of the DNA within a sample. Together with the rapidly expanding databases, my research aims to leverage most of the information contained within such samples by developing algorithms that can efficiently classify DNA to their species origin and use these algorithms to analyse different sample types, including ancient sediments to look at the presence and distribution of animals and plants over time.