David Stohlmann

16.05.2023

Towards high-quality gene annotation in cephalopod genomes

MSc Student
Advisor: Oleg Simakov

Department of Neuroscience and Developmental Biology
University of Vienna

Abstract

Cephalopods are conchiferan mollusks which comprise about 800 extant species, divided into nautilids and coleoids. Coleoids involve the most diverse lineage and include the two monophyletic groups, Octopodiformes (octopuses) and Decapodiformes (squids, cuttlefishes). Even though the monophyly of those groups is confirmed, the relationships within them are not completely clarified. A way to get a deeper insight into these relationships is using single-copy orthologous genes. At the moment, only some coleoid genomes are available and their quality differs considerably. To assess the genome quality, a bioinformatics tool called BUSCO (benchmarking universal single-copy orthologs) have provided a quantitative measurement of genomic data completeness for several animals. BUSCO can identify complete, duplicate, fragmented as well as missing orthologous genes based on different parameters. The foundation of BUSCO is a database called OrthoDB which includes genomic data from bacteria, fungi, plants, protists and metazoans. The latter are mainly based on vertebrates and arthropods and therefore they are unsuitable for analyses with cephalopods. Cephalopod genomes are large and consists in many repetitive regions which makes the current BUSCO dataset not suitable as a measurement of genome completeness. Thus, the aim of my Master’s thesis is to establish a set of high-quality cephalopod orthologous genes that can be used to assess the quality of cephalopod genome assemblies and furthermore be used for phylogenetic analyses. A first BUSCO test run with a dataset based on four cephalopod species has produced promising results, which we are trying to improve in further steps. Ultimately, these sets of high-quality orthologous genes will help to provide better values of cephalopod genome completeness and bring us closer to understanding the molecular evolution as well as phylogeny of different cephalopod lineages.