A huge amount of sequence information is being generated world-wide as part of projects to sequence genomes and transcriptomes of a large number of organisms. This information is a highly valuable source of information to exploit fundamental biological processes such as gene function, transcriptional regulation, and molecular evolution at the level of DNA and protein sequences. Comparative genomics is a science where species are being compared with respect to their genomes, for instance to better understand the evolution of genomes. Comparative genomics is also useful in many computational methods, such as prediction of gene structure, prediction of non-coding RNAs and prediction of regulatory sites in the genome.
We are using comparative genomics and DNA and protein sequence analysis to examine the structure, function and evolution of genes. We have considered protein genes as well as non-coding RNAs (nc-RNA), i.e RNA that do not code for proteins but have other functions. The evolution of proteins is being studied in a number of projects where we make use of tools such as gene prediction, profile-based searches and methods of phylogenetic analysis. We have developed methods to effectively identify ncRNAs in genomic sequences and have applied these methods on a large number of ncRNA families. We have identified a large number of previously unrecognized homologues and these results have allowed a better understanding of the secondary structure and evolution of these RNAs.
In addition, a number of collaboration projects involve analysis of data from next-generation sequencing experiments such as exome sequencing and RNA-Seq.