Highlights from Genome Science 2015

Genome Science is an annual conference focusing on genome research in the UK, with this year’s conference hosted by Nick Loman in Birmingham. Big thanks out to Nick and the organising committee for bringing together such as diverse range of speakers, ranging all the way from functional genomics through to clinical genomics and from new technologies through to fundamental research. CGAT was out in force with 8 of us attending, probably only outdone in pure numbers by the might of TGAC!

I thought I’d put a few highlights up on the blog, partly because I’m so bad at remembering names and faces and this might help me remember who it was who gave that really interesting talk at Genome Science 2015 in a few months time! This is far from a comprehensive summary of all the talks though.

In the opening talk of the conference, Daniel MacArthur’s presented the results from aggregating 1000s of exome studies. This included one of those “why hadn’t I thought about this before” moments when he explained that we’ve now sequenced the exomes of so many individuals that one can start applying statistical tests to determine genes where the number of single nucleotide variants (SNVs) identified is significantly lower than we would expect by chance. The obvious explanation for this is that mutations in these gene lead to such a severe phenotype that the mutation is removed from the gene-pool by purifying selection. This is indeed the case for a number of the genes which are implicated in severe diseases. Interestingly some of the genes they’ve identified have never been implicated in a human disease before. However, we can now say that a mutation causing an amino-acid change would likely be fatal. So there’s value not just in what you observe but what you don’t observe too! What really excites me is the potential to turn this analysis to non-coding regions when we have enough full genomes sequenced – this could really help increase our understanding of the regions of the genome which are important in regulating gene expression. Ed Yong’s got an article up on theatlantic.com explaining this much better than I have (How Data-Wranglers Are Building the Great Library of Genetic Variation)

For me, some of the most interesting talks at Genome Science 2015 were those which introduced novel technologies or analyses and prompted me to think about new ways to answer those burning biological questions. There were many talks about nanopore-based sequencing and two in particular gave me some food for thought.

Mike Akeson (University of California Santa Cruz) spoke about the development of nanopore sequencing technologies, from the early work of George Church and others through to the recent results coming out of Oxford Nanopore’s early access program. There’s obviously a lot of excitement about the rapid improvements in base calling, however, what really excited me was the mention of nanopore sequencing of proteins. Whisper it quietly but clinical applications of current sequencing technologies often use transcript abundance merely as a proxy for protein abundance. If we could start reliably quantifying proteins with nanopore-based technologies, this could be a very disruptive technology. On the research side, as someone who’s interested in the question of how much alternative splicing of transcripts is actually propagated to the protein level, being able to directly sequence both the transcriptome and proteome would be fantastic.

Keeping with the nanopore theme, Matt Loose (University of Nottingham) presented the latest incarnation of his minoTour platform for realtime minION analysis. It looks like a really slick interface but I must confess that up to this point I’d not really seen the value in the realtime aspect of nanopore analyses within research settings. Watching a minION generate reads in a live demonstration is very cool but apart from a constantly updated histogram of read length, what do you really gain from this? Of course in a clinical setting you could hook up the output directly to the downstream analysis and obtain your results 24 hours quicker, which could be of real benefit. But for me the downstream analysis is probably best measured in months not hours! What I hadn’t considered until Matt’s talk though was that the realtime analysis allows you to selectively focus on particular sequences. In the example he gave, if you barcode DNA from each sample, you can identify the sample source of the DNA in realtime. This then enables you to obtain roughly equal sequencing depth from all your samples in a multiplexed pool of DNA by rejecting pores which are occupied by an over-sequenced sample and switching to another pore. There’ll no doubt be other similar applications of the realtime sequence analysis to come. Interestingly, Matt mentioned that this is currently only possible because the speed which the DNA is being ratcheted through the nanopore is suppressed to improve accuracy, when the speed increases, it may start to become difficult to analyses the sequences quick enough!

Other highlights included Mick Watson’s very engaging presentation of his recent publication which defines genes which can’t be accurately quantified in isolation but can yield biologically relevant results when considered as a group of genes. Definitely worth a read: Errors in RNA-Seq quantification affect genes of relevance to human disease. I’ll return to this topic in more depth with some of my own simulation data using Kallisto soon.