Skip to main content.
Advanced search >
<< Back to previous page Print

<< Wednesday, May 15, 2013 >>

Remind me

Tell a friend

Add to my Google calendar (bCal)

Download to my calendar

Bookmark and ShareShare

[Dissertation Talk] Ambiguous Fragment Assignment for High-Throughput Sequencing Experiments

Seminar: Departmental | May 15 | 3-4 p.m. | 380 Soda Hall

Adam Roberts, Electrical Engineering and Computer Sciences (EECS)

Electrical Engineering and Computer Sciences (EECS)

As the cost of short-read, high-throughput DNA sequencing continues to fall rapidly, new uses for the technology have been developed aside from its original use in determining a specie's genome. Many of these new experiments use the sequencer as a digital counter for measuring biological activities such as gene expression (RNA-Seq) or protein binding (ChIP-Seq). A common problem faced in the analysis of these data is that of sequenced fragments that are 'ambiguous', meaning they resemble multiple loci in a reference genome or other sequence. Initially, such ambiguous fragments were ignored or were assigned to loci using simple heuristics. However, statistical approaches using maximum likelihood estimation have been shown to greatly improve the accuracy of downstream analyses and are becoming widely adopted.

Nevertheless, as the models used in these methods become more complex and the datasets become larger, most of these methods (which are often based on the EM algorithm) have failed to scale. In this talk, we present our model for ambiguous fragment assignment, which is the most sophisticated to date, as well as various methods we have explored for scaling our optimization procedure. These methods include the use of an online EM algorithm and a distributed EM solution implemented on the Spark cluster computing system under development in the Amp lab.