Statistical and machine learning challenges from protein engineering to genetics
Seminar | October 31 | 12-1 p.m. | 106 Stanley Hall
Jennifer Listgarten, University of California, Berkeley
Molecular biology, genetics, and protein engineering have been slowly morphing into large-scale, data-driven sciences that can leverage machine learning and applied statistics. My talk will be a quick tour of several projects at this intersection. I will start off describing some new work toward machine-learning based protein engineering (and more general design problems) that can be viewed as a sort of in silico directed evolution. Given a "forward" predictive model for some property of interest, such as protein fluorescence, stability, or expression, one may want to find the DNA or amino acid sequence that maximizes or achieves a particular value of that property, and subject to constraints such as secondary structure, etc. We propose a new method for efficiently searching through the design space in order to achieve the desired properties. Time permitting, I will then briefly discuss modelling challenges and solutions in genetic association studies and CRISPR gene editing.