Experimental design in an oligonucleotide synthesis factory using numerical simulations in Python and pandas

Seminar | February 6 | 1:30-2:45 p.m. | 775A Tan Hall

 Aaron Wiegel, Data Scientist, Synthego

 Department of Chemistry

Abstract: Regardless of the application, calculating a particular statistic and associated p-value is not necessarily the biggest challenge in designing an experiment, especially given the availability of open source software packages such as scipy and statsmodels in Python. Instead, ensuring that the assumptions required for a statistical test are actually satisfied by the data is far more challenging. Thankfully, with an existing data source, the sample method for a dataframe in pandas can be used to create simple numerical simulations to test these assumptions with real data. Using such numerical simulations on data from an oligonucleotide synthesis factory, I discuss the fundamental concepts of sampling, statistical power, and experimental design in the context of my work as a data scientist at Synthego, a biotech manufacturing startup.

Bio: Aaron Wiegel is a data scientist at Synthego, a biotech manufacturing startup. He obtained his PhD in physical chemistry from UC Berkeley where he first learned Python to create simulations of collisions between atoms and molecules using numpy and scipy. As a data scientist, he now creates automated machine learning pipelines for mass spectometry data and uses numerical simulations to help design experiments for an automated chemistry and biology laboratory. In addition to his professional work, Aaron also volunteers teaching community college math, statistics, and science courses to California state prison inmates. For fun, he brews his own beer at home, where he performs much tastier experiments than in the lab.

 boering@berkeley.edu