Dissertation Talk: Towards Automatic Machine Learning Pipeline Design

Presentation: Dissertation Talk: CS | May 10 | 1-2 p.m. | 380 Soda Hall

 Mitar Milutinovic

 Electrical Engineering and Computer Sciences (EECS)

With rapid increase in data collected, a bottleneck of making informed decisions is not anymore lack of data but lack of enough data scientists to help analyze collected data. At the same time we can observe that many tasks a data scientist does during analysis could be automated. Automatic machine learning (AutoML) deals with researching approaches to automatize parts of data analysis process, or even to automatize it fully.

In this talk we will present our work on addressing this problem. We will present necessary components to approach this problem using ML techniques themselves and our design of those components. We will focus on a task of end-to-end ML pipelines which includes not just model selection but also data cleaning, feature extraction, and many other pre-processing and post-processing tasks. We will explain issues with comparing various approaches and generally issues when comparing ML pipelines at a large scale. We will propose solutions to those problems and compare them with closest alternatives.

 jeannguyen@eecs.berkeley.edu, 510-642-9413