Data Carpentry: Training to Enable More Effective and Reproducible Research

Lecture | February 10 | 1:10-2:30 p.m. | 190 Doe Library

 Berkeley Institute for Data Science

Although petabytes of data are now available, most scientific disciplines are failing to translate this sea of data into scientific advances. The missing step between data collection and research progress is a lack of training for scientists in crucial skills for effectively and reproducibly managing and analyzing large amounts of data. Already faced with a deluge of data, researchers themselves are demanding this training and need to learn while on the job. They require training that is immediate, accessible, appropriate for their level, and relevant to their domain. This training needs to include not only technical skills but also ways of thinking about data to provide learners with the knowledge of what is possible along with the confidence to continue self-guided learning. Short, intensive, hands-on Software and Data Carpentry workshops give researchers the opportunity to engage in deliberate practice as they learn these skills, starting with strong foundational skills and receiving feedback as they learn.This model has been shown to be effective, with the vast majority (more than 90%), of learners saying that participating in the workshop was worth their time and led to improvements in their data-management and data-analysis skills. We have trained more than 20,000 learners since 2014 on six continents with more than 700 volunteer instructors, with the goal of providing effective training that empowers researchers to turn data into knowledge and discovery.