A free, interactive course using caret

Predictive modeling, or supervised machine learning, is a powerful tool for using data to make predictions about the world around us. Once you understand the basic ideas of supervised machine learning, the next step is to practice your skills so you know how to apply these techniques wisely and appropriately. In this course, you will work through four case studies using data from the real world; you will gain experience in exploratory data analysis, preparing data so it is ready for predictive modeling, training supervised machine learning models with the caret package, and evaluating those models.

To take this course, you need some familiarity with tidyverse packages like dplyr and ggplot2 and exposure to machine learning basics. Now let's get started!

Chapter 1: Not mtcars AGAIN

In this first case study, you will predict fuel efficiency from a US Department of Energy data set for real cars of today.

Chapter 2: Stack Overflow Developer Survey

Stack Overflow is the world's largest online community for developers, and you have probably used it to find an answer to a programming question. The second chapter of this course uses data from the annual Stack Overflow Developer Survey to practice predictive modeling and find which developers are more likely to work remotely.

Chapter 3: Get out the vote

In the third case study, you will use data on attitudes and beliefs in the United States to predict voter turnout. You will apply your skills in dealing with imbalanced data and explore more resampling options.

Chapter 4: But what do the nuns think?

The last case study in this course uses an extensive survey of Catholic nuns fielded in 1967 to once more put your practical machine learning skills to use. You will predict the age of these religious women from their responses about their beliefs and attitudes.

About this course

This is a free, open source course on supervised machine learning in R using the caret package. In this course, you'll work through four case studies and practice skills from exploratory data analysis through model evaluation. Ines Montani designed the web framework that runs this course, and Florencia D'Andrea helped build the site.

Contributions and comments on how to improve this course are welcome! Please file an issue or submit a pull request if you find something that could be fixed or improved.

Creative Commons License

About me

My name is Julia Silge and I'm a data scientist and software engineer at RStudio where I build modeling tools. I am both an international keynote speaker and a real-world practitioner focused on data analysis and machine learning practice. I love making beautiful charts and communicating about technical topics with diverse audiences.