Julia Silge's Blog, page 9

April 13, 2020

PCA and the #TidyTuesday best hip hop songs ever

Lately I’ve been publishing screencasts demonstrating how to use the tidymodels framework, from first steps in modeling to how to tune more complex models. Today, I’m exploring a different part of the tidymodels framework; I’m showing how to implement principal component analysis via recipes with this week’s #TidyTuesday dataset on the best hip hop songs of all time as determinded by a BBC poll of music critics.

Here is the code I used in the video, for those who...

 •  0 comments  •  flag
Share on Twitter
Published on April 13, 2020 17:00

April 1, 2020

Bootstrap resampling with #TidyTuesday beer production data

I���ve been publishing screencasts demonstrating how to use the tidymodels framework, from first steps in modeling to how to tune more complex models. Today, I���m using this week���s #TidyTuesday dataset on beer production to show how to use bootstrap resampling to estimate model parameters.

Here is the code I used in the video, for those who prefer reading instead of or in addition to video.

Explore the data

Our modeling goal here is to estimate how much sugar beer producers use...

 •  0 comments  •  flag
Share on Twitter
Published on April 01, 2020 17:00

March 25, 2020

Tuning random forest hyperparameters with #TidyTuesday trees data

I���ve been publishing screencasts demonstrating how to use the tidymodels framework, from first steps in modeling to how to tune more complex models. Today, I���m using a #TidyTuesday dataset from earlier this year on trees around San Francisco to show how to tune the hyperparameters of a random forest model and then use the final best model.

Here is the code I used in the video, for those who prefer reading instead of or in addition to video.

Explore the data

Our modeling goal here is...

 •  0 comments  •  flag
Share on Twitter
Published on March 25, 2020 17:00

March 16, 2020

LASSO regression using tidymodels and #TidyTuesday data for The Office

I���ve been publishing screencasts demonstrating how to use the tidymodels framework, from first steps in modeling to how to tune more complex models. Today, I���m using this week���s #TidyTuesday dataset on The Office to show how to build a LASSO regression model and choose regularization parameters!

Here is the code I used in the video, for those who prefer reading instead of or in addition to video.

Explore the data

Our modeling goal here is to predict the IMDB ratings for episodes of...

 •  0 comments  •  flag
Share on Twitter
Published on March 16, 2020 17:00

March 9, 2020

Preprocessing and resampling using #TidyTuesday college data

Ive been publishing screencasts demonstrating how to use the tidymodels framework, from first getting started to how to tune machine learning models. Today, Im using this weeks #TidyTuesday dataset on college tuition and diversity at US colleges to show some data preprocessing steps and how to use resampling!

Here is the code I used in the video, for those who prefer reading instead of or in addition to video.

Explore the data

Our modeling goal here is to predict which US colleges have...

 •  0 comments  •  flag
Share on Twitter
Published on March 09, 2020 17:00

February 17, 2020

Hyperparameter tuning and #TidyTuesday food consumption

Last week I published a screencast demonstrating how to use the tidymodels framework and specifically the recipes package. Today, Im using this weeks #TidyTuesday dataset on food consumption around the world to show hyperparameter tuning!

Here is the code I used in the video, for those who prefer reading instead of or in addition to video.

Explore the data

Our modeling goal here is to predict which countries are Asian countries and which countries are not, based on their patterns of food...

 •  0 comments  •  flag
Share on Twitter
Published on February 17, 2020 16:00

February 10, 2020

#TidyTuesday hotel bookings and recipes

Last week I published my first screencast showing how to use the tidymodels framework for machine learning and modeling in R. Today, Im using this weeks #TidyTuesday dataset on hotel bookings to show how to use one of the tidymodels packages recipes with some simple models!

Here is the code I used in the video, for those who prefer reading instead of or in addition to video.

Explore the data

Our modeling goal here is to predict which hotel stays include children (vs.do not include...

 •  0 comments  •  flag
Share on Twitter
Published on February 10, 2020 16:00

February 4, 2020

#TidyTuesday and tidymodels

This week I started my new job as a software engineer at RStudio, working with Max Kuhn and other folks on tidymodels. I am really excited about tidymodels because my own experience as a practicing data scientist has shown me some of the areas for growth that still exist in open source software when it comes to modeling and machine learning. Almost nothing has had the kind of dramatic impact on my productivity that the tidyverse and other RStudio investments have had; I am enthusiastic about...

 •  0 comments  •  flag
Share on Twitter
Published on February 04, 2020 16:00

December 31, 2019

About

 •  0 comments  •  flag
Share on Twitter
Published on December 31, 2019 16:00

December 30, 2019

Modeling salary and gender in the tech industry

One of the biggest projects I have worked on over the past several years is the Stack Overflow Developer Survey, and one of the most unique aspects of this survey is the extensive salary data that is collected. This salary data is used to power the Stack Overflow Salary Calculator, and has been used by various folks to explore how people who use spaces make more than those who use tabs, whether thats just a proxy for open source contributions, and more. I recently left my job as a data...

 •  0 comments  •  flag
Share on Twitter
Published on December 30, 2019 16:00