Julia Silge's Blog, page 5

August 14, 2021

Predict housing prices in Austin TX with tidymodels and xgboost

This is the latest in my series of screencasts demonstrating how to use the tidymodels packages, from just getting started to tuning more complex models. My screencasts lately have focused on xgboost as I have participated in SLICED, a competitive data science streaming show. This past week were the semifinals, where we competed to predict prices of homes in Austin, TX. ���� One of the more interesting available variables for this dataset was the text description of the real estate listings, ...

 •  0 comments  •  flag
Share on Twitter
Published on August 14, 2021 17:00

August 12, 2021

Supervised Machine Learning for Text Analysis in R is now complete

Last summer, Emil Hvitfeldt and I announced that we had started work on a new book project, to be published in the Chapman & Hall/CRC Data Science Series, and we are now happy to say that Supervised Machine Learning for Text Analysis for R (or SMLTAR, as we call it for short) is complete, in production, and available for preorder! You should be able to preorder it anywhere you normally buy books, such as Bookshop, Amazon, or directly from our publisher. The book is available in its entirety ...

 •  0 comments  •  flag
Share on Twitter
Published on August 12, 2021 17:00

August 6, 2021

Tune xgboost models with early stopping to predict shelter animal status

This is the latest in my series of screencasts demonstrating how to use the tidymodels packages, from just getting started to tuning more complex models. I participated in this week���s episode of the SLICED playoffs, a competitive data science streaming show, where we competed to predict the status of shelter animals. ���� I used xgboost���s early stopping feature as I competed, so let���s walk through how and when to try that out!

Here is the code I used in the video, for those who pr...

 •  0 comments  •  flag
Share on Twitter
Published on August 06, 2021 17:00

July 28, 2021

Use racing methods to tune xgboost models and predict home runs

This is the latest in my series of screencasts demonstrating how to use the tidymodels packages, from just getting started to tuning more complex models. This week���s episode of SLICED, a competitive data science streaming show, had contestants compete to predict home runs in recent baseball games. Honestly I don���t know much about baseball ��� but the finetune package had a recent release and this challenge offers a good opportunity to show how to use racing methods for tuning.

Here...

 •  0 comments  •  flag
Share on Twitter
Published on July 28, 2021 17:00

July 12, 2021

Predict which #TidyTuesday Scooby Doo monsters are REAL with a tuned decision tree model

This is the latest in my series of screencasts demonstrating how to use the tidymodels packages, from just getting started to tuning more complex models. Today���s screencast walks through how to train and evalute a random forest model, with this week���s #TidyTuesday dataset on Scooby Doo episodes. ����

Here is the code I used in the video, for those who prefer reading instead of or in addition to video.

Explore data

Our modeling goal is to predict which Scooby Doo monsters are real a...

 •  0 comments  •  flag
Share on Twitter
Published on July 12, 2021 17:00

June 29, 2021

Create a custom metric with tidymodels and NYC Airbnb prices

This is the latest in my series of screencasts demonstrating how to use the tidymodels packages, from just getting started to tuning more complex models. This week���s episode of SLICED, a competitive data science prediction challenge, introduced a challenge for predicting the prices of Airbnb listings in NYC. In today���s screencast, I walk through how to build such a model combining tabular data with unstructured text data from the listing names, and how to create a custom metric in tidymod...

 •  0 comments  •  flag
Share on Twitter
Published on June 29, 2021 17:00

June 20, 2021

Class imbalance and classification metrics with aircraft wildlife strikes

This is the latest in my series of screencasts demonstrating how to use the tidymodels packages, from just starting out to tuning more complex models with many hyperparameters. I recently participated in SLICED, a competitive data science prediction challenge. I did not necessarily cover myself in glory but in today���s screencast, I walk through the data set on aircraft wildlife strikes we used and how different choices around handling class imbalance affect different classification metrics....

 •  0 comments  •  flag
Share on Twitter
Published on June 20, 2021 17:00

May 27, 2021

Partial dependence plots with tidymodels and DALEX for #TidyTuesday Mario Kart world records

This is the latest in my series of screencasts demonstrating how to use the tidymodels packages, from just starting out to tuning more complex models with many hyperparameters. Today���s screencast walks through how to train and evalute a random forest model, with this week���s #TidyTuesday dataset on Mario Kart world records. ����

Here is the code I used in the video, for those who prefer reading instead of or in addition to video.

Explore data

Our modeling goal is to predict whether a...

 •  0 comments  •  flag
Share on Twitter
Published on May 27, 2021 17:00

May 5, 2021

Predict availability in #TidyTuesday water sources with random forest models

This is the latest in my series of screencasts demonstrating how to use the tidymodels packages, from just starting out to tuning more complex models with many hyperparameters. Today���s screencast walks through how to train and evalute a random forest model, with this week���s #TidyTuesday dataset on water sources. ����

Here is the code I used in the video, for those who prefer reading instead of or in addition to video.

Explore data

Our modeling goal is to predict whether a water sou...

 •  0 comments  •  flag
Share on Twitter
Published on May 05, 2021 17:00

April 27, 2021

Estimate change in #TidyTuesday CEO departures with bootstrap resampling

This is the latest in my series of screencasts demonstrating how to use the tidymodels packages, from just starting out to tuning more complex models with many hyperparameters. Today���s screencast walks through how to use bootstrap resampling, with this week���s #TidyTuesday dataset on CEO departures. ����

Here is the code I used in the video, for those who prefer reading instead of or in addition to video.

Explore data

Our modeling goal is to estimate how involuntary CEO departures a...

 •  0 comments  •  flag
Share on Twitter
Published on April 27, 2021 17:00