Julia Silge's Blog, page 5
August 14, 2021
Predict housing prices in Austin TX with tidymodels and xgboost
This is the latest in my series of screencasts demonstrating how to use the tidymodels packages, from just getting started to tuning more complex models. My screencasts lately have focused on xgboost as I have participated in SLICED, a competitive data science streaming show. This past week were the semifinals, where we competed to predict prices of homes in Austin, TX. ���� One of the more interesting available variables for this dataset was the text description of the real estate listings, ...
August 12, 2021
Supervised Machine Learning for Text Analysis in R is now complete
Last summer, Emil Hvitfeldt and I announced that we had started work on a new book project, to be published in the Chapman & Hall/CRC Data Science Series, and we are now happy to say that Supervised Machine Learning for Text Analysis for R (or SMLTAR, as we call it for short) is complete, in production, and available for preorder! You should be able to preorder it anywhere you normally buy books, such as Bookshop, Amazon, or directly from our publisher. The book is available in its entirety ...
August 6, 2021
Tune xgboost models with early stopping to predict shelter animal status
This is the latest in my series of screencasts demonstrating how to use the tidymodels packages, from just getting started to tuning more complex models. I participated in this week���s episode of the SLICED playoffs, a competitive data science streaming show, where we competed to predict the status of shelter animals. ���� I used xgboost���s early stopping feature as I competed, so let���s walk through how and when to try that out!
Here is the code I used in the video, for those who pr...
July 28, 2021
Use racing methods to tune xgboost models and predict home runs
This is the latest in my series of screencasts demonstrating how to use the tidymodels packages, from just getting started to tuning more complex models. This week���s episode of SLICED, a competitive data science streaming show, had contestants compete to predict home runs in recent baseball games. Honestly I don���t know much about baseball ��� but the finetune package had a recent release and this challenge offers a good opportunity to show how to use racing methods for tuning.
Here...
July 12, 2021
Predict which #TidyTuesday Scooby Doo monsters are REAL with a tuned decision tree model
This is the latest in my series of screencasts demonstrating how to use the tidymodels packages, from just getting started to tuning more complex models. Today���s screencast walks through how to train and evalute a random forest model, with this week���s #TidyTuesday dataset on Scooby Doo episodes. ����
Here is the code I used in the video, for those who prefer reading instead of or in addition to video.
Explore dataOur modeling goal is to predict which Scooby Doo monsters are real a...
June 29, 2021
Create a custom metric with tidymodels and NYC Airbnb prices
This is the latest in my series of screencasts demonstrating how to use the tidymodels packages, from just getting started to tuning more complex models. This week���s episode of SLICED, a competitive data science prediction challenge, introduced a challenge for predicting the prices of Airbnb listings in NYC. In today���s screencast, I walk through how to build such a model combining tabular data with unstructured text data from the listing names, and how to create a custom metric in tidymod...
June 20, 2021
Class imbalance and classification metrics with aircraft wildlife strikes
This is the latest in my series of screencasts demonstrating how to use the tidymodels packages, from just starting out to tuning more complex models with many hyperparameters. I recently participated in SLICED, a competitive data science prediction challenge. I did not necessarily cover myself in glory but in today���s screencast, I walk through the data set on aircraft wildlife strikes we used and how different choices around handling class imbalance affect different classification metrics....
May 27, 2021
Partial dependence plots with tidymodels and DALEX for #TidyTuesday Mario Kart world records
This is the latest in my series of screencasts demonstrating how to use the tidymodels packages, from just starting out to tuning more complex models with many hyperparameters. Today���s screencast walks through how to train and evalute a random forest model, with this week���s #TidyTuesday dataset on Mario Kart world records. ����
Here is the code I used in the video, for those who prefer reading instead of or in addition to video.
Explore dataOur modeling goal is to predict whether a...
May 5, 2021
Predict availability in #TidyTuesday water sources with random forest models
This is the latest in my series of screencasts demonstrating how to use the tidymodels packages, from just starting out to tuning more complex models with many hyperparameters. Today���s screencast walks through how to train and evalute a random forest model, with this week���s #TidyTuesday dataset on water sources. ����
Here is the code I used in the video, for those who prefer reading instead of or in addition to video.
Explore dataOur modeling goal is to predict whether a water sou...
April 27, 2021
Estimate change in #TidyTuesday CEO departures with bootstrap resampling
This is the latest in my series of screencasts demonstrating how to use the tidymodels packages, from just starting out to tuning more complex models with many hyperparameters. Today���s screencast walks through how to use bootstrap resampling, with this week���s #TidyTuesday dataset on CEO departures. ����
Here is the code I used in the video, for those who prefer reading instead of or in addition to video.
Explore dataOur modeling goal is to estimate how involuntary CEO departures a...