Julia Silge's Blog, page 6
April 22, 2021
Which #TidyTuesday Netflix titles are movies and which are TV shows?
This is the latest in my series of screencasts demonstrating how to use the tidymodels packages, from just starting out to tuning more complex models with many hyperparameters. Today���s screencast walks through how to build features for modeling from text, with this week���s #TidyTuesday dataset on Netflix titles. ����
Here is the code I used in the video, for those who prefer reading instead of or in addition to video.
Explore dataOur modeling goal is to predict whether a title on Ne...
April 13, 2021
Which #TidyTuesday post offices are in Hawaii?
This is the latest in my series of screencasts demonstrating how to use the tidymodels packages, from starting out with first modeling steps to tuning more complex models. Today���s screencast walks through how to use text information at the subword level in predictive modeling, with this week���s #TidyTuesday dataset on United States post offices. ����
Here is the code I used in the video, for those who prefer reading instead of or in addition to video.
Explore dataOur modeling goal i...
March 23, 2021
Dimensionality reduction of #TidyTuesday United Nations voting patterns
This is the latest in my series of screencasts demonstrating how to use the tidymodels packages, from starting out with first modeling steps to tuning more complex models. One change I have recently made on my blog is to remove Disqus comments. I want to say a huge THANK YOU ���� to everyone who ever commented on my blog before and express how much I appreciate folks��� interest and willingness to share their thoughts. Disqus was becoming frustrating for a couple of reasons, so I downloaded my...
March 3, 2021
Bootstrap confidence intervals for #TidyTuesday Super Bowl commercials
This is the latest in my series of screencasts demonstrating how to use the tidymodels packages, from starting out with first modeling steps to tuning more complex models. Today’s screencast uses a relatively new function from rsample for quickly finding bootstrap confidence intervals, with this week’s #TidyTuesday dataset on Super Bowl commercials. ����
Here is the code I used in the video, for those who prefer reading instead of or in addition to video.
Explore the dataOur modeling g...
February 23, 2021
Getting started with k-means and #TidyTuesday employment status
This is the latest in my series of screencasts demonstrating how to use the tidymodels packages, from starting out with first modeling steps to tuning more complex models. Today���s screencast uses the broom package to tidy output from k-means clustering, with this week���s #TidyTuesday dataset on employment and demographics.
Here is the code I used in the video, for those who prefer reading instead of or in addition to video.
Explore the dataOur modeling goal is to use k-means c...
February 11, 2021
Understand your models with #TidyTuesday inequality in student debt
This is the latest in my series of screencasts demonstrating how to use the tidymodels packages, from starting out with first modeling steps to tuning more complex models. Today’s screencast is a short one! It walks through how we can use tidyverse and tidymodels functions to explore a model after we have trained it, using this week’s #TidyTuesday dataset on student debt inequality. �����������
Here is the code I used in the video, for those who prefer reading instead of or in addition ...
February 1, 2021
Learn tidytext with my new learnr course
Today I am happy to announce that a new free, online, open source, interactive tutorial, Text Mining with Tidy Data Principles, has been published! ����
I previously developed an interactive course on text mining for an online learning company, but that course is no longer available. I’ve been wanting to revisit the ideas behind that course, update them, and make a new tutorial freely available for a long time, much like I did for my supervised machine learning course; I recently sat down and...
January 14, 2021
Explore art media over time in the #TidyTuesday Tate collection dataset
This is the latest in my series of screencasts demonstrating how to use the tidymodels packages, from starting out with first modeling steps to tuning more complex models. Today’s screencast walks through how to train a regularized regression model with text features and then check model diagnostics like residuals, using this week’s #TidyTuesday dataset on the artwork in the Tate collection. ����
Here is the code I used in the video, for those who prefer reading instead of or in additi...
January 3, 2021
Predicting injuries for Chicago traffic crashes
This is the latest in my series of
screencasts demonstrating how to use the
tidymodels packages, from starting out with first modeling steps to tuning more complex models. Instead of Tidy Tuesday data, this screencast uses some “wild caught” data from Chicago’s open data portal and is planned to be the first in a series walking through how to approach model ops tasks using tidymodels and other R tools. This screencast focuses on training a model, for
traffic crashes in Chicago. We can build a...
December 15, 2020
Upcoming changes to tidytext: threat of COLLAPSE
The
tidytext package passed one million downloads from CRAN this year! It has been truly amazing to see this project grow
out of an rOpenSci unconference several years ago to be a piece of software useful to people’s real world work.
There has been some of the infrastructure of the package still around from its very early days, and as more people have continued to use it, some early decisions have needed to be visited. I recently made some updates that fix what most people would consider a bug...