Julia Silge's Blog, page 10

October 11, 2019

Opioid prescribing habits in Texas

A paper I worked on was just published in a medical journal. This is quite an odd thing for me to be able to say, given my academic background and the career path I have had, but there you go! The first author of this paper is a long-time friend of mine working in anesthesiology and pain management, and he obtained data from the Texas Prescription Drug Monitoring Program (PDMP) about controlled substance prescriptions from April 2015 to 2018. The DEA also provides data about controlled...

 •  0 comments  •  flag
Share on Twitter
Published on October 11, 2019 17:00

September 22, 2019

(Re)Launching my supervised machine learning course

Today I am happy to announce a new(-ish), free, online, interactive course that I have developed, Supervised Machine Learning: Case Studies in R!

Supervised machine learning in R

Predictive modeling, or supervised machine learning, is a powerful tool for using data to make predictions about the world around us. Once you understand the basic ideas of supervised machine learning, the next step is to practice your skills so you know how to apply these techniques wisely and appropriately. In...

 •  0 comments  •  flag
Share on Twitter
Published on September 22, 2019 17:00

August 25, 2019

Practice using lubridate... THEATRICALLY

I am so pleased to now be an RStudio-certified tidyverse trainer! I have been teaching technical content for decades, whether in a university classroom, developing online courses, or leading workshops, but I still found this program valuable for my own professonal development. I learned a lot that is going to make my teaching better, and I am happy to have been a participant. If you are looking for someone to lead trainings or workshops in your organization, you can check out this list of...

 •  0 comments  •  flag
Share on Twitter
Published on August 25, 2019 17:00

July 7, 2019

Introducing tidylo

Today I am so pleased to introduce a new package for calculating weighted log odds ratios, tidylo.

Often in data analysis, we want to measure how the usage or frequency of some feature, such as words, differs across some group or set, such as documents. One statistic often used to find these kinds of differences in text data is tf-idf. Another option is to use the log odds ratio, but the log odds ratio alone does not account for sampling variability. We havent counted every feature the same...

 •  0 comments  •  flag
Share on Twitter
Published on July 07, 2019 17:00

June 30, 2019

Reordering and facetting for ggplot2

I recently wrote about the release of tidytext 0.2.1, and one of the most useful new features in this release is a couple of helper functions for making plots with ggplot2. These helper functions address a class of challenges that often arises when dealing with text data, so weve included them in the tidytext package.

Lets work through an example

To show how to use these new functions, lets walk through a more general example that does not deal with results that come from unstructured, free...

 •  0 comments  •  flag
Share on Twitter
Published on June 30, 2019 17:00

June 13, 2019

Fixing your mistakes: sentiment analysis edition

Today tidytext 0.2.1 is available on CRAN! This new release of tidytext has a collection of nice new features.

Bug squashing! Improvements to error messages and documentation Switching from broom to generics for lighter dependencies Addition of some helper plotting functions I look forward to blogging about soon

An additional change is significant and may be felt by you, the user, so I want to share a bit about it.

Once upon a time

When we started building tidytext back in 2016, one of...

 •  0 comments  •  flag
Share on Twitter
Published on June 13, 2019 17:00

April 29, 2019

Relaunching the qualtRics package

Note: cross-posted with the rOpenSci blog.

rOpenSci is one of the first organizations in the R community I ever interacted with, when I participated in the 2016 rOpenSci unconf. I have since reviewed several rOpenSci packages and been so happy to be connected to this community, but I have never submitted or maintained a package myself. All that changed when I heard the call for a new maintainer for the qualtRics package. ITS GO TIME, I thought.

Qualtrics is an online survey and data...

 •  0 comments  •  flag
Share on Twitter
Published on April 29, 2019 17:00

April 15, 2019

Writing a letter to DataCamp

Since 2017 I have been an instructor for DataCamp, the VC-backed online data science education platform. What this means is that I am not an employee, but I have developed content for the company as a contractor. I have two courses there, one on text mining and one on practical supervised machine learning.

About two weeks ago, DataCamp published a blog post outlining an incident of sexual misconduct at the company. The post was published one day after a group of over 100 instructors sent a...

 •  0 comments  •  flag
Share on Twitter
Published on April 15, 2019 17:00

February 23, 2019

Read all about it! Navigating the R Package Universe

In the most recent issue of the R Journal, I have a new paper out with coauthors John Nash and Spencer Graves. Check out the abstract:

Today, the enormous number of contributed packages available to R users outstrips any given users ability to understand how these packages work, their relative merits, or how they are related to each other. We organized a plenary session at useR!2017 in Brussels for the R community to think through these issues and ways forward. This session considered three...

 •  0 comments  •  flag
Share on Twitter
Published on February 23, 2019 16:00