Julia Silge's Blog, page 10
October 11, 2019
Opioid prescribing habits in Texas
A paper I worked on was just published in a medical journal. This is quite an odd thing for me to be able to say, given my academic background and the career path I have had, but there you go! The first author of this paper is a long-time friend of mine working in anesthesiology and pain management, and he obtained data from the Texas Prescription Drug Monitoring Program (PDMP) about controlled substance prescriptions from April 2015 to 2018. The DEA also provides data about controlled...
September 22, 2019
(Re)Launching my supervised machine learning course
Today I am happy to announce a new(-ish), free, online, interactive course that I have developed, Supervised Machine Learning: Case Studies in R!
Predictive modeling, or supervised machine learning, is a powerful tool for using data to make predictions about the world around us. Once you understand the basic ideas of supervised machine learning, the next step is to practice your skills so you know how to apply these techniques wisely and appropriately. In...
August 25, 2019
Practice using lubridate... THEATRICALLY
I am so pleased to now be an RStudio-certified tidyverse trainer! I have been teaching technical content for decades, whether in a university classroom, developing online courses, or leading workshops, but I still found this program valuable for my own professonal development. I learned a lot that is going to make my teaching better, and I am happy to have been a participant. If you are looking for someone to lead trainings or workshops in your organization, you can check out this list of...
July 7, 2019
Introducing tidylo
Today I am so pleased to introduce a new package for calculating weighted log odds ratios, tidylo.
Often in data analysis, we want to measure how the usage or frequency of some feature, such as words, differs across some group or set, such as documents. One statistic often used to find these kinds of differences in text data is tf-idf. Another option is to use the log odds ratio, but the log odds ratio alone does not account for sampling variability. We havent counted every feature the same...
June 30, 2019
Reordering and facetting for ggplot2
I recently wrote about the release of tidytext 0.2.1, and one of the most useful new features in this release is a couple of helper functions for making plots with ggplot2. These helper functions address a class of challenges that often arises when dealing with text data, so weve included them in the tidytext package.
Lets work through an exampleTo show how to use these new functions, lets walk through a more general example that does not deal with results that come from unstructured, free...
June 13, 2019
Fixing your mistakes: sentiment analysis edition
Today tidytext 0.2.1 is available on CRAN! This new release of tidytext has a collection of nice new features.
Bug squashing! Improvements to error messages and documentation Switching from broom to generics for lighter dependencies Addition of some helper plotting functions I look forward to blogging about soonAn additional change is significant and may be felt by you, the user, so I want to share a bit about it.
Once upon a timeWhen we started building tidytext back in 2016, one of...
April 29, 2019
Relaunching the qualtRics package
Note: cross-posted with the rOpenSci blog.
rOpenSci is one of the first organizations in the R community I ever interacted with, when I participated in the 2016 rOpenSci unconf. I have since reviewed several rOpenSci packages and been so happy to be connected to this community, but I have never submitted or maintained a package myself. All that changed when I heard the call for a new maintainer for the qualtRics package. ITS GO TIME, I thought.
Qualtrics is an online survey and data...
April 15, 2019
Writing a letter to DataCamp
Since 2017 I have been an instructor for DataCamp, the VC-backed online data science education platform. What this means is that I am not an employee, but I have developed content for the company as a contractor. I have two courses there, one on text mining and one on practical supervised machine learning.
About two weeks ago, DataCamp published a blog post outlining an incident of sexual misconduct at the company. The post was published one day after a group of over 100 instructors sent a...
February 23, 2019
Read all about it! Navigating the R Package Universe
In the most recent issue of the R Journal, I have a new paper out with coauthors John Nash and Spencer Graves. Check out the abstract:
Today, the enormous number of contributed packages available to R users outstrips any given users ability to understand how these packages work, their relative merits, or how they are related to each other. We organized a plenary session at useR!2017 in Brussels for the R community to think through these issues and ways forward. This session considered three...