Rate this book

Tidy Text Mining with R

Name: Tidy Text Mining with R
Rating: 4.33 (13 reviews)

Julia Silge, David Robinson

Rate this book

If you work in analytics or data science, like we do, you are familiar with the fact that data is being generated all the time at ever faster rates. (You may even be a little weary of people pontificating about this fact.) Analysts are often trained to handle tabular or rectangular data that is mostly numeric, but much of the data proliferating today is unstructured and typically text-heavy. Many of us who work in analytic fields are not trained in even simple interpretation of natural language.

We developed a new R package, tidytext (Silge and Robinson 2016), because we were familiar with many methods for data wrangling and visualization, but couldn’t easily apply these same methods to text. We found that using tidy data principles can make many text mining tasks easier, more effective, and consistent with tools already in wide use. By treating text as data frames of words, we can manipulate, summarize, and visualize the characteristics of text easily and integrate natural language processing into effective workflows we were already using.

The tools provided by the tidytext package are relatively simple; what is important is the possible applications. Thus, this book provides compelling examples of real text mining problems.

GenresProgrammingNonfictionCodingTechnicalTechnologyComputer ScienceScience

ebook

Published January 8, 2017

76 people are currently reading

217 people want to read

About the author

Julia Silge

4 books30 followers

What do you think?

Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars

81 (50%)

4 stars

55 (34%)

3 stars

18 (11%)

2 stars

5 (3%)

1 star

0 (0%)

Displaying 1 - 13 of 13 reviews

Joanna

130 reviews

December 19, 2023

Did i read this for work? Yes. Was it useful? Also yes. Am i going to count it towards my total for the year? You bet!!

2023 nonfiction

Justohidalgo

79 reviews3 followers

August 10, 2020

Really comprehensive book about text mining with R and tidy. While it is understood that some R and tidy knowledge are required to work out the examples of the book, at around the TF/IDF chapter I started to feel that I was spending more time checking out google to see what that specific R function was doing, than to fully grasp the theoretical concepts applied to the cases. That made me lose interest and wanting to find other references. But I finally found the time to finish it and I have to say that all in all, this is a good book to see how to handle basic-to-medium use cases of text mining with R. I believe it may become a reference book for me when trying to work out my own datasets.

technical

Sahelanth

45 reviews6 followers

Read

July 21, 2018

Great code examples! Easy to emulate, shows the necessary data cleaning and preprocessing and gives good tips for what to do in other contexts. You'll need to be already familiar with R and the dplyr package to get anything out of this book, though.

If you don't know R or dplyr and want to jump straight in to natural language processing, I'd instead recommend starting with the vignettes for the tm or quanteda packages.

262 reviews35 followers

September 6, 2020

It covers the basics (sentiment analysis, tf-idf, n-gram, topic modelling, and visualization) well and the chapters on case studies are pretty helpful. The use of literature (Jane Austen's novels and more) as data also makes it more engaging to a literary minded reader.

It's just when the author says "slightly familiar with dplyr and ggplot2" on the preface, she means she is not going to explain any codes relating to these two packages. Compared to all those annotated-line-by-line codes in other online tutorials, this book may not be that accessible to a beginner.

On topic modelling, you may want to google how to determine the number of topics as more systematic approaches to such determination are not covered.

Emil O. W. Kirkegaard

183 reviews395 followers

May 10, 2020

Disclaimer: I am not an expert on text mining, but I do have ~8 years of data science experience.

This was a very nice introduction to doing it in R, and the examples were very interesting too. In general, I recommend books by these authors.

My only complaint is that they did not go into details about how they scraped Twitter posts. The API is quite annoying and limited, so one might have to do some regular webscraping. Guess I should read a textbook on that next.

Book is free at https://www.tidytextmining.com/

Tony Murray

7 reviews

August 6, 2020

I enjoyed working through the book but it is a bit dated at this point and has some areas that are not functioning due to outdated packages. At times I had to go to the website and then review what they had updated on website. Also there are times where they don’t have code set up for a user to actually execute it. For example, the code related to the twitter files were a bit confusing. I had to go to the github repository to actually download the data and this should have been explained since the book really is a mixture of coding and commentary.

Yuan

31 reviews1 follower

July 24, 2022

Although this book is no longer the most up-to-date book on text mining. I really like a lot of the plots (ggplots) in this book, in particular for exploratory analysis. You can inspect the outputs at each stage, and visuals are great ways to make sense of the text data and communicate your findings. I will definitely reuse the plots in this book for further work.

Thien Dong

8 reviews

March 9, 2019

A great primer for text mining

The examples are interesting and very easy to follow. If you have any problem applying the techniques to your data set, just a quick search would lead you to the solutions!

Robert Campbell

Author 9 books17 followers

June 20, 2019

Excellent coverage of taking a tidy approach to text analysis, with a generous number of worked examples. The one drawback is that much of the code used requires at least an intermediate-level working knowledge of R.

Pritesh Shrivastava

80 reviews6 followers

September 5, 2018

I found this book fairly useful to do sentiment analyis and topic modelling using the faimiliar tidyverse tools.

Luis Amigo

7 reviews

May 17, 2019

Nicely written, deep concepts, easy lecture. IMHO a must-read for anyone interested on text mining, no matter which language he plans to use.

Anthony

154 reviews

May 22, 2019

Good overview of the tidytext library in R. Note the end-all of text analyses, but a good place to begin. I now need to get something to do some analyses on...

coding

Oconnor

76 reviews2 followers

April 30, 2021

Awesome book - with great step by step code to follow. The author's clearly explain analytical questions and walk through their analysis. Its so good I read it twice.

Displaying 1 - 13 of 13 reviews