Overcome performance difficulties in R with a range of exciting techniques and solutions This book is for programmers and developers who want to improve the performance of their R programs by making them run faster with large data sets or who are trying to solve a pesky performance problem. With the increasing use of information in all areas of business and science, R provides an easy and powerful way to analyze and process the vast amounts of data involved. It is one of the most popular tools today for faster data exploration, statistical analysis, and statistical modeling and can generate useful insights and discoveries from large amounts of data. Through this practical and varied guide, you will become equipped to solve a range of performance problems in R programming. You will learn how to profile and benchmark R programs, identify bottlenecks, assess and identify performance limitations from the CPU, identify memory or disk input/output constraints, and optimize the computational speed of your R programs using great tricks, such as vectorizing computations. You will then move on to more advanced techniques, such as compiling code and tapping into the computing power of GPUs, optimizing memory consumption, and handling larger-than-memory data sets using disk-based memory and chunking.
R High Performance Programming is an excellent book to help R programmers to take advantage of all the features of the language and solve practical performance problems or bottlenecks that they might encounter processing large amounts of data.
Every chapter explains a topic with simple but meaningful examples, showing which tools to use and how to pick the right packages from CRAN. The reader is also given the choice to only read certain chapters if he does not want to delve into the more advanced topics.
The book starts by introducing the reader to R features, the language internals and its memory model. The authors then explain how to correctly measure code performance and the basic tricks that can be adopted when coding an R program to ensure code runs fast and does not waste computing resources.
After this introductory part, the book deals with a set of advanced techniques to get your programs running even faster, like developing code compiled for R, accelerating the computation by taking advantage of the GPU, optimizing the memory footprint and processing large datasets with limited resources.
For scenarios where the previous techniques are not sufficient, the last chapters deal with parallelization or R programs, offloading the processing to a database and dealing with Big Data with an example running on AWS.
In conclusion, I really enjoyed reading this book because it is written in a very simple and understandable manner also for less experienced programmers and even complex subjects are explained in a simple way.
Though R is has become extremely popular with data analysts, performance remains a key challenge for R users. This book addresses a a wide range of techniques to overcome this challenge, including parallelism, memory management and running processes on the GPU. A very useful book for practitioners.