Learn what it takes to succeed in the the most in-demand tech job Harvard Business Review calls it the sexiest tech job of the 21st century. Data scientists are in demand, and this unique book shows you exactly what employers want and the skill set that separates the quality data scientist from other talented IT professionals. Data science involves extracting, creating, and processing data to turn it into business value. With over 15 years of big data, predictive modeling, and business analytics experience, author Vincent Granville is no stranger to data science. In this one-of-a-kind guide, he provides insight into the essential data science skills, such as statistics and visualization techniques, and covers everything from analytical recipes and data science tricks to common job interview questions, sample resumes, and source code.The applications are endless and automatically detecting spam and plagiarism, optimizing bid prices in keyword advertising, identifying new molecules to fight cancer, assessing the risk of meteorite impact. Complete with case studies, this book is a must, whether you're looking to become a data scientist or to hire one.Explains the finer points of data science, the required skills, and how to acquire them, including analytical recipes, standard rules, source code, and a dictionary of terms Shows what companies are looking for and how the growing importance of big data has increased the demand for data scientists Features job interview questions, sample resumes, salary surveys, and examples of job ads Case studies explore how data science is used on Wall Street, in botnet detection, for online advertising, and in many other business-critical situations Developing Analytic Becoming a Data Scientist is essential reading for those aspiring to this hot career choice and for employers seeking the best candidates.
There are some good nuggets of information here and in many cases way too much information. But then maybe that is the hazard of being a data scientist/author--you want to give everybody ALL the information so that they can then dig out the nuggets themselves. Unfortunately that does not make a good technical book. IMPO (in my professional opinion as a former editor for technical books) a good technical book is an author taking the reader by the hand and telling them what they need to know--not spewing gads of links, bulleted lists and drifting off topic in the middle of the chapter to explore a math problem. I could blame the editor or maybe this was just a cowboy author. Obviously the author is brilliant--but that doesn't make this a good or easy book to read.
The first three and last two chapters here are the most useful for introductions to developing analytic talent, and of course, reading this nearly 10 years after its publication certainly changes some of the usefulness of many of the now-dated pieces of advice.
I agree with many of the other reviewers here in their statements that the book is very disorganized and strays into incredibly technical discussions, diluting its ability to really convey much of anything.
The one thing that I did find interesting was that the author predicted that data engineering would be the hot buzzword in 2022. That does seem to have panned out.
Here are some of the weird things I mentally noted while reading this tome:
1) "Fake" Data Science: The author really, REALLY does not like people who do "fake" data science. As far as I can tell, "fake" data science might be title bloat, novices, or people who don't work with "big data" (10 million rows +), or perhaps a blend of all of these. This just felt like the author sneering at others rather than trying to elevate everyone in the field.
2) Cornstarch in Yogurt: While discussing a case study about inventory and stock analysis at grocery stores, the author goes on a weird tangent about how he could only find a single brand of whole-fat, unsweetened yogurt at his local grocery store. Did you know that the low-fat yogurt has more calories in it because they add cornstarch to it? That was something I learned from this book.
3) Missing Parts of Chapter 3: At the end of each chapter, the author closes with a summary. Chapter 3's includes the following statement: "Different types of data science career paths were also discussed, including entrepreneur, consultant, individual contributor, and leader". That's cool, right? Yeah, when I read that, I was like, "What?", and I had to go back and check because I didn't recall him ever talking about ICs or leaders as roles. Sure enough, he only talks about entrepreneurship or consulting. Weird. What happened to that content? (insert X-Files theme)
4) Disneyland is Too Busy: The author went to Disneyland and found it was way too busy. He has some ideas for how to improve things, including charging people money for taking too long to buy their tickets (???), putting a gondola/Ferris Wheel style ride at the entrance of the park, adding new eateries, showing peak days and pricing tickets accordingly. He also suggest creating rides that have continuous loading rather than stop-and-go.
He must think he's very smart and that his readers have never been to Disneyland. Well, too bad for the author, but I've been to Disneyland quite a bit and I think it's really strange that he says these things because: a) Disney has been showing peak days and pricing tickets accordingly for quite a while, definitely back to 2013. This information already existed in supplemental guidebooks as well. I know, because I was there AND I read those guidebooks. b) Disney already has a PATENTED continuous loading system called the Omnimover that's implemented in the Haunted Mansion. They invented it in order to reduce loading times and lines. Truly, Dr. Granville, you are a genius for suggesting such a novel idea. c) What about a gondola or Ferris Wheel on Main Street helps with anything except for disrupting the Main Street area? d) Who spends more than 5 minutes buying physical tickets in 2013 when most ticketing was moved online and why was the author so bothered by people buying their tickets at separate kiosks??
Anyways, after this, he gets into a lot of technicalities that I had to skim because nothing is explained for someone who wants to learn but doesn't yet have the technical capacities. I would not recommend anyone who is looking for an introductory text to start here.
This book doesn't even follow a reasonable structure, the content goes all over the place, it's full of shameless plugs for the author and his website. There's no real substance in this book. I did find a couple of interesting chapters, that's the only reason it's not getting a a one star rating from me.
This book turns out to be a fake - just as I would have imagined based on my reading of the author's blogs - according to Amazon reviews. Since there's no way to mark it "don't want to read", I'd just spread the words here. Go back to basic statistical learning theory..
A bit to cynical in parts for me. You need a good foundation in statistics, computer science, and operations research to be an effective data scientist. For someone who earned their PhD in statistics he sure doesn't seem to appreciate what the field has offered to data science.
Don't waste your time reading this book. I did and I regret it obviously. I stumbled up on this book while looking for books on "practical" data science. This book is mostly fluff. A large part of it is dedicated to hyping up the author himself. The parts where he's supposed to describe an algorithm, there's no algorithm. I read and reread sections where he describes his "patented" "hidden decision tree" algorithm. I don't know if its meant to be ironic, but the algorithm was "hidden" alright. Lots about what his algorithm does but no mention of the algorithm itself. He talks a lot about fraud detection , stock exchange predictions etc, you know buzzwords, but no real insight into any of these fields aside from quite superficial stuff anyone with a passing interest in these fields can come up with. The rest of it is the same. All you develop after reading this book is heightened mistrust for tall claims from self proclaimed gurus. So maybe it wasn't a total waste for me.
Over the past few years big data has become an increasingly important theme for those working in this field known as data scientists. This book highlights that there are many misconceptions about this topic, with confusion about what can be defined as big data, as well as just who might actually be called a data scientist and who is merely a statistician.
The author suggests that there are many people who assume that big data is only about performing statistical analysis on larger data sets, and he goes on to highlight that the role actually encompasses a much larger number of skills.
Amongst these is the ability to program, an appreciation of business processes, good communication skills, an intuitive understanding of the structure of the data being analysed as well as a solid grasp of statistical mathematics. The book provides a large amount of information about the methodology of the data analysis process, along with a number of examples to illustrate how the work might be undertaken.
It’s a seriously weighty volume, with a lot of technical material in the form of complex mathematics; it appears to be written as a text book suitable for advanced students who wish to move into this field of knowledge. Unfortunately, I found insufficient explanation to allow me to follow all of the material; and in addition, in many places, I was forced to re-read sections a number of times to try and follow the arguments proposed.
Although the book is structured in a way that might suggest it could be used as a reference work, I suspect that it might not be quite suitable for that purpose; I felt that most of the chapters would have to be read as a whole and perhaps even several chapters together in order to understand the points being made and then being able to make use of them.
The book clearly has a lot of important points to make; and for those that already work in this field, it would certainly provide some useful advice and guidance. Those with strong statistics or pure mathematics backgrounds might also find it of some value in guiding the way in which to proceed when starting to work in this area. However, I think it unlikely that it would to appeal to a larger set of people as it contains insufficient remedial information to help guide them through this complex topic.
At the beginning, you might be offended by the way the author defines "true" and "fake" data scientists. But when you feel ok with it, you'll find this book very useful. First, you know how to become a data scientist - skills that you need, community that you should join, and resources for you to learn. Second, there are some real examples/applications that the author provides you the perspective of an experience data scientist - how you analyze them, how you deal with such problems, and what should be avoided.
Overall, this book may not satisfy people who wants to read detailed technical things, but it can give you a lot of concepts and ideas. It's good for people who is new to data scientist and wants to have the general view of the industry.
I'm a senior marketer trying to find out possibilities and limitations to big data in my field. This book requires some developing and statistical experience but if you got it, it gives a valuable view of how to approach and work big data today.
Very useful and inspiring stuff, and very much real world business centric that I recognize from my daily work. I'm recommending this book to all my friends who are analysts and statisticans that are moving towards big data. It's not a complete guide, but a valuable standpoint to form your own path.