Data Scientists at Work is a collection of interviews with sixteen of the world's most influential and innovative data scientists from across the spectrum of this hot new profession. "Data scientist is the sexiest job in the 21st century," according to the Harvard Business Review. By 2018, the United States will experience a shortage of 190,000 skilled data scientists, according to a McKinsey report. Through incisive in-depth interviews, this book mines the what, how, and why of the practice of data science from the stories, ideas, shop talk, and forecasts of its preeminent practitioners across diverse social network (Yann LeCun, Facebook); professional network (Daniel Tunkelang, LinkedIn); venture capital (Roger Ehrenberg, IA Ventures); enterprise cloud computing and neuroscience (Eric Jonas, formerly Salesforce.com); newspaper and media (Chris Wiggins, The New York Times); streaming television (Caitlin Smallwood, Netflix); music forecast (Victor Hu, Next Big Sound); strategic intelligence (Amy Heineike, Quid); oceanographic big data (André Karpištšenko , Planet OS); geospatial marketing intelligence (Jonathan Lenaghan, PlaceIQ); advertising (Claudia Perlich, Dstillery); fashion e-commerce (Anna Smith, Rent the Runway); specialty retail (Erin Shellman, Nordstrom); email marketing (John Foreman, MailChimp); predictive sales intelligence (Kira Radinsky, SalesPredict); and humanitarian nonprofit (Jake Porway, DataKind). Each of these data scientists shares how he or she tailors the torrent-taming techniques of big data, data visualization, search, and statistics to specific jobs by dint of ingenuity, imagination, patience, and passion. Data Scientists at Work parts the curtain on the interviewees' earliest data projects, how they became data scientists, their discoveries and surprises in working with data, their thoughts on the past, present, and future of the profession, their experiences of team collaboration within their organizations, and the insights they have gained as they get their hands dirty refining mountains of raw data into objects of commercial, scientific, and educational value for their organizations and clients.
Readers will
Who this book is for The primary readership for this book is general-interest readers interested in this hot new profession and in the nature of the people who work up the readers' own data trails. The secondary readerships are (a) scientists, mathematicians, and students in feeder disciplines who are interested in scouting the vocational prospects and daily working conditions of data scientists with a view to becoming data scientists themselves, and (b) of business colleagues and managers seeking to understand and collaborate with data scientists to integrate their data management and interpretation capabilities into the competitive intelligence capabilities of the enterprise.
Table of ContentsChapter 1. Chris Wiggins (The New York Times) Chapter 2. Caitlin Smallwood (Netflix) Chapter 3. Yann LeCun (Facebook) Chapter 4. Erin Shellman (Nordstrom) Chapter 5. Daniel Tunkelang (LinkedIn) Chapter 6. John Foreman (MailChimp) Chapter 7. Roger Ehrenberg (IA Ventures) Chapter 8. Claudia Perlich (Dstillery) Chapter 9. Jonathan Lenaghan (PlaceIQ) Chapter 10. Anna Smith (Rent The Runway) Chapter 11. Andre Karpistsenko (Planet OS) Chapter 12. Amy Heineike (Quid) Chapter 13. Victor Hu (Next Big Sound) Chapter 14. Kira Radinsky (SalesPredict) Chapter 15. Eric Jonas (Independent Scientist) Chapter 16. Jake Porway (DataKind)
This is third book in this series that I’ve read (the others being Coders at Work and Founders at Work), and this is definitely the least glamorous because it doesn’t contain any celebrity coders admitting to debugging with println or any tales of derring do like stealing Aeron chairs from Oracle in the middle of the night.
However, if you are you interested in working with "data science" (or “statistics” as they call it on the East Coast) you’ll get a lot out of this book.
The book drags a bit in places and gets a bit samey as you might expect when interviewing over a dozen people about their job when they all do basically the same job.
But I recommend sticking with the whole thing. In particular the last two interviews are quite good, though Mr. Jonas’s blood sugar may be decreasing as his interview goes on. (Or maybe that’s his sense of humor — check out his Twitter feed.) I especially like Mr. Porway’s idea of baking pro bono work into the profession of data science as it develops. Just skim over the parts that you feel are repetitive.
A lot of these people have a heavy academic background, which I found interesting. It gives them a good perspective on the work they do at start ups versus their time in academia. There is probably a larger point to be made about going from modeling subatomic particles to trying to get people to click on mobile ads, but if there is one this book sure isn’t trying to make it.
One thing I was surprised at was the Hadoop-hate by some of the data scientists. Some rightly point out that maybe it's not necessary to build out a massive cluster for a few TB of data. Others seemed to think it was some kind of fad. I was also surprised that Spark didn't come up more. (Can't remember when the interviews were done, but I bought the book right as it came out.)
Actually, I would have liked to hear more about the data infrastructure, but the thought of a book like DevOps At Work or Hadoop Admins At Work is too terrible to contemplate.
A few things about the edition that I read:
1. It has this weird pink color that my kids made fun of me about. (They are kind of mean.) 2. The ink is really glossy and reflected light in a weird way that I haven’t seen before. Sometimes I had to change the angle of the book to read it. 3. The font is this weird sans serif font that sometimes my eyes had trouble tracking.
So if I had to do it over again I think I’d read this one on the Kindle.
Here are the companies/data scientists interviewed in the book. I hadn’t heard of any of them before, though now I’m following some of them on Twitter:
Chris Wiggins, The New York Times Caitlin Smallwood, Netflix Yann LeCun, Facebook Erin Shellman, Nordstrom Daniel Tunkelang, LinkedIn John Foreman, MailChimp Roger Ehrenberg, IA Ventures Claudia Perlich, Dstillery Jonathan Lenaghan, PlaceIQ André Karpištšenko, Planet OS Amy Heineike, Quid Victor Hu, Next Big Sound Kira Radinsky, SalesPredict Eric Jonas, UC Berkeley Jake Porway, DataKind (non-profit)
FWIW, here are the notes I took while reading the book. It’s a list of the tools, articles, books, and websites mentioned by the data scientists over the course of their interviews.
Tools - SN, turned into Lush (lisp shell?) - Torch 7, used at FB and Google (scientific computing framework) - R - Recommendo (Nordstrom) - Python: Pandas, scikit-learn - Segmento (follow up to Recommendo) - plyr and dplyr (break up data) - Gephi (displaying graph data) - Greenplum (MPP database) - Vowpal Wabbit (machine learning) - Weka (machine learning) - Tableau, Matplotlib, D3.js, ggplot2 (viz)
Articles - Taxonomy of data science - Data science Venn diagram - The model and the train wreck - Unreasonable effectiveness of data - The Log: What Every Software Engineer Should Know, Jay Kreps - Lines not dots - Hidden Biases of Big Data, Kate Crawford - The Done Manifesto - Lecture by Steven Boyd on 19thC math and science not working - Review articles like Nature Reviews Genetics or Nature Reviews Neuroscience - Blake Masters notes of Peter Thiel's start up class
Books - Predictable irrationality, Dan Ariely - Data Smart, John Foreman - Applied Predictive Modeling, Kuhn - Team Geek, Brian Fitzpatrick - Big Data: A Revolution That Will Transform How We Live, Kenneth Cukier - Moneyball, Michael Lewis - The Up Side Of Down, Megan McArdle - Probability Theory: The Logic Of Science, E.T. Jaynes - The Human Face Of Big Data, Rick Smolam and Jennifer Erwitt - The Macroscope, Joel de Rosnay
Web sites - Cross Validated - DataKind, help NGOs - Kaggle - quid.com/insights - Pete Warden SDK for mobile apps using deep learning - Vicarious (company) - In The Pipeline, Derek Lowe, blog
Great insight into how data scientists from different background and industries view data science. The sixteen interviews show that each firm, depending on their sector and the company's maturity, have slightly different views of what data science means. Data science broadly covers analytics/modelling, visualisation, and data engineering, and emphasis on each element depends on the type of output the data science team produces. There is a common theme though - each team or data scientist's goal is to translate a business problem that can be solved with data and algorithms, and therefore requires not only technical but also communications and interpersonal skills to dig deep into the issue at hand and also to report back the findings in a common language to the stakeholders.
The book comprises of long lists of questions and answers borne out of the interviews. There are recurrent themes across the book (for example essential skills when looking for new hires), and although sometimes I felt the book could do with an overall summary, the repeated message from reading across several interviews help drive home the point of where the commonalities lie, versus which school of thoughts are perhaps more esoteric, or at least emergent.
A really good read for any companies thinking about starting a data science team but don't know what data science can bring; great for those interested in a data scientist career to get a feel of what the work involves and how to get themselves up-skilled; great for any practicing data scientists to understand what others are doing across different industries and how we can learn from each other.
much like _Coders at Work_, this is almost guaranteed to have *something* worth your time if you're even vaguely interested in the topic. there are a few can't-miss interviews, along with some that i skimmed in parts.
for my money:
obvious highlights: LeCun, Wiggins
unexpectedly interesting: Smallwood, Jonas
worth a read: tunkelang, foreman, porway, perlich
in particular, the parts i found most enjoyable were those where common elements popped up in unexpected places -- eg Caitlin Smallwood pointing out that Netflix has a group of people hand-annotating data, much like Google does with Maps.
An extended exploration of how some of the most prominent figures in the field see and use Data Science. Opens the mind to different uses and interpretations of this discipline, while providing continuous hints to best practices, most common tools/techniques and possible pitfalls, for a wide range of scenarios. Clearly shows the dominant importance of Data and the pressing need for humans to better use and understand it.
Mostly interesting interviewees, often with interesting perspectives. This book consists of a number of interview transcripts, exploring what data scientists 'do'. It's interesting for an outsider to the profession, but I imagine it might also be interesting to insiders for all kinds of inside-baseball kind of reasons that I was unable to pick up on. Gutierrez isn't afraid of follow-up questions, and he seems to like depth of explanation: this is a good combination for such a book. This kind of book probably dates quite easily, but I have a feeling there are things in here that will last.
This book has incredible insights for aspiring data scientists. Having to come from programming world myself, it is delightful to know that getting into data science field will be a bit easy with institutional knowledge and experience i have with me. As it can be applied to unlimited field, service field, and airline in general, as where i work now, is a potential goldmine for data and it could be a high-impact area to experiment with.
Many of us may not have the classroom trainings, nor the real life working experience in a data science field. But if you want to have a feel of what real data scientists deal with, this is really a great book.
Data Science (DS), Machine Learning (ML) are very popular nowadays. It was an interesting reading experience. It's always nice to hear what DS/ML experts think about all this. Also, the reader can discover the common tools for DS/ML tasks.
Well written, conversational style. Easy to pick up and read one or two chapters at a sitting. I enjoyed all the interviews and found about half useful/relevant to my work.
Revealing questions asked of 16 data scientists. Facebook, LinkedIn, Nordstrom ... Really a sign of the times. I would read another book of this type by Gutierrez, I think he did a terrific job.