Goodreads helps you keep track of books you want to read.
Start by marking “Data Science from Scratch: First Principles with Python” as Want to Read:
Data Science from Scratch: First Principles with Python
Enlarge cover
Rate this book
Clear rating
Open Preview

Data Science from Scratch: First Principles with Python

3.93  ·  Rating details ·  839 ratings  ·  62 reviews

Data science libraries, frameworks, modules, and toolkits are great for doing data science, but they’re also a good way to dive into the discipline without actually understanding data science. In this book, you’ll learn how many of the most fundamental data science tools and algorithms work by implementing them from scratch.

If you have an aptitude for mathematics and

Kindle Edition, 330 pages
Published April 14th 2015 by O'Reilly Media
More Details... Edit Details

Friend Reviews

To see what your friends thought of this book, please sign up.

Reader Q&A

To ask other readers questions about Data Science from Scratch, please sign up.
Popular Answered Questions
Simon Accascina Time is now ripe for using python3 in data science since is is supported by major libraries such as numpy, scipy, pandas, scikit-learn, tensorflow, ma…moreTime is now ripe for using python3 in data science since is is supported by major libraries such as numpy, scipy, pandas, scikit-learn, tensorflow, matplotlib [0].

Unfortunately, this book is based on python 2.7 [1]; that notwithstanding, I *would* recommend this book since it is very well written!

The approach of the author is not to explain how to merely apply the pre-made data science tools (i.e. the aforementioned libraries), but rather he *teaches the actual algorithms* behind the principal ideas in statistical analysis and machine learning by coding them in pure python, which is great!

Now, be aware that in real applications of data science you will most probably not follow this route (that is, coding your algorithms directly in python) due to the slowness of python. The preferred approach is to use python to call highly optimized libraries such as numpy that do the actual computation for you.

For this reason, differences between python 2 and 3 are minimal among the features of the language used in this book.

What I am trying to say is that you can most likely follow the book as if it were a python3 book, and the only issue will be having to change "print x" into "print(x)" as needed.

1: as stated in chapter 2, paragraph 1 "getting python"(less)

Community Reviews

Showing 1-30
Average rating 3.93  · 
Rating details
 ·  839 ratings  ·  62 reviews

More filters
Sort order
Start your review of Data Science from Scratch: First Principles with Python
Deane Barker
Jan 19, 2016 rated it it was ok
I'm still struggling to find the book I want around data science. I've learned that there are two levels:

1. KNOWING data science
2. DOING data science

This book is about the second one. Make no mistake, this is a "statistical computation" manual. This shows you how to find statistical answers using Python. Fully half this book is code samples. If you do not plan to actually attempt to find statistical answers to known questions by writing Python code, then this isn't the book for you.

I would look
Oct 07, 2016 rated it it was amazing
I worked thru all of the examples in this book. Rather than have you import numpy and pandas and scikit-learn, he walks you through how to build up these tools yourself. What you build will be terribly inefficient and you should never use them in real life, but you will get a great feel for how they work under the hood.

(I also learned that my linear algebra is very rusty and I need a brush up ...)

I disagree with some of the reviews that they he doesn't do a good job explaining the computation
Apr 30, 2015 rated it really liked it
Decent book on introduction to data science using Python.

BTW, we should seriously stop writing books on elementary data science using R or Python. We have too many and they already started to look alike.

Jun 03, 2018 rated it it was ok
Not terribly impressed with this one. The way I see it, readers of this book either will already know how to do data science, or they won't. If they do (and here I'm ignoring the fact that why would they, since the title of the book is "data science from scratch"), then they will find the explanations of concepts too basic, and the Python code implementation examples mostly useless (they, after all, are not using the libraries specifically designed to do data science, but rather implementing a n ...more
Nicholas Teague
Jun 02, 2017 rated it really liked it
to be read for purposes of demonstrating fundamentals. most of work here can be accomplished much simpler with advanced libraries, but this type of text helps one to understand the why and the building blocks of more elaborate practice.
Minh Son Nguyen
As the title implies, this book will show you how to implement basic linear algebra, statistics, probability methods and ML models in pure Python.

+ The book covers all necessary basic topics for you to getting started with data science and also shows us where to dig in deeper in those topics.
+ Python with type hinting is a big plus. Some people may hate it but I think it's a good feature. In real life, it may depend on your team.

- Not enough mathematics explanations.
- This is too "scratchy". I w
Mohammed Ashour
Oct 01, 2018 rated it liked it
Good at:
- Practicing entry projects (exercises)
- simple language

Bad at:
- lack of some required details in some sections
- outdated code
- the apps -codes- are not that useful in some sections

Overall the book is a good refreshing read. but not that good for studying
Sungjoo Ha
May 02, 2016 rated it really liked it
Good introductory book on data science. I would recommend this to people who wish to learn basic things in a hands-on fashion.
Kevin Moore
Mar 28, 2018 rated it really liked it
Really good overview, but needed a little more information about which software packages implement the functionality discussed.
M. Cetin
Dec 09, 2018 rated it really liked it
A brief introduction to many concepts and step-by-step construction of a working code. I would expect a little more math and theory that's why I gave four stars instead of five. ...more
Tony Poerio
May 08, 2016 rated it liked it
Great book for a general overview of the concepts, and understanding what 'data science' actually means. Lots of code to drive to the points home, and it taught me quite a few Python tricks.

I can foresee using this as a reference for the main concepts, or when looking for a straightforward implementation of the algorithms discussed. The information is very solid.

If you want to power straight through, it's a tough read at times--but Joel's a very good writer, and I enjoyed the dry humor intersp
Roger Mitchell
Nov 30, 2016 rated it it was amazing
Fundamental concepts revealed, libraries for the win

Joel does a great job walking through the tasks a data scientist would take to solve hypothetical problems, and explaining the models most popularly implemented in machine learning. An overwhelming majority of the code examples are useless, which is intentional as Joel notes how to build things from scratch. Libraries (like pandas, scikit-learn, etc) provide APIs to accomplish many of these tasks without writing from scratch, but without the un
Aug 11, 2015 rated it really liked it
This was a fun survey of popular topics in contemporary data science. It was well written for a text book, and easy to read. I suppose it was light on formal proofs, but it made up for that by having you build toy models of all the major ideas. Well worth the read for me, as I am very new to data science but well versed in Python and math. I would like to see a follow-up book that covers the same topics, but using the real libraries people use in industry to solve these same problems.
Jan 29, 2017 rated it really liked it
more entertaining that an entry level programming language text would usually be, and not at the expense of content. well, maybe somewhat at the expense of content because some of the examples are a little too simple to give a real feel for what the methods are useful for. but overall lots of fun and very good information. i did find it a little frustrating, especially early on, that no equations were included and reading python was necessary to understand the fundamentals.
Vlad Gheorghe
Dec 16, 2020 rated it liked it
This is not a stand-alone book. It won't show you how to solve real problems with code, as it doesn't use any actual library. And it won't teach you the theory and mathematics of data science (the explanations of the models are very synthetic, often frustratingly so).

Rather, the book lives in a limbo between theory and practice. The old saying goes, you haven't understood something until you can explain it to your grandmother. Then again, whoever came up with that saying probably never tried ex
Jun 23, 2019 rated it liked it
Quick read. And a great intro that brushes over the area of Data Science. Even though it does not convey much knowledge that could be used by a practicing Data Scientist. The "from Scratch" part of the title refers to the book's focus on the implementation of the popular algorithms using Python. From scratch. Which is... rather pointless. Anyone serious about Data Science would use pre-packaged, efficient libraries to train their models instead. The author does send the reader to external source ...more
Sep 19, 2017 rated it really liked it
Practical book which covers what's essential for data analysts getting into statistical analysis, machine learning and related topics. Good book for those starting out, but didn't have much to offer on the statistical learning side, principles and concepts wise. You're better off looking at books such as IPSUR (Jay G Kearns) and ISLR (Hastie & Tibshirani) for such content. However, this is a practical book because it introduces many relevant ideas. Some qualms: MapReduce treatment is probably ou ...more
Oct 27, 2017 rated it it was ok  ·  review of another edition
Aside form the author's enthusiasm and breadth of knowledge I did not get much out of this book. For me there are not enough details on the statistical concepts and too much detail in the 'from scratch' code samples. The code samples are also never to be used again, as the author admits at the end, because there are many python packages that do an infinitely more efficient and scalable job of analysing data. The modelling concepts are not differentiated clearly enough so it's not understood why ...more
Thomas Pinder
May 13, 2018 rated it really liked it  ·  review of another edition
I read this prior to beginning an MSc in Data Science and found it to be a great introduction to data science, starting out with the very basics before moving into more general ML techniques and finishing up with some of the more complex topics such as MapReduce. Not an in-depth textbook by any means, but I do not think that is the purpose of this book, moreover to give the reader a well-rounded idea of the field.
Ulises Jimenez
Nov 28, 2018 rated it really liked it
It is a wonderful book to understand the detail of some machine learning methods implementation. It is also a good practice to use Python basic. As it is suggested, everything function is constructed from scratch. I really enjoyed the book, however I would not recommend it to learn ML and go directly to developing ML applications.
I rate it 4 because , some examples shown in the book do not provide data to test them
Apr 13, 2020 rated it it was ok
Shelves: never-finished
I have started this the second time now.

I really like the basic idea of doing things "from scratch", to get a better understanding, but I realize that it really requires you to run through pretty much every code example to follow intelligibly. Add to this that it starts fairly basic, I feel it is taking just a bit too much of my time to seem worth it. Considering dropping it. We'll see. Probably great for someone very new to python though.
Andrew Violette
Jun 30, 2020 rated it liked it
I've been reading this book off and on for over a year and have enjoyed it. As the book's title implies you're basically building up a library of tools for data science from scratch in python. As data science largely builds on statistics and linear algebra, the first part of the book mainly builds on those concepts. My only complaint is that you build up a library of tools, but little time is spent on how to use them. ...more
May 04, 2017 rated it liked it
The book covers a vast topic required to get started with data science stream. It introduces theory, frameworks and library. As a result none of the topics is hands on with example problem solving. Though the book working code example for all the concepts. To get a decent grip in data science the problem solving is very crucial.
Leandro Braga
Mar 10, 2018 rated it it was ok
This book is nice to improve the understanding of some details underlying the data science algorithms, but it falls short in the deepness of the content. Some concepts feels rushed and incomplete; the explanation sometimes isn't clear.

Even though the book is shallow, I would recommend it; here and there you can get a valuable piece of information from it.
Aaron Lee
Oct 31, 2020 rated it really liked it
A helpful book that works best as a refresher (for all the pesky concepts you’ve forgotten) or a gentle dive (for the topics where pages of dense symbols and formulas make your brain hurt). The choice of coding everything from scratch even though in practice you will use established libraries with one-line functions works well in building your understanding
Abhishek Anand
Jan 20, 2021 rated it liked it
Its a good summary of some of the common machine learning algorithms. I picked it up to use it as a refresher for the common algorithms. However if you have done any online ML courses or read some other books on this, then i am not sure if there is much new to offer. If you are completely new then this might not be a bad start.
Perry PhD
Jun 06, 2021 rated it it was amazing
Joel Grus has really done an amazing job here, helping to build a useful bridge between data science concepts and the ways that python can greatly facilitate insights. The prose is solid; easy to follow, and with practical examples at each step of the way. A great read for the data science enthusiast!
Perry Beaumont
Matt Heavner
Mar 19, 2017 rated it really liked it
Data Science from Scratch is a good Data Science overview. It covers the breadth of the "field" targeting (aspiring) practitioners (for example, I couldn't find a "definition" of data science beyond the "it's a Venn diagram thing - data, math, hacking"). For practitioners, the "from scratch" approach is very useful. Some topics will be o quick skim, others are a close analysis of the code (python) to understand specific implementation of "cartoon" examples. The from-scratch approach builds up th ...more
Dušan Maďar
Jul 22, 2018 rated it really liked it
Shelves: in-english
I was reading this one on and off for quite some time. It explains numerous data science algorithms and methods from scratch, using Python, as the title (and many reviews) claim. So basically it explains what the 3rd party libraries do under the hood.
May 20, 2019 rated it it was amazing
Real good introduction to DS

Detailed explanations for common ML models
If you are looking for sklearn, pytorch, pandas this is not your book
Books is in python 2.7 but python 3 is available in github
« previous 1 next »
topics  posts  views  last activity   
Right Book to Learn Data Science? 1 3 Aug 19, 2019 11:58AM  
Goodreads Librari...: Spelling errors in book title 3 9 Jan 09, 2017 06:57AM  

Readers also enjoyed

  • Python for Data Analysis
  • Python Data Science Handbook: Tools and Techniques for Developers
  • Hands-On Machine Learning with Scikit-Learn and TensorFlow
  • Cracking the Coding Interview: 150 Programming Questions and Solutions
  • Fluent Python: Clear, Concise, and Effective Programming
  • Introduction to Machine Learning with Python: A Guide for Data Scientists
  • Designing Data-Intensive Applications
  • Practical Statistics for Data Scientists: 50 Essential Concepts
  • Deep Learning with Python
  • Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow: Concepts, Tools, and Techniques to Build Intelligent Systems
  • Spark: The Definitive Guide: Big Data Processing Made Simple
  • Learning Python
  • Python Machine Learning
  • Storytelling with Data: A Data Visualization Guide for Business Professionals
  • Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy
  • Deep Learning
  • Data Science for Business: What you need to know about data mining and data-analytic thinking
  • R for Data Science: Import, Tidy, Transform, Visualize, and Model Data
See similar books…

Goodreads is hiring!

If you like books and love to build cool products, we may be looking for you.
Learn more »

News & Interviews

Need another excuse to treat yourself to a new book this week? We've got you covered with the buzziest new releases of the day. To create our...
18 likes · 2 comments
“This means that, where appropriate, we will dive into mathematical equations, mathematical intuition, mathematical axioms, and cartoon versions of big mathematical ideas.” 0 likes
“Just run: pip install ipython and then search the Internet for solutions to whatever cryptic error messages that causes.” 0 likes
More quotes…