Goodreads helps you keep track of books you want to read.
Start by marking “Data Science from Scratch: First Principles with Python” as Want to Read:
Data Science from Scratch: First Principles with Python
Enlarge cover
Rate this book
Clear rating
Open Preview

Data Science from Scratch: First Principles with Python

3.93  ·  Rating details ·  707 ratings  ·  55 reviews

Data science libraries, frameworks, modules, and toolkits are great for doing data science, but they’re also a good way to dive into the discipline without actually understanding data science. In this book, you’ll learn how many of the most fundamental data science tools and algorithms work by implementing them from scratch.

If you have an aptitude for mathematics and

Kindle Edition, 330 pages
Published April 14th 2015 by O'Reilly Media
More Details... Edit Details

Friend Reviews

To see what your friends thought of this book, please sign up.

Reader Q&A

To ask other readers questions about Data Science from Scratch, please sign up.
Popular Answered Questions
Simon Accascina Time is now ripe for using python3 in data science since is is supported by major libraries such as numpy, scipy, pandas, scikit-learn, tensorflow, ma…moreTime is now ripe for using python3 in data science since is is supported by major libraries such as numpy, scipy, pandas, scikit-learn, tensorflow, matplotlib [0].

Unfortunately, this book is based on python 2.7 [1]; that notwithstanding, I *would* recommend this book since it is very well written!

The approach of the author is not to explain how to merely apply the pre-made data science tools (i.e. the aforementioned libraries), but rather he *teaches the actual algorithms* behind the principal ideas in statistical analysis and machine learning by coding them in pure python, which is great!

Now, be aware that in real applications of data science you will most probably not follow this route (that is, coding your algorithms directly in python) due to the slowness of python. The preferred approach is to use python to call highly optimized libraries such as numpy that do the actual computation for you.

For this reason, differences between python 2 and 3 are minimal among the features of the language used in this book.

What I am trying to say is that you can most likely follow the book as if it were a python3 book, and the only issue will be having to change "print x" into "print(x)" as needed.

1: as stated in chapter 2, paragraph 1 "getting python"(less)

Community Reviews

Showing 1-30
Average rating 3.93  · 
Rating details
 ·  707 ratings  ·  55 reviews

More filters
Sort order
Start your review of Data Science from Scratch: First Principles with Python
Deane Barker
Jan 19, 2016 rated it it was ok
I'm still struggling to find the book I want around data science. I've learned that there are two levels:

1. KNOWING data science
2. DOING data science

This book is about the second one. Make no mistake, this is a "statistical computation" manual. This shows you how to find statistical answers using Python. Fully half this book is code samples. If you do not plan to actually attempt to find statistical answers to known questions by writing Python code, then this isn't the book for you.

I would look
Oct 07, 2016 rated it it was amazing
I worked thru all of the examples in this book. Rather than have you import numpy and pandas and scikit-learn, he walks you through how to build up these tools yourself. What you build will be terribly inefficient and you should never use them in real life, but you will get a great feel for how they work under the hood.

(I also learned that my linear algebra is very rusty and I need a brush up ...)

I disagree with some of the reviews that they he doesn't do a good job explaining the computation
Olivera Stojanovic
Jun 28, 2016 rated it did not like it
The idea of the book is nice, I still think is a useful book, but:
1. you'll not learn math behind this or the methods will be explained (it's good for a programming, though)
2. regarding programming part, I think that people would benefit more if there were some actual exercises for them to do, not just "type in this code" attitude
3. would be nice if all of the data sets are actually generated in a book, not just "there is some data set with 2000 points, that I just pulled out of my ass"
4. more u
Apr 30, 2015 rated it really liked it
Decent book on introduction to data science using Python.

BTW, we should seriously stop writing books on elementary data science using R or Python. We have too many and they already started to look alike.

Nicholas Teague
Jun 02, 2017 rated it really liked it
to be read for purposes of demonstrating fundamentals. most of work here can be accomplished much simpler with advanced libraries, but this type of text helps one to understand the why and the building blocks of more elaborate practice.
Minh Son Nguyen
As the title implies, this book will show you how to implement basic linear algebra, statistics, probability methods and ML models in pure Python.

+ The book covers all necessary basic topics for you to getting started with data science and also shows us where to dig in deeper in those topics.
+ Python with type hinting is a big plus. Some people may hate it but I think it's a good feature. In real life, it may depend on your team.

- Not enough mathematics explanations.
- This is too "scratchy". I w
Mohammed Ashour
Oct 01, 2018 rated it liked it
Good at:
- Practicing entry projects (exercises)
- simple language

Bad at:
- lack of some required details in some sections
- outdated code
- the apps -codes- are not that useful in some sections

Overall the book is a good refreshing read. but not that good for studying
Jun 03, 2018 rated it it was ok
Not terribly impressed with this one. The way I see it, readers of this book either will already know how to do data science, or they won't. If they do (and here I'm ignoring the fact that why would they, since the title of the book is "data science from scratch"), then they will find the explanations of concepts too basic, and the Python code implementation examples mostly useless (they, after all, are not using the libraries specifically designed to do data science, but rather implementing a n ...more
Sungjoo Ha
May 02, 2016 rated it really liked it
Good introductory book on data science. I would recommend this to people who wish to learn basic things in a hands-on fashion.
Kevin Moore
Mar 28, 2018 rated it really liked it
Really good overview, but needed a little more information about which software packages implement the functionality discussed.
M. Cetin
Dec 09, 2018 rated it really liked it
A brief introduction to many concepts and step-by-step construction of a working code. I would expect a little more math and theory that's why I gave four stars instead of five.
Tony Poerio
May 08, 2016 rated it liked it
Great book for a general overview of the concepts, and understanding what 'data science' actually means. Lots of code to drive to the points home, and it taught me quite a few Python tricks.

I can foresee using this as a reference for the main concepts, or when looking for a straightforward implementation of the algorithms discussed. The information is very solid.

If you want to power straight through, it's a tough read at times--but Joel's a very good writer, and I enjoyed the dry humor intersp
Roger Mitchell
Nov 30, 2016 rated it it was amazing
Fundamental concepts revealed, libraries for the win

Joel does a great job walking through the tasks a data scientist would take to solve hypothetical problems, and explaining the models most popularly implemented in machine learning. An overwhelming majority of the code examples are useless, which is intentional as Joel notes how to build things from scratch. Libraries (like pandas, scikit-learn, etc) provide APIs to accomplish many of these tasks without writing from scratch, but without the un
Aug 11, 2015 rated it really liked it
This was a fun survey of popular topics in contemporary data science. It was well written for a text book, and easy to read. I suppose it was light on formal proofs, but it made up for that by having you build toy models of all the major ideas. Well worth the read for me, as I am very new to data science but well versed in Python and math. I would like to see a follow-up book that covers the same topics, but using the real libraries people use in industry to solve these same problems.
Jan 29, 2017 rated it really liked it
more entertaining that an entry level programming language text would usually be, and not at the expense of content. well, maybe somewhat at the expense of content because some of the examples are a little too simple to give a real feel for what the methods are useful for. but overall lots of fun and very good information. i did find it a little frustrating, especially early on, that no equations were included and reading python was necessary to understand the fundamentals.
Jun 23, 2019 rated it liked it
Quick read. And a great intro that brushes over the area of Data Science. Even though it does not convey much knowledge that could be used by a practicing Data Scientist. The "from Scratch" part of the title refers to the book's focus on the implementation of the popular algorithms using Python. From scratch. Which is... rather pointless. Anyone serious about Data Science would use pre-packaged, efficient libraries to train their models instead. The author does send the reader to external source ...more
Sep 19, 2017 rated it really liked it
Practical book which covers what's essential for data analysts getting into statistical analysis, machine learning and related topics. Good book for those starting out, but didn't have much to offer on the statistical learning side, principles and concepts wise. You're better off looking at books such as IPSUR (Jay G Kearns) and ISLR (Hastie & Tibshirani) for such content. However, this is a practical book because it introduces many relevant ideas. Some qualms: MapReduce treatment is probably ou ...more
Oct 27, 2017 rated it it was ok  ·  review of another edition
Aside form the author's enthusiasm and breadth of knowledge I did not get much out of this book. For me there are not enough details on the statistical concepts and too much detail in the 'from scratch' code samples. The code samples are also never to be used again, as the author admits at the end, because there are many python packages that do an infinitely more efficient and scalable job of analysing data. The modelling concepts are not differentiated clearly enough so it's not understood why ...more
Thomas Pinder
May 13, 2018 rated it really liked it  ·  review of another edition
I read this prior to beginning an MSc in Data Science and found it to be a great introduction to data science, starting out with the very basics before moving into more general ML techniques and finishing up with some of the more complex topics such as MapReduce. Not an in-depth textbook by any means, but I do not think that is the purpose of this book, moreover to give the reader a well-rounded idea of the field.
Ulises Jimenez
Nov 28, 2018 rated it really liked it
It is a wonderful book to understand the detail of some machine learning methods implementation. It is also a good practice to use Python basic. As it is suggested, everything function is constructed from scratch. I really enjoyed the book, however I would not recommend it to learn ML and go directly to developing ML applications.
I rate it 4 because , some examples shown in the book do not provide data to test them
Samuel Lampa
Apr 13, 2020 rated it it was ok
Shelves: never-finished
I have started this the second time now.

I really like the basic idea of doing things "from scratch", to get a better understanding, but I realize that it really requires you to run through pretty much every code example to follow intelligibly. Add to this that it starts fairly basic, I feel it is taking just a bit too much of my time to seem worth it. Considering dropping it. We'll see. Probably great for someone very new to python though.
Andrew Violette
Jun 30, 2020 rated it liked it
I've been reading this book off and on for over a year and have enjoyed it. As the book's title implies you're basically building up a library of tools for data science from scratch in python. As data science largely builds on statistics and linear algebra, the first part of the book mainly builds on those concepts. My only complaint is that you build up a library of tools, but little time is spent on how to use them.
May 04, 2017 rated it liked it
The book covers a vast topic required to get started with data science stream. It introduces theory, frameworks and library. As a result none of the topics is hands on with example problem solving. Though the book working code example for all the concepts. To get a decent grip in data science the problem solving is very crucial.
Leandro Braga
Mar 10, 2018 rated it it was ok
This book is nice to improve the understanding of some details underlying the data science algorithms, but it falls short in the deepness of the content. Some concepts feels rushed and incomplete; the explanation sometimes isn't clear.

Even though the book is shallow, I would recommend it; here and there you can get a valuable piece of information from it.
Matt Heavner
Mar 19, 2017 rated it really liked it
Data Science from Scratch is a good Data Science overview. It covers the breadth of the "field" targeting (aspiring) practitioners (for example, I couldn't find a "definition" of data science beyond the "it's a Venn diagram thing - data, math, hacking"). For practitioners, the "from scratch" approach is very useful. Some topics will be o quick skim, others are a close analysis of the code (python) to understand specific implementation of "cartoon" examples. The from-scratch approach builds up th ...more
Dušan Maďar
Jul 22, 2018 rated it really liked it
Shelves: in-english
I was reading this one on and off for quite some time. It explains numerous data science algorithms and methods from scratch, using Python, as the title (and many reviews) claim. So basically it explains what the 3rd party libraries do under the hood.
May 20, 2019 rated it it was amazing
Real good introduction to DS

Detailed explanations for common ML models
If you are looking for sklearn, pytorch, pandas this is not your book
Books is in python 2.7 but python 3 is available in github
Helen Mary Labao Barrameda
Jun 01, 2019 rated it really liked it
Fresh approach to machine learning and DL. Some prior technical knowhow is required to appreciate the “under the hood” code snippets shared in this book. But it’s a very good read and organized properly.
Shirish Hirekodi
May 08, 2020 rated it it was amazing
An excellent primer covering programming and mathematical skills which are needed for data science. The author can express complex concepts in simple language. I learnt so much from this book and will be circling back to some chapters as I learn more
Abhishek Kumar
Dec 26, 2017 rated it really liked it
Nice book to give a feel of Data Science. It explains the principles of data science by giving a very basic implementation in Python. There is a good follow up suggestions at the end of each chapter.
« previous 1 next »
topics  posts  views  last activity   
Right Book to Learn Data Science? 1 1 Aug 19, 2019 11:58AM  
Data Science and Data Analytics 1 3 Aug 13, 2019 01:43AM  
Goodreads Librari...: Spelling errors in book title 3 9 Jan 09, 2017 06:57AM  

Readers also enjoyed

  • Hands-On Machine Learning with Scikit-Learn and TensorFlow
  • Python for Data Analysis
  • Introduction to Machine Learning with Python: A Guide for Data Scientists
  • Practical Statistics for Data Scientists: 50 Essential Concepts
  • Designing Data-Intensive Applications
  • Python Data Science Handbook: Tools and Techniques for Developers
  • Python Machine Learning
  • Data Science for Business: What you need to know about data mining and data-analytic thinking
  • An Introduction to Statistical Learning: With Applications in R
  • Flask Web Development: Developing Web Applications with Python
  • Deep Learning
  • The Elements of Statistical Learning: Data Mining, Inference, and Prediction
  • Doing Data Science
  • R for Data Science: Import, Tidy, Transform, Visualize, and Model Data
  • Naked Statistics: Stripping the Dread from the Data
  • The Hundred-Page Machine Learning Book
  • Deep Learning with Python
  • The Pragmatic Programmer: From Journeyman to Master
See similar books…

Goodreads is hiring!

If you like books and love to build cool products, we may be looking for you.
Learn more »

News & Interviews

Need another excuse to treat yourself to a new book this week? We've got you covered with the buzziest new releases of the day. To create our lis...
16 likes · 16 comments
“This means that, where appropriate, we will dive into mathematical equations, mathematical intuition, mathematical axioms, and cartoon versions of big mathematical ideas.” 0 likes
“Just run: pip install ipython and then search the Internet for solutions to whatever cryptic error messages that causes.” 0 likes
More quotes…