Goodreads helps you keep track of books you want to read.
Start by marking “Data Science from Scratch: First Principles with Python” as Want to Read:
Data Science from Scratch: First Principles with Python
Enlarge cover
Rate this book
Clear rating
Open Preview

Data Science from Scratch: First Principles with Python

3.92  ·  Rating details ·  513 ratings  ·  43 reviews

Data science libraries, frameworks, modules, and toolkits are great for doing data science, but they’re also a good way to dive into the discipline without actually understanding data science. In this book, you’ll learn how many of the most fundamental data science tools and algorithms work by implementing them from scratch.

If you have an aptitude for mathematics and som

Kindle Edition, 330 pages
Published April 14th 2015 by O'Reilly Media
More Details... edit details

Friend Reviews

To see what your friends thought of this book, please sign up.

Reader Q&A

To ask other readers questions about Data Science from Scratch, please sign up.
Popular Answered Questions
Simon Accascina Time is now ripe for using python3 in data science since is is supported by major libraries such as numpy, scipy, pandas, scikit-learn, tensorflow,…moreTime is now ripe for using python3 in data science since is is supported by major libraries such as numpy, scipy, pandas, scikit-learn, tensorflow, matplotlib [0].

Unfortunately, this book is based on python 2.7 [1]; that notwithstanding, I *would* recommend this book since it is very well written!

The approach of the author is not to explain how to merely apply the pre-made data science tools (i.e. the aforementioned libraries), but rather he *teaches the actual algorithms* behind the principal ideas in statistical analysis and machine learning by coding them in pure python, which is great!

Now, be aware that in real applications of data science you will most probably not follow this route (that is, coding your algorithms directly in python) due to the slowness of python. The preferred approach is to use python to call highly optimized libraries such as numpy that do the actual computation for you.

For this reason, differences between python 2 and 3 are minimal among the features of the language used in this book.

What I am trying to say is that you can most likely follow the book as if it were a python3 book, and the only issue will be having to change "print x" into "print(x)" as needed.

1: as stated in chapter 2, paragraph 1 "getting python"(less)

Community Reviews

Showing 1-30
Rating details
Sort: Default
Deane Barker
Jan 19, 2016 rated it it was ok
I'm still struggling to find the book I want around data science. I've learned that there are two levels:

1. KNOWING data science
2. DOING data science

This book is about the second one. Make no mistake, this is a "statistical computation" manual. This shows you how to find statistical answers using Python. Fully half this book is code samples. If you do not plan to actually attempt to find statistical answers to known questions by writing Python code, then this isn't the book for you.

I would look
Apr 30, 2015 rated it really liked it
Decent book on introduction to data science using Python.

BTW, we should seriously stop writing books on elementary data science using R or Python. We have too many and they already started to look alike.

Oct 07, 2016 rated it it was amazing
I worked thru all of the examples in this book. Rather than have you import numpy and pandas and scikit-learn, he walks you through how to build up these tools yourself. What you build will be terribly inefficient and you should never use them in real life, but you will get a great feel for how they work under the hood.

(I also learned that my linear algebra is very rusty and I need a brush up ...)

I disagree with some of the reviews that they he doesn't do a good job explaining the computation
Nicholas Teague
Jun 02, 2017 rated it really liked it
to be read for purposes of demonstrating fundamentals. most of work here can be accomplished much simpler with advanced libraries, but this type of text helps one to understand the why and the building blocks of more elaborate practice.
Mohammed Ashour
Oct 01, 2018 rated it liked it
Good at:
- Practicing entry projects (exercises)
- simple language

Bad at:
- lack of some required details in some sections
- outdated code
- the apps -codes- are not that useful in some sections

Overall the book is a good refreshing read. but not that good for studying
Olivera Stojanovic
Jun 28, 2016 rated it did not like it
The idea of the book is nice, I still think is a useful book, but:
1. you'll not learn math behind this or the methods will be explained (it's good for a programming, though)
2. regarding programming part, I think that people would benefit more if there were some actual exercises for them to do, not just "type in this code" attitude
3. would be nice if all of the data sets are actually generated in a book, not just "there is some data set with 2000 points, that I just pulled out of my ass"
4. more u
Sungjoo Ha
May 02, 2016 rated it really liked it
Good introductory book on data science. I would recommend this to people who wish to learn basic things in a hands-on fashion.
M. Cetin
Dec 09, 2018 rated it really liked it
A brief introduction to many concepts and step-by-step construction of a working code. I would expect a little more math and theory that's why I gave four stars instead of five.
Kevin Moore
Mar 28, 2018 rated it really liked it
Really good overview, but needed a little more information about which software packages implement the functionality discussed.
Nov 04, 2016 rated it it was amazing
Отличная книга, чтобы погрузиться в мир машинного обучения. Не сказать, что после прочтения вы будете знать много, но зато про многое из области ML. Подход автора состоит в том, чтобы вместо детального описания алгоритмов словами, привести реализацию в виде кода на питоне. Хорошего, понятного, компактного кода (пользуясь питоном время от времени не первый год, не думал что этот язык может быть таким элегантным). Во-первых, это позволяет сэкономить место – хорошо написанный код лучше описания алг ...more
Tony Poerio
May 08, 2016 rated it liked it
Great book for a general overview of the concepts, and understanding what 'data science' actually means. Lots of code to drive to the points home, and it taught me quite a few Python tricks.

I can foresee using this as a reference for the main concepts, or when looking for a straightforward implementation of the algorithms discussed. The information is very solid.

If you want to power straight through, it's a tough read at times--but Joel's a very good writer, and I enjoyed the dry humor intersp
Roger Mitchell
Nov 30, 2016 rated it it was amazing
Fundamental concepts revealed, libraries for the win

Joel does a great job walking through the tasks a data scientist would take to solve hypothetical problems, and explaining the models most popularly implemented in machine learning. An overwhelming majority of the code examples are useless, which is intentional as Joel notes how to build things from scratch. Libraries (like pandas, scikit-learn, etc) provide APIs to accomplish many of these tasks without writing from scratch, but without the un
Felipe Scuciatto
Nov 22, 2016 rated it really liked it
De nada adianta conhecer ciência de dados sem fazer ciência de dados. Partindo deste pressuposto, este livro traz o essencial para "colocar a mão na massa" e torturar alguns dados. O mais interessante deste livro é que ele parte do absoluto zero nos algoritmos. Por não confiar em nenhuma biblioteca de análise, ele demonstra toda construção técnica por traz de regressões, redes neurais, árvores de decisão, classificadores bayesianos, etc.

Leitura recomendada para um sólido entendimento da prática
Aug 11, 2015 rated it really liked it
This was a fun survey of popular topics in contemporary data science. It was well written for a text book, and easy to read. I suppose it was light on formal proofs, but it made up for that by having you build toy models of all the major ideas. Well worth the read for me, as I am very new to data science but well versed in Python and math. I would like to see a follow-up book that covers the same topics, but using the real libraries people use in industry to solve these same problems.
Jun 03, 2018 rated it it was ok
Not terribly impressed with this one. The way I see it, readers of this book either will already know how to do data science, or they won't. If they do (and here I'm ignoring the fact that why would they, since the title of the book is "data science from scratch"), then they will find the explanations of concepts too basic, and the Python code implementation examples mostly useless (they, after all, are not using the libraries specifically designed to do data science, but rather implementing a n ...more
Jan 02, 2018 rated it really liked it
An excellent tool for aspiring data scientists like myself.

There's no shortage of information on the topic, but it's hard to find it all in one place. You could spend weeks combing through forums, blog posts, and video tutorials only to find half as much useful information. Data Science from Scratch covers the foundations of many basic Machine Learning algorithms in a succinct and humorous way.

As fair warning, the math is a little much to take in for a single book. The author provides introducti
Sep 19, 2017 rated it really liked it
Practical book which covers what's essential for data analysts getting into statistical analysis, machine learning and related topics. Good book for those starting out, but didn't have much to offer on the statistical learning side, principles and concepts wise. You're better off looking at books such as IPSUR (Jay G Kearns) and ISLR (Hastie & Tibshirani) for such content. However, this is a practical book because it introduces many relevant ideas. Some qualms: MapReduce treatment is probabl ...more
Oct 27, 2017 rated it it was ok  ·  review of another edition
Aside form the author's enthusiasm and breadth of knowledge I did not get much out of this book. For me there are not enough details on the statistical concepts and too much detail in the 'from scratch' code samples. The code samples are also never to be used again, as the author admits at the end, because there are many python packages that do an infinitely more efficient and scalable job of analysing data. The modelling concepts are not differentiated clearly enough so it's not understood why ...more
Thomas Pinder
May 13, 2018 rated it really liked it  ·  review of another edition
I read this prior to beginning an MSc in Data Science and found it to be a great introduction to data science, starting out with the very basics before moving into more general ML techniques and finishing up with some of the more complex topics such as MapReduce. Not an in-depth textbook by any means, but I do not think that is the purpose of this book, moreover to give the reader a well-rounded idea of the field.
Ulises Jimenez
Nov 28, 2018 rated it really liked it
It is a wonderful book to understand the detail of some machine learning methods implementation. It is also a good practice to use Python basic. As it is suggested, everything function is constructed from scratch. I really enjoyed the book, however I would not recommend it to learn ML and go directly to developing ML applications.
I rate it 4 because , some examples shown in the book do not provide data to test them
Leandro Braga
Mar 10, 2018 rated it it was ok
This book is nice to improve the understanding of some details underlying the data science algorithms, but it falls short in the deepness of the content. Some concepts feels rushed and incomplete; the explanation sometimes isn't clear.

Even though the book is shallow, I would recommend it; here and there you can get a valuable piece of information from it.
May 04, 2017 rated it liked it
The book covers a vast topic required to get started with data science stream. It introduces theory, frameworks and library. As a result none of the topics is hands on with example problem solving. Though the book working code example for all the concepts. To get a decent grip in data science the problem solving is very crucial.
Jun 16, 2018 rated it liked it
Um bom livro para quem quiser começar a aprender sobre estatística, princípios de data science e machine learning através de uma abordagem prática usando a linguagem Python. Me lembrou bastante o Collective Intelligence. Para um curso introdutório ainda prefiro este segundo pois ele trás algumas atividades e exercícios mais interessantes.
Matt Heavner
Mar 19, 2017 rated it really liked it
Data Science from Scratch is a good Data Science overview. It covers the breadth of the "field" targeting (aspiring) practitioners (for example, I couldn't find a "definition" of data science beyond the "it's a Venn diagram thing - data, math, hacking"). For practitioners, the "from scratch" approach is very useful. Some topics will be o quick skim, others are a close analysis of the code (python) to understand specific implementation of "cartoon" examples. The from-scratch approach builds up th ...more
Dusan Madar
Jul 22, 2018 rated it really liked it
Shelves: in-english
I was reading this one on and off for quite some time. It explains numerous data science algorithms and methods from scratch, using Python, as the title (and many reviews) claim. So basically it explains what the 3rd party libraries do under the hood.
Abhishek Kumar
Dec 26, 2017 rated it really liked it
Nice book to give a feel of Data Science. It explains the principles of data science by giving a very basic implementation in Python. There is a good follow up suggestions at the end of each chapter.
Xingda Wang
Jul 17, 2018 rated it it was amazing
good introduction, very fundamental!
Jul 26, 2018 rated it liked it  ·  review of another edition
Good overview of Data science
Sep 22, 2018 rated it really liked it
A fine introduction to the actual work of a data scientist looks like, from massaging the data to deciding which ML algorithm to run.
Scott Johnson
Mar 14, 2017 rated it it was ok
Maybe I misjudged the target audience, but this was far more basic than I was expecting.
« previous 1 3 4 5 6 7 8 9 next »
topics  posts  views  last activity   
Goodreads Librari...: Spelling errors in book title 3 9 Jan 09, 2017 06:57AM  
  • Test-Driven Web Development with Python
  • Think Complexity: Complexity Science and Computational Modeling
  • Data Science for Business: What you need to know about data mining and data-analytic thinking
  • Flask Web Development: Developing Web Applications with Python
  • Python for Data Analysis
  • Natural Language Processing with Python
  • Data Analysis with Open Source Tools: A Hands-On Guide for Programmers and Data Scientists
  • Introduction to Bioinformatics
  • REST API Design Rulebook
  • Speaking JavaScript
  • Building Data Science Teams
  • Modern PHP: New Features and Good Practices
  • JavaScript & jQuery: The Missing Manual
  • Python Machine Learning
  • Agile Data Science: Building Data Analytics Applications with Hadoop
  • High Performance Browser Networking
  • Machine Learning with R
  • Data Smart: Using Data Science to Transform Information into Insight

Goodreads is hiring!

If you like books and love to build cool products, we may be looking for you.
Learn more »
“This means that, where appropriate, we will dive into mathematical equations, mathematical intuition, mathematical axioms, and cartoon versions of big mathematical ideas.” 0 likes
“Just run: pip install ipython and then search the Internet for solutions to whatever cryptic error messages that causes.” 0 likes
More quotes…