Introducing Data Science teaches you how to accomplish the fundamental tasks that occupy data scientists. Using the Python language and common Python libraries, you'll experience firsthand the challenges of dealing with data at scale and gain a solid foundation in data science.
Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.
About the Technology
Many companies need developers with data science skills to work on projects ranging from social media marketing to machine learning. Discovering what you need to learn to begin a career as a data scientist can seem bewildering. This book is designed to help you get started.
About the Book
Introducing Data ScienceIntroducing Data Science explains vital data science concepts and teaches you how to accomplish the fundamental tasks that occupy data scientists. You’ll explore data visualization, graph databases, the use of NoSQL, and the data science process. You’ll use the Python language and common Python libraries as you experience firsthand the challenges of dealing with data at scale. Discover how Python allows you to gain insights from data sets so big that they need to be stored on multiple machines, or from data moving so quickly that no single machine can handle it. This book gives you hands-on experience with the most popular Python data science libraries, Scikit-learn and StatsModels. After reading this book, you’ll have the solid foundation you need to start a career in data science.
What’s Inside
Handling large data Introduction to machine learning Using Python to work with data Writing data science algorithms
About the Reader
This book assumes you're comfortable reading code in Python or a similar language, such as C, Ruby, or JavaScript. No prior experience with data science is required.
About the Authors
Davy Cielen, Arno D. B. Meysman, and Mohamed Ali are the founders and managing partners of Optimately and Maiton, where they focus on developing data science projects and solutions in various sectors.
Table of Contents
Data science in a big data world The data science process Machine learning Handling large data on a single computer First steps in big data Join the NoSQL movement The rise of graph databases Text mining and text analytics Data visualization to the end user
I read it to get an introduction and can't complain on that. Not sure how people well-versed will feel but I enjoyed it. Also helped that it was quite humorous.
I'm not a data scientist and I found this book interesting enough to read it through to the end. It gave me a glimpse into what data science is and how to use it. I also learned a little about what the typical role of a data scientist is and how to implement some of the tools in a small application.
This book was worth my time. Thank you to the authors.
I think it’s a decent book that gives a good overview to technologies that are needed for data scientists. It contains a lot of hands on exercises. If you are not interested in doing those exercises you can just scroll through them.
The book is well structured to help beginners like me to get an overview of the ecosystem of data science...good examples are used and very helpful summaries in each chapter to refresh my memory and understand the milestones when I progress through the reading...my thumb up for the book.
INTRODUCING DATA SCIENCE is a broad introduction to the field. Each chapter includes the theory, as well as practical examples. This is a big, complex field, and this book will take some time to absorb. For example, the authors illustrate in Chapter 6 the large number of database products that are used in this field. These include both "NoSQL," as well as "New SQL" designs. There isn't just one "right" method. To be honest, I didn't know there was any such thing as "graphical" databases.
I recommend trying some of the detailed examples to get a feel for the subject matter. For example, the authors show, in detail, how to use Wikipedia with a custom Python program to try some data mining. The example pretends that you are trying to research diseases, using information in Wikipedia, and show you how to write your program.
I thought the best chapter was Chapter 6, "Join the NoSQL Movement." Here, the authors explain the intrinsic limitations of the traditional RBBMS structure. A typical RDBMS table is stored with ALL the columns together. This often works well, but what if you only want certain columns? Well, that's too bad--you will end up "touching" all the other columns as well.
Column databases don't have the above limitation. So, they are far faster are scanning through large amounts of information, when you just want certain columns. (As an Oracle DBA for several decades, I can confirm that the authors correctly state the limitation.)
So all in all, I found INTRODUCING DATA SCIENCE to be a very good book--albeit a bit overwhelming. I found the examples especially helpful. The book has several appendices that explain how to install required libraries for use in the chapter examples.