Key FeaturesQuickly get familiar with data science using PythonSave time - and effort - with all the essential tools explainedCreate effective data science projects and avoid common pitfalls with the help of examples and hints dictated by experienceBook DescriptionThe book starts by introducing you to setting up your essential data science toolbox. Then it will guide you across all the data munging and preprocessing phases. This will be done in a manner that explains all the core data science activities related to loading data, transforming and fixing it for analysis, as well as exploring and processing it. Finally, it will complete the overview by presenting you with the main machine learning algorithms, the graph analysis technicalities, and all the visualization instruments that can make your life easier in presenting your results.
In this walkthrough, structured as a data science project, you will always be accompanied by clear code and simplified examples to help you understand the underlying mechanics and real-world datasets.
What you will learnSet up your data science toolbox using a Python scientific environment on Windows, Mac, and LinuxGet data ready for your data science projectManipulate, fix, and explore data in order to solve data science problemsSet up an experimental pipeline to test your data science hypothesisChoose the most effective and scalable learning algorithm for your data science tasksOptimize your machine learning models to get the best performanceExplore and cluster graphs, taking advantage of interconnections and links in your dataAbout the AuthorsAlberto Boschetti is a data scientist with expertise in signal processing and statistics. He holds a PhD in telecommunication engineering and currently lives and works in London. In his work projects, he faces challenges involving natural language processing (NLP), machine learning, and probabilistic graph models everyday.
Luca Massaron is a data scientist and marketing research director who specializes in multivariate statistical analysis, machine learning, and customer insight, with over a decade of experience in solving real-world problems and generating value for stakeholders by applying reasoning, statistics, data mining, and algorithms.
Table of ContentsFirst StepsData MungingThe Data Science PipelineMachine LearningSocial Network AnalysisVisualization
It's hard to explain this book, mostly because it's hard to get to whom it is targeted.
Is it targeted to people that already know Machine Learning and want to learn Python? No, the book goes into lengths into some algorithms and has "easy to write, but not quite right" code to do so.
Is it targeted to people that know Python but want to learn Machine Learning? No; even if some algorithms are explained in length, some aren't and there is very little "you use this when you have data like that" explanations. Actually, there is very little explanation on where an algorithm should be used.
Is it targeted to people that don't know Python and don't know Machine Learning and want to learn but? This is the gray area of the book: Again, the code is pretty simple and does not follow Python coding standards and the ML part is really shallow on the "when" and "why" sections.
In the end, the book is simply an extended version of Scikit-Learn manual -- and I even have doubts if the manual isn't better because it explains when an algorithm should be used.
If someone contextualizes their practical challenges as guidelines, it is a great upper hand for the learners. Alberto Boschetti and Luca Massaron give advice with clearly set out boundaries to contextualisation to ensure readers can readily determine what is acceptable to the industry. This advice develops around scenarios, examples and codes of data science projects.
The authors are data scientists with expertise in statistics, linking with other sophisticated technical subject fields. This book has simplified the complexities that are relevant to beginners and intermediate data scientists with their understanding may have faced in using Python. In this book, users are recommended Python 3.4 or above for all its examples to practice.
The book engages and absorbs the reader into the subject matter involving almost all the human senses. The beauty of the book is that it has six chapters linked with resources (data and source codes). These resources are of immense value and will surely intrigue both beginners, and intermediate users. At the beginning of each chapter readers are able to clearly visualise what will be learnt during the chapter. The book gives more extensive knowledge about practical data mining principals through scientific methodology and effectively tests the performance of the user's machine learning hypothesis.
If the reader studies the book and completes the lab practice, it is a great chance to enhance user data manipulation and machine learning skills.
In this second edition, it is evident the authors have invested both time and effort, and have listened to user feedback to improve this particular edition. This edition displays more maturity and delivers more focus on updated and expanded content. Chapter four on Machine Learning in this second edition is an excellent move I think, as it’s one of the most widely used data science techniques with python.
Visualize the machine learning and optimisation processes the authors discuss in chapter 3, ‘The Data Pipeline’, and chapter 4, ‘Machine Learning’. If readers choose to get colour images of this book, there is the facility, and I am sure it is a bonus for the readers.
I recommend this book to all data science labs if they are dedicated to investing real industry experiences to successfully obtain their future research project deliverables.
This 354 page book is an excellent guide on learning data science through python for those aspiring to become experienced in it. It is also one of the few books that one will find truly practical and engaging.