Paweł Cisło’s Kindle Notes & Highlights for Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking

It is liberally sprinkled with compelling real-world examples outlining familiar, accessible problems in the business world: customer churn, targeted marking, even whiskey analytics!

About the book

3%

information is now widely available on external events such as market trends, industry news, and competitors’ movements.

Information is everywhere

3%

two brief case studies of analyzing data to extract predictive patterns.

Two examples: "Hurricane Frances" + "Predicting Customer Churn"

4%

The sort of decisions we will be interested in in this book mainly fall into two types: (1) decisions for which “discoveries” need to be made within data, and (2) decisions that repeat, especially at massive scale, and so decision-making can benefit from even small increases in decision-making accuracy based on data analysis.

Types of decisions

4%

Consumers tend to have inertia in their habits and getting them to change is very difficult.

inertia - tendency to do nothing or to remain unchanged

4%

the arrival of a new baby in a family is one point where people do change their shopping habits significantly.

Predictive model

4%

Big data essentially means datasets that are too large for traditional data processing systems, and therefore require new processing technologies.

Big data

4%

data, and the capability to extract useful knowledge from data, should be regarded as key strategic assets.

Data Science principle

5%

credit cards essentially had uniform pricing, for two reasons: (1) the companies did not have adequate information systems to deal with differential pricing at massive scale, and (2) bank management believed customers would not stand for price discrimination.

5%

Around 1990, two strategic visionaries (Richard Fairbanks and Nigel Morris) realized that information technology was powerful enough that they could do more sophisticated predictive modeling

First steps of data science

5%

modeling profitability, not just default probability, was the right strategy.

5%

quantitative demonstrations of the value of a data asset are hard to find, primarily because firms are hesitant to divulge results of strategic value.

5%

Sociodemographic data provide a substantial ability to model the sort of consumers that are more likely to purchase one product or another. However, sociodemographic data only go so far; after a certain volume of data, no additional advantage is conferred.

5%

consumers find value in the rankings and recommendations that Amazon provides.

Amazon ranking system

5%

data analysis is now so critical to business strategy.

5%

They employ data science teams to bring advanced technologies to bear to increase revenue and to decrease costs.

5%

With an understanding of the fundamentals of data science you should be able to devise a few probing questions to determine whether their valuation arguments are plausible.

6%

The Cross Industry Standard Process for Data Mining, abbreviated CRISP-DM (CRISP-DM Project, 2000),

Useful conditions of data mining process

7%

classification predicts whether something will happen, whereas regression predicts how much something will happen.

Classification

7%

The result of co-occurrence grouping is a description of items that occur together. These descriptions usually include statistics on the frequency of the co-occurrence and an estimate of how surprising it is.

Co-occurrence grouping

7%

Link prediction attempts to predict connections between data items, usually by suggesting that a link should exist, and possibly also estimating the strength of the link.

Link prediction

8%

Classification, regression, and causal modeling generally are solved with supervised methods. Similarity matching, link prediction, and data reduction could be either. Clustering, co-occurrence grouping, and profiling generally are unsupervised.

8%

This is still considered classification modeling rather than regression because the underlying target is categorical. Where necessary for clarity, this is called “class probability estimation.”

Classification modeling vs regression

8%

There is another important distinction pertaining to mining data: the difference between (1) mining the data to find patterns and build models, and (2) using the results of data mining.

Data mining confusion

8%

The use of data mining results should influence and inform the data mining process itself, but the two should be kept distinct.

Use of data mining

8%

Data mining is a craft. It involves the application of a substantial amount of science and technology, but the proper application still involves art as well.

Data mining is a craft

8%

Business Understanding stage represents a part of the craft where the analysts’ creativity plays a large role.

Business understanding importance

8%

This can mean structuring (engineering) the problem such that one or more subproblems involve building models for classification, regression, probability estimation, and so on.

Breaking problem into the subproblems

9%

critical part of the data understanding phase is estimating the costs and benefits of each data source and deciding whether further investment is merited.

Critical part of the data understanding phase

9%

Therefore a data preparation phase often proceeds along with data understanding, in which the data are manipulated and converted into forms that yield better results.

Data understanding

9%

modeling is some sort of model or pattern capturing regularities in the data.

Modeling

9%

fraud detection, spam detection, and intrusion monitoring) is that they produce too many false alarms.

False alarms

9%

To facilitate such qualitative assessment, the data scientist must think about the comprehensibility of the model to stakeholders (not just to the data scientists).

Comprehensibility of the model to stakeholders

10%

data science team is responsible for producing a working prototype, along with its evaluation.

Data Science team responsibility

10%

data mining is an exploratory undertaking closer to research and development than it is to engineering.

10%

Team members may be evaluated using software metrics such as the amount of code written or number of bug tickets closed. In analytics, it’s more important for individuals to be able to formulate problems well, to prototype solutions quickly, to make reasonable assumptions in the face of ill-structured problems, to design experiments that represent good investments, and to analyze results. In building a data science team, these qualities, rather than traditional software engineering expertise, are skills that should be sought.

Qualifying software engineering projects vs data science ones

12%

In data science, prediction more generally means to estimate an unknown value

Prediction meaning

12%

descriptive modeling, where the primary purpose of the model is not to estimate a value but instead to gain insight into the underlying phenomenon or process.

Descriptive modeling

13%

instance is also sometimes called a feature vector, because it can be represented as a fixed-length ordered collection (vector) of feature values.

Instance as a feature vector

13%

The target variable, whose values are to be predicted, is commonly called the dependent variable in statistics.

Target/dependent variable

13%

The creation of models from data is known as model induction.

Model induction

13%

direct, multivariate supervised segmentation is just one application of this fundamental idea of selecting informative variables.

direct, multivariate supervised segmentation

13%

If every member of a group has the same value for the target, then the group is pure. If there is at least one member of the group that has a different value for the target variable than the rest of the group, then the group is impure.

Pure vs impure

14%

formula that evaluates how well each attribute splits a set of examples into segments, with respect to a chosen target variable. Such a formula is based on a purity measure. The most common splitting criterion is called information gain, and it is based on a purity measure called entropy.

Purity measure, information gain