Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking
Rate it:
Kindle Notes & Highlights
3%
Flag icon
At a high level, data science is a set of fundamental principles that guide the extraction of knowledge from data. Data mining is the extraction of knowledge from data, via technologies that incorporate these principles.
4%
Flag icon
Data-driven decision-making (DDD) refers to the practice of basing decisions on the analysis of data, rather than purely on intuition.
4%
Flag icon
data, and the capability to extract useful knowledge from data, should be regarded as key strategic assets.
5%
Flag icon
Once we view data as a business asset, we should think about whether and how much we are willing to invest.
5%
Flag icon
When faced with a business problem, you should be able to assess whether and how data can improve performance.
6%
Flag icon
Extracting useful knowledge from data to solve business problems can be treated systematically by following a process with reasonably well-defined stages.
6%
Flag icon
From a large mass of data, information technology can be used to find informative descriptive attributes of entities of interest.
6%
Flag icon
If you look too hard at a set of data, you will find something — but it might not generalize beyond the data you’re looking at.
6%
Flag icon
Formulating data mining solutions and evaluating the results involves thinking carefully about the context in which they will be used.
7%
Flag icon
Classification and class probability estimation attempt to predict, for each individual in a population, which of a (small) set of classes this individual belongs to.
7%
Flag icon
Regression (“value estimation”) attempts to estimate or predict, for each individual, the numerical value of some variable for that individual.
7%
Flag icon
Informally, classification predicts whether something will happen, whereas regression predicts how much something will happen.
7%
Flag icon
Similarity matching attempts to identify similar individuals based on data known about them.
7%
Flag icon
Clustering attempts to group individuals in a population together by their similarity, but not driven by any specific purpose.
7%
Flag icon
Co-occurrence grouping (also known as frequent itemset mining, association rule discovery, and market-basket analysis) attempts to find associations between entities based on transactions involving them.
7%
Flag icon
Profiling (also known as behavior description) attempts to characterize the typical behavior of an individual, group, or population.
7%
Flag icon
Link prediction attempts to predict connections between data items, usually by suggesting that a link should exist, and possibly also estimating the strength of the link.
7%
Flag icon
Data reduction attempts to take a large set of data and replace it with a smaller set of data that contains much of the important information in the larger set.
7%
Flag icon
Causal modeling attempts to help us understand what events or actions actually influence others.
8%
Flag icon
Classification, regression, and causal modeling generally are solved with supervised methods. Similarity matching, link prediction, and data reduction could be either. Clustering, co-occurrence grouping, and profiling generally are unsupervised.
8%
Flag icon
Two main subclasses of supervised data mining, classification and regression, are distinguished by the type of target. Regression involves a numeric target while classification involves a categorical (often binary) target.