More on this book
Community
Kindle Notes & Highlights
Business intelligence is a broad set of information technology (IT) solutions that includes tools for gathering, analyzing, and reporting information to the users about performance of the organization and its environment. These IT solutions are among the most highly prioritized solutions for investment.
Diamond mining is the act of digging into large amounts of unrefined ore to discover precious gems or nuggets. Similarly, data mining is the act of digging into large amounts of raw data to discover unique nontrivial useful patterns.
Datafication is a term that is often used to mean that almost every phenomenon is now being observed and information about it is being recorded.
A database is a modeled collection of data that is accessible in many ways. A data model can be designed to integrate the operational data of the organization. The data model abstracts the key entities involved in an action and their relationships. Most databases today follow the relational data model and its variants. Each data modeling technique imposes and enforces rigorous rules and constraints to ensure the integrity and consistency of data over time.
A data warehouse is an organized store of data from all over the organization, specially designed to help make management decisions. Data can be extracted from operational databases to answer a particular set of queries. This data, combined with other data, can be rolled up to a consistent granularity and uploaded to a separate data store called the data warehouse. Therefore, the data warehouse is a simpler version of the operational data base, with the purpose of addressing reporting and decision-making needs only.
Data Mining is the art and science of discovering useful innovative patterns from data.
Here are brief descriptions of some of the most important data mining techniques used to generate insights from data. Decision Trees: They help classify populations into classes. Examples of classes are high-risk patients or low risk, of high-value customer or low-value customer. It is said that 70% of all data mining work is about classification solutions; and that 70% of all classification work uses decision trees.
Regression: The goal is to find a best fitting curve through the many data pointsin a multi-dimensional space. This is a well-understood technique from the field of statistics. The output of regression is a best fitting curve is that which minimizes the (error) distance between the actual data points and the values predicted by the curve. Regression models can be projected into the future for prediction and forecasting purposes.
Artificial Neural Networks: ANNs are multi-layer non-linear information processing models that learn from past data and predict future values.
Cluster analysis: This technique helps conquer large data sets by dividing them into smaller meaningful datasers. The data set is divided into a certain number of clusters, on the basis of certian similarities and dissimilarities within the data.
Association Rule Mining: This technique look for causal or correlational associations between data values. It is also called Market Basket Analysis in retail industry.
Business Intelligence is a business-oriented term, and so is Decision science. BI is the field of using data-driven insights to make superior business decisions. This is a broad category includes data analytics or data mining as a provider of the insights to package.
Data Analytics is a technology-oriented term, and so is Data Mining. Both involve the use of techniques and tools to find novel useful patterns from data.
Data Science is a new discipline born in the early 2000s. The scope includes the entire data processing chain. A data scientist would ideally be familiar with all aspects of the discipline while specializing in a part of the area.
There are two main kinds of decisions: strategic decisions and operational decisions. BI can help make both better.
A data warehouse (DW) is an organized collection of integrated, subject-oriented databases designed to support decision support functions. DW is organized at the right level of granularity to provide clean enterprise-wide data in a standardized format for reports, queries, and analysis. DW is physically and functionally separate from an operational and transactional database.
Star schema is the preferred data architecture for most DWs. There is a central fact table that provides most of the information of interest. There are lookup tables that provide detailed values for codes used in the central table.
Other schemas include the snowflake architecture, which is a fractally expanded version of the star model. The difference between a star and snowflake is that in the latter, the look-up tables can have their own further look up tables.
Data lake is a current and more expanded concept to represent Big Data. It addresses the limitations of concept of data warehouse that was invented in the 1990s. DWs were expensive and proprietary, and are often not flexible enough to handle the evolving needs of companies. A data lake is conceived to be a centralized repository where all your structured and unstructured data can flow in, speedily, at any scale, and typically dammed on a cloud storage platform. A data lake uses a flat architecture and object storage to store the data “as is,” without imposing a fixed schema or structure on
...more
Data mining is the art and science of discovering knowledge, insights, and patterns in data. It is the act of extracting useful patterns from an organized collection of data. Patterns should be valid, novel, potentially useful, and understandable. The implicit assumption is that data about the past can reveal patterns of activity that can be projected into the future.
The most important class of problems solved using data mining are classification problems. Classification techniques are called supervised learning as there is a way to supervise whether the model is providing the right or wrong answers.

