Data Analytics Made Accessible
Rate it:
Open Preview
Read between November 22 - December 1, 2018
62%
Flag icon
Support Vector machines is a machine learning technique for classifying high-dimensional data into two classes. It creates a hyperplane, with the largest amount of separation between the two classes. The classifier is made linear, by transforming the original features of input data into new features. SVMs use kernel methods to learn from specific instance, that are close to the decision boundary. SVMs are used for text mining, such as spam filtering and outlier detection.
62%
Flag icon
Web mining is the art and science of discovering patterns and insights from the World-wide web so
63%
Flag icon
Depending upon objectives, web mining can be divided into three different types: Web usage mining, Web content mining and Web structure mining (Figure 12.1).
63%
Flag icon
Hyperlink-Induced Topic Search (HITS) is a link analysis algorithm that rates web pages as being hubs or authorities.
63%
Flag icon
Many other HITS-based algorithms have also been published. The most famous and powerful of these algorithms is the PageRank algorithm. Invented by Google co-founder Larry Page, this algorithm is used by Google to organize the results of its search function. This algorithm helps determine the relative importance of any particular web page by counting the number and quality of links to a page. The websites with more number of links, and/or more links from higher-quality websites, will be ranked higher. It works in a similar way as determining the status of a person in a society of people. Those ...more
64%
Flag icon
PageRank is the algorithm that helps determine the order of pages listed upon a Google Search query. The original PageRank algorithm formulation has been updated in many ways and the latest algorithm is kept a secret so other websites cannot take advantage of the algorithm and manipulate their website according to it. However, there are many standard elements that remain unchanged. These elements ...
This highlight has been truncated due to consecutive passage length restrictions.
64%
Flag icon
The web has growing resources, with more content everyday and more users visiting it for many purposes.  A good website should be useful, easy to use, and flexible for evolution. From the insights gleaned using web mining, websites should be constantly optimized. Web usage mining can help discover what content users really like and consume, and help prioritize that for improvement. Web structure can help improve traffic to those sites, by building authority for the sites.
64%
Flag icon
Social Networks are a graphical representation of relationships among people and/or entities. Social network analysis (SNA) is the art and science of discovering patterns of interaction and influence within the participants in a network. These participants could be people, organizations, machines, concepts, or any other kinds of entities. An ideal application of social network analysis will discover essential characteristics of a network including its central nodes and its sub-network structure. Sub-networks are clusters of nodes where the within-sub-network connections are stronger than the ...more
64%
Flag icon
Network topologies There are two primary types of network topologies: the ring-type and hub-spoke topologies. Each of the topologies has different characteristics and benefits.
65%
Flag icon
There are two major levels of social network analysis: discovering sub-networks within the network, and ranking the nodes to find more important nodes or hubs.
66%
Flag icon
PageRank is a particular application of the social network analysis techniques above to compute the relative importance of websites in the overall world wide web. The data on websites and their links is gathered through web crawler bots that traverse through the webpages at frequent intervals. Every webpage is a node in a social network, and all the hyperlinks from that page become directed links to other webpages. Every outbound link from a webpage is considered an outflow of influence of that webpage. An iterative computational technique is applied to compute a relative importance to each ...more
67%
Flag icon
Social network analysis is a powerful method of analyzing relationships among entities to identify strong patterns. There could be sub-networks in a network based on strong ties within the network. A computationally rigorous set of techniques can be used to rank every node in a network for its influence and importance.
68%
Flag icon
Big Data is an all-inclusive term that refers to extremely large, very fast, highly diverse, and complex data that cannot be managed with traditional data management tools. Ideally, Big Data includes all kinds of data, and helps deliver the right information, to the right person, in the right quantity, at the right time, to help make the right decisions. Big
68%
Flag icon
be
69%
Flag icon
is
69%
Flag icon
70%
Flag icon
74%
Flag icon
databases. There are two essential components of SQL: the Data Definition Language (DDL) and Data Manipulation Language. DDL provides instructions to create new database, and to create new tables within a database. Further it provides instructions to delete a database, or just a few tables within a database. There are other ancilliary commands to define indexes etc for efficient access to the database. DML is the heart of SQL. It provides instructions to add, read, modify and delete data from the database and any of its tables. The data can selectively accessed, and then formatted, to answer a ...more
74%
Flag icon
Data should be modeled to achieve the business objectives. Good data should be accurate and accessible, so that it can be used for business operations. Relational data model is the two most popular way of managing data today.
75%
Flag icon
Descriptive Statistics These are tools and techniques to describe a collection of data. Data is often described by its central tendency and its spread. The primary central tendency is the mean, or the average, of the values. There are other central tendencies such as the median and mode. The spread within the data is called Variance, and is often described by ‘standard deviation’.
77%
Flag icon
Inferential Statistics The primary purpose of statistics is to infer the properties of the population without having to reach out to and measure the entire population. If a suitably randomized sample is selected, such that it can reasonably be assumed to be representative of the overall population, then the properties of the population can be inferred from it, with a certain high degree of confidence. The larger the sample size is, the greater would be the confidence.
77%
Flag icon
Predictive Statistics Statistical techniques such as regression analysis can be used for predicting the values of variables of interest. Correlations among variables shows which variables might influence the values of others. There  are many variations of regression, such as linear, non-linear and logit regression. Time series analysis is a special case of regression analysis where the key independent variable is time.
77%
Flag icon
Statistical tools are time-tested ways of understanding large populations using small randomized samples. Data is described using central tendencies such as mean, median and mode; and its distribution using range and variance. Predictive statistical methods like regression and others are prominent tools in data analytics.
77%
Flag icon
Artificial Intelligence (AI) is the way of abstracting intelligence capabilities out of the human mind, and embedding them into non-sentient objects. Ideally, AI will be able to express intelligent behavior surpassing human intelligence in every way. This is achieved by modeling human intelligence through the whole spectrum of perspectives from genetic, to neurological, to cognitive, to behavioral, and the physical. AI ultimately models and implements the universal laws of nature.
78%
Flag icon
Siri is an offshoot of the DARPA-funded project, CALO, which was a part of DARPA's PAL initiative (Personalized Assistant that Learns). Siri involve a number of technologies, including natural language processing, question analysis, data mashups, and machine learning. Siri’s main algorithm, at a high level, is as follows: Use Automatic speech recognition to transcribe human speech into text. Use natural language processing to translate transcribed text into ‘parsed text’. Use question & intent analysis to analyze parsed text, detecting user commands and actions such as "What is the weather". ...more
78%
Flag icon
AI, Machine Learning, and Deep Learning Artificial Intelligence is a broad term that includes all manner of intelligence machines. Patrick Whinston of MIT describes AI as representations that support models targeted at thinking, perception and action. AI includes Machine Learning, Natural Language Understanding, Expert Systems, Computer vision, Robotics, and other capabilities (Figure 19.1). Machine learning systems are those that helps learn patterns from data using Neural Networks, Genetic Algorithms, and more. Deep Learning systems are prediction systems based on Neural Networks. They are ...more
78%
Flag icon
Figure 19.1: Components of AI and Machine Learning
79%
Flag icon
Artificial Intelligence is a way of understanding intelligence and embedding it in software and devices. AI is able to do many routine knowledge works. It poses a threat to the jobs we see today. However, the Industrial revolution before it, created many new industries even as it destroyed old jobs. AI has the potential to match and exceed human intelligence in the next 10-15 years.  It has the potential to pose an existential threat to humanity.
79%
Flag icon
Science
79%
Flag icon
A data science major takes of a wide range of jobs such as Data Engineer, Data Analyst, Data visualizer, Data Warehousing specialist, Machine learning specialist, and so on.
« Prev 1 2 Next »