More on this book
Community
Kindle Notes & Highlights
by
Mariya Yao
Read between
October 3 - October 22, 2019
Bad data invariably leads to bad results.
The ultimate goal of any predictive model is to make accurate predictions about unseen data.
Underfitting occurs when your model is too simple to capture the complexities of your underlying data.
Overfitting occurs when your model does not generalize well outside of your training data.
Machine learning projects benefit from following a structured workflow, which starts with clearly defining business goals.
The core of supervised machine learning is a mathematical model that describes how an algorithm makes predictions after being trained with historical data. The goal of training is to develop a model capable of mapping each input to a target output.
Define Business Goal
Examine Existing Data and Processes
Frame the Problem
Centralize Data
Clean Data
Split Data
randomly split the available data into three sets: training data, validation data, and test data.
Train Model
Validate and Test Model
Deploy Model
Monitor Performance
Machine learning models will decay in performance if they are not regularly retrained on fresh data. You must monitor both a model’s performance as well as the integrity of its data inputs.
Iterate
early AI investments can be classified as R&D spending, and they should be regarded
as innovation opportunities with potentially exponential returns.
The vast majority of the engineering required to productize a model lies in developing and maintaining the vast and complex infrastructure that surrounds the code.
the performance of your existing models will deteriorate as environmental conditions change over time.
As more machine learning algorithms are put into production, you will also need to dedicate more resources to model maintenance—monitoring, validating, and updating the model.
Machine learning debt can be divided into three main types: code debt, data debt, and math debt.(86)
MLaaS systems are designed to encompass the entire machine learning workflow: data management, algorithm training, evaluation and deployment, and model iteration.
Machine learning models are not static. Your model will need to be retrained as new data becomes available or as external conditions change. The frequency of updates will depend on your algorithm, the situation, and the availability of data.
machine learning debt.
For example, mundane tasks can take up to 75 percent of a recruiter's job. Handing off the responsibility of filtering resumés, which consists primarily of matching terms or looking for specific experiential phrases, to an AI-based solution can free up to half of the recruiter's day for other tasks.
General and administrative roles are riddled with tedious but critical tasks such as manual data entry, which requires extreme precision. The exponentially-increasing volume of data combined with limits on the human capacity for sustained attention to detail is a recipe for corporate disaster.
Finance and Accounting
you can use natural language processing software such as AppZen to automate expense management for accountants and controllers.
The biggest challenge, however, is one of tradition. The legal sector devoting a great deal of attention as well as skepticism to LegalTech. Legal lags behind other service-based industries when it comes to automating tasks.
Current applications include drafting and reviewing contracts, mining documents in discovery and due diligence, sifting data to predict outcomes, and answering routine questions.
Your company’s intellectual property is often your most valuable asset. AI can now assist with invention disclosures, docketing, deadlines, filing applications, valuing your IP portfolio, and budgeting.
HyperScience utilizes advanced computer vision techniques to scan and process handwritten forms to eliminate the data entry bottleneck.
Robotic Process Automation (RPA),
Unfortunately, AI-based RPAs need to learn from experience, meaning that this ability to recognize new inputs is still dependent on having access to large amounts of previously generated data.
Fortunately, the nature of HR, which emphasizes matching and planning skills, and the abundance of internal data create opportunities for AI-based optimization throughout the hiring process.
These platforms use a variety of AI techniques, such as algorithmic matching and predictive scanning, to identify promising candidates.
To create meaning, BI must first convert
data into information, then analyze that information to create insights that can then be converted into recommendations for action.
data wrangling services that automate portions of the data preparation process, using algorithms and machine learning to transform raw data inputs into well-labeled data structures that are ready for use.
Whatever the reason, data silos are a bad idea.
Accountability is clear when a business unit can appoint one owner for its siloed data, but it is less clear when a dataset consists of multiple contributions from multiple departments or is of equal interest to multiple units.
Software 2.0 will not supplant traditional software development entirely. Training a machine learning model is only a single step in the development process.
Critical components such as data management, front-end product interfaces, and security will still need to be handled by regular software.
Which products and features should you prioritize, and which ones should you cut? An AI solution trained on past developments and current business priorities can assess the performance of existing applications, helping you and your engineering teams to identify efforts that will maximize impact and minimize risk.
Marketing and Sales work hand-in-hand to attract and retain customers, which requires understanding what interests customers and motivates them to want to buy products.
For example, Marketing may want to use natural language

