Benjamin Bengfort's Blog, page 3

Building a Classifier from Census Data

One of the machine learning workshops given to students in the Georgetown Data Science Certificate is to build a classification, regression, or clustering model using one of the UCI Machine Learning Repository datasets. The idea behind the workshop is to ingest data from a website, perform some initial analyses to get a sense for what's in the data, then structure the data to fit a Scikit-Learn model and evaluate the results. Although the repository does give advice as to what types of ma...

View more on Benjamin Bengfort's website »

Like • 0 comments • flag

Published on May 02, 2016 13:27

Graph Analytics Over Relational Datasets with Python

The analysis of interconnection structures of entities connected through relationships has proven to be of immense value in understanding the inner-workings of networks in a variety of different data domains including finance, health care, business, computer science, etc. These analyses have emerged in the form of Graph Analytics -- the analysis of the characteristics in these graph structures through various graph algorithms. Some examples of insights offered by graph analytics include findi...

View more on Benjamin Bengfort's website »

Like • 0 comments • flag

Published on March 11, 2016 19:22

Graph Analytics over Relational Datasets

The analysis of interconnection structures of entities connected through relationships has proven to be of immense value in understanding the inner-workings of networks in a variety of different data domains including finance, health care, business, computer science, etc.
These analyses have emerged in the form of Graph Analytics -- the analysis of the characteristics in these graph structures through various graph algorithms. Some examples of insights offered by graph analytics include findi...

View more on Benjamin Bengfort's website »

Like • 0 comments • flag

Published on March 11, 2016 19:22

A Practical Guide to Anonymizing Datasets with Python & Faker

If you want to keep a secret, you must also hide it from yourself.

— George Orwell 1984

In order to learn (or teach) data science you need data (surprise!). The best libraries often come with a toy dataset to illustrate examples of how the code works. However, nothing can replace an actual, non-trivial dataset for a tutorial or lesson, because only that can provide for deep and meaningful exploration. Unfortunately, non-trivial datasets can be hard to find for a few reasons, one of wh...

View more on Benjamin Bengfort's website »

Like • 0 comments • flag

Published on March 02, 2016 12:40

An Introduction to Machine Learning with Python

For the mind does not require filling like a bottle, but rather, like wood, it only requires kindling to create in it an impulse to think independently and an ardent desire for the truth.

— Plutarch On Listening to Lectures

The impulse to ingest more data is our first and most powerful instinct. Born with billions of neurons, as babies we begin developing complex synaptic networks by taking in massive amounts of data - sounds, smells, tastes, textures, pictures. It's not always gr...

View more on Benjamin Bengfort's website »

Like • 0 comments • flag

Published on January 21, 2016 14:33

Parameter Tuning with Hyperopt

This post will cover a few things needed to quickly implement a fast, principled method for machine learning model parameter tuning. There are two common methods of parameter tuning: grid search and random search. Each have their pros and cons. Grid search is slow but effective at searching the whole search space, while random search is fast, but could miss important points in the search space. Luckily, a third option exists: Bayesian optimization. In this post, we will focus on one implement...

View more on Benjamin Bengfort's website »

Like • 0 comments • flag

Published on September 21, 2015 11:50

Time Maps: Visualizing Discrete Events Across Many Timescales

Discrete events pervade our daily lives. These include phone calls, online transactions, and heartbeats. Despite the simplicity of discrete event data, it’s hard to visualize many events over a long time period without hiding details about shorter timescales.

The plot below illustrates this problem. It shows the number of website visits made by a certain IP address over the course of 7 months. It was built from discrete event data. The height of each bar is the number of events that occurred...

View more on Benjamin Bengfort's website »

Like • 0 comments • flag

Published on September 03, 2015 12:52

The Age of the Data Product

We are living through an information revolution. Like any economic revolution, it has had a transformative effect on society, academia, and business. The present revolution, driven as it is by networked communication systems and the Internet, is unique in that it has created a surplus of a valuable new material - data - and transformed us all into both consumers and producers. The sheer amount of data being generated is tremendous. Data increasingly affects every aspect of our lives, from the...

View more on Benjamin Bengfort's website »

Like • 0 comments • flag

Published on May 20, 2015 08:02

Markup for Fast Data Science Publication

A central lesson of science is that to understand complex issues (or even simple ones), we must try to free our minds of dogma and to guarantee the freedom to publish, to contradict, and to experiment.

— Carl Sagan in Billions & Billions: Thoughts on Life and Death at the Brink of the Millennium

As data scientists, it's easy to get bogged down in the details. We're busy implementing Python and R code to extract valuable insights from data, train effective machine learning...

View more on Benjamin Bengfort's website »

Like • 0 comments • flag

Published on April 20, 2015 16:26

Modern Methods for Sentiment Analysis

Sentiment analysis is a common application of Natural Language Processing (NLP) methodologies, particularly classification, whose goal is to extract the emotional content in text. In this way, sentiment analysis can be seen as a method to quantify qualitative data with some sentiment score. While sentiment is largely subjective, sentiment quantification has enjoyed many useful implementations, such as businesses gaining understanding about consumer reactions to a product, or detecting hateful...

View more on Benjamin Bengfort's website »

Like • 0 comments • flag

Published on March 31, 2015 07:15