Benjamin Bengfort's Blog

March 31, 2017

Data Exploration with Python, Part 3

This is the third post in our Data Exploration with Python series. Before reading this post, make sure to check out Part 1 and Part 2!

Preparing yourself and your data like we have done thus far in this series is essential to analyzing your data well. However, the most exciting part of Exploratory Data Analysis (EDA) is actually getting in there, exploring the data, and discovering insights. That's exactly what we are going to start doing in this post.

We will begin with the cleaned and...

 •  0 comments  •  flag
Share on Twitter
Published on March 31, 2017 12:47

March 11, 2017

Basics of Entity Resolution

Entity resolution (ER) is the task of disambiguating records that correspond to real world entities across and within datasets. The applications of entity resolution are tremendous, particularly for public sector and federal datasets related to health, transportation, finance, law enforcement, and antiterrorism.

Unfortunately, the problems associated with entity resolution are equally big — as the volume and velocity of data grow, inference across networks and semantic relationships be...

 •  0 comments  •  flag
Share on Twitter
Published on March 11, 2017 10:55

February 7, 2017

Data Exploration with Python, Part 2

This is the second post in our Data Exploration with Python series. Before reading this post, make sure to check out Data Exploration with Python, Part 1!

Mise en place (noun): In a professional kitchen, the disciplined organization and preparation of equipment and food before service begins.

When performing exploratory data analysis (EDA), it is important to not only prepare yourself (the analyst) but to prepare your data as well. As we discussed in the previous post, a small amount of pre...

 •  0 comments  •  flag
Share on Twitter
Published on February 07, 2017 00:04

January 12, 2017

Forward Propagation: Building a Skip-Gram Net From the Ground Up

Editor's Note: This post is part of a series based on the research conducted in District Data Labs' NLP Research Lab. Make sure to check out the other posts in the series so far:

NLP Research Lab Part 1: Distributed Representations NLP Research Lab Part 2: Skip-Gram Architecture Overview

Let's continue our treatment of the Skip-gram model by traversing forward through an single example of feeding forward through a Skip-gram neural network; from an input target word, through a pr...

 •  0 comments  •  flag
Share on Twitter
Published on January 12, 2017 15:20

December 31, 2016

Ten Things to Try in 2017

2016 marked a zenith in the data science renaissance. In the wake of a series of articles and editorials declaiming the shortage of data analysts, the internet responded in force, exploding with blog posts, tutorials, and listicles aimed at launching the beginner into the world of data science. And yet, in spite of all the claims that this language or that library make up the essential know-how of a "real" data scientist, if 2016 has taught us anything it's that the only essenti...

 •  0 comments  •  flag
Share on Twitter
Published on December 31, 2016 02:25

December 29, 2016

Data Exploration with Python, Part 1

Exploratory data analysis (EDA) is an important pillar of data science, a critical step required to complete every project regardless of the domain or the type of data you are working with. It is the exploratory analysis that gives us a sense of what additional work should be performed to quantify and extract insights from our data. It also informs us as to what the end product of our analytical process should be. Yet, in the decade that I've been working in analytics and data science, I&...

 •  0 comments  •  flag
Share on Twitter
Published on December 29, 2016 08:31

December 9, 2016

Exploring Bureau of Labor Statistics Time Series

Machine learning models benefit from an increased number of features — “more data beats better algorithms”. In the financial and social domains, macroeconomic indicators are routinely added to models particularly those that contain a discrete time or date. For example, loan or credit analyses that predict the likelihood of default can benefit from unemployment indicators or a model that attempts to quantify pay gaps between genders can benefit from demographic employment sta...

 •  0 comments  •  flag
Share on Twitter
Published on December 09, 2016 07:32

December 4, 2016

The Trends Behind What's Trending

Editor's Note: This article highlights one of the capstone projects from the Georgetown Data Science Certificate program, where several of the DDL faculty teach. We've invited groups with interesting projects to share an overview of their work here on the DDL blog. We hope you find their projects interesting and are able to learn from their experiences.

Producing online content that goes viral continues to be more art than science. Often, the virality of content depends heavily on cul...

 •  0 comments  •  flag
Share on Twitter
Published on December 04, 2016 19:10

Python Exception Handling Basics

Exceptions are a crucial part of higher level languages, and although exceptions might be frustrating when they occur, they are your friend. The alternative to an exception is a panic — an error in execution that at best simply makes the program die and at worst can cause a blue screen of death. Exceptions, on the other hand, are tools of communication; they allow the program to tell you what, why, and how something went wrong and then they gracefully terminate your program without dest...

 •  0 comments  •  flag
Share on Twitter
Published on December 04, 2016 08:22

November 5, 2016

Getting Started in Open Source

I really, honestly love programming... I also love collaborating, exchanging ideas, learning better and faster ways to accomplish things that I'm already familiar with or, even better, learning completely new things that broaden my horizons as a developer or person... I enjoy getting feedback from friends - or programmers I'm building friendships with on GitHub, hearing their thoughts about my code and what I could have improved.

But what shouldn't come with the territory - and n...

 •  0 comments  •  flag
Share on Twitter
Published on November 05, 2016 06:35