Michael Feathers's Blog
December 13, 2013
Datensparsamkeit is a German word that's difficult to translate
properly into English. It's an attitude to how we capture and store
data, saying that we should only handle data that we really
These days there's a lot of hype around the idea of Big Data -
and with it the notion that we should capture and store every bit of
data we can get our hands on. We might not have an immediate use for
the contacts our users store in their address books, but we'll ask
for it anyway in case it comes in useful later. We'll record every
click on our website and squirrel it away in case we want to trawl
it later. We set up our smartphone app to ask for location information so
if we come up with some way to use that data later, we can. After
all, storage is cheap - so why not?
The problem with the "capture-it-all" approach is that it raises
serious questions of privacy. Even if we trust ourselves to not
abuse the data we collect, each data store represents a target for
criminals or government surveillance agencies. This issue is
particularly fraught in Germany which has seen successive regimes
where governments have carried out extensive surveillance of their
citizens in order to control them. Germany consequently has strong
data privacy laws.
Datensparsamkeit  is a concept from these privacy laws that is an
opposite philosophy to "capture-all-the-things". A translation isn't
straightforward (which is why I've retained the German word) but
loosely you might translate it as something like "data austerity",
"data minimization", "data parsimony", or "data frugality". It means
that you should always ask yourself why you are capturing or storing
data, and look to handle only the minimum amount of data you need
for your purpose.
An example of this is tracking users on your web site to
determine how many unique visitors you have. If the same person
accesses several pages within a few hours, you want to count that as
one visit. If they visit several times a month, you still only want
to count them as a single visitor. One way to do this is to log
IP addresses, you count each IP address as a single person . But an IP address is very revealing, and could be
used for much more than counting vistors. Datensparsamkeit suggests
that you shouldn't store the IP address directly, perhaps instead
you should hash it and only store the hash.
A similar example involving IP addresses is using them to infer
demographic information such as region and country. You can get most
of this information and practice datensparsamkeit by just logging the first
three octets of the IP address.
Datensparsamkeit isn't just about bad people stealing data, it's
also about your relationship with the primary company themselves.
The default attitude at the moment is that any data you generate is
not just freely usable by the capturer but furthermore becomes their
valuable commercial property. Privacy advocates,
including me, think this assumption needs to be changed. Companies
should only capture what they need and the burden of demonstrating
need should fall on them. In addition, of course, they must be
completely transparant about what they capture, what they store, and
who they share their data with. Any breaches of data security must
be immediately publicized (instead of covered up, which is the
Even if you don't share my views on personal control of our own
data, the risks of security breaches mean that datensparsamkeit is a
wise course of action. If you hold data that you don't need, and
someone steals it and causes damage, shouldn't you be liable for
that damage? Even if there's no legal liability the publicity will
have serious consequences - and thus there is risk for anyone who
doesn't practice datensparsamkeit.
introduced me to Datensparsamkeit. The meme "… all the things"
seems to have been around forever (at least a decade) so I'm glad
Korny Sietsma taught me that it started in 2010.
Here's some help on pronunciation
I realize that with Network Address Translation, things are
rather more involved than this, but I wanted a simple example.
December 10, 2013
After I finished Refactoring, I created a small web site at refactoring.com to act as an online summary of the refactorings in the book. It’s mainstay is a catalog which provides a brief summary of each refactoring. I built it around 2000 and until today it still was very much a product of the late 90’s. Over the last few weeks I’ve changed its appearance, made a more dynamic catalog page with some interactive filters, redrawn the diagrams, and added some deep links to the relevant pages of Safari Books Online. I’ve also added links and refactorings from the ruby edition of the book.
November 12, 2013
Brandon Byars concludes his article on using REST-style services for enterprise integration by describing how to coordinate features across services. His advice is to use the agile planning notions of stories and epics - using stories within a service and epics to coordinate across services. When doing this it’s important to track progress at the program level because you only deliver business value when all the services involved in an epic are done.
November 10, 2013
November 8, 2013
A common mistake in service-oriented architectures is to let a single service own all of the data about widely used entities, such as products or customers. In his latest installment of enterprise REST Brandon Byars explains why this is an error and shows how to use the DDD notion of bounded contexts for a better approach.
October 31, 2013
Brandon Byars’s article on enterprise REST now turns his attention to using consumer-based testing to find integration problems. This is the approach, counter-intuitive to many, where we get service consumers to write tests which are incorporated into the service’s deployment pipeline.
October 25, 2013
Brandon Byars continues his article on using REST-style services for enterprise integration by discussing versioning. Or more to the point, explaining why you shouldn’t use versioning unless you really need to. You can reduce the pain of versioning by using techniques like tolerant readers, semantic versioning, and writing stories at service boundaries.
October 21, 2013
My colleague Brandon Byars has been involved in several projects in large enterprises where they are trying to upgrade and replace their legacy systems. A strategy that’s been useful in these situations is using REST-style web-services as an integration mechanism. Brandon is writing an article to discuss the lessons learned from this, which talks less about things like HATEOS and more on the role of environment management, versioning, and contract evolution. This first edition of the article talks about the importance of defining logical environments, and he’ll be publishing more sections to the article over the next few weeks.
October 11, 2013
From time to time, I've written on this site about the
problematic DiversityImbalance in the software
development profession, and how we need to take deliberate action to
increase the proportion of underrepresented groups. This is all well
and good, but naturally leads to the questions of what
underrepresented groups we should be concerned about. In
ThoughtWorks we've been using the term
"historically-discriminated-against"  to
help focus our thinking for one of the main drivers for embracing
Humanity has a consistently sad record of pushing groups of
humans down. Historically-discriminated-against groups include women
pretty much everywhere, African-American and Native-Americans in the
United States, lower castes in India, aboriginals in Australia,
homosexuals everywhere… sadly the list is long.
Historically-discriminated-against groups are often minorities,
but not always. Blacks in South Africa have always been a
considerable majority, but are historically-discriminated-against.
Often historically-discriminated-against people are visibly
different, based on race or gender. But they can equally well not be
as visible, such as religious groups or homosexuals.
The historic discrimination is the essence of why we should work
to support fixing problems. One might argue that the
under-representation of men in nursing is as problematic a diversity
imbalance as that of women in tech. While I don't see a lack of male
nurses as a good thing, I think it's less of a concern because
men have not suffered the historic discrimination that women have.
Similarly if someone discovered that green-eyed people were
disproportionately rare in the software industry, again my concern
would be less because of the lack of historic discrimination.
(Although I would still find such a disproportion intriguing.)
There's an important point about the "historical" in this
definition. Many historically-discriminated-against groups have
legal and social protections from discrimination these days. In
America it's now socially unacceptable to make public racist or sexist
comments, and illegal to carry out most forms of racist or sexist
discrimination. But you can't cure centuries of historic wrongs in
just a few years. This is the fallacy of people who advocate for
gender-blind and race-blind policies. It takes many generations to
undo the effect of centuries of discrimination, so just because the
law and society are beginning to catch up doesn't mean the work of
supporting historically-discriminated-against groups can stop.
So when we see only 27%
of software developers are women in world of 50% women, the fact
that women are historically-discriminated-against is evidence that
the effects of their historical oppression haven't yet been
corrected. While some may argue that there's a biological
explanation for the lack of women programmers, I consider that to be
a treacherous argument. There's no evidence to support that women
are less capable programmers (other than the circular one of their
lack of numbers). Worse still, such biological and cultural
arguments have a long history of being used to justify
discrimination. So unless credible evidence appears, I think it's
wise to consider that an underrepresentation of a
historically-discriminated-against group is a sign that we haven't
yet finished the task of correcting a long-running wrong. And until
we do, our vision of a meritocratic profession will be undermined by
the reality of the imbalance.
The precise term was coined by Bill Kimmel inspired by one of
our values statements of Social Responsibility: "We strive to
redress historic discrimination, including that of race, gender
and sexual orientation.". Similar phrases appear in various
parts of the world, but this is the origin of our usage.
There are many aspects to diversity, which is why "diversity" is
such a tricky word to work with. I often see articles extolling
the benefits of diverse teams, where this diversity is looking
for diversity of thought. This is valuable but different from a
focus on the historically-discriminated-against. Fortunately
these various aspects of diversity usually go together..
October 8, 2013
A few months ago, I bought a Google
Nexus 7 tablet. I like to wait until I've used a device for
a while before I post my experiences of it, but the disadvantage of
that policy is that now the tablet I'm talking about has been
superseded. That said, I'll pass on my comments anyway, since they
may still be helpful to others considering their future tablet
My driver for getting this device was two-fold. I got an Apple
iPad just a couple of months after it first appeared.
It's been a constant companion, but its age is showing. It's not
just that it cannot run anything newer than IOS 5 (that's not a big
deal to me), the major problem is that many websites will crash the browser
these days (which I gather is due to memory limits). The second
rationale was experimentation, I wanted to try Android  and also try the smaller 7" tablet form factor.
On the whole I really like the nexus 7. The base Google UI
is a touch better than Apple's. I miss the
cross-application back button when I go back to the iPad. I also
prefer Android's approach to completion in typing where they give
you three words to choose from and don't automatically complete with
the space bar (which regularly annoys me with IOS).
My feelings are more mixed about the 7" tablet size. I find that
most of the time I mildly prefer the 7" due to its smaller size and lighter
weight. However some of the time I strongly prefer the 10" size
because of the bigger screen. There are times - some websites, pdf
documents, or books with code or graphics - when that bigger screen
is essential. The consequence of this is that despite the advantages
of the smaller size I end up taking the 10" device on my travels for
those occasions where I need the larger screen. If I could only have
one tablet, it would have to be a 10", but I do use the 7" more
often at home.
The other downside to Android lies in the applications. There are
some nice applications for IOS that I can't find alternatives I like
on the Android. To be fair this could be due to familiarity and a
need to spend a bit more time searching. I need to put more effort
into checking out what's available before I decide whether to use
Apple or Android when I buy a 10" device to replace my iPad.
I first got an Android device at the Google IO conference in
2009 (where Rebecca and I spoke
about cloud computing). This was what's since been called
Ion. I enjoyed having a smart phone and the experience
led me to upgrade my phone account to handle 3G - but the only
way to do that was to get an iPhone, so that was the end of that