Michael Feathers's Blog
May 29, 2012

If you’re a subscriber to Safari Books Online, you can soon see a copy of a rough cut of NoSQL Distilled. I can see the book on my subscription, but others are having problems. I’m not sure why but am told that it should appear for everyone in the next few days. In any case the rough cut will only be available for subscribers with a subscription that includes rough cuts.
It’s a rough cut, so it’s still early in the production process, with few copy-edits done. But it gives you an early chance to take a look at the text and to pass on comments.
Safari Books Online is an online library that gives access to lots of software books, including those of Pearson and O’Reilly. They have various levels of subscription, only some of which include rough cuts. For example there are two levels of individual subscription: bookshelf and library. Only the (more expensive) library individual subscription gives access to rough cuts.
May 24, 2012
Retread of post orginally made on 14 Nov 2008
Let's imagine a pretty world of SOA-happiness where the computing
needs of an enterprise are split into many small applications that
provide services to each other to allow effective collaboration. One
fine morning a consumer service needs some information from a supplier
service. The twist is that although the supplier service has the
necessary data and processing logic to get this information, it
doesn't yet expose that information through a service interface. The
supplier has a potential service, but it isn't actually there yet.
In an ideal world the developers of the consumer service just asks
the supplier service to develop the potential service and all is
dandy. But life is not ideal - the sticking point here is that the
developers of the supplier service have other things to do, usually
things that are more important to their customer and management than
helping out the consumer service team.
Recently I was chatting with my colleague Erik Dörnenburg and he
told me about an approach he saw a client use to deal with
this problem. They took a leaf out of the open source play-book and
made all their services into internal open source systems. This
allows consumer service developers write the service themselves.
I'm sure many readers are rolling their eyes at the visions of
chaos this would cause, but just as open source projects don't allow
just anyone to edit anything; this client uses open-source-style control
mechanisms. In particular each service has a couple of custodians -
people whose responsibility it is to keep the service in a healthy
state. In the normal course of events the consumer developer wouldn't
actually commit changes to the supplier source tree directly,
instead they send a patch to the custodian. Just like an open-source
maintainer, the custodian receives the patch and reviews it to see
if it's good enough to commit. If not there's a dialog with the
consumer developer.
As Erik knows well from his own open
source work, reviewing a patch is much less effort than making
a change yourself. So although the custodian approach doesn't
entirely eliminate the problem of consumer developers needing to wait
on supplier developers, it does a lot to reduce the difficulty. And
again following the open-source model, a consumer developer can be
made a committer once the custodians are comfortable. This
still means that commits can get reviewed by the custodians, but avoids
the custodians becoming a bottleneck.
Related to this was their approach to a service registry. We've
seen a lot of fancy products being sold to provide service registry
capabilities so that people can lookup services and see how to use
them. This client discarded them and used a
HumaneRegistry instead.
reposted on 24 May 2012
May 22, 2012
Here’s the information on my talks in China (the page is in Chinese). I’ll be giving talks in Chengdu 成都 on the 10th and 11th, Wuhan 武汉 on the 12th and Xi’an 西安 on the 13th.
May 21, 2012
Over the last couple of years, we’ve seen mobile development become an increasing part of our work at ThoughtWorks. One question that clients regularly have is how to make the decision about which devices to support and what proportion of effort should go to each device. In this article Giles Alexander outlines two opening gambits - laser and cover-your-bases - and how to choose between and build on these approaches.

Pramod and I now have a cover for our book NoSQL Distilled. Obsessive readers of my web site may recognize the photo.
Next month (June 7) I’ll be speaking at a ThoughtWorks Quarterly briefing in Singapore. I’ll be doing a suite of talks, one of which I expect will be NoSQL-oriented. Also on the bill is Vivek Prahlad, a long-term colleague at ThoughtWorks, who I’m looking forward to catching up with. This will be the my first time in Singapore (other than the airport).
I’m then going on to talks in China: Chengdu, Wuhan, and Xi’an. I’ll post information on those when I get some URIs.
May 10, 2012
Retread of post orginally made on 06 Sep 2004
I've heard a couple of questions recently about coming up with a
standard story point mechanism for multiple teams using extreme
programming's planning approach. The hope is have several teams all
using equivalent story points, so that three story points of effort
on one team is the same as on another.
I think trying to come up with this at best of limited value, and
at worst dangerous.
The estimating system of extreme programming is based on
XpVelocity and YesterdaysWeather. Inherent
in this is the idea that when you make estimates, the actual units
you estimate aren't important - what's important is you estimate
by rough comparative value and use YesterdaysWeather
for calibration.
In this situation the story points act as an anchor for the
feedback loop that Yesterday's Weather provides - nothing more.
Baked into them are all sorts of assumptions about the nature of
the team's task, the capability of the team, and whether the team
are optimistic or pessimistic estimators. Once you try to come up
with a standard across teams you are trying to normalize all of
these factors. Trying to do this sounds very hard to me, and I'm
not aware of anyone who has done this effectively. This doesn't
mean its impossible, it just means it's hard.
The dangerous aspect comes from once you have a standard unit for
measurement across teams, someone is inevitably going to use it to
compare the performance of teams. Even if everyone swears till they
are blue in the face that they won't use it for cross team
measurement, there will always be the suspicion that this will
happen eventually. This will cause teams to distort their
measurements so that it seems that they get more story points
done. My fear is that this will break the feedback loop of
yesterday's weather and knock the planning process off kilter. I'm
always suspicious about these things because while it would be
incredibly valuable to have a way to measure productivity I think
the nature of software is such that we CannotMeasureProductivity.
So to be worth trying, this has to yield some valuable benefits -
but I don't see any. One reason that I've heard is to help people
move onto teams and estimate more quickly. But you can't estimate on
a new team until you get reasonably familiar with the problem and
the current code base. As you do that you'll also get a feel for the
relative sizes of story points on that team. We move people around
between projects at ThoughtWorks and I've never heard anyone
complain about difficulty of estimating when coming onto a new team
due to differences in story points.
reposted on 10 May 2012
May 8, 2012
While I was at the QCon conference in London a couple of months
ago, it seemed that every talk included some snarky remarks about
Object/Relational mapping (ORM) tools. I guess I should read the
conference emails sent to speakers more carefully, doubtless there
was something in there telling us all to heap scorn upon ORMs at
least once every 45 minutes. But as you can tell, I want to push
back a bit against this ORM hate - because I think a lot of it is unwarranted.
The charges against them can be summarized in that they are
complex, and provide only a leaky abstraction over a relational data
store. Their complexity implies a grueling learning curve and often
systems using an ORM perform badly - often due to naive interactions
with the underlying database.
There is a lot of truth to these charges, but such charges miss a
vital piece of context. The object/relational mapping problem is
hard. Essentially what you are doing is synchronizing between
two quite different representations of data, one in the relational
database, and the other in-memory. Although this is usually referred
to as object-relational mapping, there is really nothing to do with
objects here. By rights it should be referred to as
in-memory/relational mapping problem, because it's true of mapping
RDBMSs to any in-memory data structure. In-memory data structures
offer much more flexibility than relational models, so to program
effectively most people want to use the more varied in-memory
structures and thus are faced with mapping that back to relations
for the database.

The mapping is further complicated because you can make changes
on either side that have to be mapped to the other. More
complication arrives since you
can have multiple people accessing and modifying the database
simultaneously. The ORM has to handle this concurrency because you can't just rely
on transactions- in most cases, you can't hold transactions
open while you fiddle with the data in-memory.
I think that if you if you're going to dump on something in the
way many people do about ORMs, you have to state the alternative.
What do you do instead of an ORM? The cheap shots I usually hear
ignore this, because this is where it gets messy. Basically it
boils down to two strategies, solve the problem differently (and
better), or avoid the problem. Both of these have significant
flaws.
A better solution
Listening to some critics, you'd think that the best thing for a
modern software developer to do is roll their own ORM. The
implication is that tools like
Hibernate and Active Record have just become bloatware, so you
should come up
with your own lightweight alternative. Now I've spent many an hour
griping at bloatware, but ORMs really don't fit the bill - and I say
this with bitter memory. For much of the 90's I saw project after
project deal with the object/relational mapping problem by writing
their own framework - it was always much tougher than people
imagined. Usually you'd get enough early success to commit deeply to
the framework and only after a while did you realize you were in a
quagmire - this is where I sympathize greatly with Ted Neward's
famous quote that object-relational mapping is the Vietnam
of Computer Science[1].
The widely available open source ORMs (such as iBatis, Hibernate,
and Active Record) did a great deal to remove this problem [2].
Certainly they are not trivial tools to use, as I said the
underlying problem is hard, but you don't have to deal with the full
experience of writing that stuff (the horror, the horror). However much
you may hate using an ORM, take my word for it - you're better off.
I've often felt that much of the frustration with ORMs is about
inflated expectations. Many people treat the relational database "like a
crazy aunt who's shut up in an attic and whom nobody wants to talk
about"[3]. In this world-view they just want to deal
with in-memory data-structures and let the ORM deal with the
database. This way of thinking can work for small applications and
loads, but it soon falls apart once the going gets tough.
Essentially the ORM can handle about 80-90% of the mapping problems,
but that last chunk always needs careful work by somebody who really
understands how a relational database works.
This is where the criticism comes that ORM is a leaky
abstraction. This is true, but isn't necessarily a reason to avoid
them. Mapping to a relational database involves lots of repetitive,
boiler-plate code. A framework that allows me to avoid 80% of that
is worthwhile even if it is only 80%. The problem is in me for
pretending it's 100% when it isn't. David Heinemeier Hansson, of
Active Record fame, has always argued that if you are writing an
application backed by a relational database you should damn well
know how a relational database works. Active Record is designed with
that in mind, it takes care of boring stuff, but provides manholes
so you can get down with the SQL when you have to. That's a far
better approach to thinking about the role an ORM should play.
There's a consequence to this more limited expectation of what
an ORM should do. I often hear people complain that they are
forced to compromise their object model to make it more relational
in order to please the ORM. Actually I think this is an inevitable consequence
of using a relational database - you either have to make your
in-memory model more relational, or you complicate your mapping
code. I think it's perfectly reasonable to have a
more relational domain model in order to simplify your
object-relational mapping. That doesn't mean you should always
follow the relational model exactly, but it does mean that you
take into account the mapping complexity as part of your domain
model design.
So am I saying that you should always use an existing ORM rather
than doing something yourself? Well I've learned to always avoid
saying "always". One exception that comes to mind is when you're
only reading from the database. ORMs are complex because they
have to handle a bi-directional mapping. A uni-directional problem is
much easier to work with, particularly if your needs aren't too
complex and you are comfortable with SQL. This is one of the
arguments for CQRS.
So most of the time the mapping is a complicated problem, and
you're better off using an admittedly complicated tool than starting a
land war in Asia. But then there is the second alternative I
mentioned earlier - can you avoid the problem?
Avoiding the problem
To avoid the mapping problem you have two alternatives. Either you use
the relational model in memory, or you don't use it in the database.
To use a relational model in memory basically means programming
in terms of relations, right the way through your application. In
many ways this is what the 90's CRUD tools gave you. They work very
well for applications where you're just pushing data to the screen
and back, or for applications where your logic is well expressed in
terms of SQL queries. Some problems are well suited for this
approach, so if you can do this, you should. But
its flaw is that often you can't.
When it comes to not using relational databases on the disk, there
rises a whole bunch of new champions and old memories. In the 90's
many of us (yes including me) thought that object databases would
solve the problem by eliminating relations on the disk. We all know
how that worked out. But there is now the new crew of NoSQL
databases - will these allow us to finesse the ORM quagmire and give
allow us to shock-and-awe our data storage?
As you might have gathered, I think NoSQL is technology to be
taken very seriously. If you have an application problem that maps
well to a NoSQL data model - such as aggregates or graphs - then you
can avoid the nastiness of mapping completely. Indeed this is often
a reason I've heard teams go with a NoSQL solution. This is, I
think, a viable route to go - hence my interest in increasing our
understanding of NoSQL systems. But even so it only works when the
fit between the application model and the NoSQL data model is good.
Not all problems are technically suitable for a NoSQL database. And
of course there are many situations where you're stuck with a
relational model anyway. Maybe it's a corporate standard that you
can't jump over, maybe you can't persuade your colleagues to accept
the risks of an immature technology. In this case you can't avoid
the mapping problem.
So ORMs help us deal with a very real problem for most enterprise
applications. It's true they are often misused, and sometimes the
underlying problem
can be avoided. They aren't pretty tools, but then the problem they
tackle isn't exactly cuddly either. I think they deserve a little
more respect and a lot more understanding.
1:
I have to confess a deep sense of conflict with the Vietnam
analogy. At one level it seems like a case of the pathetic
overblowing of software development's problems to compare a
tricky technology to war. Nasty the programming may be, but
you're still in a relatively comfy chair, usually with air
conditioning, and bug-hunting doesn't involve bullets coming at
you. But on another level, the phrase certainly resonates with
the feeling of being sucked into a quagmire.
2:
There were also commercial ORMs, such as TOPLink and Kodo. But the
approachability of open source tools meant they became dominant.
3:
I like this phrase so much I feel
compelled to subject it to re-use.
Michael Feathers's Blog


