Hussain Abbas’s Kindle Notes & Highlights for Rethinking Productivity in Software Engineering

We argue that there is no metric that adequately captures the full space of developer productivity and that attempting to find one is counterproductive. Instead, we encourage the design of a set of metrics tailored for answering a specific goal.

15%

When we create a metric, we are examining a thin slice of a developer’s overall time and output. Developers engage in a variety of other development tasks beyond just writing code, including providing guidance and reviewing code for other developers, designing systems and features, and managing releases and configuration of software systems. Developers also engage in a variety of social tasks such as mentoring or coordination that can have a significant impact on overall team or organization output.

17%

this omniscient vision of software development work still comes with significant unintended consequences. First, if this monitoring were done at a team or organization level by managers, how would being monitored change developers’ behavior? The effect of being observed so thoroughly might actually result in developers self-monitoring their every action, unintentionally reducing productivity. Even if this were a net increase in productivity, it might also lead to developers leaving the organization, moving to organizations that were a little less like Big Brother.

18%

Such a rich model of productivity would be incredibly powerful if developers, teams, and organizations were a relatively stable phenomena to model. But new developers arrive all the time, changing team dynamics. Teams disband and reform. Organizations decide to enter a new market and leave an old one. All of these changes mean that the phenomena one might model are under constant change, meaning that whatever policy recommendations our rich model might suggest would likely need to change again in response to these external forces. It’s even possible that by having such a seamless ability to ...more

This explains that even if we measure productivity accurately both in quantitative and qualitative terms, it is still not helpful to the end goal of improving productivity.

18%

regardless of how accurately or elaborately one can measure productivity, the ultimate bottleneck in realizing productivity improvements is behavior change.

18%

And if our productivity utopia relies on developer insight into their own productivity to identify opportunities for individuals to change, why not just focus on developers in the first place, working with them individually and in teams to identify opportunities for increased productivity, whatever the team and organizational goals? This would be a lot cheaper than trying to measure productivity accurately, holistically, and at scale. It would also better recognize the humanity and expertise of the people ultimately responsible for achieving productivity.

18%

A focus on developers’ experiences with productivity also leaves room for all the indirect components of productivity that are far too difficult to observe, including factors such as developers’ motivation, engagement, happiness, trust, and attitudes toward the work they are doing. These factors, likely more than anything...

This highlight has been truncated due to consecutive passage length restrictions.

19%

One of the core principles of agile development is to create customer value . Hence, many aspects of agile development aim to focus on this value generation. One example is the evolution from continuous integration to continuous delivery [13], i.e., to deliver value to customers not at the end of the project or a sprint but continuously.

19%

Another aspect related to productivity brought in by agile development was the counting of story points and the calculation of velocity as the number of story points per sprint. However, many proponents of agile development recommend not to use this measure of velocity as a productivity measure because it can lead to unwanted effects. For example, Jeffreys [15] states, “Velocity is so easy to misuse that one cannot recommend it.” The effects can include that story points are inflated instead of used as a means to identify too large stories and keeping developers from working on stories with a ...more

20%

The difference between efficiency and effectiveness is usually explained informally as “efficiency is doing things right” and “effectiveness is doing the right things.” While there are numerous other definitions [12], an agreement prevails that efficiency refers to the utilization of resources and mainly influences the required input of the productivity ratio. Effectiveness mainly aims at the usefulness and appropriateness of the output as it has direct consequences for the customer.

21%

We add in the PE Model that productivity is expressed as the combination of effectiveness and efficiency: a team can be productive only if it is effective and efficient! We would neither consider a software team productive if it was not building the features needed by the customers nor if it spent an unnecessary amount of effort on building the software. For effectiveness, we need to consider the purpose, functionality, and quality of the software. For efficiency, we further consider costs.

22%

The three dimensions in the proposed productivity framework for software engineering are as follows: Velocity: How fast work gets done Quality: How well work gets done Satisfaction: How satisfying the work is

23%

Given a particular high-level productivity goal, a common desire is to derive specific metrics that track such a goal. Unfortunately, going from goals to metrics is not trivial as metrics are typically proxies for specific aspects of a goal. One technique to bridge this divide is to have an intermediate state under consideration. For example, the goal-question-metric (GQM) approach for understanding and measuring the software process [1, 2] works by first generating “questions” that define goals and then specifying measures that could answer those questions.

23%

Inspired by the way that the HEART framework involves both splitting by dimensions and breaking down from goals to metrics, we propose splitting into goals, questions, and metrics in combination with the productivity dimensions and lenses. This technique can guide the development of specific questions and metrics toward the concrete productivity goals identified.

24%

When the dimensions framework is used with GQM, it may not be immediately evident to the researcher or practitioner what should be framed as a goal and what should be framed as one or more questions, as a goal could be stated as a research question or vice versa. As mentioned earlier, the HEART framework offers an alternative of using signals instead of questions. We have found it useful in practice to iteratively break down productivity measures along these three dimensions, and GQM is one approach for this.

25%

Research shows that team productivity is actually bounded not by how efficiently individual developers work but by communication and coordination overhead [5]. This is partly because teams work only as fast as decisions can be made, and many of the most important decisions are not made individually but collaboratively. However, this is also because even for individual decisions, developers often need information from teammates, which studies have shown is always one or two orders of magnitude slower to obtain than referencing documentation, logs, or other automatically retrievable content

26%

While it’s easy to assume that each individual in an organization might have to concern themselves with only one of these lenses, studies of software engineering expertise show that great developers are capable of reasoning about code through all of these lenses [5]. After all, when a developer writes or repairs a line of code, not only are they getting an engineering task done, they’re also meeting a team’s goals, achieving an organization’s strategic objectives, and ultimately enabling an organization to test its product’s value proposition in a market.

26%

Individuals, teams, organizations, and markets need their own metrics because the factors that affect performance at each of these levels are too complex to reduce to a single measure. I actually believe that individual developers, teams, organizations, and markets are so idiosyncratic that each may need its own unique measures of performance that capture a valid notion of their work output (productivity, speed, product quality, actual versus plan, etc.).

27%

Another refinement of these outcome-oriented techniques is using organizational economic output as the outcome, such as a company’s earnings. The main advantage of this approach is that economic output is arguably the most direct measure of productivity, at least at a large scale—if a developer’s work does not produce profit directly or indirectly, are they really being productive? The disadvantages of this approach is that, as Ramírez and Nembhard point out, tracing profits down to individual knowledge workers is difficult and also that present economic output is not necessarily indicative of ...more

29%

A productive knowledge worker is one who enjoys and is enthusiastic about their work, finds meaning and purpose in their work, is not continuously stressed, is appreciated, has a work-life balance, finds the work atmosphere pleasant, and resolves conflicts with co-workers quickly.

29%

There are four main techniques for measuring knowledge worker productivity: outcome-, process-, people-, and multi-oriented productivity measurement techniques. There are five categories of drivers that knowledge worker research suggests influence productivity: the physical environment, the virtual environment, the social environment, individual work practices, and well-being at work.

34%

Our taxonomy of factors influencing software development productivity is extremely diverse. The technical factors range from detailed product factors, such as execution time constraints, to general environment factors such as the use of software tools. The soft factors have been investigated on the corporate, team, project, and individual levels. For specific contexts, it will be necessary for practitioners to look into each of these factors in more detail. We hope that this chapter can be used as a starting point and checklist for productivity improvement in practice.

40%

Controlled experiments are designed to test a specific hypothesis, but there are challenges with designing the experiment so that it has ecological validity. Cognitive models offer a theoretical framework for explaining why and how things happen (e.g., how interruptions affect productivity), but these models can be complex and difficult to develop. Observational studies offer a rich description of situated activity, but these studies are resource intensive and can produce an overwhelming amount of data of which to make sense.

43%

But is it the case that happy software engineers = more productive software engineers1? Moreover, are perks the way to go to make developers happy? Are developers happy at all? These questions are important to ask both from the perspective of productivity and from the perspective of sustainable software development and well-being in the workplace.

45%

The third most frequent cause of unhappiness is to work with bad code and, more specifically, with bad code practices. Developers are unhappy when they produce bad code, but they suffer tremendously when they meet bad code that could have been avoided in the first place.

47%

the link between happiness and productivity in software development is real. It is possible to quantify the happiness of software developers, and there are distinct patterns in the causes and consequences of their happiness.

50%

However, if we look at the first principle of “individuals and interaction over processes and tools,” we see a shift. The processes and tools created to structure the agile delivery were used to micromanage the software developers’ work in all the small details. We can view the global agile principles in our case as an algorithmic machine, with specific input and output features. The input measures are the numbers, the hours, and the deliverables deadlines, which are then used to push people to maximize their efforts. Given the tools and processes of agile, the remote client is able to monitor ...more

51%

When software developers complain that they have to attend a meeting at 10 p.m. and are not able to leave work to pick up sick children, they are not complaining about agile development per se. Instead, they are complaining about the lack of power and decision-making within the organizational setup. Agile development works well for software developers in Scandinavia, Northern Europe, and United States because the software teams are powerful and privileged. When clients demand agile development from software developers elsewhere, those developers are not empowered. Instead, the power to choose ...more

52%

Surprisingly, many developers still felt productive despite the high number of context switches. The follow-up interviews with the developers revealed that the cost of context switches varies. The cost or “harm” of a context switch depends on several factors: the duration of the switch, the reason for the switch, and the focus on the current task that is interrupted. A short switch from the IDE to respond to a Slack message is usually less costly than being interrupted from a task by a co-worker and discussing a topic unrelated to the main task for half an hour. Also, short context switches, ...more

53%

In another effort to group developers with similar perceptions of productivity together, we asked participants to describe productive and unproductive workdays, rate their agreement with a list of factors that might affect productivity, and rate the interestingness of a list of productivity measures at work. We found that developers can be clustered into six groups: social, lone, focused, balanced, leading, and goal-oriented.

54%

By knowing the trends of developers’ perceived productivity and the activities they consider as particularly productive/unproductive, it might be possible to schedule the tasks and activities developers must perform in a way that best fits their work patterns. For example, if a developer is a morning person and considers coding particularly productive and meetings as impeding productivity, blocking calendar time in the morning for coding tasks and automatically assigning afternoon hours for meeting requests may allow the developer to best employ their capabilities over the whole day.

59%

rumination may involve repeated thinking that “I am worthless, I am a failure,” supplemented by recall of experiences, such as a poor evaluation of a piece of work you delivered. This thinking repeatedly intrudes into a person’s consciousness, thereby making it difficult for them to concentrate, one of the major complaints that depressed people are suffering from. Sticky mind-wandering can take the form of recurrent worries, for example, about not being good enough, about their children, their future, and so on.

61%

Contributing to a software project requires a multitude of different kinds of awareness, ranging from high-level status information (e.g., What is the overall status of the project? What are the current bottlenecks?) to more fine-grained information (e.g., Who else is working on the same file right now and has uncommitted changes? Who is affected by the source code I am writing at the moment?). Awareness includes both short-term, momentary awareness (awareness of events at this particular point in time, such as the current build status) and long-term, historical awareness (awareness of past ...more

62%

Software projects may go through different stages in their development cycle. According to our study participants, these variabilities from project to project make it difficult to devise any uniform, one-size-fits-all measurement system that would work across different project contexts and distinct development workflows

62%

Another problem perceived by our study participants is that measures can be gamed so that any automatic system aimed at measuring productivity would be potentially exploitable. This applies in particular to simple measures such as the number of issues or number of commits: “A poor-quality developer may be able to close more tickets than anyone else, but a high-quality developer often closes fewer tickets but of those few, almost none get reopened or result in regressions. For these reasons, metrics should seek to track quality as much as they track quantity.”

63%

When we asked our participants about how to automatically detect such unexpected events, several examples were mentioned, in particular related to the commit history: “Commits that take particularly long might be interesting. If a developer hasn’t committed anything in a while, his first commit after a long silence could be particularly interesting, for example because it took him a long time to fix a bug. Also, important commits might have unusual commit messages, for example including smileys, lots of exclamation marks, or something like that…basically something indicating that the developer ...more

63%

Automated methods for aggregating awareness information are likely to produce quantitative over qualitative information since aggregating numbers (e.g., the number of issues per developer) is much easier than aggregating textual information (e.g., what kinds of issues a developer is working on). Unsurprisingly, measures such as lines of code and number of issues open/closed are available in most development tools, but many developers in our study found them too limited to be used as awareness information and worried that such simple numbers may act as a proxy of their productivity. In short, ...more

This is changing now with LLMs but effectiveness is a concern.

63%

While tools that help developers make sense of everything that goes on in a software project are necessary to enable developer awareness, these tools currently favor quantitative information over qualitative information. To accurately represent what goes on in a software project, awareness tools need to focus on summarizing instead of measuring information and be careful when presenting numbers that could be used as an unintended proxy for productivity measures. We argue for the use of natural language and text processing techniques to automatically summarize information from a software ...more

64%

Software developers produce many textual artifacts, ranging from source code and documentation to bug reports and code reviews. Therefore, it is unsurprising that dashboards used in software projects often combine different types of data , i.e., qualitative and quantitative data. A bar graph showing the number of open issues grouped by team would be a simple example of quantitative data, whereas a tag cloud of the most common words used in bug reports is a simple representation of some of the qualitative data present in a software repository.

64%

When creating a dashboard for a software project, many considerations have to be taken into account; e.g., should the dashboard feature enterprise-wide data or just data from a single project (bearing in mind that projects tend not to be independent)? Should each developer have their own personalized dashboard, or do all dashboards from a project look the same? In addition, dashboards can cover different timespans, such as the entire lifetime of a project, the current release, or the last week. In software projects, one week is not necessarily like any other. For example, development activity ...more

66%

Goodhart’s law—usually cited as “When a measure becomes a target, it ceases to be a good measure”—describes another risk of the use of dashboards in software development projects. For example, if a dashboard emphasizes the number of open issues, developers will become more careful about opening new issues, e.g., by combining several smaller issues into one. Similarly, if a dashboard conceptualizes productivity as time spent in the IDE, developers might become hesitant to look up information outside of the IDE.

66%

“Developers are the most capable people on Earth to game any system you create.”

66%

While the goal of an organization might be long-term value creation, dashboards often use relatively short time spans. Values such as customer satisfaction are not readily extractable from a software repository, even though they might actually align with the organization’s goal much better than the number of open issues in a project or time spent in the IDE.

68%

Software functionality consists of functional processes that must respond to events outside the software, detected by or generated by its functional users (defined as the “senders or intended recipients of data”). Functional users may be humans, hardware devices, or other pieces of software. Software does only two things. It moves data (entering from its functional users and exiting to them across the software boundary and from/to persistent storage), and it manipulates data.

70%

if we are ever to understand software productivity and use the measurements for estimating purposes, then we need a plausible, repeatable, technology-independent measure of work-output. The COSMIC method meets this need; sizes may be measured at any point in the life of a piece of software. It is up to each organization to determine the problem it is trying to solve and then decide for itself how and when to apply the COSMIC method and how to use the resulting measurements.

70%

The method’s fundamental design principles are valid for all time. The method definition [2] is mature and has been frozen for the foreseeable future. Automatic COSMIC size measurement is already happening. As a further consequence of the universality of the method’s underlying concepts, measured sizes should be easily understood and therefore acceptable to the software community whose performance is measured.

72%

Functional size is a measure of the amount of functionality provided by the software, derived by assigning numerical values to the user practices and procedures that the software must perform to fulfill the users’ needs, independent of any technical or quality considerations. The functional size is therefore a measure of what the software must do, not how it should work. This general process is described in the ISO/IEC 14143 standard. The COSMIC method measures the occurrences of Entries, Exits, Reads, and Writes

72%

Benchmarking can also be done with an inward focus. The most common example of this type of benchmarking is the comparison of velocity in the last sprint to the velocity in previous sprints. The objective is usually to learn from earlier sprints what can be improved to reach a higher velocity. In Chapter 3, Amy J. Ko performs a thought experiment to argue that we should focus on good management rather than productivity measurement. The effects that good management will have on productivity are true for most successful organizations we have encountered. But the only way to prove that good ...more

74%

Benchmarking needs to be done on objects that can be normalized to be truly comparable. In software development this means a sprint, a release, a project, or a portfolio. You should not be benchmarking individuals. Why? The simple answer is that there is no way to normalize people.

76%

Up-front designs can be based on incorrect or out-of-date assumptions, leading to expensive rework especially in rapidly changing circumstances. However, rushing into implementation can produce ineffective emergent designs, also leading to rework. Despite the emphasis on responsiveness in agile development, designers struggle to backtrack on important decisions and features

See a Problem?

Preview — Rethinking Productivity in Software Engineering by Caitlin Sadowski