Monday, 19 March 2018

Disruptive Work

Just over a year ago I instigated a move in my company to stop recording bugs. It was a contentious move, however it seemed that for the context in which we worked - tracking bugs just didn't seem to fit. I'd planned to write about it at the time but wanted to see the results first. In light of some recent internal discussion in my company in favour of re-instigating bugs, now seems like as good a time as any to write about what we do instead.

A scheduling problem

My current company, River, has historically worked on an agency basis. Rather than having development teams devoted specifically to one product or feature area, each team will have their time allocated on a sprint-by-sprint basis to one of the client programmes that we develop and maintaining software for.

This presents a number of challenges to effective agile development, probably the most significant being scheduling work. When product teams work in agile sprints they commit to delivering a backlog of work for each sprint. Inevitably I my experience sometimes things didn't always go as planned and work may need to be brought into a sprint that has come in urgently from another channel such as a Proof of Concept (POC) or the customer support team. In product teams I've worked on there was always an understanding across the business that these disruptive items would impact on the teams ability to tackle its scheduled work and subsequent sprints would be affected. As all of the sprints related to the same product this was manageable - we may have had to reduce the scope of a long term release deliverable, but for the most part the work could be absorbed as an overhead of the long term development work of the team.

The challenge we faced at River with the agency model was that for any given sprint for a team there was a good chance that the work would be completely unrelated to what they had done in the previous one. Any items being raised through other channels such as ad-hoc customer requests or support tickets may have come from a different programme from the one that was the subject of the active sprint and therefore the emergent work could not simply be slotted in to the team backlog at the appropriate priority level.

We saw some challenging tensions between undertaking new planned developments and maintaining existing programmes. I'd personally struggled to make progress on two high profile projects due to the impact of disruptions from existing ones, so I was keenly aware of the impact that unplanned work could have. A particular problem that I'd encountered was issue snowballing. As one sprint was disrupted by unplanned work, it would inevitably have a knock on effect as promised work would be rushed or carry over and impact a later iteration. Timescales on the new programme would get squeezed resulting issues on that programme, which would consequently come in as disruptions which impacted on later sprints..

Being Careful with metrics

Last year across the development group we set about establishing a set of meaningful goals around improving performance. Years of working in software testing provides me with a strong mistrust of metrics and I was keen to avoid false targets when it came to striving to improve the quality of the software. I'd never suggest that counting bugs provided any kind of useful metric to work to, however I did feel that examining the amount of time spent on bugs across the business could provide us with some useful information on where we could valuably improve and so I set about investigating use of bugs across the company.

What I found on examining the backlogs of the various teams was that each team took a different approach on recording bugs. Some teams would capture bugs for issues they found during development, others would restrict their 'bug' categorisation for issues that had been encountered during live use. Some teams went further and raised nearly everything as a user story - yielding such interesting titles as "As a user I don't want the website to throw an error when I press this button".

I and the group involved in looking at useful testing metrics took a first principles approach by taking a step back from what to track on bugs and actually understanding the problem that we were trying to solve. Whilst obviously the absence of bugs was a desirable characteristic in our programmes, more importantly was reducing the amount of disruption encountered within teams relating to software that they weren't working on at the time, whatever the cause and so we decided that looking at disruptions would give us more value than looking at bugs alone.

The Tip of the Iceberg

What became apparent was that the causes of disruptions went far deeper than just classic 'bugs' or coding flaws. The work that was causing the most disruption included items such as ad-hoc demands for reports from customers that we weren't negotiating strongly on, or changes requested as a result of customers demanding additional scope to that which had been delivered. I would have been personally happy to consider anything that came in this way as a bug, however I'm politically aware enough to know that describing an ad-hoc request from an account director as a 'bug' might be erring on the side of bloody-mindedness.

The approach that we decided to take was to categorise any item that had to be brought into an active sprint (i.e. something that had to impact the current sprint and could not be scheduled into a future sprint for the correct programme) as 'disruptive'. The idea was that we then track the time spent in the teams disruptive items and then look to reduce this time by targeting improvements, not just in the coding/testing, but in all of the processes that impacted here including prioritisation, product ownership and customer expectation management. A categorisation of a 'bug' was insufficient to identify the range of different types of disruptive work that we encountered. We therefore established a new set of categories to better understand where our disruptive backlog items were coming from :

  • Coding flaws
  • Missing behaviour that we hadn't previously identified (but the customer expected)
  • Missing behaviour that we had identified but hadn't done (due potentially to incorrect prioritisation)
  • Missing behaviour that would be expected as standard for a user of that technology
  • Performance not meeting expectation
  • Security vulnerability
  • Problem resulting from customer data

Each item raised in the backlog would default to be a 'normal' backlog item, but anything could be raised as 'disruptive' if a decision was made to bring it in to a team and to impact a sprint in progress.

Did it work?

After over a year of capturing disruptive work instead of bugs we are in a good place to review whether it worked. Typically and frustratingly the answer is both yes and no.

Some of the things that did really help

  • No-one argued any more about whether something was a bug. I have gone over a year in a software company without hearing anyone argue about whether something was a bug or not. I cannot overstate how important I think that I has been for relationships and morale.
  • We don't have big backlogs of bugs. The important stuff gets fixed as disruptive. The less important stuff gets backlogged. At my previous company we built up backlogs of hundreds of bugs that we knew would never get fixed as they weren't important enough, treating all items as backlog items avoids this 'bug stockpiling'.
  • All mistakes are equal. I've always disliked the situation where mistakes in coding are categorised and measured as bugs but mistakes in customer communication that have far bigger impact are just 'backlog'. There is a very murky area prompting some difficult conversations when separating 'bugs' from refactoring/technical debt/enhancements. These conversations are only necessary if you force that distinction.

What didn't work so well

  • People know where they are with bugs. In many cases they are easy to define - many issues are so clearly coding flaws that categorising them as bugs is easy and allows clear conversations between roles simply down to familiarity with bug management.
  • There was still inconsistency. As with bugs, different teams and product owners applied subtly different interpretations of what disruptive items were. Some were treating any items that impacted on their planned sprint as disruptive, even if they related to the programme that was the subject of that sprint, others only raised "disruptives" if the work was related to a different programme.
  • The disruptive category led to a small degree of gaming. Folks started asking for time at short notice to be planned into the subsequent sprint rather than the current one. This was still significantly disruptive to planning and ensuring profitable work, however it could be claimed that the items weren’t technically "disruptive items" as the time to address them had been booked in our planning system.

In Review

Right now the future of disruptive items is uncertain as one of the product owners in my team this week raised the subject of whether we wanted to consider re-introducing bugs. Although I introduced the concept I'm ambivalent on this front. Given the problems that we specifically faced at the time, tracking disruptive items was the right thing to do. Now that we have a larger product owner team and some more stability in the scheduling disruptive work is not the debilitating problem than it was. At the same time I'm not convinced that moving back to 'bugs' is the right move. Instead my inclination is to once again go back to first principles and look at the problems that we are facing now, and track appropriately to address those, rather than defaulting to a categorisation that, for me, introduces as many problems as it solves.

Whatsapp Button works on Mobile Device only

Start typing and press Enter to search