Friday, September 23, 2016

The 5 whys and escaped software defects

At training recently there was some discussion about using the Toyota "5 whys" line to get closer to the root cause of problems. Though Toyota wasn't mentioned. There was a bit of context around software defects and trying to trace back "further", so that in addition to fixing the immediate problem, find and fix more systemic issues that cause some categories of defects to recur.

it got me thinking a bit. In the past such as at SupportSoft there would be discussion and even tracking and measurement of "escapes", that is defects that made it into the field and were first reported by end users. So the error was missed by all of developer testing, code review, test automation, and the QA test team. I'd always found it strange that all of the focus and serious face talk about escapes was on the test team. The escaped defects were always blamed on QA for "missing" them in their test plans.

We never did further analysis to trace defects back further into the development team or individual programmer who checked in code that didn't work. I wondered why was the "blame" always on QA for escapes. They didn't write the code that didn't work. Oh well. I'd say the "pinning" of escapes on QA is from a mentality where quality is something that a test team either adds or fails to add as the final step in a software development process.

With 5 whys, things change in a more positive way. QA becomes a link in the chain, however if there are escapes then the entire team is accountable including original developers who checked in code that ultimately errored in the field. There was a chain of events that led to a runtime error in the field and QA is one link in the chain.

It's a new mentality, different than other places I've worked. Before it was understood and accepted that defects are a natural, unavoidable byproduct of advancing a software program. Development delivers code with defects, QA identifies the most important defects, development fixes them, dev/QA iterations occur, and everything moves forward and code ships to production.

With 5 whys we implicitly reject this silo model that code thrown over the wall out of development is expected to also include defects for testers to identify. With 5 whys and root cause the expectation becomes far higher initial quality out of development, with few or no defects, and far fewer iterations where software is "sent back" from QA to development for rework. Also there's more of a team approach to responsibility for quality of code delivered into production. Defects are owned by the entire team, they aren't something that testers either find or fail to find as the final step. Quality becomes more pervasive and end to end.

No comments: