Uh oh. My product is failing. Repeatedly 😬

Brian Schoolcraft
Sep 25, 2024
2 min read

We’ve shipped product to customers. Now we find out if it actually works!

We don’t want to expect them to fail, but if we’re planning well, we’re prepared for it. What does that look like? Here are a few key areas to keep in mind when things get messy:

Be a collector - Get all the parts “on the table”

Be obsessive about collecting field failures. Bring them back to the lab to study, and try to identify failure modes.

Be patient - Don’t rush to classify the issues

If I have ten failures at the same time, it’s quite tempting to assume they all have the same root cause. I find it best to stay curious, and let the evidence speak for itself.

Be transparent - Make it easy to gather information

Don’t create barriers to communication, and don’t allow communication to happen in silos. Field techs, service engineers, and development engineers should be able to see (but not be bothered by) all the chatter for every failure.

Be systematic - Figure out how to turn the problem ON and OFF

This is our confirmation that the root cause we’ve identified is correct. If we can turn it on and off in the lab, we’ve got the information we need to start figuring out how to fix it.

Be responsible - Assign someone to own the process

Collecting, summarizing, and classifying failure data should be someone’s responsibility - not everyone’s. This might be a single person for all field issues, or possibly a team of people each focused on their own failure mode. A distributed approach is rarely capable of keeping issues from falling through the cracks.

There’s no silver bullet to fix a reliability issue, but focusing on these five areas will go a long way to smooth things out and get to a fix fast.

-Brian Schoolcraft

Comments