Weāve shipped product to customers. Now we find out if it actually works!
We donāt want to expect them to fail, but if weāre planning well, weāre prepared for it. What does that look like? Here are a few key areas to keep in mind when things get messy:
Be a collector - Get all the parts āon the tableā
Be obsessive about collecting field failures. Bring them back to the lab to study, and try to identify failure modes.Ā
Be patient - Donāt rush to classify the issues
If I have ten failures at the same time, itās quite tempting to assume they all have the same root cause. I find it best to stay curious, and let the evidence speak for itself.
Be transparent - Make it easy to gather information
Donāt create barriers to communication, and donāt allow communication to happen in silos. Field techs, service engineers, and development engineers should be able to see (but not be bothered by) all the chatter for every failure.
Be systematic - Figure out how to turn the problem ON and OFF
This is our confirmation that the root cause weāve identified is correct. If we can turn it on and off in the lab, weāve got the information we need to start figuring out how to fix it.
Be responsible - Assign someone to own the process
Collecting, summarizing, and classifying failure data should be someoneās responsibility - not everyoneās. This might be a single person for all field issues, or possibly a team of people each focused on their own failure mode. A distributed approach is rarely capable of keeping issues from falling through the cracks.
Thereās no silver bullet to fix a reliability issue, but focusing on these five areas will go a long way to smooth things out and get to a fix fast.
-Brian Schoolcraft
Comments