Recently I was asked to help troubleshoot an issue that has been plaguing a team of system administrators for a period far exceeding a reasonable limit.
My first question to the system administrators was “have you looked at the logs?”. Their expression told me they hadn’t. They became defensive and insisted the logs wouldn’t yield any discernable information.
This exchange reminded me of a time when I was a kid. My older sister recommended a movie she had seen called ‘When a stranger calls’. The plot of the movie is about a girl babysitting. A killer gets into the house and tries to lure the girl upstairs by calling the house from the business line. When she answers, he asks, ‘have you checked the children lately?’ Even though my sister has already seen this movie, she was punching my Dad’s leg screaming at the TV, don’t go upstairs!!!
During the discussion with the administrators, I found myself mentally punching and banging my head on the desk, screaming ‘have you checked the logs?’. Checking the logs is one of the 1st steps of troubleshooting.
When I was in the Navy, part of my technical training was learning the six steps of troubleshooting. After training, I worked on both old crappy and cutting-edge electronic equipment, much of which I had no specialized training on. What I learned is that the six steps can be adapted and applied to any situation in any profession.
So, if you have an issue that needs troubleshooting, follow the Navy’s six steps of troubleshooting (below) and you’ll resolve your issue in short order:
1. Symptom Recognition
2. Symptom Elaboration
3. List Probable Faulty Functions
4. Localize the Faulty Function (This includes checking logs…)
5. Localize the Faulty Component
6. Failure Analysis
**for a description of each step, read how the Navy’s six steps have become a part of academic and professional certification courses here
Not checking the logs and making system changes without empirical data in the hopes of fixing the issue is what we call ‘Easter egging’. Easter egging is trying this (looking under this rock) and trying that (behind that tree) for a solution. The result is changes that neither fix the problem nor get reverted and thus ripple through the system causing unknown, undesirable downstream behavior.
So I ask, ‘Have you checked the logs lately?’