Quantcast
Channel: THWACK: All Content - All Communities
Viewing all articles
Browse latest Browse all 13537

The Trouble with Troubleshooting

$
0
0

As a longtime IT professional, responding to problems in systems of one

type or another is old hat. It comes with the job description, and one

tends to develop habits and methodologies early on in your career. That

doesn't mean that I, or anyone else, have developed the best habits,

however, and often our methods are quite ineffective indeed.

 

Standard practice in the industry is for a network operations center (NOC)

to monitor some portion of the network for immediate or impending troubles.

Companies spend millions of dollars on entire rooms filled with

beautiful monitors mounted on walls, desks and workstations built to look

as futuristic as possible, low lights at just the right hue, and

comprehensive monitoring suites to keep track of it all. The trouble is,

this often makes people aware of a problem, but offers nothing in the way

of a troubleshooting methodology or tool to actually fix the problem.

 

As often as not, when some sort of event worth responding to grabs the

attention of a NOC engineer, they either call someone or start a trouble

ticket, or both. The lucky recipient of the aforementioned prodding then

digs into the problem or passes it onto the next person in the chain,

with each successive person having to start over in their own domain

(compute, network, security, etc.) with new tools and limited information.

 

This entire approach may seem logical and even expedient, though I suspect

that's largely due to a little bit of Stockholm Syndrome and the ever

popular "but this is the way we've always done it" argument. I'm not

saying that this is a bad approach--or at least that it hasn't always been

a bad approach--given the historical dearth of cross-silo troubleshooting

tools available on the market. Most of us instinctively knew that this was

inefficient, but didn't have a good sense of what we could do about it.

 

Various tools and paradigms were suggested, developed, sold, and

subsequently put on shelves that attempted to fix the full-stack

troubleshooting void. Comprehensive network tools are one of the

favorites, offering a truly staggering array of dashboards, widgets,

alerts, and beautiful graphics in a noble attempt to present the most

information possible to the engineers tasked with fixing the relevant

problems. Many tools also exist for doing the same thing inside of virtual

environments, or on storage arrays, or the cloud, etc., and many are very

good at what they do... but they don't do what we need, which is to

collapse the silos between IT disciplines into one, unified, system - until

now.

 

Solarwinds NPM, part of the Orion suite of products, has long been the

darling of NOCs everywhere, and with good reason. It is a comprehensive

and well thought out approach to network and systems monitoring.

Collapsing the silos in IT, however, requires more than just a great tool

for the NOC, or even a great tool for the network and systems teams. It

requires a tool which is not only useful for all of these teams, but

preserves the chain of data (of troubleshooting) as it moves between

specialties. In other words, if I'm the systems guy, I want to see the

data that the network team is seeing, and the steps they've taken to

resolve the problem; and I want to see it in the system, not a hastily or

poorly-crafted email, which is the equivalent of tossing a flaming bag of

excrement over the wall on our way out.

 

NPM 12.1 has taken a stab--a good stab--at solving these problems with the

inclusion of a tool called PerfStack. I'll be exploring what this tool can

do, and where in the troubleshooting process it fits, in a series of blog

posts over the coming weeks. I'll likely also toss in some of my own

personal horror stories of troubleshooting problems, as I've had more than my

fair share in my past, and confession is cathartic. In the meantime,

I'd encourage everyone to check out this already fantastic series of posts

on the new tool:

 

https://thwack.solarwinds.com/community/solarwinds-community/product-blog/blog


Viewing all articles
Browse latest Browse all 13537

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>