In this episode of Defrag This, we share stories from the trenches of IT and how we found unlikely solutions to troubleshoot business networks.
Troubleshooting Business Networks - Part 1
Greg: Hey, everyone. Welcome to this week's episode of "Defrag This." I'm your host, Greg Mooney. This is Episode 4. Today, we gonna be talking about troubleshooting your business networks. You'd be surprised with the amount of variables, that can ultimately take down parts of your entire network, from shadow IT to microwaves knocking out your Wi-Fi. Network issues come from the strange places. I'm joined today with a seasoned IT vet and Principal Product Manager at Ipswich, Jim Cashman. Welcome, Jim.
Jim: Glad to be here, Greg.
Greg: Yeah. So we're gonna share some of our stories from the trenches of IT and how we ultimately found solutions to these myriad of problems out there. Some of the stories may be humorous, I know mine are, while some of them may be some of the most traumatizing moments of our lives. Mine is actually kinda both.
So, Jim, we'd love to get a synopsis of your background in IT for the audience. What's your experience?
Jim: Sure. I was an IT Director for over 20 years. I worked for many small companies, some small engineering firms, where I was a one-man shop. I did a lot of everything. And a bulk of my career was spent at a software startup, and I got there in the early phases, where I was the first IT hire, and, you know, got to build their systems and their networks through really rapid growth where they went from a venture-funded startup through to being bought for an awful lot of money from a large French conglomerate.
So that really gave me so much experience, you know, in building networks all over the world and staffing and so on and so...budgeting, a lot.
Greg: So you were pretty much just thrown into the fire.
Jim: Thrown into the fire, and it was a lot of fun. You know, I was responsible for helpdesk, storage, servers, networking, telephony, security, you know, budgeting, hiring, firing, etc.
Greg: Yeah. I guess my story is a little different. I had been working for this guy in this audio-electronics startup. It was basically a lean manufacturing facility, but it was such a small startup that we basically had to build everything from the ground up. So that also included doing IT. So we were wiring things to pipes and drilling holes in the ground just setting up the systems.
And, basically, from there, I started doing a lot of consultant work. This was right around 2008, so I was right out of college, so there was no work around for somebody like me. So it's like "Oh, I guess I'm gonna go work for a staffing agency and just do random IT jobs."
The one I was at the longest was this architect firm where I did a lot of system administration. And obviously, when I first started working there, I didn't know what the heck I was doing.
Jim: Sure, sure. So did you do a lot with AutoCAD?
Greg: We actually had a sotfware engineer...not a software engineer, but a software support guy, who I worked with. So he dealt more with those issues, like AutoCAD. Yeah, they were using the whole Autodesk suite.
Jim: Yeah, because in the first two jobs I had, they were engineering firms, and, you know, in addition to being the IT guy, I was expected to be the AutoCAD guru for the firm. And I just loved that product and even created lots of little programs to help automate things that the engineers did. It was a really rewarding experience.
Greg: Yes, that was actually where I first started using PowerShell and, you know, actually doing scripting. So I set up these scripts where basically, it would open up the program and then run through the 3D imaging. It basically plugged in all these data points into the actual program automatically. And then, at the same time, it would also be monitoring the hardware usage of the machine. And, like, it was basically just kind of like having three programs run at the same time, one of them being the actual...you know, the log monitor, and it would log how the machine was performing and all that stuff. It was actually extremely complicated, being the first script. Maybe I should have started off witheopening a file on the desktop.
Jim: Well, it's funny you bring this up, because I hate to say something that's just like what you said. But I had an occasion in the engineering firm where I had a server that was misbehaving. And I thought I fixed it, and I wanted to make sure I fixed it, but I knew the problem only happened under load. And so what I did was I created a couple of simple scripts for AutoCAD that would basically, you know, run AutoCAD, load the drawing with a bunch of external references, close it, regenerate, load it again, close it, load it again, and regenerate, etc. And then it was in a big batch file loop, which just kept going and going and going. And then I had that little job running on 20 workstations at once. I did this, you know, late at night one time. So I think we had a similar, a very similar experience.
Greg: Seems like it, interesting.
Jim: The funny thing was was I put the system under all that stress, and everything worked great. And then I said, "Okay, I've solved the problem." When I came in the next day by about 9:30, the server had crashed again. So I hadn't found the problem. But such is life.
Greg: Yep. You know, we have to kinda fall flat on our faces to learn something every once in a while. I guess I wanna bring out the craziest... Well, the most traumatizing moment I guess I ever had was I had this switch... So I used to ine up all these computers under the desk to image them before I actually sent them out to users, because we had a very stringent testing protocol where we had to test all the programs to make sure everything worked. I'm sure most guys out there know that when you don't test something and then you deploy it into production, something is usually gonna break. You know, I had to learn this the hard way.
But, anyways, I had this little switch where I just plug into the network, and then I would be able to image it from my computer, and then I would just let that run. So there was a lot of cables under there. And one time, I accidentally looped one of the cables. And we didn't have spanning tree protocols, STP, on the network.
So, basically, when you set up a loop like that, it essentially takes down the entire network. And we were scrambling for two hours, people sending tickets and screaming at us, like "I can't do my job. What the hell is going on?"
I'm not thinking I did anything wrong because I didn't think I plugged something in wrong. I'm just trying to backtrack my day. I'm like "Oh, what did we install? What did we do that could have caused this."
The sysadmin I was working with, he, like, went down. He looked under the desk. He pulled it out. The whole thing came back on. And this was like an hour and a half of panic. And, essentially, I had inadvertently looped a signal on the switch, and I'm like "Oh, well, I don't know if that actually brought back the network," blah, blah, blah. And the CIO was like "Oh, that was you, wasn't it?" Like, she was pissed. But she couldn't log in to get the log files off the switch because she forgot the password. So I was like "You don't have any proof it was me." So I got away with it.
Jim: I don't know why she would be so concerned other than the fact that the problem was fixed. I used to have similar problems back in the day, you know, before wireless. You know, when everybody started showing up in conference rooms with laptops, you never had enough network connections. So, invariably, you'd put a little, you know, Netgear, a little blue eight-port switch in. And, yeah, on occasion, people would loop those just by mistake, you know, just the regular business users. But, yeah, as you say, the switches these days will pick that up and shut it down, but not 15 years ago.
Greg: Yeah, this was probably back in 2010. Yeah, it was a really old infrastructure. Like, we still used the tape decks and all that stuff, which I think that a lot of people still use today. But, to me, it just seemed really old-fashioned. They had machines from the '90s still running in the server room.
So what are some of the craziest things you've run into as an IT professional?
Jim: Well, I was thinking about that question, and, you know, a few things come to mind, some, more serious than others, you know, some, very simple. You know, you mentioned at the beginning of this about, you know, how microwave ovens can wreak havoc with things. I had a hardware issue like that.
I had an engineer who kept complaining that his network communication would go out periodically from the box. And I noticed when I was in his office that his screen always showed, you know, kinda funny patterns, something, you know, electro-mechanic or electromagnetic was going on in the office. And I couldn't figure out for the longest time. And, you know, I brought people in to take a look at..is there something wrong with the network jack? Is there something wrong with the way that the power plug is cabled and so forth? And it was kinda spotty when it happened, so it was a hard problem to diagnose.
What I wound up doing to kinda prove to the building ownership that there was something physically wrong in the room, I actually put the guy's PC on a cart with the screen. But I plugged it in the room, and I rolled the cart out of the room still plugged in. And when it was outside in the hallway, everything was fine. And when I rolled it back into the room, everything was not fine.
Anyway, to make a long story short, what it turned out to be is it wasn't the wall. They had wired the office with the wrong kind of wires. I recall they wired it with aluminum. I was pretty sure that stuff was illegal to run wires. You know, I thought wires all had to be copper. But, anyway, they had an electrician come in, change the wires in the wall, and the problem went away. So that was one.
Another story that comes to mind is I was the subject of a cable cut. I think all IT Directors worry about cable cuts, something out in the street where you lose your internet connectivity. You know, the canonical example is the backhoe in the street and some guys are working and they cut cables on the street. Well, this was a little bit different. So I got woken up in the middle of night by one of our international staff who said, you know, "I can't log into the office."
And what happened was I did a quick check. I wasn't able to dial in on the telephone to our main PBX through the digital circuits, but I was able to dial in on the analog circuits. And I wasn't able to get in on the T1. So I said, "Ha! Whatever has happened has just happened to the digital stuff." And what happened was that some thief was trying to break into a medical clinic that was in the same building that we were in. And he took a hacksaw to a bundle of cables coming into the building thinking he was cutting out the alarm circuits, which is kind of funny in the end, because he cut out all the digital circuits, but the analog circuits were coming in a different bundle, and he didn't get those. And that's generally how the alarm alerts.
Greg: I guess he learned the hard way too.
Jim: Yeah, he learned the hard way. But I was able to diagnose that from my home and figuring out "Oh, it's the digital circuits. I know where that comes in." I was able to call the phone company in the middle of the night. And, you know, by the time I got to work the next morning, there was already a Verizon truck, one of those trucks that has the fiber...I don't know what you call it...splicing where they actually take the fiber into the truck and do the splicing. So that was already onsite.
Greg: I think that beats my story. I don't know if I can beat that one.
Jim: Yeah, no, that was good. I guess the other lesson I would take from the trenches is I was often asked to solve what I call "people problems with technology." And I'll give you two examples of that.
Greg: Is this a PICNIC? They call it PICNIC. It's problem in chair, not...
Jim: Oh, PEBCAK, yeah. No, this is more management. So in two different cases, I had one guy come to me. He was he was the manager of the customer service department. And he felt that his staff was extending their lunch hour playing solitaire. So he would find them at 1:15 in the afternoon still playing solitaire. And he came to me, and he said, "I want you to take all the games off of all my people's computers." And, you know, I thought about it for 30 seconds. And I said, "I'm not gonna do that. The appropriate way to deal with this problem is to tell your employees to not play games beyond their lunch hour."
And so, you know, that was a case of a people problem where somebody was trying to solve it using technology. It's what I'd call passive-aggressive as opposed to just acting like an adult and telling one of your employees to knock something off, which occasionally has to happen, right?
Greg: Yeah, but it sounds like a cultural issue on the team more so than anything.
Jim: So similarly, another manager came to me and said, you know, "I think one of my employees is surfing to bad websites." You know, back in the day of the early internet, people used to talk about "Oh, I can surf on the internet and find out how to make a bomb." This guy was actually looking at how to build weapons on the web.
Greg: Oh, while at work?
Jim: Yeah, while at work. This was in the late '90s. And she said, "Oh, you've got to lock him away from doing such and such." Again, it was a startup, and we had a freewheeling culture, a lot of freedom. And I said, "I'll do no such thing. I'm not gonna, you know, start to worry about filtering this guy." I said, "All you need to do is go and talk to him and see what's going on. Tell him to knock it off."
Greg: And maybe get him some therapy too.
Jim: Perhaps, yeah. He didn't last long. But, anyway, so I always cautioned IT people to not try that sort of passive-aggressively use of technology to solve what are management problems and to err on the side of giving your users freedom and assume that 99.9% of the people that work at a given company are pretty honest, hardworking folks.
Greg: No, that's a fair assumption. So what were some of the most creative solutions you had to come up with to fix the extremely strange problem?
Jim: Yeah. Well, one thing that comes to mind...I don't know if this is a strange problem. It's a fairly typical problem, but I think I solved it in a creative way. It was in the early days of the Internet, and we were doing file downloads for our software product. We were a software company. And, you know, in the early days of the Internet, hosting and datacenters were just coming online, and you usually did everything right out of your building over your T1 or T3.
So, you know, we had a file server that served out our software. And it was getting overtaxed. And we knew with every release, the period of time that our T1 was pegged got longer and longer. And I said, you know, "The next time we have a big release, this thing is gonna be pegged for weeks, and everybody is gonna be upset."
So we did a contract with a hosting company, one of the earlier ones that specialized in file downloads, a company called Connection. But once I got everything out there, I needed to learn how to manage it. It was a geographically distributed set of servers. There were three Sun servers. And I had to do some clean up, some sysadmin, if you will, on a daily or hourly basis, to clean up some temp files and things like. But I wanted to control when I did it. And I didn't wanna control it with cron jobs on the actual Sun servers.
So I actually wrote little web apps. They're not web apps. They were Perl scripts that responded at a webpage that did this system administration work for me. And I actually used WhatsUp Gold, our monitoring product, as, you might say, a job scheduling engine. So WhatsUp Gold, you know, I had it monitor these webpages on a periodic basis. And just simply by hitting the webpage, they were running this script. It was returning some information. It was putting it into the logging of WhatsUp Gold, so kinda using WhatsUp Gold as a job engine more than just a monitoring engine.
For those that think I was creating all sorts of security holes, I had it fairly well tied down so that these webpages could only be accessed from a certain machine on my network, and that was that. But it worked really well for me. And that way, I could keep all my scheduling of these sysadmin tasks within a piece of software I was comfortable with and didn't worry so much about the Sun servers.
I should also add, by the way, now that I think of it, one of the reasons why I did that was these were shared servers, and I did not have access to alot on those particular servers at the time. I had very limited access.
Greg: Access to?
Jim: To the actual Sun servers to, like, perform this file maintenance. So I had enough rights from within the web server to work on files within my little piece of the file tree. But beyond that, I didn't have much permission at all. So I needed to get crafty and actually, you know, create a script as a webpage, hit the webpage on a periodic basis. Yeah, it's kinda hacky. But, you know, back in the early days of the internet, those are the things we did.
Greg: Yeah. And I'm sure there's plenty of stuff that people do nowadays that are, you know, just like that. I mean, just thinking about all the scripts that IT teams maintain nowadays, I mean, it's kind of the same thing. It's not always, you know, using solutions that they're not even built to be actually used for. I mean, that's just being crafty. That's what you need to do as an IT person. I mean, rule number one is troubleshooting and figuring that stuff out.
Jim: And I would offer out to anyone who's listening to this podcast that's a What'sUp Gold user, if you're using the product in nonstandard ways, you know, reach out to us. Let us know. We'd love to hear it. It could be that the way you're using it other customers would like, and maybe we can add the functionality directly to the software to make it a little easier for you to do what you're trying to do and then to let other customers do the same thing.
Greg: No, that's a great point. If any of you guys are out there, you can always email us at firstname.lastname@example.org with questions? We'd also like to hear your creative solutions, like Jim said. And we have the ability to have people call in here, and I really wanna test that functionality with our new little podcast radio station here.
Jim: Sure you don't want me to tell you about my security problems? It's a great story.
Greg: How about this? We will take a short break, and then we will come back for the security problem, all right? So stay tuned, folks, to "Defrag This." We'll be right back.
Troubleshooting Business Networks - Part 2
Greg: All right, welcome back to Defrag This, part two of episode four. I'm here with Jim Cashman and we were just about to get into security challenges.
We've talked about a lot of the other challenges of troubleshooting business networks but obviously nowadays, security is probably one of the most important jobs of any IT team, especially if you work in a regulated industry. I know a lot of our customers work in health care, finance, and government, so security's a big problem to tackle. And so Jim, what's one of the biggest security challenges you've ever had to overcome?
Jim: Well, one I recall was a pretty big deal at my company and it goes back to this idea that, you know, these things often are people problems and not necessarily technology problems that you need to be aware about. So I once had my security guy, who was responsible for my firewalls, go rogue. He was trying to run some sort of business on the side, which I didn't know about, and he had some spat with his partner and the partner called me and said that he believed that my security guy was attempting to hack him from within my network. And the partner even got the FBI involved, so it really became...
Greg: Oh, wow...
Jim: Yeah, it became...
Greg: Became a mess.
Jim: ...a big mess. And you know, I had no reason to doubt my security guy. I hired him, he seemed like a very upstanding fellow. But when he started to tell me his story, to try to...you know, I went to him, I said, "You know, this guy has called me and he's saying this. Does this mean anything to you?" And, you know, the hair on the back of your neck goes up and something about his story just didn't seem right. And then I said, "Well, let's take a look at the firewall logs and see, you know, is somebody in this company, you know, let's see what they're doing." And he said he'd deleted all the firewall logs accidentally.
Greg: Oh, that's strange.
Jim: You know, while he was investigating. And so this can get really bad, when your security guy does this, and it causes you to ask, "Who's watching the watcher?"
Greg: Now were you in a regulated industry at this time?
Jim: No, no. This was the software startup.
Greg: Yeah, okay.
Jim: Now in terms of the firewall logs, I knew that I had worked with him in the previous months to set up a way to back up our firewall logs, just for something like this. And take like, say, a daily backup of the logs and put them on another server in the network in case there was some issue like this. And I was almost 100% certain that he had actually done this. He and I had talked about it. I hadn't checked it at the time, but I said, "Where are the backups? And he said, "Oh, you know, I know we talked about it, but I never got the chance to set that little process up."
Greg: Oh, that's very convenient.
Jim: So again, no firewall logs. But he failed to cover one of his tracks. And there is an odd log file in older versions of Windows and it might still be there...and if I recall correctly, it wasn't in the normal event logs. It was some strange log, just sitting in system 32 or something, that kept track of changes that a user would make to the scheduled jobs part of Windows that, you know, you can set up scheduled daily, hourly jobs. And so I found this log. He knew nothing about it and I did. And when I looked at it, the log told me that there was a job on that system the day before.
Greg: To back everything up.
Jim: Set up to back everything up. So I was correct in that he had set it up and then, covering his tracks, he got rid of the logs, he got rid of the backups, and he was smart enough to take the actual scheduled job off. But what he didn't realize is that the fact that he deleted that scheduled job was also logged someplace else and that's what I found.
Greg: Now does the log actually tell you which, you know, which user or which...
Jim: Well, at the time, since we weren't requiring anybody to sort of login in order to get out of the company, it was basically all done on an IP address basis.
Jim: But we would've known...Oh, you're asking about this particular log?
Greg: Yeah, yeah.
Jim: I thought you were talking about firewall logs in general. This particular log, I don't remember what it showed. It may have shown the user. He would've probably logged in as an admin account anyways, but it definitely showed that somebody logged in and deleted this scheduled job.
Greg: So but you essentially caught him in a lie because he said he never had backed it up to begin with.
Jim: Exactly. And it's like, who else was gonna do this? You know, he was the only other guy that had the technical level of expertise to even be working with these kinds of things. And so I'm like, "Who else would've done this, if it weren't you?"
Greg: Caught red-handed.
Jim: So I confronted him and he finally realized the game was up and admitted to it but, you know, he made me twist, as it were, with the FBI for a good week. He watched me go around with the FBI and do investigations and looking for this and looking for that and, you know, I was pretty angry with him.
I'm a very nice guy and give people second and third chances and so forth, but here I had a guy who let me spin, you know, twist in the wind for a week, trying to resolve this problem, when he knew that he had done it. And in addition, he's your firewall guy and he's deleting firewall logs, which is kind of, it's like a doctor trying to kill his patient kind of thing, you know? So we had to let him go, there was just no other way.
Greg: Oh, of course. Yeah.
Jim: I don't think there was any legal repercussions. I have a feeling that the whole FBI involvement just went away. But I think the moral of this story is though, is that security incidents, most of the time, I've found are internal to your company. They're people you know, they're processes you know.
Greg: Yeah, and I mean that also relates, even if it is a hacker, too, usually the way they get in is from social engineering. So...
Jim: That's right.
Greg: ...it's somebody from within the company clicking on something they shouldn't be clicking on...
Jim: That's right.
Greg: ...or doing something, whether they know it or not, yeah.
Jim: Yeah, everybody thinks...
Greg: It's 9 times out of 10.
Jim: Everybody thinks it's some rocket scientist who's found some secret backdoor in a checkpoint firewall, and that's rarely the case.