Long time.. How do you like your fiber… cut?

So it’s been a while.. I was really getting in to doing a post once a day, then life/work got in the way and totally blew the wind out of my sails..

Anyway.. There’s been a ton that has happened since my last post.. The most interesting thing I ran into over the last few weeks was a fiber cut for a major ISP..

For those of you that don’t know how networking is setup, there are different ways to get your data traffic around, Copper (Ethernet, Coax), Fiber and Air (WiFi, P2P Long distance Directional Wireless). A week or so back we had two clients call in and report that their internet was down.. Well, the good news was we knew their internet was down due to our in-house monitoring system (gosh is it awesome..).

So.. after an hour or two of nothing coming back up, I gave a call to their provider, in this case, was Vinakom.. While I normally have no issues with a company selling T1’s and charging 300+ a month for them, what I DO have an issue is, when you have clients in Evanston, and Elgin IL and both of their connectivity to the world goes down.. Wait.. the whole world?!

Yes. Locations >35 Miles away from each other lost connectivity, they were 100% down.. When I called the ISP was greeted promptly, but was told that one of their Fiber lines were cut, that it was minimal and would only effect a few clients.. hah wait, only a few!? While the woman answering the phone was nice, I was NOT impressed. How can you as an ISP have a single fiber cut but downplay the effects on their clients! Ugh.

Needless to say, the outage lasted 7 hours, it wasn’t until 4:30pm that the connectivity came back…

Minor Fiber cut my butt.. To this day I still believe Vinakom is only peering with one provider, this is not only stupid but an idiotic thing to do.. especially when it is your lifeblood.

Either that.. or the pipe that they used to go to the ISP hotel on 350 Cermak.. now THAT would be hilarious..

Oh well.. alls well that ends well.. I would have advised my clients to jump ship and to use the outage as a way to get out of their contract.. but in both situations they can only get ADSL or a T1.. bummer.

Well that’s all for now, i’ll make it more of a point to get on here and do more updates.. The business of IT changes all the time, and so does the amount of STUFF that we see.

Be back soon.

-b

iPhones.. more reason to hate them..

So for those that know me, they know that I despise apple products. While I respect their creative usefulness, I despise the marketing and the general operation of apple, they just suck.

Today proved to be another straw in my hat for the fight against apple.

As everyone should be aware (in one shape or form), Apple released their new iOS 7 yesterday, woo…. Well, I took a call this morning, a client of ours was reporting intermittent data connectivity issues. So the tech I am, I checked their connection and found extreme packet loss at the Comcast node one hop from their modem. I explained to the client that it looked like comcast was having issues and that they would need to be contacted.

She explained that she had already talked to them, and to her dismay had been told that it was something coming from within their network as the modem looked fine.. Even though it clearly wasn’t. Things like that enrage me, a tech just telling an end user that its their network and they had no issues -_-

So I called Comcast, found that they were having issues at the node above the modem, although the tech did not see any issues with the modem. While I was talking with this tech, I was told that they already had someone dispatched, that they would be at the location in the afternoon. I called back the client and told them that we would just have to see what the Tech saw.

Fast forward to 4:45, I got a call from this same client, she said the tech was onsite and wanted to talk to me. Talking with the tech, he had been doing tests and found that when he unplugged the uplink to our firewall, the connectivity and response times were back to normal levels, the hell right?

Well, since the issue was an issue within the network, I dismissed the tech and started working with the end user. One of the things I was taught with Juniper’s is that they have a superb logging system, at least from my point of view..

I turned on logging on the outside port, and to my surprise, I watched as large data sets ran across going to microsoft and apple IPs. After figuring out that the Microsoft IPs were for checking of windows updates, I narrowed in on the apple ips. The rDNS for the apple ips resolved to a cdn that was dedicated to an iOS 7 distribution, so I pushed a little further, I asked the end user what types of phones they had in the office and she said everyone had an iPhone. Great. I found the culprit, now just to figure out a way to tell the end user that their phones were killing their network.

Lucky for me, the end user I work with at this specific location is great to work with, she is very relaxed even when there are real legit issues. So I explained to her, that it appeared the iPhones were downloading updates, which were causing the network traffic to go crazy. She asked around and checked a few phones, she said that everyone was connected to their wifi, and that there were some that were even in the process of downloading the updates to the system and apps!

So that’s where we end, the end user said she was going to get all of the phones off the network and then we would see how things settle out in the next day or so.

tl;dr iPhone updates suck, they didn’t do rolling updates or anything that would allow them to stagger their updates to help networks, they just pushed it out and expected everyone to understand.. Damn iPhones.

-B

*Sigh* … Hump Day Crazies..

So.. ran out of time and didn’t have time to post my findings.. The highlight of the day was going to a clients office, and going through their pile of old hardware. While normally, that is not an issue or a day maker, I found some interesting items..

First, was a 128MB Palm Pilot SD card.. i haven’t seen cards that small in YEARS… boy did it take me back.. *will post picture later*

Well guess there is no second.. nothing was out of the norm to post about from yesterday…

For today, I got to deal with a customer that had ZERO concept of how their internet connection worked. While normally this doesn’t bug me, however as I explained that their main line went down, so the backup DSL line took over and was what was causing their VPN issues. She proceeded to ask about the first Comcast connection.. as I would tell her that there were no correlations, that the issue was one went down, she would tangent to the other connection and ask why it wasn’t working.. basically a giant circle.. UGH. Did I mention.. this woman worked for a finance company? yeah. Just the type of person I want handling my money.. HAH.

Not so much ranting as just annoyance.. nothing else exciting to report.. gotta put the computer away, been going since 7am this morning 🙁 blah wednesdays..

-B

Encrypted Emails are Encrypted for a Reason..

So.. When you have a client that uses Cisco’s encryption system for emails.. that usually means they are trying to be secure, or just don’t know any better right?

Those 9MB attachments are AWESOME just so you can turn around and forward it off to an email to be decrypted.. wait what? Yup. Gotta love clueless end users. I got to work with one, that had no concept of the term “Encrypted”, instead of opening the attachment and following the directions that Cisco so nicely puts on the page for you.. You’re supposed to skip the directs on how to open the attachment properly and just send it back so that their system does it for you.

While I understand that the reason the email forwarding service exists is because some browsers don’t open the encrypted attachments correctly, so then you need to send to this address to get it decrypted for you, but not even trying to open it properly.. comeon man!

</enduserrants>

Technology is only as “good” as the end user that is utilizing it is, I just love when tech isn’t used effectively.. yeesh..

Anywho, first day back after being in wire-less Georg- I mean, Georgia. Working through everything after coming back is a pain, it’s like I went on vacation and had to restart all of my thought processes from when I went on vacation.. except.. I never went on vacation -_- ooh well.

We had fun with our PRI provider today, and by fun I mean we are ready to go to their office and choke someone out! When you have a 24 Channel PRI, that handles an enterprise amount of faxes, you typically should have a service that won’t just randomly fail during the day.. right?

Apparently not.. What we thought at first was that our windows 2003 server was starting to go on us, so I rebooted it.. Upon reboot we found that there was still a line issue, calling up our provider let me know that they already had a ticket open and someone would call back in a half hour or so.. great.

Fast forward 2 hours, we were still having issues, and we had to actually pause all of the fax ports!!?! If we didn’t outbound faxes would have failed to send, and would have alerted the end users.. great, more emails from people..

After another engineer on our side sat fighting with one of the engineers on their side, we soon found out that they were going to dispatch a tech, great, someone was going to go on site to our datacenter and look at the line, no eta. Lovely.

Within a half hour, we tried sending out again, hoping that the system had come back. To our amazement it had!

From what I heard, they didn’t do anything, the system just decided to come back and start working again (lol), that’s always what I want to hear when i’m dealing with mission critical stuff, yup always..

So yeah, Faxing issues and dumb end users, highlights of an otherwise slow day..

Be back tomorrow!

-B

Raid Array only 67% rebuilt? No problem!

So one of the fun things I get to do on a daily basis is that I get to work with some pretty cool stuff.

One of those things, are servers. Today while my boss and I were driving to a convention in Georgia (a whole different story, 12 hour drive SUCKED). We had one of our client’s servers crash, for a second day in a row.

The kicker is this, after fighting with Dell to RMA a 600GB SAS drive, and get it next-day’ed to the site, a tech had 3 hours previously replaced a known bad drive with a new working one. So our first response was, WTF?! we just replaced the bad drive!

One of our techs ran onsite and working with him, we found that the new drive was marked Offline in the Dell H700 Bios (uhoh..), and the other drive in the RAID 1, was showing as failed (SHIT!). Working with my tech, I had him turn the Drive 0:1 online, naturally this should be a fruitless effort, that does no more than just turning a drive on.

The back story, for my decision, is that one of my own servers, a Dell R410, that runs on a H700 as well, had a similar issue, where the RAID 5 had two drive failures due to excessive heat, to fix the system with NO issues, was to put the offline drive to on, and to reestablish the RAID. With this thinking I had the tech try just turning on the new, offline drive.

When he rebooted the server, he got a windows prompt that the server had shutdown incorrectly (YESSS!!), this was so far so good, it meant that the rebuild of the RAID 1 Array was somewhat a success (as if it wasn’t it shouldn’t have booted up right?). When the system booted up completely, there were no errors, and no signs of applications not starting up… So basically, a drive, a 600GB (SAS 6Gbps) Drive, somehow managed to rebuild its ENTIRE array in just under 3 hours.. bull shit I say, but how can we explain how this system is online without the complete RAID array rebuild..

Well, here is where it get’s interesting.. After my tech confirmed with the client that they could get in, and work properly, one of my in-office techs called dell, and after a lengthy chat with the tech found that the RAID 1, that was the array 0 of the RAID 10 configuration had actually NOT finished rebuilding, the logs from the controller actually told us, that it got to 66.7% rebuilding, when the first drive in the array failed.. wait.. 67?! yeah, my thoughts exactly. A drive, that was in the Array 0 of a RAID 10, somehow got to 67% completion, lost its master drive and somehow was booting up..? yeah.  It makes no sense to us either, from my boss to the other techs, everyone says the system should be DITW (Dead In The Water), yet, here it was alive and kicking.

With that bit of fun, we have to be careful, I suspect there HAS to be some sort of data missing, from a windows update file to some system file that hasn’t been accessed yet, that we will see down the line. Zombie Server. Ugh, thankfully, we have a contingency plan in place, in case the system dies decide to die, for good..

And that’s all there is for today, nothing better than driving for 6 hours, and having a server go belly up, awesome stuff..

We are here in Atlanta now and i’m exhausted. Hopefully tomorrow isn’t so exciting, but will post back 😀

-B