serinde: ("What fresh hell?")
[personal profile] serinde
Here I am, the only one in.

Suddenly, my SSH sessions to the userhosts freeze up. Another #*%& xterm crash? No, Mr. Xterm is fine. Network? No, gaim and my sessions to zen and absinthe are fine. And then netmon starts spewing alerts in a tide of glowing orange: all userhosts, both main web servers, and a mailhost. Clearly something going on at Navisite. (But not entirely, because some other hosts there are up.)

Amidst my preparations for hara-kiri, [livejournal.com profile] sweh points out that some are already coming back. Everything in fact returned cleanly. Still don't know what the fuck happened. Power hit?
Now I'm not in charge of this stuff and I never have seen the contract, but I thought one of the points of colocation is reliable power.

Date: 2007-04-12 02:31 pm (UTC)
From: [identity profile] arkham1010.livejournal.com
whats the uptime on the alerting boxes? Did they all just reboot?

Maybe there was a router/switch failure and failover took a long time?

Date: 2007-04-12 02:33 pm (UTC)
From: [identity profile] syringavulgaris.livejournal.com
They all rebooted simultaneously, yes.

Date: 2007-04-12 02:46 pm (UTC)
From: [identity profile] arkham1010.livejournal.com
PDU issues eh?

Spank them. Not nicely.

Date: 2007-04-12 03:45 pm (UTC)
From: [identity profile] briony530.livejournal.com
The 1/4 of that I understood sounds very frustrating. If it helps, blame it on me.

Date: 2007-04-12 06:57 pm (UTC)
lillilah: (Default)
From: [personal profile] lillilah
I thought the point of colocation was that if there is a problem with a machine you have to run across town to deal with it.

Date: 2007-04-13 01:08 am (UTC)
From: [identity profile] blarglefiend.livejournal.com
The points of co-location include "reliable power", but what usually happens in my experience is that management pick the lowest bidder who is then unable to manage little things like that.

We've had two major power outages at our co-location facility in the past eighteen months. This is with millions of dollars of equipment that doesn't just come back up after a power outage, typically it involves having to get a Sun engineer out to untangle the SAN.

Profile

serinde: (Default)
serinde

December 2024

S M T W T F S
1234567
891011121314
15161718192021
22232425262728
293031    

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jan. 10th, 2026 02:30 pm
Powered by Dreamwidth Studios