I woke up on Saturday morning ready to update your newsletter to the June edition only to find that the server that holds this month’s newsletter was inaccessible.
We understand the importance of your newsletter, so saying I was concerned would be a bit of an understatement. The newsletter is a cornerstone of our websites for accountants .
While checking why this server was down I found that the datacenter where the server is located had a major disaster.
One of their high power transformers exploded and blew out three of the walls in their electrical room. So many wires were damaged that the backup generators wouldn’t work either.
The bad news is they don’t have power. And datacenters use a lot of power for the servers and the A/C cooling systems.
The good news is that none of the servers were damaged.
The Datacenter owned by ThePlanet and located in Houston Texas has been working around the clock to repair the datacenter bring the over 9,000 servers back online.
Last night they were able to bring up the 6,000 servers located on the second floor of their facility. I spent all day and night yesterday calling, IMing and posting tickets trying to find out if our server was one of the lucky ones located on the second floor.
Murphy’s Law prevails… and it turns out that our server is one of the unlucky ones located on the first floor where the damage is the worst.
ThePlanet tells us that they’ll have temporary power to the servers on the first floor tonight.
What’s Affected?
We don’t keep all our eggs in one basket. We utilize 4 separate datacenters to securely host our 10 servers. That way if one server goes down you still have some functionality.
Each server has its own function. Some handle your website, some handle email, some handle Secure File Exchange, while others handle DNS and other important services.
This strategy worked very well in this disaster. Many of the other companies who have servers in the Houston datacenter have full outages… Meaning their customers are totally without their website and email.
The only server that is affected by this disaster is the Email Marketing System server. That server holds your list of subscribers and the online newsletters.
Your data is safe⦠just not accessible until tomorrow morning.
What This Means
This means that your June newsletter won’t be updated on your site until Tuesday June 3rd. It also means that you can’t change your emailed newsletter subscription list today.
What We’ve Learned
One of the biggest lessons I’ve learned from this is that no matter how well a datacenter is built… no matter how many redundancies they have in place… disasters still happen.
I’m calling in a server disaster recovery expert to review our current plans and make recommendations on what we can do to prevent or at least minimize the downtime if this happens in one of our other datacenters.
When it comes to making changes to servers, I’ve found that it’s wise to thoroughly evaluate the implications of the changes. Then thoroughly test the new plan before making it live.
That means it takes time. Over the next few months I will keep you informed on our new and improved disaster recovery plans.
Update - Monday June 2nd at 5:00 PM EST
Ok, I have a bunch of good news…
First, I found a backup of the June 2008 Newsletter, so I installed it on your website. The cross server backup was there all along. I had simply forgotten about it. My brain doesn’t work too well on 2 hours of sleep.
Second, ThePlanet just got their servers back online, so the Email Marketing System is up and running again. ThePlanet did an outstanding job recovering from this disaster. I am happy that this recovery went so smoothly.
The fact of the matter is… "Disasters Happen". That’s why we have contingency plans. It’s just gives me a happy warm fuzzy feeling when our disaster recovery plans work.
Update - Tuesday June 3rd at 1:00 PM EST
ThePlanet’s Woes Continue…
When it rains… it pours! The temporary fix ThePlanet put into place yesterday failed today. They installed a gigawatt generator to power the A/C and servers for the 3,000 servers located on the first floor of the datacenter.
This huge generator broke down today. So the datacenter is out of power again. That means the Email Marketing System is offline now.
It seems to me that we need to move our server out of this ailing datacenter, so we’ve put that in motion. We’ve hired someone physically pickup our server and drive it to another datacenter just 3 miles away.
The Email Marketing System will be back online once the server is hooked up and the necessary programming changes are made. We are working on the programming now so that won’t hold us up once the server is ready.
The ETA on the Email Marketing System being back online is sometime tonight. I wish I could be more specific but there are too many variables at play. Please come back to this blog post as I will keep it updated on our progress.
Update - Wednesday June 4th at 9:00 AM EST
Our server has been moved to the new datacenter and it’s in queue to get connected and powered up. Unfortunately there are 500 servers that joined us in the move. The datacenter can not tell us where we are in the list, but they are working as hard as they can to get our server online. We expect it will be up sometime today.
The good news is once the server is connected we will not have any more datacenter problems. If we kept the server in the wounded datacenter who knows what other issues would arise.
Update - Thursday June 5th at 5:00 PM EST
ThePlanet datacenter is having more problems. They promised to move our server to a new datacenter on Tuesday. We called them Tuesday night and asked when our server would be up and they said tomorrow morning.
When we called on Wednesday morning they said they weren’t sure but it was likely that our server was sitting in the new datacenter. They also said that they have a lot of servers to install and we would have to wait.
This morning when we called we got a totally different answer. They said that they haven’t moved our server because they have the power on now and they are in the process of booting it up.
This really pissed me off because we spend hours modifying our programs to adjust for the server being located on another IP.
So we waited for our server to come up. But it never did. We called them every hour today asking them when the server would be up. And each time they told us it should be soon.
My frustration level is at its max. And I’m not the only one. There are thousands of other business that have been severely affected by this outage. I’m sure the people at ThePlanet are doing the best they can, but I wish they would just tell us the truth.
I hope this is all fixed in the morning…
Update - Thursday June 13th at 7:00 PM EST
Well the Email Marketing System is up and running again, and the June Newsletter was only a day late. It seems to have gone off pretty much without a hitch.
I am terribly sorry for the complications with this months send