Outage caused by Amazon Web Services problems

Status
Not open for further replies.

blantonl

Founder and CEO
Staff member
Super Moderator
Joined
Dec 9, 2000
Messages
11,115
Location
San Antonio, Whitefish, New Orleans
Folks,

As you are probably aware, we had an outage today that lasted almost 5 hours, due to our primary hosting provider (Amazon Web Services) experiencing significant network connectivity problems with their EBS (Elastic Block Store) which we use to store our master database server data.

Once the problems crept up at 11:30 AM CST, we brought the site offline to prevent having other major problems on our back end infrastructure. We finally made the decision to promote on of our back slave servers to the new master at about 4:30 PM CST. This means that the site is currently running in an emergency recovery configuration - therefore expect to see maintenance during the day tomorrow so we can get back to a more robust database implementation.

In the mean time, you can see Amazon Web Services status here:

AWS Service Health Dashboard - Oct 22, 2012

Also note that this issue was not just unique to RadioReference.com. Many of some of the largest Web properties were impacted.

Update: Amazon Web Services Down In North Virginia

Thanks all your patience during this outage. It was one of the largest we've had in the history of the site.
 

fyrrsq4

Member
Premium Subscriber
Joined
Dec 19, 2002
Messages
7
Location
RI
Thanks for getting the backup online and good luck with the work tomorrow!
 

SCPD

QRT
Joined
Feb 24, 2001
Messages
0
Location
Virginia
Welcome back!

I was reading a little about Amazon hosting while RR was down. RR goes down on occasion but I can't remember when it was down for 5+ hours (with the exception of announced maintenance).

Because of what radioreference.com is to the scanning and communications world, have you considered a secondary site to keep operations flowing, even if the secondary only utilizes a reference database and not the other perks such as Forums and Broadcastify. Because radioreference.com comes pre-loaded now any downtime could have a negative impact to scanneing/scanner sales worldwide (Though higly unlikely unless the servers were down for days)

Anyways, thank you for keeping us informed and Glad your back! Sure you guys have lotsa work to do (and some rag chewing on amazon reps)

Cheers!
 

blantonl

Founder and CEO
Staff member
Super Moderator
Joined
Dec 9, 2000
Messages
11,115
Location
San Antonio, Whitefish, New Orleans
Hehehe...

RadioReference is a serious infrastructure operation. We have anywhere from 16-20 servers going at one time with Amazon Web services, 1 server with ServerBeach, and 8 servers with 1000TB.com. I'm not sure a lot of people realize what it takes to run this platform from end to end. I had almost 10 people email me today asking if I wanted space on their GoDaddy hosting platform. I appreciate the though, but.... :)

We are backed up against disaster, but we certainly can do a better job to mitigate issues when something like this happens.

But let's not forget - RadioReference sees very little downtime - it is a rare occurrence for us to have problems. We've designed the back end very well to be resilient, but even the largest Web businesses in the world get caught by unexpected problems. We are no exception.

That doesn't excuse it for us, because ultimately we are responsible for what happened - no one else.
 

Laxie

Member
Joined
Feb 25, 2012
Messages
2
Location
St. Paul, MN
Thanks for the update!

I couldn't figure out why my cell phone could get every other site but yours. I too was getting shaky but then I brought the site up on my PC and saw the notice about RR being down.

Thank you for posting the full info about what happened. It's great to see a site that gives us more than "our servers were down."

Keep up the great work and have an adult beverage tonight and relax. :)
 

mjbjr

Member
Joined
Dec 11, 2009
Messages
657
Location
Macon,Ga USA
Hehehe...

RadioReference is a serious infrastructure operation. We have anywhere from 16-20 servers going at one time with Amazon Web services, 1 server with ServerBeach, and 8 servers with 1000TB.com. I'm not sure a lot of people realize what it takes to run this platform from end to end. I had almost 10 people email me today asking if I wanted space on their GoDaddy hosting platform. I appreciate the though, but.... :)

We are backed up against disaster, but we certainly can do a better job to mitigate issues when something like this happens.

But let's not forget - RadioReference sees very little downtime - it is a rare occurrence for us to have problems. We've designed the back end very well to be resilient, but even the largest Web businesses in the world get caught by unexpected problems. We are no exception.

That doesn't excuse it for us, because ultimately we are responsible for what happened - no one else.

Didnt you make a post once about the ins and outs of RR? I cant seem to find it.
 

SCPD

QRT
Joined
Feb 24, 2001
Messages
0
Location
Virginia
Saw those on FB

I'm not sure a lot of people realize what it takes to run this platform from end to end. I had almost 10 people email me today asking if I wanted space on their GoDaddy hosting platform. I appreciate the though, but.... :)

I doubt most people have any idea how complex the radioreference.com backbone really is. I didn't till I got into this and it blows me away how far you can take it if you have the working capitol. I'm a happy Go Daddy subscriber but am still looking into/at Amazon hosting - tbh Amazon even has less downtime than several of the other "Big Boys" it's one of those occasional 'Stuff happens' deals.

These little blips in the system though will have you and the crew probably looking at new contingencies to implement down the road ....
 

gr8rcall

Member
Premium Subscriber
Joined
Jun 17, 2012
Messages
727
Location
Alamance County, NC
I got kinda scared yesterday (almost had a heart attack) when I saw:
"RadioReference.com is offline due to Amazon EC2 hosting problems"

Glad it's working again!!!

I kind of felt like "troymail's" AVATAR!
 
Last edited:

gewecke

Completely Banned for the Greater Good
Banned
Joined
Jan 29, 2006
Messages
7,452
Location
Illinois
Thanks for the update!
...Had serious withdrawal symptoms. :(

73,
n9zas
 

One13Truck

Member
Feed Provider
Joined
Jul 2, 2004
Messages
916
Location
My home 20 eating pizza.
I was browsing the forums when it went down. At first I thought the internet went down when I couldn't get anything to load. But working in the communications industry sometimes crap happens. Not much can be done about it. Glad to see it seems like it's working batter again today though.
 

luke-1

Member
Joined
Jan 10, 2003
Messages
612
Location
Parker, CO
I just received an email stating my feed was down. I checked it and it is up and running fine.

Anything to do with this?
 

scannerfreak

Moderator
Database Admin
Joined
Jul 3, 2003
Messages
5,193
Location
Indiana
I just received an email stating my feed was down. I checked it and it is up and running fine.

Anything to do with this?

From yesterday's initial post:

This means that the site is currently running in an emergency recovery configuration - therefore expect to see maintenance during the day tomorrow so we can get back to a more robust database implementation.

So the site was taken down today for a bit for said maintenance.
 

One13Truck

Member
Feed Provider
Joined
Jul 2, 2004
Messages
916
Location
My home 20 eating pizza.
Yep. I got an email as well. But I actually DID take mine down for a bit to do some upgrades to some stuff on the computer. Can I still blame Amazon for it though? ;)

I'm joking. But wow I can't believe how seriously some people took the outage when I was checking on the FB site yesterday. You'd think it was down for weeks and happened all the time. With the size of the site and the traffic it gets I'm surprised it stays up as much as it does. Just another example of the fine work done by everybody here to keep things running smooth. The hamster in the wheel in the back room must be one heck of a rodent to keep the place going!
 
Status
Not open for further replies.
Top