Here are the stats from this incident. This is relevant since RadioReference and Broadcastify both still currently share some infrastructure, so major incidents will impact both Web properties. (Note: this will change when we completely migrate broadcastify to it's own entire infrastructure)
First - we stayed up throughout the incident and we saw record traffic today which is a big win for Broadcastify and RadioReference. Previous major incidents have flushed out bugs and issues that, when left alone, tend to magnify greatly when we receive record traffic and typically brought down the site for one embarrassingly reason or other.
At one point we were serving over 4000/requests a second to our Web infrastructure. Things got slow for a during the beginning, but we provisioned 4 extra Web servers and reconfigured the proxy and everyone went along nicely. We also disabled the forums for a while to reduce load while we could scale up the Web servers.
The top feed was:
Broadcastify - Boston Police, Fire and EMS
This feed reached 60,423 listeners at 2013-04-15 16:26:15 CST
Next highest feed was:
Broadcastify - Boston Fire Department
This feed reached 5,379 listeners at 2013-04-15 15:32:39
As far total listeners:
At 2013-04-15 16:30:01 we reached 88,904 listeners which is a new record.
Hope this answers the questions that many of you will certainly have...
With that said: we had some lessons learned that we will take forward - not everything was 100% smooth.
1) Our audio relay servers which serve audio directly to clients had settings that caused clients to be rejected when each audio server reached about 15,000 listeners. This caused intermittent client rejections. We immediately rolled out a configuration change to fix that, and our listener counts went from 60K to 80K within 15 minutes. Lesson learned.
2) The forums can cause great performance issues with the overall site, since the entire site shares the same database infrastructure. This is due to a number of reasons, but suffice to say that until we move the forums to their own database, if we experience a large traffic incident we'll have to take the forums offline for a 15-30 min period to help out the overall infrastructure.
Finally, we're learning a lot about "incident" traffic. When an incident happens, we'll see a slow but firm step up in traffic, then an absolute deluge of traffic due to traditional and social media, then a long prolonged wind-down. We're beginning to handle that middle deluge better as we get more experience profiling traffic patterns and behaviors. Today, we used our lessons learned from previous incidents to keep everything up and running.
Thanks,
First - we stayed up throughout the incident and we saw record traffic today which is a big win for Broadcastify and RadioReference. Previous major incidents have flushed out bugs and issues that, when left alone, tend to magnify greatly when we receive record traffic and typically brought down the site for one embarrassingly reason or other.
At one point we were serving over 4000/requests a second to our Web infrastructure. Things got slow for a during the beginning, but we provisioned 4 extra Web servers and reconfigured the proxy and everyone went along nicely. We also disabled the forums for a while to reduce load while we could scale up the Web servers.
The top feed was:
Broadcastify - Boston Police, Fire and EMS
This feed reached 60,423 listeners at 2013-04-15 16:26:15 CST
Next highest feed was:
Broadcastify - Boston Fire Department
This feed reached 5,379 listeners at 2013-04-15 15:32:39
As far total listeners:
At 2013-04-15 16:30:01 we reached 88,904 listeners which is a new record.
Hope this answers the questions that many of you will certainly have...
With that said: we had some lessons learned that we will take forward - not everything was 100% smooth.
1) Our audio relay servers which serve audio directly to clients had settings that caused clients to be rejected when each audio server reached about 15,000 listeners. This caused intermittent client rejections. We immediately rolled out a configuration change to fix that, and our listener counts went from 60K to 80K within 15 minutes. Lesson learned.
2) The forums can cause great performance issues with the overall site, since the entire site shares the same database infrastructure. This is due to a number of reasons, but suffice to say that until we move the forums to their own database, if we experience a large traffic incident we'll have to take the forums offline for a 15-30 min period to help out the overall infrastructure.
Finally, we're learning a lot about "incident" traffic. When an incident happens, we'll see a slow but firm step up in traffic, then an absolute deluge of traffic due to traditional and social media, then a long prolonged wind-down. We're beginning to handle that middle deluge better as we get more experience profiling traffic patterns and behaviors. Today, we used our lessons learned from previous incidents to keep everything up and running.
Thanks,
Last edited: