A Site With Over 120,000 Hits Per Hour? Here's How We Managed the Demand Using a Custom 404 Pipeline.

Going live with a new Sitecore implementation and getting over 30 times the load you were expecting can be a bit of a shock! When we saw 120,000 hits per HOUR instead of the 3,000 we expected, adjustments needed to be made beyond scaling, and fast. 

There's a lot that can be done when you switch over to a new platform which also includes a redesign of the site. Mapping and 301 redirecting old URLs will be one of the most effective things you can do, but this site was seeing a tremendous amount of requests for seemingly random and completely unknown pages. 

This site also had a unique 404-page system that couldn't make use of standard caching since a query was needed every time. In the event the page is missing, the client wanted Sitecore to check if the other language (English or French) exists. If it did, the site would serve a custom 404 page saying sorry, but hey at least there's a version in language X. Furthermore, they wanted dedicated 404 pages inside the navigation tree, so if you didn't find a blog article, there was a specific blog 404 page. So now that we have a multisite, multilingual instance with more than one 404 page per language, which brings us to a total of 12 different 404 pages being served. Fun.

Before we go any further it's important to note that I did advise them of this being an inefficient practice, and load testing was performed on this 404 feature because of its unique characteristics. Everything looked great during the tests with less than a 10th of a second in TTFB. Launch day had other plans for us, and when we learned the true traffic we'd receive, something had to be done to save the web database.

Taking 404 Process Out of Sitecore

What's faster than a Sitecore query and a page load? Not doing it. To get past this bottleneck I created a static list of requested URLs and the ID of the page it needed. Each time a 404 request is made, the list is referenced and if the page ID can't be found, the Sitecore query method runs and then adds to the list. Any subsequent requests under the same URL can skip the query since the list will have the page ID. A 2nd list is maintained with the rendered 404 pages so WebUtil.ExecuteWebPage isn't needed, since we also don't want to be redirecting the client. 

Seeing Performance Improvement Under Load

So, what does this look like? The traffic was coming in spikes from random locations and times. This was a fix that I got out in a hurry, but as you can see here the response time dramatically improves when the spike in traffic occurs. This was possible due to the static list delivering the stored page instead of Sitecore rendering it. 

Everyone was able to sleep at night with this buffer in place. Bring it on you pesky bots! This odd behaviour dropped off after a couple of months, and to this day the client has no idea where the unrecognized URLs came from.

If you have any similar stories or questions about traffic spike mitigation let me know!