As a software developer, and technology manager, I’ve had a lot of experience in the past with variable website traffic. In the past, I’ve worked on business to consumer information websites, I’ve worked on payment gateways and even on a few ecommerce sites.
In this post, you should get some really good ideas for how to deal with traffic spikes along with learning about what traffic spikes are.
What is a traffic spike?
There’s a lot of things that can come up as a surprise when traffic dramatically spikes. A traffic spike is any increase in traffic that occurs well above the average for that site. In the 2000’s, we used to talk about getting slashdotted. 🙂
The more a website or API grows in traffic or functionality the more you can expect it to scale. Designing for scale without dramatically increasing costs or complexity can be challenging. The most important things to watch for and do for a scaling website or API are the following things:
- Database load
- Coping with a large volume of concurrent users
- Storing data efficiently
- Dealing with multiple HTTP requests
- Implementing caching layers
- Media assets
First and foremost, the first thing to break is almost always the only thing that isn’t redundant. In my experience, this is usually the database because it’s really difficult to design it well enough for scale.
When I first joined Caddle, it was immediately obvious that they were having scaling problems due to the way the database was setup and the way that servers were being used.
Hardware Should Serve Only One Purpose
Typically to keep costs low when starting a company, it’s really common to have one server do many functions ie: serving up the website (web server), running the database (database server) and storing static files (file server).
For a while, this is an okay approach to reduce general IT spending, but as your website or app gains more and more traffic it becomes the first thing to break.
Running and maintaining a database server can be really challenging, it typically makes sense to use a cloud provider’s platform instead of setting up your own instance. Amazon Relational Database Service (RDS) is pretty low cost and able to scale pretty well while freeing up the team from all the boring and mundane tasks like patching, backups, and setup.
You Will Lose Traffic
In the case, of really large spikes you are likely to lose some traffic and that has to be okay. If autoscaling is being used there should only be a few minutes where some traffic loss occurs, but it’s still something that you need to expect.
Rarely is advice about using other services, frameworks or technologies correct or helpful. I genuinely believe that using serverless technologies like AWS Lambda will help but there is a large potential cost from that.
I’ve blogged quite a bit about serverless technologies, one of my favourite posts is How to Adapt to Serverless Technologies.
Autoscaling Works Well
Autoscaling works really well, but it’s not generally immediate. The best way to do autoscaling is to build a system that is unique to your business to predict your traffic, and then use AWS’s autoscaling as your backup for when you get your prediction wrong. If you’re using AWS there can be a lot of surprises that you didn’t know about.
For example, if the target CPU utilization is 80% and you suddenly peak to 90% or 95% you will only scale out just enough to get back to 80% and this will likely take a couple of minutes to achieve.
If your traffic were suddenly to peak by 200% of the norm, it would keep adjusting every few minutes to get back down to an average CPU utilization of 80%.
And finally, one thing, I’ve learned a lot is that you should be really cautious on using CloudWatch alarms as they update every 5 minutes which means they can be really too late.
Stop Serving Resources That You Don’t Have to
Caching and public content delivery networks (CDN) are a Godsend when dealing with large traffic spikes. By utilizing caching and content delivery networks you can significantly reduce the amount of traffic that your systems need to handle.
Caching is the act of temporarily storing files for faster usage. There’s really two types of caching that I’m going to talk about: browser caching and caching services.
Setting up a cache can be really difficult if all of your site is dynamic. Thankfully, most sites aren’t really all that dynamic.
You have to be really careful when enabling browser caching as it’s really easy to cause your website to be broken for some users after you make updates because some browsers won’t compare if resources have been changed.
Cloudflare is an excellent option for caching, it should be possible for you to use the free tier and significantly reduce the amount of traffic you are using.
The most beautiful part of Cloudflare is the caching rules. In Cloudflare, you’re able to set caching rules which can potentially eliminate the need to even connect with your server.
Content Delivery Networks
CDNs out of the box are built for high availability and high performance because they are a system of distributed servers that deliver pages, files, and other web content to the user/machine based on where the closest machine is. Generally, you can say that the closer the machines are physically to the user the faster the user will get the content.
That said, the advantage is actually that a CDN can provide a lot of protection when large surges of traffic happen because they can effectively distribute the load.
CDNjs is a fantastic resource for using externally hosted files and resources.
Remove the Crap!
As sites and APIs age, they tend to keep accumulating more and more code and plugins due to a lack of time to remove a plugin or third party tracking code. Generally, companies try a new tracking company and eventually turn it off but don’t remove the code or plugin.
All of these third party libraries or plugins tend to slow down the site or cause more and more http traffic. It’s really important to audit sites for code that’s not called, broken scripts, or plugins that aren’t being used because these will cause slowdowns and maybe even increased costs.
There’s almost never a need to ship comments out to users every time their browser asks for content. As part of your build process, you should strip out comments.
Reduce Database Usage
As mentioned previously, the first place to generally have issues during a traffic spike is the database so it’s usually the first place I would look at making changes.
It’s usually best to start by making really small changes by looking at what queries are the most frequently executed. In a lot of systems, you should probably look at whatever runs immediately when a page loads. Often these are permission checks, configuration objects, and even menu options to add.
A lot of this can be easily queried once and then cached for later. Configuration options are unlikely to change much, so they can be stored in the code or somewhere else. When working on apps, I like to use Firebase’s Remote Config.
For those that don’t know, most databases include a very simple cache that can be used to dramatically improve performance for systems that execute the same queries constantly. In MySQL query caching is one of the easy ways of reducing database loads in really read-heavy environments.
Don’t Call the Database
In a lot of cases, WordPress other content management systems are calling the database every time a page is called because developers didn’t setup any sort of caching.
In a lot of cases, it’s pretty easy to setup a plugin that can build the contents of the pages and not call the database more than a few times a day. W3 Total Cache and WP Cache are pretty good options for WordPress.
Perform Regular Monitoring
Effectively monitoring you website or application can only be done after doing an audit of what the system looks like. You will need to dig in and figure out what all the parts of the infrastructure are so you can streamline and monitor easier.
Every website and application has a lot of complexity in the infrastructure partly because we are all using DNS, third parties, cloud hosting, and generally frameworks or libraries that we haven’t written.
Getting massive amounts of traffic or a large traffic spike can be a tremendeously stressful time but also a great learning experience. It’s perfectly normal to lose some traffic while turning things up, and you shouldn’t worry too much about losing some traffic.