Cloud Design Principles

Sharing is Caring

Cloud computing is basically using servers whether they be for databases, storage, application or something else through the internet. The biggest cloud providers are AWS, Azure, and Google Cloud. You can read more about cloud computing in my post “What is Cloud Computing?“.

For businesses the biggest draws to cloud computing is the potential to save money. Cloud computing’s inherent strengths are elasticity, ability to automate infrastructure management, enhanced reliability and reduced cost.

Good cloud architecture is reliable, high performing, cost efficient, and most importantly secure.

Embracing Elasticity

A well designed cloud system should be able to grow and contract as the number of users grows or decreases with very minimal drop in performance. Linear scalability should be able to achieved when additional resources are automatically added by AWS load balancing.

There are two primary ways to scale a system: vertical scaling and horizontal scaling. When scaling there can be a lot of overhead and increased complexity.

Horizontal Scaling

Horizontal scaling means that the system scales by adding additional machines with the software installed on it. This means that a new server is added that has the same capacity as the current system. The problem with this is that unless all systems are bought at the same time, they are unlikely to be exactly identical.

Horizontal scaling is really difficult to build for, all systems have to basically be stateless as you probably won’t be able to ensure that the same machine is consistently used for every user.

Vertical Scaling

Vertical scaling means that you scale up the system by moving it to an increasingly better server. Each time, the new server will likely have a faster CPU or more memory than the machine before it.

Vertical scaling is a lot easier from a development perspective but it hits limits really fast because there is only so many CPUs / cores, memory and hard disks that can be added. Most software isn’t really designed to take advantage of multiple cores or CPUs, so you are unlikely to actually take advantage of most of the new system.

Servers Should be Treated As Disposable

I love that cloud computing allows us to easily build, deploy and delete servers quickly because this allows us to treat our servers as disposable instead of as fixed resources. All servers should be stateless and able to be replaced quickly. Configuration, coding and installation should all be automated so that deployments to new environments can happen quickly and without intervention.

Servers shouldn’t be a dumping ground but should be able to scaled up and down quickly. If your team is testing constantly it will be testable and constant which means the risk of human error is dramatically reduced.

Automated Infrastructure

The cloud has truly enabled us to be able to do infrastructure as code which means we can automate the entire process of deploying and maintaining software and dramatically improve system up time by reducing the risk of human error and allowing a system to be incredibly scalable.

Gone are the days of waiting weeks for new blade servers to arrive from Dell or some other service provider!

On Amazon Web Servers (AWS) there’s a number of services that can be completely automated and be used to to test and manage systems.

For example, CloudWatch Alarms and CloudWatch Events allow us to do some pretty amazing automations without staff necessarily having to do anything. For example, a message can be sent from an alarm to notification service which could then do some pretty sophisticated processing when certain things are happening.

Caching

Caching is the process of storing copies of files in a high-speed data storage layer which allows specific data to access more quickly. Caching is a great way to make an application appear faster and save some additional cost.

In web based applications, there’s four major caching types: Web Caching (Browse or Proxy), Data Caching, Output Caching, and Distributed Caching.

Web Caching

Web caching works by caching HTTP responses for certain documents like images, JavaScript or css. Usually these sorts of caches work off of HTTP Headers and are a great way to dramatically reduce server load when a user requests a document a second time.

Most Web browsers support caching images, JavaScript and CSS out of the box with very little setup required on the server. In Apache we can do this in an htaccess file which sets it to keep all files cached for a day.

To set this up in a web server we end up doing something like this:

<IfModule mod_expires.c>
        ExpiresActive On
        ExpiresDefault "access plus 1 day"
</IfModule>

If your company is using Apache and you’re not sure how to setup caching I’ve created a blog post called “How to Setup Caching in Apache.”

Data Caching

Data caching is a technique of storing data in memory or on the hard disk so that going to get the data from the database or recalculating it can be avoided. It’s a pretty good technique for reducing database usage.

A really good usecase for this is data that only changes once or twice a day and it’s only available in another system. A cache makes perfect sense as it won’t change very often and it will make the system appear a lot faster.

Memcache is a pretty good option on a local machine, to spread the load out and avoid adding state to machines I prefer to use AWS’ Elasticache. Azure Cache for Redis works the same way as AWS’ Elasticache.

Output Caching

Output caching stores the final copy of HTML pages or parts of pages that will be sent to the client; the concept is that this saves time and load regenerating pages because a cache copy is sent. In PHP and ASP.NET these become really important concepts on sites that get a lot of traffic. Many of the WordPress caching plugins are actually doing this.

Security Should Be Baked In

It feels like every week, I get yet another email about a security breach happening because things weren’t correctly setup in a particular cloud provider.

Most of the cloud providers work under a shared security responsibility model which means you are responsible for securing your workloads, and the cloud provider is response for the cloud infrastructure.

The key to staying safe on any cloud platform is to test and audit frequently. The testing and auditing should be automated through technologies like Cloudformation or Teraforma. AWS also has tools that can do testing like the Amazon Inspector and AWS Trust Advisor which can monitor for vulnerabilities.

Single Points of Failure Should Be Reduced

A system is highly available when it can withstand multiple individual components failing (servers, network, hard disks, etc). A well designed cloud system has automated recovery setup at every layer of the architecture.

Introducing redundency can be done by setting up multiple resources for the same task and having them in active mode (load balanced) or in standby mode (waiting for a failover to occur).

It’s a given that a failure is most likely to occur at the worst time, so setting up automation to recover automatically is really important.

Using multiple Availability Zones allos multiple data centers to reduce the impact of failure or something happening in one area ie a natural disaster.

Optimizing for Cost

Lower cost and increased flexibility are the reasons that cloud computing makes sense for most businesses. Optimizing for cost is really difficult without having been on the cloud provider for at least a few months.

A lot of the cloud providers have automated services that can make suggestions for cost reductions.

Wrapping it up

Within most cloud providers there’s a bunch of different services that can be used in these different circumstances.