Experiments in the Cloud, Part 2: Traditional vs Cloud Architectures

After a long haitus, I return to continue this series. I’ve actually changed slightly what I’m doing with this blog, but the core remains the same but I’m utilizing one less external service. I have to preface this article again with my hosting of this blog is more about learning about these services on AWS. Even in the 5 month break between the first part and this part, AWS has come up with some great additions to their services. So, before I dive in, let me reassert that I know that I could be using github/gitlab pages to host a static site, and I know this is overkill. Now, let’s dive in.

Picking a static site generator

There’s a lot of these out there, but I ended up picking hugo partly because of my desire to investigate golang a bit more. My criteria was that I wanted a very simple, fast-for-local-testing, service that was built upon markdown. Hugo is insanely fast. There are plenty of tutorials on how to get going in hugo, so I won’t dive into that in this article.

Hosting: Static Delivery

Amazon AWS offers its S3 service to store static files. With S3 you can enable static file hosting, which allows you to use S3 as a highly-available static file host. This is the clear choice for hosting the website generated by hugo.

To begin with S3, you create a bucket. These bucket names are visible if you expose S3 to the internet. For example, this site is hosted on S3 at BUCKET-NAME.s3-website-us-west-2.amazonaws.com. I’m purposely not linking to it to try not to confuse webcrawlers. Uploading is very simple with the awscli tool. After doing the configuration with your access keys, it’s as simple as aws s3 sync public/ s3://BUCKET-NAME/.

There’s a hidden detail mentioned above with regards to high availability. In the previous installment, we discussed the concepts of high availability on AWS. If you look at the domain name format for the S3 buckets, you’ll notice ‘s3-website-us-west-2’. ‘us-west-2’ is the name of a region inside of AWS, whereas ‘us-west-2a’ would be a specific AZ. By looking at this domain name format we can assume that S3 is highly-available within the region. Sure enough if you dig in, you’ll confirm those findings. Because of this, hosting our static site will be highly available out of the box. If I was wanting to host this on EC2 instances, I would have to stand up two VPSes and have a load balancer to ensure high availaibility.

Hosting: Domain Names

There’s a problem with our setup – the domain name needed to access the website is horribly ugly! You can set up a CNAME to forward to S3. This is pretty easy and solves the first problem. The second problem is SSL. AWS only serves SSL for amazonaws.com, which means that if you set up your CNAME record, you’ll get security warnings if you try to host SSL. S3 has no built in support for custom SSL hosting. For that, we need CloudFront.

CloudFront also has the added benefit of turning our website from regionally highly available to being globally avilable. CloudFront is a CDN service that provides “edge” caching, which means that throughout the globe there are edges distributed. Your requests are made to those edges automatically based on DNS geolocation. If you’re loading this website from Germany, for example, you’re actually loading from an edge within Europe rather than my west-coast AWS region. The CDN honors the Cache-Control headers, or you can override them as well as enforce maximums and minimums.

Since originally setting up this blog, AWS also introduced Certificate Manager which allows free SSL certificates for use within AWS services. It’s dead simple to set up, so as of this deployment this website not only is globally available and low-latency, it also is done with SSL.

The last step I didn’t really call out is that CloudFront is both available for use as a CNAME record, or if you use Route 53, you can use an ALIAS record which manages the actual DNS records automatically. Because I was evaluating Route 53 for my business, I picked Route 53 as the DNS host for nilobject.com and have this website set up via an ALIAS record.

Uploading a new blog post

By introducing the CDN, we’ve actually complicated our deployment process. The CDN will cache static files for a period of time. By default it’s up to 24 hours based on the cache-control headers. That means that if I upload to my S3 website a new post, some people may not see the blog post for 24 hours. That to me was unacceptable. Luckily, CloudFront offers invalidations, which allow you to purge the caches across all the edge locations. You can purge specific paths with wildcard matches. The only limit to invalidations is that they cost money if you go past 1000 paths per month, it then becomes $0.005 per path invalidated. This could add up quickly for large websites, but for a blog that I update once every 5 months (hah), I can just blindly do full invalidations at no charge.

Where do we stand?

At this point I had nilobject.com hosted on a global CDN and the ability to upload new content and invalidate the caches worldwide. However, this is not as nice as I was aiming for – it still requries me to use a computer with awscli installed to do the generation and uploading, and either more awscli commands to perform the invalidation or logging into the AWS console and invalidating the distribution.

I wanted something automatic. I wanted to be able to log into github/gitlab, create a new markdown file, and have it automatically deployed. Or I wanted a simple git push to do the deployment automatically. At the time of writing part 1, I decided to use travis-ci and have the blog hosted on github.com. This morning, I switched to gitlab.com and used the integrated gitlab-ci service.

Continuous Integration: It’s not just for testing!

Continuous integration tools like travis-ci, circle-ci, gitlab-ci, etc, all exist to not only provide automated testing but also the ability to deploy your new builds automatically. The other great thing is that most are free for open-source projects. In this case, there isn’t much testing to be done. However, I suppose it’s possible to put bad content into a post file and hugo would fail. The important part is that we can use this service to run our s3 and cloudfront commands automatically.

You can see the config file for this website here. You’ll notice there are no AWS access keys or any secrets except the bucket name – which isn’t really that secret. Most CI services, gitlab-ci included, provide the ability to put app secrets in the CI service so that they show up in builds as environment variables. By controlling who can contribute to my project, my AWS keys are as safe as the CI service’s security. Beyond that, AWS has fine grained permissions, so the acccess keys in this situation have write access to that single S3 bucket and the ability to invalidate the cloudront distribution, but no other permissions.

Conclusion

Now I have a completely overkill setup, but through it I’ve learned about the capabilities of Route 53, S3, CloudFront, and access credentials. Will I keep it set up this way long-term? I’m not sure, but I don’t have any reason for the time being to switch away.

As for this series, I am uncertain what the next topics will be. I want to cover AWS’s concepts for networking (VPCs) as well as go through a high-availability web application setup using ELB, EC2, and RDS. I also have a backlog of other topics I’ve been thinking of writing about, and today’s post is more to get me moving again. I may post on some other topics first before continuing this series.

comments powered by Disqus