Breaking This Website!
December 23, 2021
Things fell down when I made a configuration change earlier in the week
So unbeknownst to me, I introduced a change that caused my personal website to no longer be happily working in a good state. I’ve been doing a lot of things in the background lately, around moving away from CloudFlare post my incidents with them that I discussed in detail prior. Those changes, included moving my websites that I ran on CloudFlare, towards AWS and CloudFront. I figured I would do a little post mortem in to what went wrong and have a set of actions I need to take in order to ensure something like this does not happen again.
What Happened?
I woke up on the 21st, as I generally do every morning and got set up for the day ahead. Lately being on holidays, I set aside my mornings for self hosting / tech activities and my afternoons and evenings are more around games. I checked Discord and saw in my server with my friends, my status channel which gets pings from Uptime Kuma had some unread messages. Initially I figured it might have just been a Docker container that died on Endeavour and came back up, but no! It was my personal website that was reporting as down.
The first report came in around 05:57 UTC, there was some continued reports then as I’m guessing DNS was slowly updating. The error message was saying something about TLS certificate chains being incorrect, which was already enough of a prompt to tell me what the issue was. Opening my website I was greeted by the browser warning of “Invalid TLS Certificate” which is rather spooky for regular users, but in a way familiar to me thanks to years of organisations using self signed certificates, which is bad!
What was root cause and how did I fix it?
What went wrong here, was me forgetting to make a configuration change on my Application Load Balancer. Since I was using CloudFlare and CloudFlare’s TLS, in order to have an E2E connection, I manually uploaded the CloudFlare certs to Amazon Certificate Manager and associated it with the HTTPS listener on my ALB. When I changed my domain names DNS servers from CloudFlare to Route 53, since CloudFlare was no longer proxying my traffic, their TLS certificate which is signed by them, now showed as invalid to the rest of the world.
The solution, was to generate a new certificate inside of Amazon Certificate Manager and associate it with the ALB. At some point I had the CloudFlare certs inside my Docker container base image, so I removed those as part of this. I had noted the issue at 09:32 UTC and my website was reporting healthy at 10:00 UTC. I did take a rather relaxed approach to it all, but I’m still pretty happy with that turnaround time.
What am I going to change going forward?
Honestly I just need to be a bit smarter when it comes to remembering what I have configured in the past! But, I also think my website monitoring did not fully work here. My Discord server did get messages, but I think a push notification or SMS to my phone for something like my personal website is warranted. Not so much a page to wake me up to fix it, but just something I would see immediately when I wake up since it would be on my phone. It’s also worth noting I use Uptime Robot as well for uptime monitoring and as far as that service was concerned, things were green. So, I need to see was it ignoring TLS errors or what might have gone wrong there. The other, larger action is that right now my website stack is rather ‘bespoke’ in comparison to everything else since it runs on Fargate, but I want to investigate moving it into S3 and CloudFront. CloudFront Functions were released not too long ago, so I want to see if they could replace my Nginx URL rewrites that I currently have in a Docker container. If they can, it would be a great way to both use the same set of Terraform code I use for everything else while also saving some money!
Thank you!
You could of consumed content on any website, but you went ahead and consumed my content, so I'm very grateful! If you liked this, then you might like this other piece of content I worked on.
My initial status page configurationPhotographer
I've no real claim to fame when it comes to good photos, so it's why the header photo for this post was shot by Marc-Olivier Jodoin . You can find some more photos from them on Unsplash. Unsplash is a great place to source photos for your website, presentation and more! But it wouldn't be anything without the photographers who put in the work.
Find Them On UnsplashSupport what I do
I write for the love and passion I have for technology. Just reading and sharing my articles is more than enough. But if you want to offer more direct support, then you can support the running costs of my website by donating via Stripe. Only do so if you feel I have truly delivered value, but as I said, your readership is more than enough already. Thank you :)
Support My Work