Kubernetes at the speed of Fastly
When most people need to scale, they go to Kubernetes. But if you’re the team behind Kubernetes and need to improve your scaling and efficiency, you turn to Fastly. The team needed a new and improved solution to help them serve over 5 petabytes of data in downloads from their official binaries service each month, to people all over the world. We started by speeding up and scaling downloads for them — and what’s more, we helped them understand the full scope of a surprising problem.
Come for the speed, stay for the real-time logging
The Kubernetes team adopted Fastly so that they could speed up their downloads, save money on egress charges, and improve their ability to make changes on the backend without disrupting service. But it turns out that Fastly’s real-time Observability — our logging capabilities, access to metrics and more — was the killer feature that helped diagnose something surprising when they cut their download traffic over to Fastly. You can read the announcement from the Kubernetes team and hear their take on why they chose us as well.
You can watch the full video of the live traffic cutover to see exactly what happened. It was a big surprise when the traffic coming through was just a trickle compared to the volume that we all expected! Up until this point the Kubernetes team never had access to good monitoring or visibility into their traffic, and nothing close to real-time. DNS and network engineering can be incredibly challenging and opaque, but with Fastly, they can now see their traffic patterns. From the instant they switched, they knew right away that something was off. Together, the Kubernetes team and Fastly’s Mission Control team were able to quickly diagnose that a huge volume of requests were pointing directly at the Google Cloud Storage bucket for the binaries download, not the new, abstracted URL they were supposed to be pointing to (cdn.dl.k8s.io). It turns out that very early in the project’s history the origin hostname had been advertised publicly, and despite cleanup efforts within the project itself and among the broader community, many hardcoded instances of the GCS bucket’s hostname still exist. Their move to Fastly, and the straightforward, direct visibility we provide to Kubernetes, made this visible for the first time in the project’s history.
With this information, the Kubernetes team will start a new effort to get people around the planet to point at the new domain for downloading their binaries, and they can watch the traffic through Fastly’s CDN tick up higher in real-time as they succeed. With their binaries now served by Fastly, they have freed themselves to make changes, and even migrate to a new, community-owned GCS bucket in the future, without disturbing service or requiring any more changes from anyone attempting to download K8s. Downloads are already being served faster, and the project is saving tons of money in egress charges too. You can take a peek at their real-time metrics – we’ve got the Kubernetes dashboard embedded on our developer page.
Why Fastly is better at real-time visibility
This kind of real-time visibility within our UI may not seem very groundbreaking at first glance, but most CDNs are not able to offer it, requiring their customers to work with batched log data delivered with serious lag. You can’t see it in their dashboards, and you can’t export or stream it into your other tools (we make that easy for you too).
At Fastly we’ve designed our platform to be completely software-defined, which means that we can do a ton of logging that would be impossible in legacy architecture that relies heavily on hardware switches, less adaptable components, and generally less modern technology. Fastly takes real-time data and observability seriously. You get access from the moment you flip the switch to run on our network, and we make it easy for you to integrate it into other tools that most developers already use, like Splunk, Datadog, and many more. We know you may prefer 3rd party observability dashboards, so we offer tons of logging endpoints, and stream the data in real-time. It’s your data, and you should use it however you want.
Network engineering and optimization, DNS, content delivery, and global caching infrastructure are all really complicated to manage. Kubernetes has an incredible Infrastructure Special Interest Group, but most development teams and open source projects don’t have an in-house team of experts for this kind of work. Additionally, Fastly is unique in the incredibly high level of data and configurability we provide, CDNs don’t expose information or configurability, which helps our customers fine tune their configurations to their exact specifications, and avoid big professional services bills. Fastly lets you see what’s happening, make the changes you want, and push them to production. If you do have questions, you’ve got our world-class customer support at your service, but we’ve made it easier for you to do more on your own, and see that it’s working.
What’s next for Kubernetes on Fastly
There is ongoing outreach to projects and teams about switching to using the new CDN endpoint (cdn.dl.k8s.io), and this will continue into the future – a lot of people build on Kubernetes so it’s a big undertaking. While that’s continuing, Kubernetes is looking forward to moving their backend to a different, community-owned GCS bucket without any disruption to people’s access thanks to their new CDN endpoint (cdn.dl.k8s.io). Kubernetes binaries don’t change much (compared to a lot of content on the rest of the internet), so they’re going to maximize their “time to live” (TTL) in Fastly’s cache to help them offload trips back to their origin as much as possible.
Eventually, Kubernetes plans to use their new access to information about their users and traffic patterns to implement Fastly’s origin shielding feature in a really interesting way. Once people have built Kubernetes into their infrastructure, they tend to stick to a certain release version. What’s more, most of the binary downloads are being done by automation tools, especially scripts running in GitHub Actions. GitHub Actions run exclusively in a few specific and predictable Microsoft Azure regions, resulting in certain releases being requested disproportionately from certain geographic locations. On Fastly, Kubernetes can start to optimize their origin shielding strategy to match Fastly POPs against where the GitHub Actions are being run, and which Kubernetes releases are being requested.