Clustering

In normal operation, a request directed to a Fastly POP will be handled by two separate cache servers acting together in a process we call clustering. This architecture allows Fastly to scale efficiently in a number of ways:

  1. We can distribute cached objects evenly throughout the POP and reduce duplication
  2. We can significantly increase the likelihood of a cache hit
  3. We can use request collapsing to handle concurrent requests for objects that are in high demand

Enabling and disabling clustering

By default all GET and HEAD requests are subject to clustering. All other HTTP methods will disable clustering because the responses are not cacheable (this includes all requests methods besides GET and HEAD that have been made cacheable by running a return(lookup); in vcl_recv). Clustering is also disabled automatically after a restart.

To specifically enable or disable clustering, set the values of the following HTTP headers to a truthy value, for example "1":

WARNING: Despite the names of these headers, they do not relate to shielding

It is always possible to disable clustering. If clustering is disabled by your own code or by a restart, it is possible to enable it for requests that do not use a GET or HEAD method.

The delivery node and the fetch node

An incoming request is routed to a random cache server within the POP. That server becomes responsible for accepting the inbound request and, ultimately, for forming and dispatching the response. In performing this role it is known as the delivery node. However, normally, the delivery node is not where the object being requested is likely to be stored.

In order to improve the likelihood of finding a hit in the cache, objects are assigned to specific storage servers based on each server handling a portion of the possible address space for cache keys, a mechanism known as consistent hashing. After calculating the object's cache key, the delivery node will transfer the request to the server which owns the address space containing that cache key. This second server will be responsible for fetching the object from origin and storing it, and in performing this role, is known as the fetch node.

The delivery node may determine that the appropriate fetch node for an object is itself (i.e., the same server that is already handing the request). In this situation, the delivery node will shift to a designated secondary fetch node in order to avoid load hot spots within the POP.

Delivery and fetch nodes in a Fastly POP

If clustering is disabled, a single server performs the role of both delivery and fetch, but the likelihood of a cache hit is significantly reduced. All Fastly cache nodes can perform both delivery and fetch roles, but each object will only have one server that is its primary storage location. Objects cached on servers other than their primary storage server will be subject to more aggressive eviction. Even disregarding this, without clustering, an object would need to be cached on every server within a POP before you would be guaranteed to get a cache hit.

So the primary purpose of clustering is to efficiently distribute cached objects across the available storage within a POP, and to ensure that regardless of which server initially receives and handles a request, the object can be found in the cache.

Another benefit of clustering is that for exceptionally popular objects, especially those that cannot be cached for long periods (e.g. video streams), Fastly performs request collapsing between the delivery node and the fetch node. For example, a stream being watched by 10,000 or more users would likely see multiple concurrent requests for the same object active on all cache nodes in a POP at the same time. Each cache node, acting as a delivery node, will forward just one request to the designated fetch node for that object, and the fetch node will in turn aggregate the requests from all the delivery nodes and forward just one to origin.

Disabling clustering significantly reduces request collapsing capacity and may result in orders of magnitude more origin traffic for exceptionally popular objects (and potentially even client timeouts).

Cluster-aware coding

When a request is handed off from a delivery node to a fetch node, the fetch node receives, and operates on, a copy of the request. When handing back from fetch node to delivery node, the fetch node passes the response object. As a result, stages of the VCL lifecycle which execute on the fetch node may modify the request, but those modifications will not persist through the transition back to the delivery node.

Some common patterns of clustering-aware code include:

Piggybacking data on the response

Where a response is not from cache, data generated in vcl_fetch can be accessed in vcl_deliver by attaching it to the cached object. But care must be taken not to access the data if the object came from cache (since the data will still be attached to the object):

sub vcl_fetch {
#FASTLY fetch
set beresp.http.tmpLog = req.http.tmpLog + " fetch[" + beresp.backend.name + ":" + beresp.backend.ip + "]";
}
sub vcl_deliver {
#FASTLY deliver
if (fastly_info.state !~ "^HIT") {
set req.http.tmpLog = resp.http.tmpLog;
}
unset resp.http.tmpLog;
}

Restart in vcl_deliver instead of vcl_fetch

The restart statement causes Fastly to move the VCL control flow back to the vcl_recv subroutine. Doing this from a subroutine that is executing on a fetch node will result in the restarted request resuming at vcl_recv on the delivery node, but there is no mechanism to communicate a reason for the restart. Commonly, a temporary request header attached to req.http is used for this purpose, and will work if the restart is performed in vcl_deliver instead of vcl_fetch, guaranteeing that the restart is triggered on the delivery node.

This does mean that the response will be cached, if it's eligible for caching.