GitHub provides a first-party action for caching in your workflows.
It's relatively easy to use. You configure it with a path to cache and a key to identify when the cache should be restored.
Here's an example to cache the vendor
directory in a PHP project:
- name: Cache composer dependencies
uses: actions/cache@v4
with:
path: vendor
key: composer-${{ hashFiles('composer.lock') }}
If nothing changes in your composer.lock
file, then your workflow can just reload the vendor
folder from the
last time your workflow ran. This can save quite a bit of time on your CI runs.
This is great for tools like Composer and npm that have distinct lock files, but what about tools that use a cache to speed up operations, but don't have a distinct mechanism for detecting changes.
For example, we use PHPStan (via Larastan) on all our projects. It can take a while to run, so it utilizes a cache folder to track files it already scanned and haven't changed. How can we leverage the GitHub cache action for this?
We can't hash a folder or location, but we can rely on a feature of the GitHub cache action that allows us to specify more than one key to identify a cache. I'll show an example, and then explain it in more detail:
- name: Cache Larastan result cache
uses: actions/cache@v4
with:
path: .phpstan.cache
key: "phpstan-result-cache-${{ github.run_id }}"
restore-keys: |
phpstan-result-cache-
What is going on with the key
and restore-keys
here?
GitHub uses these values to try to figure out which cache to restore during this run. It first checks for a cache result
that matches our key
value, but it only considers it a match if it is a complete, exact match. If it doesn't find one,
it then checks one or more restore-keys
values in order. When checking these values, though, it only has to find a
partial prefix match. The first one it finds as a match will be restored. And if multiple cache results match our prefix,
it will use the most recent one.
Knowing how this works, notice how the main key
is set to a value which includes the current run_id
. This value is
unique for every single run. Because of this, we know the key
will never match, and therefore it will always fall back
to our restore-keys
value.
Why would we set it up this way? It might make more sense with a concrete example:
Let's say our CI run has a run_id
of 123
and the previous run had a run_id
of 122
.
So when this run starts, we'll have a value in our cache with the key phpstan-result-cache-122
from the previous run.
Because our current run is 123
, GitHub will first try to fetch a cache with the key phpstan-result-cache-123
. And
because there is no match it will fall back to our restore-keys
value of phpstan-result-cache-
. Remember, this only has
to match a prefix of the key, so it will find our cache with the key phpstan-result-cache-122
and restore it.
When this run completes, it will save a new cache with the key phpstan-result-cache-123
which will in turn be used
by our next run.
This gives us the best of both worlds. We're able to restore the PHPStan cache from the previous run, speeding up our current run. But we're also constantly saving a new cache with the current run's results. This way, our cache doesn't drift further and further out of date.
This same technique works well with a number of different tools. We use it with Rector and PHP CS Fixer as well.
Hope this helps,
Joel
P.S. Can you imagine how awesome your project would be if we were working with you, adding improvements like this every week?