** Warning: This builder is experimental and is very likely to change in breaking ways at this time. **
This tool fetches objects from Google Cloud Storage, either in the form of a .zip archive, or based on the contents of a source manifest file.
A source manifest is a JSON object in Cloud Storage listing other objects in Cloud Storage that should be fetched. The format of the manifest is a mapping of destination file path to the location in Cloud Storage where the file's contents can be found.
The following is an example source manifest:
{
"Dockerfile": {
"sourceUrl": "gs://my-bucket/abcdef",
"sha1sum": "<sha-1 digest>"
},
"path/to/main.go": {
"sourceUrl": "gs://my-bucket/ghijk",
"sha1sum": "<sha-1 digest>"
}
}
To process the above manifest, the GCS Fetcher tool processes each element:
- Fetch the object located at
sourceUrl
- Verify the object's SHA-1 matches the expected digest
- Write the file contents to the path indicated by the object key
So in the above example, the tool fetches gs://my-bucket/abcdef
, verifies its
SHA-1 digest, and places the file in the working directory named as
Dockerfile
. It then fetches gs://my-bucket/ghijk
, verifies its SHA-1 digest,
and places the file in the working directory at path/to/main.go
.
The main benefit to source manifests are in enabling incremental upload of sources from a client.
If a user uploads the two files in the example above, then edits the contents of
their Dockerfile
, only that new file has to be uploaded to Cloud Storage, and
the manifest for the next build can reuse the entry for path/to/main.go
,
only writing a new entry for Dockerfile
to specify the new file's location in
Cloud Storage.
In the case of a file move, where file contents don't change, no new files have to be uploaded. The new manifest simply changes the key in the object describing the path to place the file fetched from Cloud Storage.
To fetch source described in a source manifest, add the following line to your build config:
steps:
- name: 'gcr.io/cloud-builders/gcs-fetcher'
args:
- '--type=Manifest'
- '--location=gs://${PROJECT_ID}_cloudbuild/manifest-foo.json'
It may also be useful to produce and upload source manifests describing some
source, which you can do with gcr.io/cloud-builders/gcs-uploader
:
steps:
- name: 'gcr.io/cloud-builders/gcs-uploader'
args: ['--location=${PROJECT_ID}_cloudbuild/manifest-${BUILD_ID}.json']
This will upload the contents of the workspace directory, ignoring objects that
are already present in Cloud Storage, and upload a manifest JSON object named
manifest-${BUILD_ID}.json
to the same Cloud Storage bucket.
gcs-uploader
will not delete remote objects that are not present locally.
gcs-fetcher
and gcs-uploader
can be used together to provide simple
cross-build caching functionality, by optimistically fetching files that are
expensive to generate at the beginning of the build, and by uploading those
files at the end of the build.
In order to benefit from this you would need to define a well-known reusable manifest file location.
steps:
# Attempt to fetch whatever files are available.
- name: 'gcr.io/cloud-builders/gcs-fetcher'
args:
- '--type=Manifest'
- '--location=gs://${PROJECT_ID}_cloudbuild_cache/manifest-foo.json'
# Generate new files, ignoring those that already exist.
# - name: 'generate-new-files'
# ...
# Upload all files; only new and changed files will be uploaded to
# Cloud Storage.
- name: 'gcr.io/cloud-builders/gcs-uploader'
- '--bucket=${PROJECT_ID}_cloudbuild_cache'
- '--manifest_file=manifest-foo.json'
Tips and Caveats
- You can use Cloud Storage object lifecycle management to automatically delete objects after a certain amount of time, which may reduce storage costs but negatively impact cache hit rates.
- Two ongoing builds that fetch from and upload to the same manifest file may
interact poorly with each other, leading to confusing bugs. You may want to
include
${BRANCH_NAME}
or some other unique value in the manifest file location to avoid this. - Even with incremental upload, you may find that generating the files is just as fast or faster than fetching from Cloud Storage. Caching is not a magic bullet, and can add more complexity than it removes.
- .tar.gz support, depending on object name extension
- Unit tests to cover generation parsing/formatting
- Actually verify SHA-1