Buckets

The Bucket API defines a Source to produce an Artifact for objects from storage solutions like Amazon S3, Google Cloud Storage buckets, or any other solution with a S3 compatible API such as Minio, Alibaba Cloud OSS and others.

Example

The following is an example of a Bucket. It creates a tarball (.tar.gz) Artifact with the fetched objects from an object storage with an S3 compatible API (e.g. Minio):

---
apiVersion: source.toolkit.fluxcd.io/v1
kind: Bucket
metadata:
  name: minio-bucket
  namespace: default
spec:
  interval: 5m0s
  endpoint: minio.example.com
  insecure: true
  secretRef:
    name: minio-bucket-secret
  bucketName: example
---
apiVersion: v1
kind: Secret
metadata:
  name: minio-bucket-secret
  namespace: default
type: Opaque
stringData:
  accesskey: <access key>
  secretkey: <secret key>

In the above example:

A Bucket named minio-bucket is created, indicated by the .metadata.name field.
The source-controller checks the object storage bucket every five minutes, indicated by the .spec.interval field.
It authenticates to the minio.example.com endpoint with the static credentials from the minio-secret Secret data, indicated by the .spec.endpoint and .spec.secretRef.name fields.
A list of object keys and their etags in the .spec.bucketName bucket is compiled, while filtering the keys using default ignore rules.
The digest (algorithm defaults to SHA256) of the list is used as Artifact revision, reported in-cluster in the .status.artifact.revision field.
When the current Bucket revision differs from the latest calculated revision, all objects are fetched and archived.
The new Artifact is reported in the .status.artifact field.

You can run this example by saving the manifest into bucket.yaml, and changing the Bucket and Secret values to target a Minio instance you have control over.

Note: For more advanced examples targeting e.g. Amazon S3 or GCP, see Provider.

Apply the resource on the cluster:
```
kubectl apply -f bucket.yaml
```

Run kubectl get buckets to see the Bucket:

NAME           ENDPOINT            AGE   READY   STATUS                                                                                         
minio-bucket   minio.example.com   34s   True    stored artifact for revision 'sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'

Run kubectl describe bucket minio-bucket to see the Artifact and Conditions in the Bucket's Status:

...
Status:
  Artifact:
    Digest:            sha256:72aa638abb455ca5f9ef4825b949fd2de4d4be0a74895bf7ed2338622cd12686
    Last Update Time:  2024-02-01T23:43:38Z
    Path:              bucket/default/minio-bucket/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855.tar.gz
    Revision:          sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
    Size:              38099
    URL:               http://source-controller.source-system.svc.cluster.local./bucket/default/minio-bucket/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855.tar.gz
  Conditions:
    Last Transition Time:  2024-02-01T23:43:38Z
    Message:               stored artifact for revision 'sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'
    Observed Generation:   1
    Reason:                Succeeded
    Status:                True
    Type:                  Ready
    Last Transition Time:  2024-02-01T23:43:38Z
    Message:               stored artifact for revision 'sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'
    Observed Generation:   1
    Reason:                Succeeded
    Status:                True
    Type:                  ArtifactInStorage
  Observed Generation:     1
  URL:                     http://source-controller.source-system.svc.cluster.local./bucket/default/minio-bucket/latest.tar.gz
Events:
  Type    Reason                  Age   From               Message
  ----    ------                  ----  ----               -------
  Normal  NewArtifact             82s   source-controller  stored artifact with 16 fetched files from 'example' bucket

Writing a Bucket spec

As with all other Kubernetes config, a Bucket needs apiVersion, kind, and metadata fields. The name of a Bucket object must be a valid DNS subdomain name.

A Bucket also needs a .spec section.

Provider

The .spec.provider field allows for specifying a Provider to enable provider specific configurations, for example to communicate with a non-S3 compatible API endpoint, or to change the authentication method.

Supported options are:

Generic
AWS
Azure
GCP

If you do not specify .spec.provider, it defaults to generic.

Generic

When a Bucket's spec.provider is set to generic, the controller will attempt to communicate with the specified Endpoint using the Minio Client SDK, which can communicate with any Amazon S3 compatible object storage (including GCS, Wasabi, and many others).

The generic Provider requires a Secret reference to a Secret with .data.accesskey and .data.secretkey values, used to authenticate with static credentials.

The Provider allows for specifying a region the bucket is in using the .spec.region field, if required by the Endpoint.

Generic example

---
apiVersion: source.toolkit.fluxcd.io/v1
kind: Bucket
metadata:
  name: generic-insecure
  namespace: default
spec:
  provider: generic
  interval: 5m0s
  bucketName: podinfo
  endpoint: minio.minio.svc.cluster.local:9000
  timeout: 60s
  insecure: true
  secretRef:
    name: minio-credentials
---
apiVersion: v1
kind: Secret
metadata:
  name: minio-credentials
  namespace: default
type: Opaque
data:
  accesskey: <BASE64>
  secretkey: <BASE64>

AWS

When a Bucket's .spec.provider field is set to aws, the source-controller will attempt to communicate with the specified Endpoint using the Minio Client SDK.

Without a Secret reference, authorization using credentials retrieved from the AWS EC2 service is attempted by default. When a reference is specified, it expects a Secret with .data.accesskey and .data.secretkey values, used to authenticate with static credentials.

The Provider allows for specifying the Amazon AWS Region using the .spec.region field.

AWS EC2 example

Note: On EKS you have to create an IAM role for the source-controller service account that grants access to the bucket.

---
apiVersion: source.toolkit.fluxcd.io/v1
kind: Bucket
metadata:
  name: aws
  namespace: default
spec:
  interval: 5m0s
  provider: aws
  bucketName: podinfo
  endpoint: s3.amazonaws.com
  region: us-east-1
  timeout: 30s

AWS IAM role example

Replace <bucket-name> with the specified .spec.bucketName.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::<bucket-name>/*"
        },
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::<bucket-name>"
        }
    ]
}

AWS static auth example

---
apiVersion: source.toolkit.fluxcd.io/v1
kind: Bucket
metadata:
  name: aws
  namespace: default
spec:
  interval: 5m0s
  provider: aws
  bucketName: podinfo
  endpoint: s3.amazonaws.com
  region: us-east-1
  secretRef:
    name: aws-credentials
---
apiVersion: v1
kind: Secret
metadata:
  name: aws-credentials
  namespace: default
type: Opaque
data:
  accesskey: <BASE64>
  secretkey: <BASE64>

Azure

When a Bucket's .spec.provider is set to azure, the source-controller will attempt to communicate with the specified Endpoint using the Azure Blob Storage SDK for Go.

Without a Secret reference, authentication using a chain with:

Environment credentials
Workload Identity
Managed Identity with the AZURE_CLIENT_ID
Managed Identity with a system-assigned identity

is attempted by default. If no chain can be established, the bucket is assumed to be publicly reachable.

When a reference is specified, it expects a Secret with one of the following sets of .data fields:

tenantId, clientId and clientSecret for authenticating a Service Principal with a secret.
tenantId, clientId and clientCertificate (plus optionally clientCertificatePassword and/or clientCertificateSendChain) for authenticating a Service Principal with a certificate.
clientId for authenticating using a Managed Identity.
accountKey for authenticating using a Shared Key.
sasKey for authenticating using a SAS Token

For any Managed Identity and/or Azure Active Directory authentication method, the base URL can be configured using .data.authorityHost. If not supplied, AzurePublicCloud is assumed.

Azure example

---
apiVersion: source.toolkit.fluxcd.io/v1
kind: Bucket
metadata:
  name: azure-public
  namespace: default
spec:
  interval: 5m0s
  provider: azure
  bucketName: podinfo
  endpoint: https://podinfoaccount.blob.core.windows.net
  timeout: 30s

Azure Service Principal Secret example

---
apiVersion: source.toolkit.fluxcd.io/v1
kind: Bucket
metadata:
  name: azure-service-principal-secret
  namespace: default
spec:
  interval: 5m0s
  provider: azure
  bucketName: <bucket-name>
  endpoint: https://<account-name>.blob.core.windows.net
  secretRef:
    name: azure-sp-auth
---
apiVersion: v1
kind: Secret
metadata:
  name: azure-sp-auth
  namespace: default
type: Opaque
data:
  tenantId: <BASE64>
  clientId: <BASE64>
  clientSecret: <BASE64>

Azure Service Principal Certificate example

---
apiVersion: source.toolkit.fluxcd.io/v1
kind: Bucket
metadata:
  name: azure-service-principal-cert
  namespace: default
spec:
  interval: 5m0s
  provider: azure
  bucketName: <bucket-name>
  endpoint: https://<account-name>.blob.core.windows.net
  secretRef:
    name: azure-sp-auth
---
apiVersion: v1
kind: Secret
metadata:
  name: azure-sp-auth
  namespace: default
type: Opaque
data:
  tenantId: <BASE64>
  clientId: <BASE64>
  clientCertificate: <BASE64>
  # Plus optionally
  clientCertificatePassword: <BASE64>
  clientCertificateSendChain: <BASE64> # either "1" or "true"

Azure Managed Identity with Client ID example

---
apiVersion: source.toolkit.fluxcd.io/v1
kind: Bucket
metadata:
  name: azure-managed-identity
  namespace: default
spec:
  interval: 5m0s
  provider: azure
  bucketName: <bucket-name>
  endpoint: https://<account-name>.blob.core.windows.net
  secretRef:
    name: azure-smi-auth
---
apiVersion: v1
kind: Secret
metadata:
  name: azure-smi-auth
  namespace: default
type: Opaque
data:
  clientId: <BASE64>

Azure Blob Shared Key example

---
apiVersion: source.toolkit.fluxcd.io/v1
kind: Bucket
metadata:
  name: azure-shared-key
  namespace: default
spec:
  interval: 5m0s
  provider: azure
  bucketName: <bucket-name>
  endpoint: https://<account-name>.blob.core.windows.net
  secretRef:
    name: azure-key
---
apiVersion: v1
kind: Secret
metadata:
  name: azure-key
  namespace: default
type: Opaque
data:
  accountKey: <BASE64>

Workload Identity

If you have Workload Identity set up on your cluster, you need to create an Azure Identity and give it access to Azure Blob Storage.

export IDENTITY_NAME="blob-access"

az role assignment create --role "Storage Blob Data Reader" \
--assignee-object-id "$(az identity show -n $IDENTITY_NAME  -o tsv --query principalId  -g $RESOURCE_GROUP)" \
--scope "/subscriptions/<SUBSCRIPTION-ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Storage/storageAccounts/<account-name>/blobServices/default/containers/<container-name>"

Establish a federated identity between the Identity and the source-controller ServiceAccount.

export SERVICE_ACCOUNT_ISSUER="$(az aks show --resource-group <RESOURCE_GROUP> --name <CLUSTER-NAME> --query "oidcIssuerProfile.issuerUrl" -otsv)"

az identity federated-credential create \
  --name "kubernetes-federated-credential" \
  --identity-name "${IDENTITY_NAME}" \
  --resource-group "${RESOURCE_GROUP}" \
  --issuer "${SERVICE_ACCOUNT_ISSUER}" \
  --subject "system:serviceaccount:flux-system:source-controller"

Add a patch to label and annotate the source-controller Deployment and ServiceAccount correctly so that it can match an identity binding:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - gotk-components.yaml
  - gotk-sync.yaml
patches:
  - patch: |-
      apiVersion: v1
      kind: ServiceAccount
      metadata:
        name: source-controller
        namespace: flux-system
        annotations:
          azure.workload.identity/client-id: <AZURE_CLIENT_ID>
        labels:
          azure.workload.identity/use: "true"
  - patch: |-
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: source-controller
        namespace: flux-system
        labels:
          azure.workload.identity/use: "true"
      spec:
        template:
          metadata:
            labels:
              azure.workload.identity/use: "true"

If you have set up Workload Identity correctly and labeled the source-controller Deployment and ServiceAccount, then you don't need to reference a Secret. For more information, please see documentation.

apiVersion: source.toolkit.fluxcd.io/v1
kind: Bucket
metadata:
  name: azure-bucket
  namespace: flux-system
spec:
  interval: 5m0s
  provider: azure
  bucketName: testsas
  endpoint: https://testfluxsas.blob.core.windows.net

Deprecated: Managed Identity with AAD Pod Identity

If you are using aad pod identity, You need to create an Azure Identity and give it access to Azure Blob Storage.

export IDENTITY_NAME="blob-access"

az role assignment create --role "Storage Blob Data Reader"  \
--assignee-object-id "$(az identity show -n $IDENTITY_NAME -o tsv --query principalId  -g $RESOURCE_GROUP)" \
--scope "/subscriptions/<SUBSCRIPTION-ID>/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.Storage/storageAccounts/<account-name>/blobServices/default/containers/<container-name>"

export IDENTITY_CLIENT_ID="$(az identity show -n ${IDENTITY_NAME} -g ${RESOURCE_GROUP} -otsv --query clientId)"
export IDENTITY_RESOURCE_ID="$(az identity show -n ${IDENTITY_NAME} -otsv --query id)"

Create an AzureIdentity object that references the identity created above:

---
apiVersion: aadpodidentity.k8s.io/v1
kind: AzureIdentity
metadata:
  name:  # source-controller label will match this name
  namespace: flux-system
spec:
  clientID: <IDENTITY_CLIENT_ID>
  resourceID: <IDENTITY_RESOURCE_ID>
  type: 0  # user-managed identity

Create an AzureIdentityBinding object that binds Pods with a specific selector with the AzureIdentity created:

apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzureIdentityBinding
metadata:
  name: ${IDENTITY_NAME}-binding
spec:
  azureIdentity: ${IDENTITY_NAME}
  selector: ${IDENTITY_NAME}

Label the source-controller Deployment correctly so that it can match an identity binding:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kustomize-controller
  namespace: flux-system
spec:
  template:
    metadata:
      labels:
        aadpodidbinding: ${IDENTITY_NAME}  # match the AzureIdentity name

If you have set up aad-pod-identity correctly and labeled the source-controller Deployment, then you don't need to reference a Secret.

apiVersion: source.toolkit.fluxcd.io/v1
kind: Bucket
metadata:
  name: azure-bucket
  namespace: flux-system
spec:
  interval: 5m0s
  provider: azure
  bucketName: testsas
  endpoint: https://testfluxsas.blob.core.windows.net

Azure Blob SAS Token example

---
apiVersion: source.toolkit.fluxcd.io/v1
kind: Bucket
metadata:
  name: azure-sas-token
  namespace: default
spec:
  interval: 5m0s
  provider: azure
  bucketName: <bucket-name>
  endpoint: https://<account-name>.blob.core.windows.net
  secretRef:
    name: azure-key
---
apiVersion: v1
kind: Secret
metadata:
  name: azure-key
  namespace: default
type: Opaque
data:
  sasKey: <base64>

The sasKey only contains the SAS token e.g ?sv=2020-08-0&ss=bfqt&srt=co&sp=rwdlacupitfx&se=2022-05-26T21:55:35Z&st=2022-05.... The leading question mark (?) is optional. The query values from the sasKey data field in the Secrets gets merged with the ones in the .spec.endpoint of the Bucket. If the same key is present in the both of them, the value in the sasKey takes precedence.

Note: The SAS token has an expiry date, and it must be updated before it expires to allow Flux to continue to access Azure Storage. It is allowed to use an account-level or container-level SAS token.

The minimum permissions for an account-level SAS token are:

Allowed services: Blob
Allowed resource types: Container, Object
Allowed permissions: Read, List

The minimum permissions for a container-level SAS token are:

Allowed permissions: Read, List

Refer to the Azure documentation for a full overview on permissions.

GCP

When a Bucket's .spec.provider is set to gcp, the source-controller will attempt to communicate with the specified Endpoint using the Google Client SDK.

Without a Secret reference, authorization using a workload identity is attempted by default. The workload identity is obtained using the GOOGLE_APPLICATION_CREDENTIALS environment variable, falling back to the Google Application Credential file in the config directory. When a reference is specified, it expects a Secret with a .data.serviceaccount value with a GCP service account JSON file.

The Provider allows for specifying the Bucket location using the .spec.region field.

GCP example

---
apiVersion: source.toolkit.fluxcd.io/v1
kind: Bucket
metadata:
  name: gcp-workload-identity
  namespace: default
spec:
  interval: 5m0s
  provider: gcp
  bucketName: podinfo
  endpoint: storage.googleapis.com
  region: us-east-1
  timeout: 30s

GCP static auth example

---
apiVersion: source.toolkit.fluxcd.io/v1
kind: Bucket
metadata:
  name: gcp-secret
  namespace: default
spec:
  interval: 5m0s
  provider: gcp
  bucketName: <bucket-name>
  endpoint: storage.googleapis.com
  region: <bucket-region>
  secretRef:
    name: gcp-service-account
---
apiVersion: v1
kind: Secret
metadata:
  name: gcp-service-account
  namespace: default
type: Opaque
data:
  serviceaccount: <BASE64>

Where the (base64 decoded) value of .data.serviceaccount looks like this:

{
  "type": "service_account",
  "project_id": "example",
  "private_key_id": "28qwgh3gdf5hj3gb5fj3gsu5yfgh34f45324568hy2",
  "private_key": "-----BEGIN PRIVATE KEY-----\nHwethgy123hugghhhbdcu6356dgyjhsvgvGFDHYgcdjbvcdhbsx63c\n76tgycfehuhVGTFYfw6t7ydgyVgydheyhuggycuhejwy6t35fthyuhegvcetf\nTFUHGTygghubhxe65ygt6tgyedgy326hucyvsuhbhcvcsjhcsjhcsvgdtHFCGi\nHcye6tyyg3gfyuhchcsbhygcijdbhyyTF66tuhcevuhdcbhuhhvftcuhbh3uh7t6y\nggvftUHbh6t5rfthhuGVRtfjhbfcrd5r67yuhuvgFTYjgvtfyghbfcdrhyjhbfctfdfyhvfg\ntgvggtfyghvft6tugvTF5r66tujhgvfrtyhhgfct6y7ytfr5ctvghbhhvtghhjvcttfycf\nffxfghjbvgcgyt67ujbgvctfyhVC7uhvgcyjvhhjvyujc\ncgghgvgcfhgg765454tcfthhgftyhhvvyvvffgfryyu77reredswfthhgfcftycfdrttfhf/\n-----END PRIVATE KEY-----\n",
  "client_email": "test@example.iam.gserviceaccount.com",
  "client_id": "32657634678762536746",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/test%40podinfo.iam.gserviceaccount.com"
}

Interval

.spec.interval is a required field that specifies the interval which the object storage bucket must be consulted at.

After successfully reconciling a Bucket object, the source-controller requeues the object for inspection after the specified interval. The value must be in a Go recognized duration string format, e.g. 10m0s to look at the object storage bucket every 10 minutes.

If the .metadata.generation of a resource changes (due to e.g. the apply of a change to the spec), this is handled instantly outside the interval window.

Note: The controller can be configured to apply a jitter to the interval in order to distribute the load more evenly when multiple Bucket objects are set up with the same interval. For more information, please refer to the source-controller configuration options.

Endpoint

.spec.endpoint is a required field that specifies the HTTP/S object storage endpoint to connect to and fetch objects from. Connecting to an (insecure) HTTP endpoint requires enabling .spec.insecure.

Some endpoints require the specification of a .spec.region, see Provider for more (provider specific) examples.

STS

.spec.sts is an optional field for specifying the Security Token Service configuration. A Security Token Service (STS) is a web service that issues temporary security credentials. By adding this field, one may specify the STS endpoint from where temporary credentials will be fetched.

This field is only supported for the aws and generic bucket providers.

If using .spec.sts, the following fields are required:

.spec.sts.provider, the Security Token Service provider. The only supported option for the generic bucket provider is ldap. The only supported option for the aws bucket provider is aws.
.spec.sts.endpoint, the HTTP/S endpoint of the Security Token Service. In the case of aws this can be https://sts.amazonaws.com, or a Regional STS Endpoint, or an Interface Endpoint created inside a VPC. In the case of ldap this must be the LDAP server endpoint.

When using the ldap provider, the following fields may also be specified:

.spec.sts.secretRef.name, the name of the Secret containing the LDAP credentials. The Secret must contain the following keys:
- username, the username to authenticate with.
- password, the password to authenticate with.
.spec.sts.certSecretRef.name, the name of the Secret containing the TLS configuration for communicating with the STS endpoint. The contents of this Secret must follow the same structure of .spec.certSecretRef.name.

If .spec.proxySecretRef.name is specified, the proxy configuration will be used for commucating with the STS endpoint.

Example for the ldap provider:

---
apiVersion: source.toolkit.fluxcd.io/v1
kind: Bucket
metadata:
  name: example
  namespace: example
spec:
  interval: 5m
  bucketName: example
  provider: generic
  endpoint: minio.example.com
  sts:
    provider: ldap
    endpoint: https://ldap.example.com
    secretRef:
      name: ldap-credentials
    certSecretRef:
      name: ldap-tls
---
apiVersion: v1
kind: Secret
metadata:
  name: ldap-credentials
  namespace: example
type: Opaque
stringData:
  username: <username>
  password: <password>
---
apiVersion: v1
kind: Secret
metadata:
  name: ldap-tls
  namespace: example
type: kubernetes.io/tls # or Opaque
stringData:
  tls.crt: <PEM-encoded cert>
  tls.key: <PEM-encoded key>
  ca.crt: <PEM-encoded cert>

Bucket name

.spec.bucketName is a required field that specifies which object storage bucket on the Endpoint objects should be fetched from.

See Provider for more (provider specific) examples.

Region

.spec.region is an optional field to specify the region a .spec.bucketName is located in.

See Provider for more (provider specific) examples.

Cert secret reference

.spec.certSecretRef.name is an optional field to specify a secret containing TLS certificate data. The secret can contain the following keys:

tls.crt and tls.key, to specify the client certificate and private key used for TLS client authentication. These must be used in conjunction, i.e. specifying one without the other will lead to an error.
ca.crt, to specify the CA certificate used to verify the server, which is required if the server is using a self-signed certificate.

If the server is using a self-signed certificate and has TLS client authentication enabled, all three values are required.

The Secret should be of type Opaque or kubernetes.io/tls. All the files in the Secret are expected to be [PEM-encoded][pem-encoding]. Assuming you have three files; client.key, client.crt and ca.crt for the client private key, client certificate and the CA certificate respectively, you can generate the required Secret using the flux create secret tls command:

flux create secret tls minio-tls --tls-key-file=client.key --tls-crt-file=client.crt --ca-crt-file=ca.crt

If TLS client authentication is not required, you can generate the secret with:

flux create secret tls minio-tls --ca-crt-file=ca.crt

This API is only supported for the generic provider.

Example usage:

---
apiVersion: source.toolkit.fluxcd.io/v1
kind: Bucket
metadata:
  name: example
  namespace: example
spec:
  interval: 5m
  bucketName: example
  provider: generic
  endpoint: minio.example.com
  certSecretRef:
    name: minio-tls
---
apiVersion: v1
kind: Secret
metadata:
  name: minio-tls
  namespace: example
type: kubernetes.io/tls # or Opaque
stringData:
  tls.crt: <PEM-encoded cert>
  tls.key: <PEM-encoded key>
  ca.crt: <PEM-encoded cert>

Proxy secret reference

.spec.proxySecretRef.name is an optional field used to specify the name of a Secret that contains the proxy settings for the object. These settings are used for all the remote operations related to the Bucket. The Secret can contain three keys:

address, to specify the address of the proxy server. This is a required key.
username, to specify the username to use if the proxy server is protected by basic authentication. This is an optional key.
password, to specify the password to use if the proxy server is protected by basic authentication. This is an optional key.

Example:

---
apiVersion: v1
kind: Secret
metadata:
  name: http-proxy
type: Opaque
stringData:
  address: http://proxy.com
  username: mandalorian
  password: grogu

Proxying can also be configured in the source-controller Deployment directly by using the standard environment variables such as HTTPS_PROXY, ALL_PROXY, etc.

.spec.proxySecretRef.name takes precedence over all environment variables.

Insecure

.spec.insecure is an optional field to allow connecting to an insecure (HTTP) endpoint, if set to true. The default value is false, denying insecure (HTTP) connections.

Timeout

.spec.timeout is an optional field to specify a timeout for object storage fetch operations. The value must be in a Go recognized duration string format, e.g. 1m30s for a timeout of one minute and thirty seconds. The default value is 60s.

Secret reference

.spec.secretRef.name is an optional field to specify a name reference to a Secret in the same namespace as the Bucket, containing authentication credentials for the object storage. For some .spec.provider implementations the presence of the field is required, see Provider for more details and examples.

Prefix

.spec.prefix is an optional field to enable server-side filtering of files in the Bucket.

Note: The server-side filtering works only with the generic, aws and gcp provider and is preferred over .spec.ignore as a more efficient way of excluding files.

Ignore

.spec.ignore is an optional field to specify rules in the .gitignore pattern format. Storage objects which keys match the defined rules are excluded while fetching.

When specified, .spec.ignore overrides the default exclusion list, and may overrule the .sourceignore file exclusions. See excluding files for more information.

Suspend

.spec.suspend is an optional field to suspend the reconciliation of a Bucket. When set to true, the controller will stop reconciling the Bucket, and changes to the resource or in the object storage bucket will not result in a new Artifact. When the field is set to false or removed, it will resume.

For practical information, see suspending and resuming.

Working with Buckets

Excluding files

By default, storage bucket objects which match the default exclusion rules are excluded while fetching. It is possible to overwrite and/or overrule the default exclusions using a file in the bucket and/or an in-spec set of rules.

`.sourceignore` file

Excluding files is possible by adding a .sourceignore file in the root of the object storage bucket. The .sourceignore file follows the .gitignore pattern format, and pattern entries may overrule default exclusions.

Ignore spec

Another option is to define the exclusions within the Bucket spec, using the .spec.ignore field. Specified rules override the default exclusion list, and may overrule .sourceignore file exclusions.

---
apiVersion: source.toolkit.fluxcd.io/v1
kind: Bucket
metadata:
  name: <bucket-name>
spec:
  ignore: |
    # exclude all
    /*
    # include deploy dir
    !/deploy
    # exclude file extensions from deploy dir
    /deploy/**/*.md
    /deploy/**/*.txt

Triggering a reconcile

To manually tell the source-controller to reconcile a Bucket outside the specified interval window, a Bucket can be annotated with reconcile.fluxcd.io/requestedAt: <arbitrary value>. Annotating the resource queues the Bucket for reconciliation if the <arbitrary-value> differs from the last value the controller acted on, as reported in .status.lastHandledReconcileAt.

Using kubectl:

kubectl annotate --field-manager=flux-client-side-apply --overwrite  bucket/<bucket-name> reconcile.fluxcd.io/requestedAt="$(date +%s)"

Using flux:

flux reconcile source bucket <bucket-name>

Waiting for `Ready`

When a change is applied, it is possible to wait for the Bucket to reach a ready state using kubectl:

kubectl wait bucket/<bucket-name> --for=condition=ready --timeout=1m

Suspending and resuming

When you find yourself in a situation where you temporarily want to pause the reconciliation of a Bucket, you can suspend it using the .spec.suspend field.

Suspend a Bucket

In your YAML declaration:

---
apiVersion: source.toolkit.fluxcd.io/v1
kind: Bucket
metadata:
  name: <bucket-name>
spec:
  suspend: true

Using kubectl:

kubectl patch bucket <bucket-name> --field-manager=flux-client-side-apply -p '{\"spec\": {\"suspend\" : true }}'

Using flux:

flux suspend source bucket <bucket-name>

Note: When a Bucket has an Artifact and is suspended, and this Artifact later disappears from the storage due to e.g. the source-controller Pod being evicted from a Node, this will not be reflected in the Bucket's Status until it is resumed.

Resume a Bucket

In your YAML declaration, comment out (or remove) the field:

---
apiVersion: source.toolkit.fluxcd.io/v1
kind: Bucket
metadata:
  name: <bucket-name>
spec:
  # suspend: true

Note: Setting the field value to false has the same effect as removing it, but does not allow for "hot patching" using e.g. kubectl while practicing GitOps; as the manually applied patch would be overwritten by the declared state in Git.

Using kubectl:

kubectl patch bucket <bucket-name> --field-manager=flux-client-side-apply -p '{\"spec\" : {\"suspend\" : false }}'

Using flux:

flux resume source bucket <bucket-name>

Debugging a Bucket

There are several ways to gather information about a Bucket for debugging purposes.

Describe the Bucket

Describing a Bucket using kubectl describe bucket <bucket-name> displays the latest recorded information for the resource in the Status and Events sections:

...
Status:
...
  Conditions:
    Last Transition Time:  2024-02-02T13:26:55Z
    Message:               processing object: new generation 1 -> 2
    Observed Generation:   2
    Reason:                ProgressingWithRetry
    Status:                True
    Type:                  Reconciling
    Last Transition Time:  2024-02-02T13:26:55Z
    Message:               bucket 'my-new-bucket' does not exist
    Observed Generation:   2
    Reason:                BucketOperationFailed
    Status:                False
    Type:                  Ready
    Last Transition Time:  2024-02-02T13:26:55Z
    Message:               bucket 'my-new-bucket' does not exist
    Observed Generation:   2
    Reason:                BucketOperationFailed
    Status:                True
    Type:                  FetchFailed
  Observed Generation:     1
  URL:                     http://source-controller.source-system.svc.cluster.local./bucket/default/minio-bucket/latest.tar.gz
Events:
  Type     Reason                      Age                 From               Message
  ----     ------                      ----                ----               -------
  Warning  BucketOperationFailed       37s (x11 over 42s)  source-controller  bucket 'my-new-bucket' does not exist

Trace emitted Events

To view events for specific Bucket(s), kubectl events can be used in combination with --for to list the Events for specific objects. For example, running

kubectl events --for Bucket/<bucket-name>

lists

LAST SEEN   TYPE      REASON                       OBJECT                 MESSAGE
2m30s       Normal    NewArtifact                  bucket/<bucket-name>   fetched 16 files with revision from 'my-new-bucket'
36s         Normal    ArtifactUpToDate             bucket/<bucket-name>   artifact up-to-date with remote revision: 'sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'
18s         Warning   BucketOperationFailed        bucket/<bucket-name>   bucket 'my-new-bucket' does not exist

Besides being reported in Events, the reconciliation errors are also logged by the controller. The Flux CLI offer commands for filtering the logs for a specific Bucket, e.g. flux logs --level=error --kind=Bucket --name=<bucket-name>.

Bucket Status

Artifact

The Bucket reports the latest synchronized state from the object storage bucket as an Artifact object in the .status.artifact of the resource.

The Artifact file is a gzip compressed TAR archive (<calculated revision>.tar.gz), and can be retrieved in-cluster from the .status.artifact.url HTTP address.

Artifact example

---
apiVersion: source.toolkit.fluxcd.io/v1
kind: Bucket
metadata:
  name: <bucket-name>
status:
  artifact:
    digest: sha256:cbec34947cc2f36dee8adcdd12ee62ca6a8a36699fc6e56f6220385ad5bd421a
    lastUpdateTime: "2024-01-28T10:30:30Z"
    path: bucket/<namespace>/<bucket-name>/c3ab8ff13720e8ad9047dd39466b3c8974e592c2fa383d4a3960714caef0c4f2.tar.gz
    revision: sha256:c3ab8ff13720e8ad9047dd39466b3c8974e592c2fa383d4a3960714caef0c4f2
    size: 38099
    url: http://source-controller.<namespace>.svc.cluster.local./bucket/<namespace>/<bucket-name>/c3ab8ff13720e8ad9047dd39466b3c8974e592c2fa383d4a3960714caef0c4f2.tar.gz

Default exclusions

The following files and extensions are excluded from the Artifact by default:

Git files (.git/, .gitignore, .gitmodules, .gitattributes)
File extensions (.jpg, .jpeg, .gif, .png, .wmv, .flv, .tar.gz, .zip)
CI configs (.github/, .circleci/, .travis.yml, .gitlab-ci.yml, appveyor.yml, .drone.yml, cloudbuild.yaml, codeship-services.yml, codeship-steps.yml)
CLI configs (.goreleaser.yml, .sops.yaml)
Flux v1 config (.flux.yaml)

To define your own exclusion rules, see excluding files.

Conditions

A Bucket enters various states during its lifecycle, reflected as Kubernetes Conditions. It can be reconciling while fetching storage objects, it can be ready, or it can fail during reconciliation.

The Bucket API is compatible with the kstatus specification, and reports Reconciling and Stalled conditions where applicable to provide better (timeout) support to solutions polling the Bucket to become Ready.

Reconciling Bucket

The source-controller marks a Bucket as reconciling when one of the following is true:

There is no current Artifact for the Bucket, or the reported Artifact is determined to have disappeared from the storage.
The generation of the Bucket is newer than the Observed Generation.
The newly calculated Artifact revision differs from the current Artifact.

When the Bucket is "reconciling", the Ready Condition status becomes Unknown when the controller detects drift, and the controller adds a Condition with the following attributes to the Bucket's .status.conditions:

type: Reconciling
status: "True"
reason: Progressing | reason: ProgressingWithRetry

If the reconciling state is due to a new revision, an additional Condition is added with the following attributes:

type: ArtifactOutdated
status: "True"
reason: NewRevision

Both Conditions have a "negative polarity", and are only present on the Bucket while their status value is "True".

Ready Bucket

The source-controller marks a Bucket as ready when it has the following characteristics:

The Bucket reports an Artifact.
The reported Artifact exists in the controller's Artifact storage.
The Bucket was able to communicate with the Bucket's object storage endpoint using the current spec.
The revision of the reported Artifact is up-to-date with the latest calculated revision of the object storage bucket.

When the Bucket is "ready", the controller sets a Condition with the following attributes in the Bucket's .status.conditions:

type: Ready
status: "True"
reason: Succeeded

This Ready Condition will retain a status value of "True" until the Bucket is marked as reconciling, or e.g. a transient error occurs due to a temporary network issue.

When the Bucket Artifact is archived in the controller's Artifact storage, the controller sets a Condition with the following attributes in the Bucket's .status.conditions:

type: ArtifactInStorage
status: "True"
reason: Succeeded

This ArtifactInStorage Condition will retain a status value of "True" until the Artifact in the storage no longer exists.

Failed Bucket

The source-controller may get stuck trying to produce an Artifact for a Bucket without completing. This can occur due to some of the following factors:

The object storage Endpoint is temporarily unavailable.
The specified object storage bucket does not exist.
The Secret reference contains a reference to a non-existing Secret.
The credentials in the referenced Secret are invalid.
The Bucket spec contains a generic misconfiguration.
A storage related failure when storing the artifact.

When this happens, the controller sets the Ready Condition status to False, and adds a Condition with the following attributes to the Bucket's .status.conditions:

type: FetchFailed | type: StorageOperationFailed
status: "True"
reason: AuthenticationFailed | reason: BucketOperationFailed

This condition has a "negative polarity", and is only present on the Bucket while the status value is "True". There may be more arbitrary values for the reason field to provide accurate reason for a condition.

While the Bucket has this Condition, the controller will continue to attempt to produce an Artifact for the resource with an exponential backoff, until it succeeds and the Bucket is marked as ready.

Note that a Bucket can be reconciling while failing at the same time, for example due to a newly introduced configuration issue in the Bucket spec. When a reconciliation fails, the Reconciling Condition reason would be ProgressingWithRetry. When the reconciliation is performed again after the failure, the reason is updated to Progressing.

Observed Ignore

The source-controller reports an observed ignore in the Bucket's .status.observedIgnore. The observed ignore is the latest .spec.ignore value which resulted in a ready state, or stalled due to error it can not recover from without human intervention. The value is the same as the ignore in spec. It indicates the ignore rules used in building the current artifact in storage.

Example:

status:
  ...
  observedIgnore: |
    hpa.yaml
    build
  ...

Observed Generation

The source-controller reports an observed generation in the Bucket's .status.observedGeneration. The observed generation is the latest .metadata.generation which resulted in either a ready state, or stalled due to error it can not recover from without human intervention.

Last Handled Reconcile At

The source-controller reports the last reconcile.fluxcd.io/requestedAt annotation value it acted on in the .status.lastHandledReconcileAt field.

For practical information about this field, see triggering a reconcile.

Files

buckets.md

Latest commit

History

buckets.md

File metadata and controls

Buckets

Example

Writing a Bucket spec

Provider

Generic

Generic example

AWS

AWS EC2 example

AWS IAM role example

AWS static auth example

Azure

Azure example

Azure Service Principal Secret example

Azure Service Principal Certificate example

Azure Managed Identity with Client ID example

Azure Blob Shared Key example

Workload Identity

Deprecated: Managed Identity with AAD Pod Identity

Azure Blob SAS Token example

GCP

GCP example

GCP static auth example

Interval

Endpoint

STS

Bucket name

Region

Cert secret reference

Proxy secret reference

Insecure

Timeout

Secret reference

Prefix

Ignore

Suspend

Working with Buckets

Excluding files

.sourceignore file

Ignore spec

Triggering a reconcile

Waiting for Ready

Suspending and resuming

Suspend a Bucket

Resume a Bucket

Debugging a Bucket

Describe the Bucket

Trace emitted Events

Bucket Status

Artifact

Artifact example

Default exclusions

Conditions

Reconciling Bucket

Ready Bucket

Failed Bucket

Observed Ignore

Observed Generation

Last Handled Reconcile At

`.sourceignore` file

Waiting for `Ready`