OCPNODE-2205: Lazy image pull support #1600

harche · 2024-03-18T19:53:14Z

xref : https://issues.redhat.com/browse/OCPNODE-2205

openshift-ci-robot · 2024-03-18T19:57:04Z

@harche: This pull request references OCPNODE-2205 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

xref : https://issues.redhat.com/browse/OCPNODE-2205

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

harche · 2024-03-18T19:58:43Z

/cc @sairameshv

giuseppe · 2024-03-18T20:07:07Z

enhancements/node/lazy-image-pull.md

+# Non-Goals
+
+* Enabling lazy image pulling on the control plane nodes.
+* Replacement of the standard OCI image format; `eStargz` will be an additional supported format.


our tools do not support eStargz. We are not able to produce this format.

We can use the zstd:chunked format.

Same comment applies in other places in this patch

Updated. Thanks.

Stargz store performs lazy pulling if the image is eStargz or zstd:chunked and CRI-O performs lazy pulling for them when stargz store enabled. So I think the formats don't need to be limited to zstd:chunked and eStargz can still be listed as one of the lazy-pullable formats.

@ktock to begin with only zstd:chunked format be supported by OCP support. i.e. we will not technically prevent anyone from using any other format, but the official support will only be extending, at least in the beginning, to zstd:chunked format.

rphillips · 2024-03-18T20:41:32Z

enhancements/node/lazy-image-pull.md

+  name: cluster
+  spec:
+    lazyImagePools:
+      - "serverless-custom-pool1"


I think we should have the type in addition to the pool name. Just in case we want to change the type of the lazy image pull later.

kannon92 · 2024-03-20T16:46:21Z

enhancements/node/lazy-image-pull.md

+[storage.options]
+additionallayerstores = ["/var/lib/stargz-store/store:ref"]
+```
+As of now, that `c/storage` configuration file does not support drop-in config. The entire content of that file is [hard-coded in MCO](https://github.com/openshift/machine-config-operator/blob/release-4.16/templates/common/_base/files/container-storage.yaml). Without the drop-in config support, we risk overriding that critical file and potentially losing custom configuration that might be already present in there. A Request for adding drop-in config support for applying similar custom configurations to `c/storage` [already exists](https://github.com/containers/storage/issues/1306). Hence, this proposal advocates adding support for drop-in config in `c/storage`.


We would need this for KEP-4191 also. Since we are going to modify imagestore.

kannon92 · 2024-03-20T16:48:31Z

enhancements/node/lazy-image-pull.md

+
+* **Faster Serverless:** - Serverless end points need to serve the cold requests as fast as possible. Lazy-pulling of the images allow the serverlesss end point to start serving the user request without getting blocked on the entire container image to get downloaded on the host.
+* **Improved User Experience:** Large images often lead to extended pod startup times. Lazy-pulling drastically reduces the initial delay, providing a smoother user experience.
+* **Potential Benefits for Large Images:** If the container image is shared between various containers which utilize only part of the image during their execution, Lazy image pulling speeds up such container startup by downloading only the relavent bits from those images.


It would be worth exploring what types of images would not work with stargz. And how we can deduce that?

ie Can we understand why stargz is not working for this images and what kind of logs can we look at to determine that for support/operational?

That snapshotter supports zstd:chunked and eStargz format. OCP will support zstd:chunked format only. You can enable debugging in /etc/container/storage.conf to get more verbose output of the image handling.

That isn't really answering my question.

Potential Benefits for Large Images: If the container image is shared between various containers which > utilize only part of the image during their execution, Lazy image pulling speeds up such container startup by > downloading only the relavent bits from those images.

this makes it sound like the benefits of lazy pulling is not guaranteed. So i wanted to know how we could know if it wasn't working or working. From a support and a user perspective.

Will it always work if someone uses zstd:chunked for their containers. And in all cases, that is all that is needed?

Are there cases where all the bits would need to be downloaded for a container to start?

If the Stargz service plugin is not running on the host, then the images will not get lazy pulled. i.e. the container would start times will be like regular container images.

If that plugin is running and the users suspect any defect in the plugin they can look at the unit logs for details.

You can use image with zstd:chucked format, but without that plugin the lazy part of image pulling won't work, so you would have container start up times comparable to the regular container images.

I'm mostly asking because the language you said makes it sound like this isn't guaranteed.

I think if stargz service plugin is enabled, and images will get lazy pulled.

I would maybe just drop the "Potentially".

But adding some logs on debugging this would be useful for docs/feature enablement.

Basically lazy pulling is trading throughput for latency — startup might be faster, but all operations are a bit slower due to the indirection required for the “is the file available already, or do I need to download it” checks.

So, the answer to “what types of images don’t benefit” from lazy pulling is “all well-designed ones” (when throughput matters): if all files in the image are actually necessary, and consumed over the lifetime of the application, then it is faster to pull and to use the image using an ordinary pull, rather than the FUSE slowly-download-files-on-demand lazy approach.

Lazy pulling should be beneficial only if many files in the container are not actually necessary for processing the typical request (and if the files that are necessary are so few that the per-file request latency is not overwhelming). I’m sure that happens frequently in practice, but it is quite likely an indication of something that could be fixed in the image.

(There are other benefits to zstd:chunked, like possibly sharing files across container images / layers, but those don’t depend on lazy pulling.)

Basically lazy pulling is trading throughput for latency

stargz-snapshotter will continue to download rest of the image in the background.

The FUSE overhead is paid during every single filesystem operation during the container runtime, isn’t it? For a long-running container which is not running purely out of memory (e.g. a single statically linked binary) it will eventually overwhelm any possible savings.

harche · 2024-03-28T18:41:37Z

/test markdownlint

cgwalters · 2024-04-02T07:27:46Z

enhancements/node/lazy-image-pull.md

+* Add support for drop-in config in [container storage config](https://github.com/containers/storage/blob/main/pkg/config/config.go)
+* Package upstream [Stargz Store plugin](https://github.com/containerd/stargz-snapshotter) for the installation on RHCOS
+* Enable lazy image pulling in Openshift using [Stargz Store plugin](https://github.com/containerd/stargz-snapshotter) with [zstd:chunked](https://github.com/containers/storage/pull/775) image format.
+* Support lazy image pulling only on the worker nodes (or subset of worker nodes).


For the dev preview phase we thought it would be better to limit the usage to the worker nodes because we still have an important pending issue of downloaded chucks not being verified in backing c/storage library. @giuseppe, @ktock and @mtrmac are working on addressing that - containers/storage#795 (comment)

We will be blocking on that before graduating this to TechPreview

FWIW broken layer integrity enforcement is a security blocker.

I don’t know that it’s appropriate to ship to customers in any channel; at the very least it would require a well-advertised caveat that customers certainly see before actively opting in.

(Also, at the very least I am not actively working on that, I don’t know about the others listed.

As a blocker for this enhancement it probably should be tracked in more salient place than a closed upstream PR.)

cgwalters · 2024-04-02T07:29:12Z

enhancements/node/lazy-image-pull.md

+
+* Add support for drop-in config in [container storage config](https://github.com/containers/storage/blob/main/pkg/config/config.go)
+* Package upstream [Stargz Store plugin](https://github.com/containerd/stargz-snapshotter) for the installation on RHCOS
+* Enable lazy image pulling in Openshift using [Stargz Store plugin](https://github.com/containerd/stargz-snapshotter) with [zstd:chunked](https://github.com/containers/storage/pull/775) image format.


I am curious how "battle tested" this is. I guess we'll find out in testing, but the amount of new moving parts in code here has me left with an instinct that this will require a maintainer.

I am curious how "battle tested" this is. I guess we'll find out in testing

Initial tests in OCP are promising. We are also proposing to add Openshift e2e job around lazy image pulling in this EP.

I agree, the plugin will certainly require a knowledgeable maintainer able to deal with customer support escalations. Even if there were no bugs in it (which I doubt), expertise would still be required to allow post-mortem analysis of broken systems.

That’s a non-trivial ongoing cost.

cgwalters · 2024-04-02T07:30:12Z

enhancements/node/lazy-image-pull.md

+# Non-Goals
+
+* Enabling lazy image pulling on the control plane nodes.
+* Replacement of the standard OCI image format; `zstd:chunked` will be an additional supported format.


I know what you're trying to say here but it sounds like it's implying zstd:chunked is separate from OCI; as you know it's just an alternative compression scheme with metadata, but readers may not. How about

Replacement of the standard OCI image format; zstd:chunked is a variant of zstd compressed layers in OCI, which is supported by podman and moby

Sounds great, thanks. Updated.

cgwalters · 2024-04-02T07:33:10Z

enhancements/node/lazy-image-pull.md

+     lazyImagePull:
+        enabled: true
+```
+* Optionally, A list of supported types of the image format for lazy pulling can also be specified. At the time of writing this document, Openshift will only support `zstd:chunked` type image format. If the user doesn't explicitly specify the image format, the default value is set to `zstd:chunked`.


Since they can only specify one thing, why would this be mentioned in the config?

Currently we have eStargz and zstd:chunked formats supported by the stargz-snapshotter. We are going to support only zstd:chucked format to begin with. But in future, we may support other snapshotters with their own image formats e.g. https://github.com/awslabs/soci-snapshotter

So we are just trying to make sure in future we should be able to support more snapshotters and formats (if required).

/cc @rphillips

cgwalters · 2024-04-02T07:34:00Z

enhancements/node/lazy-image-pull.md

+
+[Service]
+Type=notify
+Environment=HOME=/home/core


Why? That's bizarre.

Thanks, dropped it.

cgwalters · 2024-04-02T07:36:07Z

enhancements/node/lazy-image-pull.md

+```toml=
+[Unit]
+Description=Stargz Store plugin
+After=network.target


This means very little and I doubt it's necessary https://systemd.io/NETWORK_ONLINE/

Thanks, dropped it.

cgwalters · 2024-04-02T07:36:37Z

enhancements/node/lazy-image-pull.md

+RestartSec=1
+
+[Install]
+WantedBy=multi-user.target


For us actually, WantedBy=crio.service or kubelet-dependencies.target.

Thanks, updated to crio.service

cgwalters · 2024-04-02T07:37:41Z

enhancements/node/lazy-image-pull.md

+[Service]
+Type=notify
+Environment=HOME=/home/core
+ExecStart=/usr/local/bin/stargz-store --log-level=debug --config=/etc/stargz-store/config.toml /var/lib/stargz-store/store


Right so here's where I think what we're doing in splitting binaries in CoreOS and configuration in MCO makes little sense.

I think if this is packaged as an RPM then this systemd unit should live in that RPM and not in the MCO.

And most importantly, it's ExecStart=/usr/bin/stargz-store.

(/usr/local is machine local state on CoreOS)

Updated ExecStart=/usr/bin/stargz-store, thanks.

If we ship the systemd unit with the RPM, is there a way to enable/disable the service using MCO? We would want this service only if user explicitly wants to use lazy image pulling. Otherwise, there is no reason to have that service running by default.

cgwalters · 2024-04-02T07:42:25Z

enhancements/node/lazy-image-pull.md

+
+#### Resource Consumption with FUSE
+
+The stargz-snapshotter employs a FUSE filesystem, which can lead to increased resource usage on the node during intensive local disk I/O operations. It’s important to note that this does not affect containers performing heavy I/O with local volume mounts, as they will not experience performance degradation.


See also https://www.phoronix.com/news/Linux-6.9-FUSE-Passthrough

Thanks for that pointer, added a remark in that section about it at the end.

cgwalters · 2024-04-02T07:45:15Z

enhancements/node/lazy-image-pull.md

+
+* Enabling lazy image pulling on the control plane nodes.
+* Replacement of the standard OCI image format; `zstd:chunked` will be an additional supported format.
+* Alterations to the image build process; developers can use standard OCI build tools, such as buildah to build and convert images to `zstd:chunked` format.


Yes. But...I think a notable callout here is going to need to be the docs work to describe how to do that.

In general it should be highlighted here more obviously that because zstd:chunked isn't the default, in reality customers will need to enable it in their image builds for their own images, and if they want the benefit for our images, they'll need to re-compress (which is probably not very viable).

Good idea, I will add a documentation related goal so that we won't miss it.

and if they want the benefit for our images, they'll need to re-compress (which is probably not very viable).

In particular recompressing the OpenShift product images is, AFAIK, not viable in production workloads, because the release image payload, and the image integrity design, is built around the release image containing a full list of image digests of the OpenShift images.

Similarly any signed image. And I’d expect operators / Helm charts to fairly frequently refer to images by digest.

So… this will most likely primarily apply to customer-internal images only.

rphillips · 2024-04-03T19:02:21Z

enhancements/node/lazy-image-pull.md

+ containerRuntimeConfig:
+     lazyImagePull:
+        enabled: true
+        format:


format: zstd:chunked

That will support only one, but in future if you want to support one or more then the list would be helpful.

format: - "zstd:chucked" - "ystd:hunked" - "xstd:junked"

https://github.com/openshift/enhancements/pull/1600/files#r1550258088

ktock · 2024-04-04T04:28:42Z

enhancements/node/lazy-image-pull.md

+
+* Add support for drop-in config in [container storage config](https://github.com/containers/storage/blob/main/pkg/config/config.go)
+* Package upstream [Stargz Store plugin](https://github.com/containerd/stargz-snapshotter) for the installation on RHCOS
+* Enable lazy image pulling in Openshift using [Stargz Store plugin](https://github.com/containerd/stargz-snapshotter) with [zstd:chunked](https://github.com/containers/storage/pull/775) image format.


Stargz store also supports eStargz and CRI-O performs lazy pulling for both of zstd:chunked and eStargz so can we list that format here as well?

@ktock Thanks for your reply. This enhancement proposal is aimed at introducing and supporting the new feature in Openshift. While it is true that the stargz snapshotter supports both zstd:chunked and eStargz formats, from the Openshift's supportability point of view we have to evaluate overall ecosystem around any feature and how best we can support our customers. At this point in time, Redhat's main container image building tools like podman and buildah focus on supporting zstd:chunked image format.

But having said that, we would definitely officially support more formats if overall container tools from Redhat support those formats as well. In fact, this is the reason we have added format field in the API with this enhancement proposal.

Thanks for the clarification!

This enhancement proposal is aimed at introducing and supporting the new feature in Openshift.

👍

While it is true that the stargz snapshotter supports both zstd:chunked and eStargz formats, from the Openshift's supportability point of view we have to evaluate overall ecosystem around any feature and how best we can support our customers. At this point in time, Redhat's main container image building tools like podman and buildah focus on supporting zstd:chunked image format.
But having said that, we would definitely officially support more formats if overall container tools from Redhat support those formats as well. In fact, this is the reason we have added format field in the API with this enhancement proposal.

So the runtime will have a logic to decide whether it enables lazy pulling on the pulling image, right? I have a question in the proposal, related to the implementation of that logic: PTAL #1600 (comment)

We have a operator in Openshift called Machine Config Operator (MCO). We use that operator to lay down node specific configurations (e.g. config files in /etc). Our plan is to use that operator to lay down required configuration for the c/storage library used by crio to enable this feature. In fact, that config file for c/storage is already placed by MCO already, we just need to update it with the lazy image related configuration (if user decides to use this feature).

ktock · 2024-04-05T14:40:12Z

enhancements/node/lazy-image-pull.md

+     lazyImagePull:
+        enabled: true
+        format:
+          - "zstd:chunked"


How is this value consumed by the runtime? i.e. How can the runtime use this value for deciding whether it enable lazy pulling on the pulling image? Maybe this should be a mediatype + manifest annotation?

Right now, none of this exists at a generic level: the on-demand layer plugin is provided with an image reference and a layer digest, and that’s it. So if this had to be implemented today, that plugin would need to re-parse the image manifest, find the layer, and impose its own conditions on the layer’s metadata.

If we need to add a generic feature, that will need to be co-designed along with this enhancement.

How is this value consumed by the runtime? i.e. How can the runtime use this value for deciding whether it enable lazy pulling on the pulling image? Maybe this should be a mediatype + manifest annotation?

@ktock As of now, this values isn't directly used by the runtime. We are using this value to in MCO to prepare the configuration file for the c/storage library.

c/storage AFAIK has no option to restrict ALS lookups by compression format right now.

I think we can introduce a field additionallayerstore_target_mediatype in storage.conf. This is layer mediatypes to enable to lookup of Additional Layer Store.

containerRuntimeConfig.lazyImagePull.format should also be a list of mediatype. Or, MCO can define aliases for them; e.g. "zstd:chunked" means application/vnd.oci.image.layer.v1.tar+zstd.

My first instinct, from the runtime and c/storage perspective is to ask “why do we need that option at all?”.

I could, maybe, imagine that users might want to enable/disable lazy pulls based on the application (= on the registry namespace, on the K8S namespace, or as a Pod option). (To be clear, I don’t think we need that for a first version at all.)

But why would users want to make a choice based on the format? Isn’t that ~transparent to the running systems, other than performance?

It might be necessary to have an option that governs “what format backends (snapshotters?) should be installed on nodes” — maybe not for the first version, but it does make sense to ensure that we can add that option later.

But that option should govern the behavior of MCO, c/storage doesn’t need to know. storage.conf only contains an additionallayerstore option with a list of mount points; how those mount points get set up and what are the backing FUSE servers is not a storage.conf concern AFAICS.

I agree, we can drop this option.

Thanks for the review. Removed this option from the proposal.

mtrmac · 2024-04-09T19:42:33Z

enhancements/node/lazy-image-pull.md

+
+## Upgrade / Downgrade Strategy
+
+* Downgrade expectations - In the event of a cluster downgrade to a version that does not support zstd:chunked image format, containers utilizing such image format will get downloaded again and their startup times would be comparable to the non-zstd:chunked equivalent container image. Even in the absence of stargz-snapshotter, those container will start but without any performance gains.


I don’t see any mechanism for containers to be downloaded again.

(Cc: @giuseppe to keep me honest about the c/storage aspects.)

Once an Image object exists in c/storage, it doesn’t go automatically away when a backend layer disappears; the image is just broken. Once a layer record exists in c/storage metadata, it doesn’t go automatically away when the layer store changes and the layer no longer exists; the layer is just broken (and might even “infect” future pulls that share this layer, because the metadata lookup will indicate that the layer already exists locally).

Unless the downgrade triggers a complete wipe? I don’t know whether it does, and the text above is not worded in a way consistent with that.

(Also, zstd in a non-chunked variant has been supported for a long time; and layers are locally stored uncompressed, so the compression format doesn’t matter all that much. But that’s all mostly irrelevant.)

Even in the absence of stargz-snapshotter, those container will start but without any performance gains.

Absolutely not; the snapshotted layers are stored in different paths from the point of view of the graph driver (and I very much hope that the backend plugins are not trying to store anything in the graph driver-managed locations).

See above about “once a layer record exists”…

This downgrade either needs to explicitly trigger a wipe (assuming that’s possible at all), or must be prevented.

And it’s not just downgrades: disabling the snapshotters on an already-running node would break the node for the same reasons.

Yes, you are correct, if the images need to be cleaned up if the stargz store service is not enabled (or fails to start). I will update the proposal to explicitly call out that scenario. Thanks.

mtrmac · 2024-04-09T19:48:30Z

enhancements/node/lazy-image-pull.md

+
+#### What consequences does it have on the cluster health?
+
+Containers that use zstd:chunked image format will observe no associated performance gains in their start up time.


See the “downgrade” section; just disabling the option will break the node, AFAICS.

mtrmac

Just to make sure OCPNODE-2205: Lazy image pull support #1600 (comment) is considered
I don’t see any discussion of registry credentials. The lazy-pull layer store does not currently get any credentials from the actual pull operation in c/image, they must be provided to the plugin via other means
- I see no discussion of that happening here at all
- Assuming the capabilities as of today, that would either mean that per-image pull secrets linked from the Pod object would not be used at all, at best this would work for the cluster global pull secret; that’s possible but a fairly significant downside for any production workloads; or maybe this particular snapshotter seems to support somehow capturing credentials by interposing a CRI proxy (ingenious but eww, and I don’t know how that could work in multi-tenant situations).
- Alternatively, some new feature to provide the credentials directly from the pull operation would have to be built. And because providing credentials via file paths passed to FUSE is out of the question, that would be a major rearchitecture of the API, AFAICS

openshift-bot · 2024-05-08T01:15:17Z

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

ktock · 2024-05-08T02:14:23Z

/remove-lifecycle stale

openshift-ci-robot · 2024-08-28T19:34:13Z

@harche: This pull request references OCPNODE-2205 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

xref : https://issues.redhat.com/browse/OCPNODE-2205

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2024-08-28T19:34:25Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign lmzuccarelli for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ktock · 2024-08-29T12:18:20Z

@harche Thanks for reopening this and giving me access to this PR.

@mtrmac Thanks for the comment about downgrading/upgrading.

layer_integrity_state_file = "<location>" layer_integrity_state_marker = ""

Thanks for the suggestion. This sounds good to me. Updated proposal to include this.

mtrmac · 2024-08-29T14:10:30Z

enhancements/node/lazy-image-pull.md

+CRI-O detects incompatibe changes of storage configuration using the following fields of `cri-o.conf`.
+
+```
+layer_integrity_state_file = "<location>"
+layer_integrity_state_marker = "<contents>"
+```
+
+When CRI-O starts for the first time on the node, it creates a file (call it "state file") at the location specified by `layer_integrity_state_file` with the contents of `layer_integrity_state_marker` (call it "state marker").
+Everytime CRI-O restarts, it checks that `layer_integrity_state_marker` in `cri-o.conf` exactly matches the contents of the state file at `layer_integrity_state_file`.
+If it doesn't match, CRI-O treats this as an incompatible change and starts repairing the storage as mentioned above.
+
+MCO populate the state marker with configurations whose changes trigger repairing of the storage, e.g. `"version=1;store-format=1;ALS="estargz:/var/"`.
+The state marker must be an empty string if lazy pulling is disabled and must not be empty if lazy pulling is enabled.
+
+Downgrading CRI-O to a version that doesn't support lazy pulling and state files requires manual wiping of images on the node.


Cc: @haircommander @saschagrunert

WDYT about this mechanism? Is it potentially valuable for other use cases / overengineered / insufficient?

I think this type of thing should really be shared with podman and other c/storage consumers.

What this is trying to achieve is IMO very similar to "feature flags" as used by Linux filesystems for example for many years, with explicit concept of e.g. "write-incompat" and "incompat" flags too.

For example, composefs is currently expressed as a use_composefs = "true" flag but when set up it's clearly on-disk incompatible with a c/storage consumer that predates support for that (or that doesn't have the flag enabled). Yet, we don't really (AFAIK) express that state anywhere except implicitly in the filesystem layout for image that happened to be pulled with composefs.

One interesting tension I see here actually is the split between "config for storage" and "storage" - and again if you look at how Linux filesystems (and for that matter sqlite) do it is that these "feature flags" are embedded in the storage itself. It'd make more sense to me to store these things in /var/lib/containers/storage/meta.json or whatever instead of in a config file.

yeah this almost seems like something that should happen in a separate service

I think there are two elements to this:

Did the configuration change in unexpected ways? That could be stored in c/storage, although the c/storage storage is not packaged into a neat atomic bundle that makes it trivial (the config file[1] points at the “root” store location [where meta.json could live], and possible other stores, and the lazy pull mount point; the config could change the “root” store location, or to switch the primary “root” with a secondary root).

What to do about a detected configuration change? The policy “it is OK to just delete existing images” is specific to K8s or CRI-O, or possibly an even higher-level decision; it’s not something c/storage can, on its own, decide to do.

As an analogy, Podman has some kind of “essential config file entries” record in its primary database, and that does include (some of?) the c/storage config settings. I didn’t look into how that works in detail.

[1] and there is a RFE to allow drop-in config file stubs, so even the config file itself is not a natural location for a compatibility marker.

The reason I was thinking this should be a CRI-O feature is:

It needs to run before CRI-O starts really responding to requests

It needs to trigger CRI-O-visible changes, like deleting images (and containers?)

It needs to use exactly the same storage configuration as CRI-O

It needs to be ~stateless/idempotent, not a single-shot “wipe now”, so that MCO can ensure the wipe happens even if the node goes down during the first boot after a config change.

Maybe it can be a separate systemd service running before CRI-O; I don’t know whether that would make things any easier.

(the config file[1] points at the “root” store location

...

As an analogy, Podman has some kind of “essential config file entries” record in its primary database, and that does include (some of?) the c/storage config settings. I didn’t look into how that works in detail.

Encoding the file paths in the storage itself is a broken concept and did break in practice, see containers/podman#23447

But that's distinct from the other config options. I think it could very much make sense to support e.g. podman image config set composefs=true which would end up looking up the storage root path from the global config, and then flipping a flag on there that lives in that storage, as opposed to changing the global storage.

In the short term nothing stops crio from doing this itself with /var/lib/containers/storage/crio-state.json or whatever.

Would that work together with the latest CRI-O wipe enhancements @kwilczynski ?

@mtrmac @haircommander

a separate service

If this should be a separated service, I think we can have a subcommand in CRI-O, described as the following:

crio check-layer-state <layer_integrity_state_marker> <layer_integrity_state_file>

This will run as a separated service before CRI-O starts.
This does the same things as described in the original proposal.
Concretely:

When it starts for the first time on the node, it creates a file (call it "state file") at the location specified by layer_integrity_state_file with the contents of layer_integrity_state_marker (call it "state marker").

Everytime this command starts on the node, it checks that the specified layer_integrity_state_marker exactly matches the contents of the state file at layer_integrity_state_file. If it doesn't match, it starts repairing the storage.

We can also avoid adding layer_integrity_state_file and layer_integrity_state_marker to crio.conf.

WDYT about this mechanism?

that works for me

Thanks for the review. Added the description about this subcommand to the proposal.

@mtrmac @haircommander I've posted an implementation of this command to cri-o/cri-o#8589 .

haircommander · 2024-08-29T14:57:58Z

enhancements/node/lazy-image-pull.md

+
+When an upgrade or downgrade happens with incompatible changes in the store configuration (e.g. enabling/disabling Additional Layer Store), CRI-O handles this by rebuilding its image storage.
+Rebuilding of the image storage is done following the existing `clean_shutdown_file` feature of CRI-O.
+Concretely, CRI-O checks if layers in the storage are damaged and repairs the storage if there are damaged layers.


it only does this on sudden reboot, and only as of 4.17 does it repair, before it just removes all the storage

I'm not sure the repair sequence knows how to fix stargz images

Removing an inconsistent image would be sufficient. Removing all storage, I guess, acceptable?

harche · 2024-09-10T18:20:58Z

@ktock Can we direct user to a blog like this one, https://blog.cubieserver.de/2022/experimenting-with-estargz-image-pulling-on-openshift/, to use lazy image pull in Openshift instead of going through this route?

ktock · 2024-09-10T19:11:48Z

@harche Thanks for your comment, but I think a blog can't achieve features discussed here:

Automation of deploying services & configurations (stargz-store service and c/storage config, etc.)
Item 1 in the Goals list: Package upstream Stargz Store plugin for the installation on RHCOS

harche · 2024-09-11T18:56:25Z

@ktock packaging is only required if it has to be shipped with RHCOS officially. Nothing stops someone from downloading and installing a binary like that blog mentions.

ktock · 2024-09-12T04:08:31Z

@harche

Nothing stops someone from downloading and installing a binary like that blog mentions.

But now we need several binaries and configurations (stargz-store and auth helper) and this can be more complex. From the user's perspective, installing all of them correctly can be harder compared to the time that blog was written. Officially automated and packaged installation can help make the maintenance of nodes easier.

harche · 2024-09-13T18:29:31Z

@harche

Nothing stops someone from downloading and installing a binary like that blog mentions.

But now we need several binaries and configurations (stargz-store and auth helper) and this can be more complex. From the user's perspective, installing all of them correctly can be harder compared to the time that blog was written. Officially automated and packaged installation can help make the maintenance of nodes easier.

I agree with you. However, supporting enterprise customers becomes challenging if there aren't sufficient resources allocated to maintaining these feature (bugs, CVEs etc), especially since its priorities has shifted a bit.

ktock · 2024-09-17T03:30:10Z

I agree with you. However, supporting enterprise customers becomes challenging if there aren't sufficient resources allocated to maintaining these feature (bugs, CVEs etc), especially since its priorities has shifted a bit.

@harche Thanks for the comment. Of course, I'm willing to continually maintain lazy-pulling-related features and stargz-store from the community side! One possible way to make it easier would be to have a repo under github.com/containers org for receiving issues and for maintaining lazy-pulling-related resources (e.g. helpers).

sairameshv · 2024-09-17T15:02:48Z

I agree with you. However, supporting enterprise customers becomes challenging if there aren't sufficient resources allocated to maintaining these feature (bugs, CVEs etc), especially since its priorities has shifted a bit.

@harche Thanks for the comment. Of course, I'm willing to continually maintain lazy-pulling-related features and stargz-store from the community side! One possible way to make it easier would be to have a repo under github.com/containers org for receiving issues and for maintaining lazy-pulling-related resources (e.g. helpers).

Hey, FYI, there is already a repo/fork for the stargz-snapshotter here
We are in the process of packaging the required rpms from this repository.

ktock · 2024-09-17T23:26:24Z

@sairameshv Thanks for the info! Please tell me anything I can help from my side.

harche · 2024-09-18T15:33:36Z

@ktock @sairameshv are there any open comments on this enhancement proposal preventing it from merging?

ktock · 2024-09-19T05:01:56Z

I think currently there is no comment that prevents it from merging. (cc @mtrmac)

mtrmac · 2024-09-19T14:46:05Z

This LGTM as written, but I’m not an approver in this repo; and I can’t speak to support / production deployment implications.

haircommander · 2024-09-19T15:33:17Z

enhancements/node/lazy-image-pull.md

+```golang
+type LazyImagePullConfiguration struct {
+   // +kubebuilder:validation:Required
+   Enable bool `json:"enable"`


FYI openshift api doesn't anymore allow booleans because they're not extendable, you'll need to add an enum

Thanks for the review. Fixed to avoid using booleans.

Co-authored-by: Harshal Patil <[email protected]> Co-authored-by: Kohei Tokunaga <[email protected]>

openshift-ci · 2024-09-20T05:20:14Z

@harche: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

haircommander · 2024-09-20T13:40:10Z

/lgtm

ktock · 2024-09-30T07:29:24Z

Thanks for your approval. Can we move this forward and merge this?

harche · 2024-09-30T13:20:37Z

Thanks for your approval. Can we move this forward and merge this?

cc @mrunalp for approval required to merge it.

ktock · 2024-10-08T12:57:31Z

@mrunalp Could you please take a look at this PR? 🙏

harche · 2024-10-08T13:30:56Z

@mrunalp Could you please take a look at this PR? 🙏

@ktock I think @mrunalp should be able to take a look at this when he is back next week.

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 18, 2024

openshift-ci bot requested review from russellb and sdodson March 18, 2024 19:53

harche force-pushed the lazy-pull branch from e56cded to 372f6d1 Compare March 18, 2024 19:54

harche changed the title ~~WIP: Lazy image pull support~~ OCPNODE-2205: Lazy image pull support Mar 18, 2024

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 18, 2024

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 18, 2024

openshift-ci bot requested a review from sairameshv March 18, 2024 19:58

harche force-pushed the lazy-pull branch from 372f6d1 to 37b52a7 Compare March 18, 2024 19:59

giuseppe reviewed Mar 18, 2024

View reviewed changes

rphillips reviewed Mar 18, 2024

View reviewed changes

kannon92 reviewed Mar 20, 2024

View reviewed changes

harche force-pushed the lazy-pull branch from 37b52a7 to 16940b7 Compare March 28, 2024 18:13

cgwalters reviewed Apr 2, 2024

View reviewed changes

harche force-pushed the lazy-pull branch from 16940b7 to 09bafc8 Compare April 3, 2024 18:25

openshift-ci bot requested a review from rphillips April 3, 2024 18:29

rphillips reviewed Apr 3, 2024

View reviewed changes

harche force-pushed the lazy-pull branch from 09bafc8 to b8285e0 Compare April 3, 2024 19:36

ktock reviewed Apr 4, 2024

View reviewed changes

ktock reviewed Apr 5, 2024

View reviewed changes

mtrmac reviewed Apr 9, 2024

View reviewed changes

mtrmac suggested changes Apr 9, 2024

View reviewed changes

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 8, 2024

openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 8, 2024

ktock force-pushed the lazy-pull branch from 4abaa33 to b00727d Compare August 29, 2024 12:11

mtrmac reviewed Aug 29, 2024

View reviewed changes

haircommander reviewed Aug 29, 2024

View reviewed changes

ktock force-pushed the lazy-pull branch from b00727d to 7836a4c Compare September 6, 2024 12:59

ktock mentioned this pull request Sep 10, 2024

Add subcommand to check the node state and perform repairing of storage cri-o/cri-o#8589

Open

haircommander reviewed Sep 19, 2024

View reviewed changes

Lazy image pull support

0ce4d6f

Co-authored-by: Harshal Patil <[email protected]> Co-authored-by: Kohei Tokunaga <[email protected]>

ktock force-pushed the lazy-pull branch from 7836a4c to 0ce4d6f Compare September 20, 2024 05:07

openshift-ci bot assigned haircommander Sep 20, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 20, 2024


		#### Resource Consumption with FUSE

		The stargz-snapshotter employs a FUSE filesystem, which can lead to increased resource usage on the node during intensive local disk I/O operations. It’s important to note that this does not affect containers performing heavy I/O with local volume mounts, as they will not experience performance degradation.


		## Upgrade / Downgrade Strategy

		* Downgrade expectations - In the event of a cluster downgrade to a version that does not support zstd:chunked image format, containers utilizing such image format will get downloaded again and their startup times would be comparable to the non-zstd:chunked equivalent container image. Even in the absence of stargz-snapshotter, those container will start but without any performance gains.


		#### What consequences does it have on the cluster health?

		Containers that use zstd:chunked image format will observe no associated performance gains in their start up time.

OCPNODE-2205: Lazy image pull support #1600

Are you sure you want to change the base?

OCPNODE-2205: Lazy image pull support #1600

Conversation

harche commented Mar 18, 2024 • edited Loading

openshift-ci-robot commented Mar 18, 2024 • edited by openshift-ci bot Loading

harche commented Mar 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kannon92 Mar 29, 2024 • edited Loading

Choose a reason for hiding this comment

kannon92 Mar 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harche commented Mar 28, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ktock Apr 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtrmac Apr 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtrmac Apr 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtrmac left a comment • edited Loading

Choose a reason for hiding this comment

openshift-bot commented May 8, 2024

ktock commented May 8, 2024

openshift-ci-robot commented Aug 28, 2024 • edited by openshift-ci bot Loading

openshift-ci bot commented Aug 28, 2024

ktock commented Aug 29, 2024

mtrmac Aug 29, 2024 • edited Loading

Choose a reason for hiding this comment

harche commented Mar 18, 2024 •

edited

Loading

openshift-ci-robot commented Mar 18, 2024 •

edited by openshift-ci bot

Loading

kannon92 Mar 29, 2024 •

edited

Loading

kannon92 Mar 29, 2024 •

edited

Loading

ktock Apr 5, 2024 •

edited

Loading

mtrmac Apr 9, 2024 •

edited

Loading

mtrmac Apr 9, 2024 •

edited

Loading

mtrmac left a comment •

edited

Loading

openshift-ci-robot commented Aug 28, 2024 •

edited by openshift-ci bot

Loading

mtrmac Aug 29, 2024 •

edited

Loading