Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libpod: Drop checks for paths in sqlite+boltdb #23447

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cgwalters
Copy link
Contributor

The original logic here is old, dating to
7eb5ce9 and got inherited when the sqlite database was added.

Since then, various changes have landed here especially around canonicalizing symbolic links.

However, this code still often causes problems; most recently in https://gitlab.com/fedora/bootc/base-images/-/issues/20 where it seems like the way Anaconda has the system set up trips this up again.

I can certainly believe that things can go wrong if one overrides/reconfigures e.g. the runtime state dir to be different. But there's also a lot of other ways to break podman...and it's trivial to subvert this check with a bind mount over the absolute path, pointing to some arbitrary different place.

In general, encoding file names into files that are potentially owned by the user is ugly...it can trip up basic things like migrating a home directory, etc.

Since I am not aware of a common misconfiguration that these checks block, and I am very aware of a lot of times they have incorrectly blocked correct situations...just drop the checks.

If we do need to do some more validation later, I think we could say encode the directory inodes for at least the volume dir. And the runtime dir could have the inode for the root, but not the other way around.

This version of podman no longer performs checks for directory paths stored in the database. A future version will likely drop them entirely.

@cgwalters cgwalters requested a review from mheon July 30, 2024 12:55
@openshift-ci openshift-ci bot added release-note do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Jul 30, 2024
Copy link
Contributor

openshift-ci bot commented Jul 30, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cgwalters
Once this PR has been reviewed and has the lgtm label, please assign jnovy for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

Cockpit tests failed for commit 389e7b7. @martinpitt, @jelly, @mvollmer please check.

The original logic here is old, dating to
containers@7eb5ce9
and got inherited when the sqlite database was added.

Since then, various changes have landed here especially
around canonicalizing symbolic links.

However, this code *still* often causes problems; most recently
in https://gitlab.com/fedora/bootc/base-images/-/issues/20
where it seems like the way Anaconda has the system set up
trips this up again.

I can certainly believe that things can go wrong if one
overrides/reconfigures e.g. the runtime state dir to be
different. But there's also a lot of other ways to break
podman...and it's trivial to subvert this check with a
bind mount over the absolute path, pointing to some
arbitrary different place.

In general, encoding file names into files that are potentially
owned by the user is ugly...it can trip up basic things like
migrating a home directory, etc.

Since I am not aware of a common misconfiguration that these
checks block, and I am *very* aware of a lot of times they
have incorrectly blocked correct situations...just drop the
checks.

If we *do* need to do some more validation later, I think we
could say encode the directory inodes for at least the volume
dir. And the runtime dir could have the inode for the root,
but not the other way around.

Signed-off-by: Colin Walters <[email protected]>
@rhatdan rhatdan added the No New Tests Allow PR to proceed without adding regression tests label Jul 30, 2024
@rhatdan
Copy link
Member

rhatdan commented Jul 30, 2024

LGTM
@mheon @Luap99 PTAL

@cgwalters
Copy link
Contributor Author

For compatibility we need to keep writing the values of course...but hopefully at some point we can just insert dummy values there instead for a further minor cleanup.

@mheon
Copy link
Member

mheon commented Jul 30, 2024

These do still fulfill a purpose in my mind - preventing users from causing a nonfunctional, difficult to recover from configuration by swapping paths in the config file while containers are active. They may be doing that intentionally - your home directory migration is an example - but our database has a lot of paths stored in it, so if you remove the check code and do the migration, things will still be very broken (containers unable to fully remove because we don't know where all their files live, for example). This forces folks to use system reset, a command which is wired to work around these issues by just removing the entire directory in question, no worries about incorrect cached paths in the database.

I have been considering adding a "trust me, I know what I'm doing" flag to podman system migrate that will overwrite the DB's stored config values with the current set, with the intention it only be used in cases where there are no containers present. Would that be acceptable?

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still the call to runtime.mergeDBConfig(dbConfig) so all the paths are overwritten silently anyway so the question to me is what exactly is this validation good for. I really don't know.

(Overall I am in favour of doing this)

In general it is impossible for me to the follow the initial path setup code with any reasonable logic. I have no idea what it is supposed to do in what places. I don't understand it and we just keep braking things in very weird ways, changes like 6293ec2 are just not understandable for me regardless of how much time I try to spend on this.

I guess there is the problem around the tmpdir handing when users log in with su without XDG_RUNTIME_DIR set and then later running a command in proper session with XDG_RUNTIME_DIR and then they end up with different paths which will certainly break a lot of stuff which is something we want to catch...

@cgwalters
Copy link
Contributor Author

These do still fulfill a purpose in my mind - preventing users from causing a nonfunctional, difficult to recover from configuration by swapping paths in the config file while containers are active.

And in practice, you have seen people do this? How often? More often than the false positives?

swapping paths

And which specific paths have you seen swapped? I'm guessing it's mostly the root (and not the runroot or volumes)?

while containers are active.

I think this is an interesting part - we hence don't need to check if there aren't any containers, which would let us move a lot of checking to a later dynamic point. If we just did that, it would help a lot I think.

Combining these...I am pretty sure we can add highly reliable sanity checking that the runtime dir is paired with its correct static root, e.g. again via inode comparison or so.

@cgwalters
Copy link
Contributor Author

sketching out runroot+root binding:

  • runroot contains an xattr/stamp file/json/whatever that has the device/inode of the static root (or even a serialized encoded file handle)
  • (static) root contains an xattr/stamp file/json/whatever that has the pair of (linux kernel boot uuid, encoded handle for runtime dir)

With something like that we could do verification both ways and it would detect things like bind mounts pointing to a new place unexpectedly, which file path comparison wouldn't.

@mheon
Copy link
Member

mheon commented Jul 30, 2024

I know support does see this a fair bit. Is it catching issues there, or being a hindrance? I think 50/50, but I'll ask to be sure.

Path moves usually seem to be tmpdir related - a result of systemd misconfiguration (no enable-linger, Podman starts to use /tmp, someone notices, lingering is enabled, now we need to swap back to /run/user/$UID or similar). Systemd misconfigurations like this are extremely common in the field from what I see.

@cgwalters
Copy link
Contributor Author

What about moving the tmpdir actually under the storage root, as its own tmpfs?

@mheon
Copy link
Member

mheon commented Jul 30, 2024

I would have concerns about rootless - I imagine we'd need a mount namespace attached to our userns/pause process, and killing the userns pause process would now kill that mountns and get rid of the tmpfs. I think that would cause problems, but I'm not sure?

@giuseppe @Luap99 Thoughts?

@Luap99
Copy link
Member

Luap99 commented Jul 30, 2024

What about moving the tmpdir actually under the storage root, as its own tmpfs?

How would we create the tmpfs mount? AS rootless we first would need to join the userns but in order to so we must read the pidfile from the tmpdir so catch 22. And creating the initial mount would be racy without locks so this complicates things more I think. And for rootless it will make the file inaccessible outside the user+mountns which makes it harder to look at them (not that there would be reason to so for a normal user but I think tests depend on it)

@Luap99
Copy link
Member

Luap99 commented Jul 30, 2024

sketching out runroot+root binding:

* runroot contains an xattr/stamp file/json/whatever that has the device/inode of the static root (or even a serialized encoded [file handle](https://man7.org/linux/man-pages/man2/open_by_handle_at.2.html))

* (static) root contains an xattr/stamp file/json/whatever that has the pair of (linux kernel boot uuid, encoded handle for runtime dir)

That sounds better, but keep in mind file handles must be supported by the fs and are privileged AFAIK thus will nor work rootless. But yes some device+inode mapping seems better if paths change, however as @mheon mentioned the actual db will store full paths in container configs and so on so if a path actually moved with the same inode it might still break much later. Although I know this is not a problem for you build use cases where this will work better so maybe it would be good enough?

@cgwalters
Copy link
Contributor Author

Can we just detect the situation when we have containers in the root, but the tmpdir is uninitialized(empty)/nonexistent, and make that the error case?

@cgwalters
Copy link
Contributor Author

db will store full paths in container configs

Eww?

Copy link

A friendly reminder that this PR had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Sep 3, 2024

@mheon @Luap99 @cgwalters what should we do with this one?

@github-actions github-actions bot removed the stale-pr label Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. No New Tests Allow PR to proceed without adding regression tests release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants