fedora_coreos_meeting
LOGS
16:29:29 <dustymabe> #startmeeting fedora_coreos_meeting
16:29:29 <zodbot> Meeting started Wed Feb 15 16:29:29 2023 UTC.
16:29:29 <zodbot> This meeting is logged and archived in a public location.
16:29:29 <zodbot> The chair is dustymabe. Information about MeetBot at https://fedoraproject.org/wiki/Zodbot#Meeting_Functions.
16:29:29 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
16:29:29 <zodbot> The meeting name has been set to 'fedora_coreos_meeting'
16:29:31 <dustymabe> #topic roll call
16:29:32 <dustymabe> .hi
16:29:33 <zodbot> dustymabe: dustymabe 'Dusty Mabe' <dusty@dustymabe.com>
16:30:13 <travier> .hello siosm
16:30:14 <zodbot> travier: siosm 'Timothée Ravier' <travier@redhat.com>
16:30:31 <bgilbert> .hi
16:30:31 <zodbot> bgilbert: bgilbert 'Benjamin Gilbert' <bgilbert@backtick.net>
16:30:59 <jmarrero> .hi
16:31:00 <zodbot> jmarrero: jmarrero 'Joseph Marrero' <jmarrero@redhat.com>
16:31:17 <jlebon> .hello2
16:31:18 <zodbot> jlebon: jlebon 'None' <jonathan@jlebon.com>
16:31:40 <aaradhak> .hi
16:31:41 <zodbot> aaradhak: aaradhak 'Aashish Radhakrishnan' <aaradhak@redhat.com>
16:32:08 <dustymabe> #chazir travier bgilbert jmarrero jlebon aaradhak
16:32:53 <copperi[m]> .hello copperi
16:32:54 <zodbot> copperi[m]: copperi 'Jan Kuparinen' <copper_fin@hotmail.com>
16:33:41 <dustymabe> #chair copperi[m]
16:33:41 <zodbot> Current chairs: copperi[m] dustymabe
16:33:44 <dustymabe> welcome all
16:33:48 <travier> #chair travier bgilbert jmarrero jlebon aaradhak
16:34:06 <travier> there was a spurious 'z' in the previous one :)
16:34:15 <dustymabe> travier: haha
16:34:18 <dustymabe> oops
16:34:27 <dustymabe> #topic Action items from last meeting
16:34:36 <dustymabe> * dustymabe will communicate our feedback on the website redesign
16:34:37 <travier> but I think you need to do it
16:34:47 <dustymabe> #chair travier bgilbert jmarrero jlebon aaradhak copperi[m]
16:34:47 <zodbot> Current chairs: aaradhak bgilbert copperi[m] dustymabe jlebon jmarrero travier
16:35:12 <dustymabe> #info dustymabe took the feedback from last meeting to the websites team: https://gitlab.com/fedora/websites-apps/fedora-websites/fedora-websites-3.0/-/issues/89#note_1271079731
16:35:57 <dustymabe> #topic New Package Request: audit
16:35:59 <spresti[m]> .hello spresti
16:36:01 <zodbot> spresti[m]: spresti 'Steven Presti' <spresti@redhat.com>
16:36:03 <spresti[m]> sorry I am late all!
16:36:04 <dustymabe> #link https://github.com/coreos/fedora-coreos-tracker/issues/1362
16:36:07 <dustymabe> #chair spresti[m]
16:36:07 <zodbot> Current chairs: aaradhak bgilbert copperi[m] dustymabe jlebon jmarrero spresti[m] travier
16:36:12 <dustymabe> welcome spresti[m]
16:36:54 <travier> Will introduce this one
16:37:19 <travier> The idea is that we want to include the audit package in the system as it's a base system tool
16:37:43 <travier> The problem is that is comes with the legacy `service` command line
16:38:25 <travier> it's required for "compliance" reasons as we can not use systemd to stop/restart the audit daemon directly
16:38:44 <travier> it has to be traceable which user asked for audit to stop
16:38:46 <mnguyen> .hello mnguyen
16:38:47 <zodbot> mnguyen: mnguyen 'Michael Nguyen' <mnguyen@redhat.com>
16:39:46 <travier> systemctl/systemd "by-passes" that as it uses a daemon/control model that does not directly link to the user via the audit id stored in the kernel for each process and assigned on login
16:40:10 <travier> so in the end, it needs to use a legacy script to perform operation on the service
16:40:29 <travier> So we have several options to move forward:
16:41:00 <travier> The audit script already includes the legacy scripts that are run by the service command
16:41:13 <travier> /usr/libexec/initscripts/legacy-actions/auditd/restart, etc.
16:41:24 <travier> Option A: The short option is thus just to remove the service binary and man page in a post-script.
16:41:34 <travier> Option B: The long option is to rewrite those as a proper standalone script that is not correlated to the service binary.
16:41:44 <travier> Option C: Another option is to move the service binary somewhere else and include a wrapper script that only accepts auditd as an option for calls to service auditd <stop|restart|...> and rejects everything else.
16:41:53 <travier> (eoi)
16:41:56 <travier> end of intro
16:42:43 <jlebon> ideally we'd fix the audit package itself, so not A or C
16:42:52 <dustymabe> travier: mind if I ask.. what has changed since the last time we discussed this?
16:43:37 <travier> The last time was a while ago and there was things that needed to be removed from the package. We're now down to just this issue
16:43:44 <travier> were* things
16:44:43 <jlebon> maybe audit can rework the scripts so they're shipped in /usr/sbin instead and then make the service pkg a weak dep
16:44:50 <travier> Option B has the problem that we might be told to "just ship service"
16:44:50 <dustymabe> i.e. there were some other things (like python scripts) that were removed from the package ?
16:45:10 <travier> and the docs everywhere on the net mention service as a workarond
16:45:22 <travier> not problem -> downside
16:46:49 <travier> dustymabe: yes, there were some python deps that got removed / split
16:47:00 <dustymabe> travier: 👍
16:47:18 <travier> and the full initscript package got split into scripts & service sub packages
16:48:12 <dustymabe> Option A isn't ideal, because we've said in the past we wanted to minimize postprocess hacking and slashing
16:48:36 <jlebon> so e.g. have a `/usr/sbin/auditdctl [verb]` which is called by the service wrapper to not break people who still want to use it, but could also be called directly (which we'd recommend on FCOS)
16:48:37 <dustymabe> I think I like Option B the best, but maybe not move it somewhere else
16:48:41 <travier> https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/security_guide/sec-starting_the_audit_service
16:48:57 <dustymabe> sorry Option C
16:49:11 <travier> https://access.redhat.com/solutions/2664811
16:49:12 <dustymabe> (given that option B is #harder)
16:49:37 <jmarrero> If we need this fast, I would be OK with A temporarily while B is implemented.
16:50:46 <jmarrero> *but no A if B is never gonna be implemented.
16:51:02 <dustymabe> i advocate for option C
16:51:10 <jlebon> i think we should get together with the maintainer and chat about options
16:51:14 <dustymabe> leave service binary in place but patch it to only support `audit`
16:51:55 <jlebon> i don't think we should do any processing FCOS-side without trying to do this in at the packaging level first
16:52:05 <travier> jlebon: agree, I should reach out to the audit maintainers to see which approach they would accept upstream
16:52:22 <jlebon> we're not the only ones who want to drop the dep on service
16:53:10 <dustymabe> jlebon: I fully support engagement upstream.. but that's kind of where we were two years ago (I think it was that long)
16:53:18 <dustymabe> and we're back here
16:53:47 <jlebon> dustymabe: IIUC, i think two years ago what was attempted was advocating for systemctl, which lead to no movement
16:54:28 <jlebon> obviously that'd still be ideal fix, but barring that, there's still room for cleaner solutions
16:54:29 <dustymabe> i think at the time the author was open to a auditctl (some controlling utility)
16:54:37 <dustymabe> but wasn't willing to work on it
16:54:42 <dustymabe> which is OK
16:54:49 <dustymabe> let me see if I can find a link
16:57:16 <dustymabe> hmm. can't seem to find it right now
16:57:20 <dustymabe> maybe it was in an email
16:57:50 <travier> it was likely in the service split bz
16:58:09 <travier> https://bugzilla.redhat.com/show_bug.cgi?id=1768815 maybe
16:59:35 <travier> let's move to something else
16:59:43 <travier> I'll reach out to the audit maintainer
16:59:51 <dustymabe> travier: +1
16:59:52 <jlebon> travier: +1
17:00:01 <dustymabe> i think after that discussion then we can make a decision
17:00:10 <dustymabe> but the added (new) context will help
17:00:12 <travier> #proposed We'll reach out to the audit maintainer to try option B, while keeping option C as a backup
17:00:30 <dustymabe> #topic Ship the shimx64 binary in the CoreOS ISO image kind/enhancement
17:00:36 <dustymabe> #link https://github.com/coreos/fedora-coreos-tracker/issues/1413
17:00:46 <dustymabe> cc bgilbert
17:01:14 <bgilbert> there's a PXE use case that we don't really document
17:01:27 <bgilbert> which is: netbooting from UEFI with Secure Boot enabled
17:01:36 <bgilbert> (I'm not sure if that's still literally PXE but anyway)
17:02:00 <bgilbert> it requires shim, to chain from the MS signing keys in the firmware to the Fedora keys
17:02:33 <bgilbert> and of course, any random shim won't work.  it needs to be a Fedora one with the Fedora keys
17:02:40 <bgilbert> (unlike, say, pxelinux.0, which can come from anywhere)
17:03:07 <bgilbert> and we don't currently provide a way to get the Fedora shim, without resorting to things like "finding the RPM"
17:03:29 <bgilbert> for a while it was accidentally possible to extract efiboot.img from the ISO and shim from efiboot.img
17:03:56 <bgilbert> but that was a redundant copy and we removed it.  shim is still in the ISO, under the unhelpful name BOOTX64.EFI
17:04:20 <bgilbert> (and technically there's no contract that that file is shim)
17:04:27 <dustymabe> this is a problem :( but I wonder if it's one we can just throw a sledge hammer at rather than giving an elegant solution
17:04:38 <bgilbert> yeah, the question is the size of the hammer
17:05:00 <dustymabe> podman cp quay.io/fedora/fedora-coreos:stable:/path/to/shim-binary ./shim-binary
17:05:00 <bgilbert> any of the proposed solutions aren't very much work
17:05:18 <bgilbert> hmm
17:05:30 <dustymabe> it's downloading a huge amount of data for a tiny piece of it
17:05:49 <bgilbert> the furthest we could go is: add shim to the stream metadata as a fourth PXE artifact, and to the ISO image in /images/pxeboot for the same reason
17:05:56 <bgilbert> the latter so that `coreos-installer iso extract pxe` will work
17:06:19 <bgilbert> and the former so that `coreos-installer download -f pxe` will work, and the website will list it
17:06:56 <dustymabe> bgilbert: yeah, that sounds ideal. that would be the solution I would go with if we had infinite resources
17:07:08 <bgilbert> dustymabe, as you say, we could document a hack for extracting from the image.  shim doesn't change very much
17:07:18 <dustymabe> though we shouldn't discount the size of the binary (i.e. extra size in ISO)
17:07:24 <bgilbert> shim is small
17:07:30 <dustymabe> 👍
17:07:32 <bgilbert> the main reason I'm hesitating is user confusion
17:07:39 <bgilbert> "what's this artifact?  what should I do with it?"
17:07:52 <dustymabe> "If you have to ask, you can't afford it" :)
17:07:56 <bgilbert> and there might be scripts that would be confused by the installer downloading an extra thing
17:08:38 <dustymabe> i'm not too worried about user confusion on this front, but maybe I should be
17:08:52 <jlebon> bgilbert: you mentioned we're already shipping it in the ISO? could we just document how to extract it from there?
17:09:14 <bgilbert> there's multiple reasons not to document extracting it from its current path
17:09:25 <travier> I don't think it's likely that BOOTX64.efi would not be shim any time soon
17:09:26 <bgilbert> it's inside a VFAT image file inside the ISO, and it has a generic name
17:09:36 <bgilbert> but we could put a second copy inside the ISO directly
17:09:52 <bgilbert> if we put it in /images/efiboot, `coreos-installer iso pxe extract` will automatically extract it
17:10:00 <bgilbert> */images/pxeboot
17:10:33 <bgilbert> it's 925K
17:10:51 <jlebon> that sounds reasonable to me
17:10:55 <bgilbert> which?
17:10:56 <dustymabe> ok so order of preference:
17:11:06 <dustymabe> 1. make `coreos-installer iso pxe extract` work
17:11:23 <dustymabe> 2. add to pxe artifacts so downloading pxe using coreos-installer gives you shim
17:11:29 <jlebon> bgilbert: putting it in /images/efiboot
17:11:44 <dustymabe> we could always do 2. later if demand increases
17:12:10 <bgilbert> `iso pxe extract` is "supposed" to deliver the same artifacts as `coreos-installer download -f pxe` fwiw
17:12:15 <bgilbert> I'm not sure how important that is
17:12:32 <dustymabe> yeah, would be nice to keep it consistent
17:12:39 <jlebon> so the argument for 2. is that right now users don't need the ISO at all for pxe booting, and this would change that?
17:12:46 <jlebon> and consistency with `coreos-installer download`
17:12:54 <jlebon> gotcha
17:13:06 <bgilbert> eh, I'm not so concerned about needing the ISO to extract shim
17:13:18 <bgilbert> it's a relatively obscure use case (though maybe it shouldn't be) and shim changes seldom
17:13:25 <bgilbert> unlike the other artifacts, which change on every release
17:14:14 <bgilbert> re compat, it does feel odd to constrain ourselves not to add artifacts
17:14:17 <dustymabe> for me it's 1. - and then do 2. if you have time
17:14:32 <dustymabe> or really - we could just document this
17:14:33 <travier> We could also have a one liner dnf download from a container to get the RPM and extract the binary from it
17:14:43 <dustymabe> i.e. tell the users how to get shim from the RPM
17:14:56 <dustymabe> travier: yeah
17:15:07 <dustymabe> i think I highlighted an easier way above with my `podman cp`
17:15:20 <bgilbert> (I should maybe mention that the original reporter actually wants this for RHCOS, so FCOS docs don't 100% solve the problem)
17:15:26 <travier> I agree that the other options are "cleaner" but are they worth it?
17:15:34 <jlebon> and since it changes rarely, it's a one time cost when setting up your PXE server
17:15:45 <dustymabe> bgilbert: yeah, that's imporant - things aren't as easily accessible in that scenario
17:15:57 <travier> If we ship it as an artifacts then this adds up in storage, etc. for eveybody
17:16:09 <jlebon> but it's awkward to add a requirement on a new tool where before just `coreos-installer` sufficed. 1. maintains that property which is nice
17:16:15 <travier> but agree it's not much storage compared to the rest
17:16:56 <dustymabe> bgilbert: thoughts on a #proposed here?
17:17:35 <bgilbert> none, really.  there's benefits to any of the approaches
17:17:45 <bgilbert> anyone feel strongly about an option?
17:18:15 <dustymabe> well - `podman cp` or `download the RPM and extract` aren't really good options for RHCOS
17:18:19 <jlebon> i'm ok with either 1 or documenting hacks to get it, but the UX for 1 is much nicer
17:19:02 <dustymabe> I'm ok with 1. (and 2. being optional TBH, though consistency is nice)
17:19:09 <jmarrero> downloading the rpm is less data but the podman example seems like most people will get right away.
17:19:09 <dustymabe> either way we needs docs to mention this use case
17:19:33 <bgilbert> jlebon: views on 2?
17:19:40 <travier> in the RHCOS case you already have the container image from the release image
17:19:56 <travier> well, not on your system
17:19:57 <jlebon> bgilbert: seems premature for now
17:20:41 <dustymabe> maybe we do 1. and then open an issue for 2. detailing steps and rationale (and link from the code implementing 1.)  - then we can implement it later if we want to
17:21:08 <bgilbert> I'm vaguely uncomfortable with `iso extract pxe` not matching `download`
17:21:12 <bgilbert> seems gratuitous
17:21:23 <bgilbert> (though we could put shim elsewhere in the ISO for manual extraction)
17:21:31 <bgilbert> `iso extract shim` :-P
17:21:35 <dustymabe> :)
17:21:56 <dustymabe> i'm cool with 2. too - just from my end didn't want to require it, but also cool if we as a group do decide to require it
17:23:02 <jlebon> bgilbert: that's an interesting idea actually. it emphasizes the fact that you shouldn't normally need this
17:23:15 <bgilbert> yeah, it's tempting
17:23:25 <dustymabe> jlebon: am I correct in understanding that you're not opposed to 2. - just maybe don't see the benefits of the extra effort ?
17:23:49 <dustymabe> jlebon: I guess that depends - "you shouldn't normally need this" - are we encouraging people use secureboot or not?
17:24:03 <jlebon> dustymabe: that, and also whether we're ok with the messaging it implies
17:24:07 <travier> dustymabe: it's needed only for PXE Secure Boot
17:24:12 <bgilbert> that's the thing.  we "should" probably be rewriting our docs to encourage UEFI SB
17:24:31 <dustymabe> bgilbert: in which case we'd want to encourage this workflow more?
17:24:38 <bgilbert> yeah
17:24:58 <dustymabe> seems like supporting arguments for making it more a part of the "normal PXE workflow"
17:25:28 <jlebon> indeed :)
17:25:44 <bgilbert> maybe we should defer user-visible changes until we have draft docs
17:25:55 <bgilbert> so we don't get ahead of our understanding of the workflow
17:26:25 <dustymabe> bgilbert: WFM - do you have proposed next steps?
17:26:49 <bgilbert> find someone to work on rewriting the PXE docs to add and emphasize a SB section
17:27:04 <bgilbert> I'm not planning to work on it soon
17:27:27 <bgilbert> and put a note in the issue to consult this discussion re ways to expose shim to users
17:27:36 <dustymabe> ok so what should we take to the ticket?
17:27:58 <dustymabe> should we doc a workaround in the ticket for now?
17:28:27 <bgilbert> container extraction workaround seems fine as a workaround
17:28:36 <dustymabe> +1
17:28:36 <jlebon> +1
17:28:37 <bgilbert> first draft of the docs can use that too
17:28:54 <jmarrero> +1
17:28:56 <dustymabe> bgilbert: do you want to update the ticket or should I?
17:29:07 <bgilbert> do you want it?
17:29:21 <dustymabe> not really, but I am running the meeting so I will
17:29:26 <bgilbert> okay, ty
17:29:29 <bgilbert> I think this discussion was still useful to explore the option space, even if we're deferring a long-term decision
17:29:34 <dustymabe> +1
17:29:39 * dustymabe will update the ticket
17:29:41 <jlebon> yeah, agreed
17:29:48 <dustymabe> #topic open floor
17:30:30 <dustymabe> i went to open floor because we're out of time but I do want to point out that we seem to be at a dead end for our two paths we were pursuing for https://github.com/coreos/fedora-coreos-tracker/issues/1247 (which blocks ppc64le being released in our prod streams)
17:30:59 <dustymabe> the kernel package itself looks unlikely to change because of limitations with various possible boot firmwards for ppc64le
17:31:28 <dustymabe> and jmarrero reported that the opportunistic cleanup in rpm-ostree is sufficiently complex that we don't want to pursue that path either
17:32:02 <dustymabe> so I guess we're #opentoideas on next steps to take, I guess we can start to explore resizing our boot partition again
17:32:12 <dustymabe> i was just hoping not to have to wait for that to ship ppc64le
17:32:14 <jlebon> "Getting petitboot updated so it can boot a gzipped vmlinux could be done, but AFAIK petitboot is mostly unmaintained these days." is sad
17:32:25 <dustymabe> jlebon: indeed
17:32:37 <dustymabe> maybe we should just call it a day and drop the arch altogether
17:32:56 <bgilbert> can we drop the others too?
17:33:04 <jlebon> why again is this not a problem in RHCOS?
17:33:06 <jmarrero> +1
17:33:19 <dustymabe> jlebon: I think it is
17:33:53 <dustymabe> https://bugzilla.redhat.com/show_bug.cgi?id=2104619
17:33:55 <jlebon> dustymabe: ok. so we do have to find a solution for this regardless
17:33:58 <jmarrero> RHCOS is where we saw it initially IIRC
17:34:20 <jlebon> oh right, it was hacked around in the MCO
17:34:30 <jlebon> which... is something zincati could do too
17:34:40 <jlebon> cleanup the rollback before staging a new deployment
17:34:40 <dustymabe> what's the hack?
17:35:11 <dustymabe> yeah, that's kind of what we wanted rpm-ostree to do, but only if it was needed
17:35:12 <jmarrero> MCO cleans up the old deployment
17:35:14 <jlebon> obviously the inconsistency there is not great, and it has reliability implications
17:35:43 <dustymabe> jmarrero: does the MCO only do that on ppc64le ?
17:35:57 <jlebon> doing it in zincati would be worse because it increases the window between the point of no return and finding out the deployment you're on is broken
17:36:19 <jmarrero> mmm let me dig the PR
17:36:21 <dustymabe> jlebon: indeed - i.e. if your upgrade window isn't until the weekend - zincati stages today
17:36:47 <dustymabe> though I guess that wouldn't help even if it was implemented in rpm-ostree would it?
17:36:57 <jlebon> https://github.com/openshift/machine-config-operator/pull/3243/
17:36:58 <dustymabe> or does this step only happen in the finalize-staged step ?
17:38:01 <jlebon> dustymabe: not sure i follow
17:38:12 <jlebon> maybe we can continue in #fedora-coreos
17:38:14 <travier> the MCO does it on all arches AFAIK
17:38:25 <jmarrero> It happens for all Arches
17:38:44 <travier> https://github.com/openshift/machine-config-operator/pull/3243/#issuecomment-1180668694
17:38:47 <dustymabe> i.e. if we implementing "opportunistic cleanup" in rpm-ostree (like we started to try) would that have the same problem of "staged update today but not going to reboot to apply until saturday"
17:39:31 * dustymabe closes this meeting soon and we'll head over to #fedora-coreos to discuss more
17:39:34 <travier> I think we need to consider the "increase" /boot option
17:39:42 <dustymabe> travier: yeah
17:39:49 <jlebon> dustymabe: yeah let's chat there :)
17:39:54 <dustymabe> #endmeeting