fedora_coreos_meeting
LOGS
16:32:15 <dustymabe> #startmeeting fedora_coreos_meeting
16:32:15 <zodbot> Meeting started Wed Aug 24 16:32:15 2022 UTC.
16:32:15 <zodbot> This meeting is logged and archived in a public location.
16:32:15 <zodbot> The chair is dustymabe. Information about MeetBot at https://fedoraproject.org/wiki/Zodbot#Meeting_Functions.
16:32:15 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
16:32:15 <zodbot> The meeting name has been set to 'fedora_coreos_meeting'
16:32:19 <dustymabe> #topic roll call
16:32:21 <dustymabe> .hi
16:32:22 <zodbot> dustymabe: dustymabe 'Dusty Mabe' <dusty@dustymabe.com>
16:32:32 <jmarrero> .hi
16:32:33 <zodbot> jmarrero: jmarrero 'Joseph Marrero' <jmarrero@redhat.com>
16:32:49 <jbrooks> .hello jasonbrooks
16:32:49 <zodbot> jbrooks: jasonbrooks 'Jason Brooks' <jbrooks@redhat.com>
16:33:29 <lorbus> .hi
16:33:30 <zodbot> lorbus: lorbus 'Christian Glombek' <cglombek@redhat.com>
16:33:40 <dustymabe> #chair jmarrero jbrooks lorbus
16:33:40 <zodbot> Current chairs: dustymabe jbrooks jmarrero lorbus
16:34:21 <ravanelli> .hi
16:34:22 <zodbot> ravanelli: ravanelli 'Renata Ravanelli' <renata.ravanelli@gmail.com>
16:34:46 <jlebon> .hello2
16:34:47 <zodbot> jlebon: jlebon 'None' <jonathan@jlebon.com>
16:34:53 <dustymabe> #chari jlebon ravanelli
16:35:08 * dustymabe is hoping to have bgilbert and walters around to discuss a few things
16:35:48 <aaradhak> .hi
16:35:49 <zodbot> aaradhak: aaradhak 'Aashish Radhakrishnan' <aaradhak@redhat.com>
16:36:09 <dustymabe> #chair aaradhak
16:36:09 <zodbot> Current chairs: aaradhak dustymabe jbrooks jmarrero lorbus
16:36:13 <bgilbert> .hi
16:36:14 <zodbot> bgilbert: bgilbert 'Benjamin Gilbert' <bgilbert@backtick.net>
16:36:20 <dustymabe> #chair bgilbert travier
16:36:20 <zodbot> Current chairs: aaradhak bgilbert dustymabe jbrooks jmarrero lorbus travier
16:36:30 <dustymabe> ok let's get started
16:36:36 <dustymabe> #topic Action items from last meeting
16:36:49 <dustymabe> #info no action items from last meeting!
16:37:00 <dustymabe> #topic tracker: Fedora 37 changes considerations
16:37:16 <travier> .hi siosm
16:37:16 <dustymabe> #link https://github.com/coreos/fedora-coreos-tracker/issues/1222
16:37:17 <zodbot> travier: Sorry, but user 'travier' does not exist
16:37:24 <travier> .hello siosm
16:37:25 <zodbot> travier: siosm 'Timothée Ravier' <travier@redhat.com>
16:38:23 <dustymabe> ok a few updates on this topic.. the macaddresspolicy change fell out (thomas got busy and went on vacation) but we'll get it into F38. https://fedoraproject.org/wiki/Changes/MAC_Address_Policy_none
16:39:12 <dustymabe> #info the hostname change got into f38/f37: https://github.com/coreos/fedora-coreos-tracker/issues/902#issuecomment-1225839825
16:39:19 <dustymabe> #info the macaddresspolicy change fell out (thomas got busy and went on vacation) but we'll get it into F38. https://fedoraproject.org/wiki/Changes/MAC_Address_Policy_none
16:39:35 <dustymabe> we have a few new self contained changes to review:
16:39:52 <dustymabe> subtopic 225. Haskell GHC 8.10.7 & Stackage LTS 18.28
16:40:06 <jlebon> i'll have to leave at the halfway point for another meeting
16:40:31 <dustymabe> nothing for us to do here. we don't ship those
16:40:40 <dustymabe> subtopic 226. Mumble 1.4
16:40:47 <dustymabe> nothing for us to do here. we don't ship mumble
16:40:48 <jlebon> dustymabe: nice work on the hostname change! that was a long road :)
16:40:59 <dustymabe> jlebon: thanks :)
16:41:08 <dustymabe> subtopic 227. Emacs 28
16:41:14 <dustymabe> nothing for us to do here. we don't ship emacs
16:41:26 <dustymabe> ok that's all the new items
16:41:35 <dustymabe> anything else change related that we should discuss?
16:41:38 <travier> 👍👍
16:41:44 <travier> looks good
16:41:54 <dustymabe> there is the "Preset All Systemd Units on First Boot" - should we discuss status on that one?
16:42:31 <jlebon> it's in, I still need to verify that it works for us, but we're not dependent on it. we're still carrying the workaround in Ignition for now
16:42:47 <jlebon> i don't think there's anything new to discuss
16:43:13 <dustymabe> there is also 118.  BIOS boot.iso with GRUB2
16:43:26 <dustymabe> which doesn't affect us directly
16:43:46 <dustymabe> but we should consider trying to revisit these topics so we don't just drop them forever
16:44:13 <dustymabe> once they fall off the current view of the world it's hard to remember to go back to them
16:45:10 <jlebon> let's file an issue for it?
16:45:11 <travier> 👍
16:45:11 <dustymabe> ok i'll move on to the next ticket
16:45:19 <travier> we hav https://github.com/coreos/fedora-coreos-tracker/issues/1231
16:45:34 <jlebon> ack, nice
16:45:51 <dustymabe> yeah we have an issue. we just need to remember to followup
16:45:59 <dustymabe> #topic Document /boot requirements and constrains when installing/upgrading kernels
16:46:03 <dustymabe> #link https://github.com/coreos/fedora-coreos-tracker/issues/1247
16:46:28 <dustymabe> ok I tagged this one with the meeting label
16:47:06 <dustymabe> basically we have a few efforts underway here to help change our /boot contstraints
16:47:13 <dustymabe> AFAIK there is
16:47:23 <dustymabe> 1. change compression algorithm (underway)
16:47:47 <dustymabe> 2. change rpm-ostree behavior to opportunistically cleanup rollback deployment if needed (in discussion)
16:47:55 <dustymabe> 3. change the size of /boot/ partition
16:48:01 <dustymabe> (in discussion)
16:48:15 <dustymabe> anything else I'm missing that hasn't already been disqualified?
16:49:04 <dustymabe> (also, do those 3 look correctly characterized?)
16:49:46 <jlebon> seems right
16:49:58 <jmarrero> yeah
16:50:19 <dustymabe> ok so let's continue the discussion here
16:51:06 <dustymabe> the reason I'm bringing this up is because we are looking to add ppc64le and the /boot contents there are larger than the other platforms (see https://github.com/coreos/fedora-coreos-tracker/issues/987#issuecomment-1221438641)
16:51:30 <dustymabe> so it would be nice to have at least one of these mitigations in place before we ship that arch
16:51:44 <dustymabe> so let's skip discussion on 1. since it's already in progress
16:52:49 <dustymabe> for 3. are we seriously considering changing the /boot/ partition size and what all would that take (I imagine we'd want to do it across the board to retain symetry among most of our arches like we've had before
16:53:22 <jlebon> i touched on this in https://github.com/coreos/fedora-coreos-tracker/issues/1247#issuecomment-1177907272
16:53:41 <jlebon> i think we should, but we also need to be very careful about it given https://github.com/coreos/fedora-coreos-tracker/issues/1247#issuecomment-1190704602
16:54:02 <bgilbert> I'm not sure we can
16:54:30 <bgilbert> historically, at different times, we've documented two approaches for setting the rootfs size
16:54:53 <bgilbert> (when creating an additional data partition on the same disk)
16:55:03 <bgilbert> the older one was to set the starting offset of the data partition
16:55:21 <bgilbert> and the current one is to use "resize: true" (now that we have that) and set the size of the rootfs directly
16:56:14 <travier> I'd say we should unify the docs, prepare the change to a bigger /boot, announce it and wait 6 months with reminders at 3 months?
16:56:18 <bgilbert> if we had only done the older one, we could resize the bootfs up and the rootfs down and not risk clobbering anything
16:56:44 <bgilbert> but with the newer one, we're stuck.  we've carefully provided advice on how to write Ignition configs so the data partition won't be clobbered, and the advice was bad
16:57:09 <bgilbert> travier: the old docs are gone, but old deployed Ignition configs may not be
16:57:58 <dustymabe> bgilbert: and there is no way we can detect a "reprovision" and safely error?
16:58:01 <travier> yes, I understand that it requires our users to interact with their systems or change their configs
16:58:04 <jlebon> bgilbert: hmm, are you saying we can't require users to change their configs?
16:58:25 <bgilbert> announcements are all well and good, but clobbering user data on systems configured according to our advice is Very Bad, and we can't guarantee that everyone will see an announcement
16:58:34 <ravanelli> dustymabe: boot changes in general may also break ppc64le, since it uses petiboot. I don't size change changes will break it, but it is good to check with command for GRUB2 will change
16:59:04 <bgilbert> if we can automatically detect & error out, that would address my concern, but in general I don't think we can
16:59:05 <travier> ravanelli: but power has not been shipped yet?
16:59:09 <travier> for focs
16:59:20 <ravanelli> no, I think the idea is too
16:59:24 <bgilbert> the user isn't required to indicate their intention to reprovision nondestructively, and reprovisioning destructively is a valid operation
16:59:37 <bgilbert> (plus Ignition doesn't even know the new partition is there, so it'd have to happen somehow in coreos-installer)
17:00:29 <travier> there is indeed a lot of potential for tricky issues
17:01:42 <travier> Can we have an opt-in flag to resize /boot ?
17:01:44 <dustymabe> it might help if we enumerate (offline) the cases
17:01:57 <dustymabe> and indicate in which cases you end up with data loss
17:01:59 <travier> so that we don't touch the default but let folks opt-in for it?
17:02:23 <bgilbert> travier: yes, if we're willing to incur a transposefs run for everyone who sets it
17:02:24 <dustymabe> travier: i mean, people can already resize boot today
17:02:30 <jlebon> i see the concern, but I'm also concerned about being handcuffed forever
17:02:46 <travier> sure you can resize today but it's very manual
17:03:04 <jlebon> moving down gigs of data on first boot isn't a great UX
17:03:06 <dustymabe> travier: as in a manually crafted butane config?
17:03:17 <dustymabe> (still automated, though)?
17:03:18 <bgilbert> Butane could certainly have a flag that desugars to resizing boot and recreating root at firstboot
17:03:39 <travier> https://github.com/coreos/fedora-coreos-docs/issues/410
17:03:48 <bgilbert> (or we could ship two images, ugh)
17:03:48 <dustymabe> correct
17:03:52 <travier> https://github.com/coreos/fedora-coreos-tracker/issues/1196#issuecomment-1132428498
17:04:00 <dustymabe> travier: right
17:04:11 <dustymabe> bgilbert: I'd definitely prefer not :)
17:04:14 <bgilbert> me too
17:04:42 <dustymabe> we could push this problem down the road
17:04:52 <dustymabe> honestly a combination of 1 & 2 would probably suffice
17:05:00 <travier> shipping two image would have the advantage that we could say that the previous on is deprecated and announce a 1 year switch
17:05:00 <dustymabe> perhaps we could also look at other potential space savings
17:05:03 <travier> for example
17:05:10 <travier> thus folks would have to look at it
17:05:13 <travier> could not just ignore it
17:05:17 <dustymabe> travier: I don't really think that solves the problem
17:05:27 <bgilbert> (I don't think we're special in this regard.  e.g. Fedora has to continue working with old small /boot partitions forever)
17:05:57 * dustymabe goes back to old me and whispers in his ear to set /boot to at least 512M
17:06:19 <dustymabe> we are slightly special in that we bake in a bunch of statically compiled (larger) files in our initramfs
17:06:37 <bgilbert> (well okay, Anaconda-based Fedora can make new installations larger)
17:07:48 <bgilbert> travier: if we update the coreos-installer defaults, it doesn't solve the problem, yeah
17:08:41 <travier> we would not be able to. it would be another "platform"
17:08:44 <travier> qemu2
17:08:46 * dustymabe wonders if we could take any cue based on the Ignition config version
17:08:52 <travier> but I agree that it's ugly
17:08:58 <dustymabe> probaly not
17:09:12 <bgilbert> yeah, I don't think so
17:09:15 <dustymabe> would require a lot of assumptions
17:09:27 <jlebon> bgilbert: but unlike Anaconda-based Fedora, we put a lot more emphasis on reprovisionability
17:09:31 <bgilbert> yup
17:09:50 <bgilbert> this has me thinking about Colin's split-initramfs approach again
17:10:00 <bgilbert> which I argued against pretty strongly on complexity grounds
17:10:09 <dustymabe> yeah, I was with you
17:10:19 <bgilbert> but it does have the advantage that we're handling the consequences of our decisions ourselves rather than pushing them onto the user
17:11:30 <dustymabe> i'll call that...
17:11:40 <dustymabe> 4. split-initramfs/rootfs binaries
17:11:52 <dustymabe> seriously though.. can we revisit 2. ?
17:11:58 <bgilbert> do we have a sense of when to stop?  i.e., if zstd give us some space back, and the rpm-ostree changes give us some flexibility, etc.,
17:12:04 <bgilbert> when do we call it good enough?
17:12:14 <bgilbert> dustymabe: yes, let's
17:12:55 <dustymabe> so IIUC 2. basically says "if we need extra space to finalize deployment, then we clean up the rollback files first"
17:12:59 <jlebon> the rpm-ostree change alone would fix this, but we don't want to rely on it too much
17:13:29 * dustymabe brb - please continue discussion
17:13:45 <jlebon> because you'd lose your rollback
17:14:45 <jlebon> https://github.com/ostreedev/ostree/issues/2670#issuecomment-1179341883
17:15:33 <bgilbert> yeah
17:15:47 <jlebon> bgilbert: i'm still not over the fact that we can't change new images for this
17:15:54 * dustymabe back
17:15:55 <jlebon> i feel like we should have that freedom
17:16:21 <jlebon> it's unlikely to be the last of its kind
17:16:53 <dustymabe> honestly I think if you lose your rollback you're in the same position as if you can't upgrade because you don't have enough space
17:17:16 <bgilbert> jlebon: image-based auto-upgrading OSes don't have infinite degrees of freedom, sadly :-(
17:17:27 <dustymabe> keep in mind here that the rollback you are losing is the one that you already haven't been running for two weeks
17:17:43 <bgilbert> jlebon: we've known that
17:17:48 <jlebon> dustymabe: the reason you can't upgrade may have nothing to do with the ENOSPC check
17:18:35 <dustymabe> but wouldn't we only be cleaning it up if we progressed enough in the upgrade to get to the final stage ?
17:19:05 <dustymabe> i.e. most "upgrade" problems would have been cleared by that point
17:19:56 * dustymabe has a loose approximation about how rpm-ostree works, so clearly people who know better can tell me where I'm wrong
17:20:00 <jlebon> there's still things it does afterwards that could fail, but i think that's true, yes
17:20:51 <dustymabe> ok we're running short on time..
17:20:54 <jlebon> anyway, don't want to belabor this. we can chat more in the ticket!
17:20:58 <dustymabe> any conclusions we want to draw at this point?
17:21:13 <dustymabe> or paths forward (i.e. more investigation here or there?)
17:21:38 <jmarrero> +1 on we should have that freedom. It feels like we should not be limited forever. We need some sort of system for these "breaking changes" if we don't have one already. We could detect if there is enough space to upgrade and change the partition size and if no space to resize, then error out while upgrading and leave it alone.
17:21:43 <jmarrero> I think more investigation makes sense before drawing the line.
17:22:16 <bgilbert> jmarrero: we don't have a mechanism for restructuring partitions at runtime
17:22:19 <bgilbert> FCOS doesn't use LVM
17:22:28 <bgilbert> (and can't really, Ignition doesn't support it)
17:22:44 <bgilbert> dustymabe: there's still the ppc question
17:23:00 <bgilbert> in principle we can have different partition sizes for different platforms, though I think we previously decided not to do that
17:23:07 <dustymabe> right. IOW we're going to still need other solutions other than "resize /boot" unless we want to force people to reprovision existing systems
17:23:33 <dustymabe> bgilbert: correct. I'd prefer to keep them in line if we can
17:23:34 <jlebon> if we're committing long-term to deal with it on existing platforms, then i'd say it's probably not worth diverging
17:24:00 <bgilbert> "how much fix" do we need to be comfortable shipping ppc?
17:24:04 <dustymabe> jlebon: i.e. might as well ship ppc64le since we have to deal with it anyway?
17:24:17 <dustymabe> bgilbert: I assume the compression fix would be sufficient
17:24:22 <bgilbert> okay
17:24:30 <dustymabe> but would need to do some final testing
17:24:34 <jlebon> dustymabe: right yeah. and RHCOS is already shipping ppc64le with that layout
17:24:48 <dustymabe> jlebon: the 384M layout?
17:25:11 <jlebon> dustymabe: yup
17:25:20 <dustymabe> interesting..
17:25:32 <dustymabe> ok let me try to summarize
17:27:17 <dustymabe> #proposed we discussed the different options for solving this general problem here today. Right now we don't see a clear path forward for changing the /boot/ partition size without risking data loss while re-provisioning systems. We're going to investigate the other options and also brainstorm on how we can increase the /boot partition in the future. For now we'll try to get at least the
17:27:18 <dustymabe> compression mitigation in place and move forward with shipping ppc64le.
17:28:08 <dustymabe> I'll add some more context in the ticket too
17:28:12 <dustymabe> ack/nack?
17:28:34 <jlebon> ack
17:28:49 <bgilbert> ack
17:28:58 <jmarrero> ack
17:29:13 <travier> disagree about not diverging
17:29:22 <travier> we're hitting this on ppc now
17:29:35 <travier> so clearly the size requirements are not the same for each platforms
17:29:49 <travier> but ack for the proposed
17:30:01 <travier> well not the ppc part
17:30:13 <travier> I think we should fix it now as we have the option
17:30:17 <dustymabe> I guess we were focused on FCOS here (which makes sense)
17:30:34 <travier> sure, but the size issues are the same
17:30:38 <dustymabe> but it's possible the compression solution might not work there (does moving to zstd work their?)
17:30:44 <bgilbert> travier: that'd mean we'd need to conditionalize the ppc boot size for FCOS/RHCOS, right?
17:30:47 <dustymabe> travier: ^^ that's the difference
17:30:58 <bgilbert> dustymabe: no zstd in RHEL 8
17:31:18 <dustymabe> could we use more aggressive xz there?
17:31:41 <travier> (have to go sorry)
17:31:43 <dustymabe> I guess we could require 1 to be in place for FCOS and 2. would solve the problem for RHCOS
17:31:45 <bgilbert> dustymabe: we don't use xz at all right now.  yes, but it slows down boot by seconds.
17:32:04 <dustymabe> bgilbert: yeah, could be OK for one platform
17:32:11 <dustymabe> sorry, one arch
17:32:32 <dustymabe> ok I'll mark this as agreed and we'll continue to discuss options for RHCOS in the appropriate places for that
17:32:38 <dustymabe> #agreed we discussed the different options for solving this general problem here today. Right now we don't see a clear path forward for changing the /boot/ partition size without risking data loss while re-provisioning systems. We're going to investigate the other options and also brainstorm on how we can increase the /boot partition in the future. For now we'll try to get at least the
17:32:40 <dustymabe> compression mitigation in place and move forward with shipping ppc64le.
17:33:07 <bgilbert> yeah, one arch in RHCOS for the lifetime of RHEL 8 could be okay
17:33:16 <dustymabe> anyone able to hang around for the other two meeting topics? was hoping to get to them since jlebon is going to be AFK for some weeks?
17:33:35 <bgilbert> I can
17:33:47 <jlebon> I can also
17:33:49 <dustymabe> #topic NetworkManager: consider defaulting to EUI-64 for IPv6 SLAAC (at least on OpenStack)
17:33:58 <dustymabe> #link https://github.com/coreos/fedora-coreos-tracker/issues/907
17:34:06 <dustymabe> jlebon: you tagged this one I think
17:34:20 <jlebon> yup
17:34:53 <jlebon> i've talked to an SME and put the TL;DR in https://github.com/coreos/fedora-coreos-tracker/issues/907#issuecomment-1210894052
17:35:54 <jlebon> so I think that leans us towards doing the change, but for OpenStack only since the platform expects it
17:36:13 <dustymabe> WFM
17:36:46 * dustymabe wonders if we need to consider "upgrading" systems to be different than "newly deployed" ones here
17:37:57 <jlebon> i was thinking it'd be for newly deployed systems only
17:38:19 <jlebon> via runtime conditinals on firstboot
17:38:24 <dustymabe> which means we probably need a barrier that writes out a config describing the current behavior
17:38:57 <dustymabe> ahh "runtime conditionals" meaning we dynamically apply the config?
17:39:02 <dustymabe> on first boot
17:39:16 <jlebon> yeah, in e.g. `coreos-teardown-initramfs` where we have other config propagation bits
17:39:42 <jlebon> actually, better done before ignition-files
17:39:50 <jlebon> so it can be overridden if one really wants
17:39:58 <dustymabe> maybe let's leave the implementation to a followup discussion
17:40:02 <dustymabe> #proposed we will set ipv6.addr-gen-mode=eui64 as the default on our OpenStack platform since the platform expects this to be the case. We will attempt to leave currently deployed systems alone so that we don't change an existing system's IP address.
17:40:19 <jlebon> ack
17:40:41 <bgilbert> wfm
17:40:46 <dustymabe> any opposed?
17:40:48 <dustymabe> ack from me :)
17:41:17 <dustymabe> #agreed we will set ipv6.addr-gen-mode=eui64 as the default on our OpenStack platform since the platform expects this to be the case. We will attempt to leave currently deployed systems alone so that we don't change an existing system's IP address.
17:41:27 <dustymabe> #topic Pinning coreos-assembler in FCOS releases
17:41:31 <dustymabe> #link https://github.com/coreos/fedora-coreos-tracker/issues/1068
17:41:35 <dustymabe> jlebon: you again :)
17:41:48 <dustymabe> we can push this one if you'd like, or can discuss it today
17:42:40 <jlebon> the TL;DR is: this is unblocked now. let's fix it! i added a strawman at the end
17:42:49 <jlebon> cool to discuss here or keep it there
17:43:27 <dustymabe> I think there are some details here that might be tricky to get right
17:43:45 <jlebon> agree
17:44:25 <dustymabe> maybe since we are over time let's push it and maybe work out some of the details offline and bring back a better proposal to the meeting
17:44:33 <dustymabe> in short, though. I think I'm in favor of pinning
17:44:36 <dustymabe> to make things more reliable
17:45:09 <jlebon> sure, SGTM.  i think if we keep it simple, it'll be likelier to get implemented :)
17:45:21 <dustymabe> always true :)
17:45:24 <dustymabe> #topic open floor
17:45:32 <dustymabe> anyone have topics for open floor (sorry about the late meeting)
17:46:01 <dustymabe> #info f37 test week for fcos is tentatively sept 19-23
17:46:22 <dustymabe> which.. /me checks - happens to be when he has scheduled vacation
17:46:36 <dustymabe> sigh.. might need to be the week before that :)
17:46:46 <dustymabe> #undo
17:46:46 <zodbot> Removing item from minutes: INFO by dustymabe at 17:46:01 : f37 test week for fcos is tentatively sept 19-23
17:46:54 <dustymabe> i'll circle back with SumantroMukherje on that
17:47:00 <dustymabe> any other topics for open floor?
17:48:17 <dustymabe> #endmeeting