fedora_coreos_meeting
LOGS
16:30:47 <dustymabe> #startmeeting fedora_coreos_meeting
16:30:47 <zodbot> Meeting started Wed Aug 11 16:30:47 2021 UTC.
16:30:47 <zodbot> This meeting is logged and archived in a public location.
16:30:47 <zodbot> The chair is dustymabe. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:30:47 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
16:30:47 <zodbot> The meeting name has been set to 'fedora_coreos_meeting'
16:30:52 <dustymabe> #topic roll call
16:30:55 <dustymabe> .hi
16:30:56 <zodbot> dustymabe: dustymabe 'Dusty Mabe' <dusty@dustymabe.com>
16:31:04 <jaimelm> .hello2
16:31:05 <zodbot> jaimelm: jaimelm 'Jaime Magiera' <jaimelm@umich.edu>
16:31:09 <darkmuggle> .hi
16:31:10 <zodbot> darkmuggle: darkmuggle 'None' <me@muggle.dev>
16:32:31 <travier> .hello siosm
16:32:32 <zodbot> travier: siosm 'Timothée Ravier' <travier@redhat.com>
16:32:48 <bgilbert> .hi
16:32:49 <zodbot> bgilbert: bgilbert 'Benjamin Gilbert' <bgilbert@backtick.net>
16:32:53 <miabbott> .hello2
16:32:54 <zodbot> miabbott: miabbott 'Micah Abbott' <miabbott@redhat.com>
16:34:12 <jbrooks> .hello jasonbrooks
16:34:13 <zodbot> jbrooks: jasonbrooks 'Jason Brooks' <jbrooks@redhat.com>
16:35:13 <dustymabe> #chair jaimelm darkmuggle travier bgilbert miabbott jbrooks
16:35:13 <zodbot> Current chairs: bgilbert darkmuggle dustymabe jaimelm jbrooks miabbott travier
16:35:45 <dustymabe> #topic Action items from last meeting
16:35:54 <dustymabe> * dustymabe to re-index and look for newly submitted change proposals
16:35:55 <dustymabe> for f35 that we need to consider
16:35:57 <dustymabe> * - dustymabe to figure out how the cloud edition is handling the
16:35:59 <dustymabe> ipv6.addr-gen-mode=stable-privacy problem
16:36:19 <dustymabe> #info dustymabe re-indexed and found new items. See https://github.com/coreos/fedora-coreos-tracker/issues/856#issuecomment-896976066
16:37:00 <dustymabe> #info dustymabe added info about how fedora cloud base handles ipv6 addr gen mode in https://github.com/coreos/fedora-coreos-tracker/issues/907#issuecomment-895455009
16:37:17 <dustymabe> hopefully I don't just give myself action items this meeting
16:37:43 <dustymabe> #topic Differing behavior for aarch64 vs x86_64 disk images
16:37:48 <dustymabe> #link https://github.com/coreos/fedora-coreos-tracker/issues/855
16:37:57 * dustymabe waves at bgilbert
16:38:15 <bgilbert> right
16:38:36 <bgilbert> so, we decided to handle this via documentation, and then a bug happened.
16:38:51 <dustymabe> 🐛
16:39:06 <bgilbert> it turns out that on aarch64 and ppc64le, Butane generates boot-disk RAID templates that don't match what we actually ship
16:39:29 <bgilbert> specifically, they don't skip unused partition numbers.
16:40:08 <bgilbert> that would be a trivial fix, _except_ for another Ignition behavior.  if Ignition sees a partition number, it stops matching against the partition label when doinog config merges.
16:40:46 <bgilbert> so, if we explicitly specify partition numbers in the RAID templates, everyone who is overriding the RAID template to set the partition size of the root partition would need to add "number: 4" to their override.
16:41:10 <bgilbert> both OCP (as of 4.8) and FCOS have existing docs telling people to do the override by label only.
16:41:37 <bgilbert> and changing that Ignition behavior is a breaking change.
16:42:18 <bgilbert> so our options are:
16:43:04 <bgilbert> 1. break existing RAID template overrides (probably needs a Butane spec bump?)
16:43:22 <bgilbert> 2. change the Ignition matching behavior (needs Ignition spec 4 AFAICT)
16:43:53 <bgilbert> 3. ship empty partitions on aarch64/ppc64le so we don't skip any partition numbers
16:44:02 <bgilbert> 4. live with the inconsistency and update kola tests
16:44:11 <bgilbert> questions?
16:44:51 <darkmuggle> What is your recommendation bgilbert
16:44:57 <dustymabe> of course.. I was advocating for 3. before we found the bug bgilbert mentioned :)
16:45:03 <travier> Could 3 also enable us to do https://github.com/coreos/fedora-coreos-tracker/issues/855#issuecomment-889946742? Sorry I'm not familiar enough with aarch64/uboot to answer that
16:45:14 <travier> https://github.com/coreos/fedora-coreos-tracker/issues/855#issuecomment-889946742 (fixed link)
16:45:31 <bgilbert> travier: yes
16:46:30 <bgilbert> darkmuggle: I raised this because I'm wondering whether we should just ship the empty partitions.  it's hacky and I don't like it, but it's the smallest fix and it's backward-compatible
16:47:29 <miabbott> is there a follow-on change to fix this "for real", if we decide to choose a shorter term workaround?
16:47:44 <dustymabe> and.. bgilbert wanted to make dusty happy anyway so killing two birds with one stone
16:47:45 <darkmuggle> Right... 3 is the simplest fix. I thought I was missing some nuance, hence why I asked. That said, 3 seems reasonable with a backlog to fix in Spec4.
16:47:49 <miabbott> i.e. do the empty partitions now and then something more complete later
16:48:36 <bgilbert> in the long run I do think we need to fix the Ignition matching behavior
16:49:15 <bgilbert> Ignition treats labels as second-tier identifiers but our messaging has consistently been that people should use them over partnums
16:49:30 <miabbott> if we have a way out of option 3 in the long term, then it seems like that is the best choice
16:49:47 <dustymabe> miabbott: but I don't assume we'd want to change out of option 3
16:49:54 <dustymabe> it has other benefits
16:50:01 <travier> (I'm trying to figure out if we need some small empty partition space for u-boot to UEFI in arm32/64 setups but my knowledge is lacking)
16:50:22 <bgilbert> once we start shipping the empty partitions, it'll be hard to justify the work to remove them, even without other benefits
16:50:35 <miabbott> dustymabe: understood. i just don't want us to get locked into something we can't escape
16:50:59 <bgilbert> if we were to ship empty partitions, there's the question of how to do it
16:51:28 <dustymabe> so.. official turn in conversation focusing on option 3
16:51:29 <bgilbert> it wouldn't affect x86_64 at all, but Butane would need to be updated for the new partmaps in aarch64 and ppc64le
16:51:57 <bgilbert> dustymabe: maybe not, just exploring the option space
16:52:04 <dustymabe> +1
16:52:16 <dustymabe> yeah we can go back to the higher level too, just wanted to note the change
16:52:24 <bgilbert> +1
16:52:42 <dustymabe> bgilbert: and we assume that butane update would be less disruptive?
16:52:51 <bgilbert> so now we'd have a situation where newer Butane can't be used with older FCOS on those arches
16:52:57 <bgilbert> (FCOS and RHCOS)
16:53:06 <bgilbert> *can't be used for boot disk RAID
16:53:17 <dustymabe> which, for FCOS, is a non-issue
16:53:21 <bgilbert> yup
16:53:24 <dustymabe> since we don't ship those arches yet
16:53:47 <bgilbert> in OCP, well
16:54:01 <dustymabe> so the problem set is OCP 4.8 (first time butane was supported) + aarch64/ppc64le + boot disk RAID
16:54:21 <bgilbert> we don't bind Butane releases to OCP releases, but that also doesn't matter because users don't update their bootimages so they're stuck with old Ignition forever
16:54:38 <bgilbert> dustymabe: Butane was supported in 4.7 specifically for the boot RAID case
16:54:46 <dustymabe> ahh, I didn't know that
16:54:53 <dustymabe> thought it was new with 4.8
16:54:59 <bgilbert> it was generalized for 4.8
16:55:20 <dustymabe> but.. was aarch64 supported with 4.7 ?
16:55:30 <dustymabe> that's relatively new
16:55:39 <bgilbert> no, but I think ppc64le was
16:55:44 <dustymabe> got ya
16:55:44 <miabbott> aarch64 isn't supported in OCP until 4.9
16:55:58 <bgilbert> we can do backports, but again, old bootimages
16:56:12 <bgilbert> new image/old Butane should be fine.  the problem is only old image/new Butane
16:56:19 <jaimelm> https://docs.openshift.com/container-platform/4.7/installing/install_config/installing-customizing.html
16:56:37 <bgilbert> maybe we decide not to care about old bootimages, since we've sorta done that for every new Ignition feature as-is
16:57:09 <bgilbert> okay, end of tangent I think
16:57:26 <dustymabe> back to the higher level
16:57:29 <dustymabe> 1. break existing RAID template overrides (probably needs a Butane spec bump?)
16:57:31 <dustymabe> 2. change the Ignition matching behavior (needs Ignition spec 4 AFAICT)
16:57:33 <dustymabe> 3. ship empty partitions on aarch64/ppc64le so we don't skip any partition numbers
16:57:35 <dustymabe> 4. live with the inconsistency and update kola tests
16:57:43 <bgilbert> interested in people's general thoughts among those options
16:57:46 <dustymabe> want to remove an option or two from the list because ETOOHARD ?
16:58:06 <bgilbert> I don't think we should try to do 2 right now
16:58:29 <dustymabe> ❌ 2.
16:59:48 <dustymabe> i'll abstain from commenting because I originally opened the $topic issue and my desires were clear
17:00:15 <bgilbert> also interested in jlebon's thoughts but he's AFK
17:00:34 <dustymabe> yeah - we can take this to next meeting, though.. how time sensitive is it?
17:01:36 <dustymabe> then again.. there seems to be a lot of good reason to go with 3.
17:01:44 <bgilbert> the consequences of the current behavior are: 1) we had to disable kola tests on aarch64/ppc64le; 2) boot disk RAID puts the rootfs on partition 3
17:02:06 <dustymabe> even more so if we could accomodate the uboot stuff that travier mentioned (though that shouldn't be a primary motiviator IMO)
17:02:06 <bgilbert> some tools hardcode a partition 4 assumption but I don't think anything in the OS does
17:02:42 <bgilbert> i.e., not that time sensitive AFAIK, except for the 4.9 code freeze
17:03:17 <dustymabe> is there a good reason to solve this in another way than 3. considering the tangential benefits ?
17:03:21 <travier> I think we need a chat with arm-aware folks to figure out if this is needed / if that will help
17:03:31 <travier> (will set that up)
17:03:31 <jaimelm> ^^
17:03:32 <dustymabe> travier: yeah
17:03:50 <dustymabe> unfortunately if what we're proposing wouldn't help I don't see us making any other changes
17:04:51 <bgilbert> dustymabe: I don't hear anyone arguing for breaking existing Butane configs, so
17:05:01 <bgilbert> dustymabe: I think it comes down to fix bug vs. live with bug
17:05:11 <bgilbert> s/fix/work around/
17:05:42 <bgilbert> travier: +1
17:05:52 <dustymabe> bgilbert: and to be clear.. 3. would break existing butane configs
17:06:23 <bgilbert> dustymabe: no.  3 would break existing releases of Butane.
17:06:37 <bgilbert> ...except
17:07:17 <bgilbert> wait, sorry, notional %undo
17:07:45 <dustymabe> ok I'll try to update the issue with this discussion (unless bgilbert wants to)
17:08:28 <bgilbert> using this feature with new Butane releases would not work with old OSes
17:09:03 <bgilbert> old rendered configs would not change behavior, and old Butane configs would seamlessly switch to the new behavior when recompiled
17:09:14 <bgilbert> dustymabe: +1, thanks
17:09:18 <dustymabe> hopefully it would fail hard :)
17:09:23 <bgilbert> dustymabe: yes
17:09:31 <dustymabe> #topic F35: CHANGE: CompilerPolicy Change
17:09:32 <bgilbert> at provisioning time
17:09:36 <dustymabe> #link https://github.com/coreos/fedora-coreos-tracker/issues/872
17:09:46 <dustymabe> jaimelm: want to speak to this one?
17:11:17 <jaimelm> Well, your description pretty much lays it out in a nutshell. There will be more leeway now in terms of Clang/LLVM
17:11:36 <jaimelm> Cursory look shows no downside.
17:12:10 <dustymabe> i'm assuming there's no issue there for us (our tools specifically)
17:12:21 <dustymabe> prefering clang/llvm isn't really a thing
17:12:30 <jaimelm> right.
17:12:43 <dustymabe> and for other tools we pull from the rest of Fedora, we don't have control over that other than reporting new bugs we find
17:12:48 <dustymabe> so we should be good there
17:13:14 <dustymabe> ok i'll move on to the next issue
17:13:33 <dustymabe> #topic F35: CHANGE: More flexible use of SSSD fast cache for local users
17:13:37 <dustymabe> #link https://github.com/coreos/fedora-coreos-tracker/issues/875
17:13:42 <dustymabe> darkmuggle: let me know if you want me to punt
17:14:05 <jaimelm> BTW, the bugzilla was updated yesterday for code complete deadline
17:14:12 <jaimelm> no update after that
17:15:17 <dustymabe> ok we'll punt
17:15:29 <dustymabe> #topic tracker: Fedora 35 changes considerations
17:15:35 <dustymabe> #link https://github.com/coreos/fedora-coreos-tracker/issues/856
17:15:42 <dustymabe> this is the high level rollup issue
17:15:53 <dustymabe> I just updated the description with all of the new accepted changes
17:16:04 <dustymabe> the ones we haven't looked at yet are marked with a ❌
17:16:39 <jaimelm> dustymabe is really making use of that red x.
17:16:40 <dustymabe> we'll spend time here going through to see if we can skip or need to investigate them
17:16:53 <dustymabe> item: 1.7 x TRIAGE Boost 1.76 upgrade
17:17:04 <dustymabe> skip or investigate ?
17:18:14 <jaimelm> I can't think of anything this would touch on
17:18:36 <dustymabe> yeah I think this is mostly introducing the new change and getting all dependent packages to compile/build
17:19:02 <dustymabe> #info skipping Boost 1.76 upgrade because it should be contained to the build system (making sure dependent packages compile)
17:19:17 <dustymabe> item: 1.8 x TRIAGE MinGW environment and toolchain update
17:20:03 <dustymabe> i don't think I've ever come across MinGW - anyone know?
17:20:23 <jaimelm> nope
17:21:02 <dustymabe> created to support the GCC compiler on Windows systems
17:21:22 <bgilbert> yup, it's a Windows cross-compiler
17:21:35 <bgilbert> not relevant to us
17:21:44 <dustymabe> #info skipping MinGW environment and toolchain update because it's a Windows cross-compiler, not relevant.
17:22:02 <dustymabe> item: 1.10 x TRIAGE Make btrfs the default file system for Fedora Cloud
17:22:33 <dustymabe> #info skipping "Make btrfs the default file system for Fedora Cloud" as it is only for Cloud edition.
17:22:48 <dustymabe> item: 1.11 x TRIAGE Build Fedora Cloud Images with Hybrid BIOS+UEFI Boot Support
17:22:58 <bgilbert> :-D
17:23:04 <dustymabe> #info skipping "Build Fedora Cloud Images with Hybrid BIOS+UEFI Boot Support" as it is only for Cloud edition.
17:23:18 <dustymabe> item: 1.12 x TRIAGE Adding Selected Flathub Applications
17:23:42 <dustymabe> #info skipping "Adding Selected Flathub Applications" as we don't use flatpaks
17:23:52 <dustymabe> item: 1.13 x TRIAGE Update firewalld to v1.0.0
17:24:21 <dustymabe> while i'm personally interested in this one.. not relevant to FCOS
17:24:25 <jaimelm> nope
17:24:26 <dustymabe> #info skipping "Update firewalld to v1.0.0" as we don't use firewalld
17:24:35 <dustymabe> item: 1.15 x TRIAGE Gconv package split in glibc
17:25:23 <dustymabe> do we use glibc-gconv-extra for anything ?
17:26:10 <dustymabe> weird. I don't see it installed on my FCOS machine
17:27:20 <dustymabe> i'll look a little more into this one
17:27:41 <dustymabe> we'll pick the item list up next time
17:27:45 <dustymabe> #topic open floor
17:27:49 <dustymabe> sorry that was a bit dry at the end
17:28:21 <jaimelm> Well, better to go through it to be sure.
17:28:21 <dustymabe> #info multi-arch pipeline work is ongoing. The first bits landed and the first runs are going now (finding and fixing issues along the way)
17:28:33 <dustymabe> jaimelm++
17:28:56 <jaimelm> cool
17:29:10 <dustymabe> bgilbert: you were just a day or two ahead of me enabling metal image builds for aarch64 and hitting the raid test failures
17:29:41 <dustymabe> i guess it would happen for qemu too, but I'm just running basic tests for now (not the full suite)
17:29:46 <bgilbert> dustymabe: wasn't me; the multi-arch folks hit it after RHCOS was rebased to cosa main
17:30:01 <bgilbert> *switched to
17:30:01 <dustymabe> dang - i was so close to finding it first in FCOS
17:30:24 <dustymabe> oh well
17:30:26 <dustymabe> we're at time
17:30:35 <dustymabe> will close out in 30s unless more topics come up
17:32:14 <dustymabe> #endmeeting