fedora_coreos_meeting
LOGS
16:30:51 <lucab> #startmeeting fedora_coreos_meeting
16:30:51 <zodbot> Meeting started Wed Oct 13 16:30:51 2021 UTC.
16:30:51 <zodbot> This meeting is logged and archived in a public location.
16:30:51 <zodbot> The chair is lucab. Information about MeetBot at https://fedoraproject.org/wiki/Zodbot#Meeting_Functions.
16:30:51 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
16:30:51 <zodbot> The meeting name has been set to 'fedora_coreos_meeting'
16:31:02 <lucab> #topic roll call
16:31:09 <lorbus> .hi
16:31:10 <zodbot> lorbus: lorbus 'Christian Glombek' <cglombek@redhat.com>
16:31:17 <jaimelm> .hello2
16:31:18 <zodbot> jaimelm: jaimelm 'Jaime Magiera' <jaimelm@umich.edu>
16:32:09 <dustymabe> .hi
16:32:10 <zodbot> dustymabe: dustymabe 'Dusty Mabe' <dusty@dustymabe.com>
16:32:15 <ravanelli> .hi
16:32:16 <zodbot> ravanelli: ravanelli 'Renata Andrade Matos Ravanelii' <renata.ravanelli@gmail.com>
16:32:50 <travier> .hello siosm
16:32:51 <zodbot> travier: siosm 'Timothée Ravier' <travier@redhat.com>
16:32:54 <jbrooks> .hello jasonbrooks
16:32:56 <zodbot> jbrooks: jasonbrooks 'Jason Brooks' <jbrooks@redhat.com>
16:33:21 <lucab> #chair lorbus jaimelm ravanelli dustymabe travier jbrooks
16:33:21 <zodbot> Current chairs: dustymabe jaimelm jbrooks lorbus lucab ravanelli travier
16:33:29 <bgilbert> .hi
16:33:30 <zodbot> bgilbert: bgilbert 'Benjamin Gilbert' <bgilbert@backtick.net>
16:34:28 <lucab> #chair bgilbert
16:34:28 <zodbot> Current chairs: bgilbert dustymabe jaimelm jbrooks lorbus lucab ravanelli travier
16:34:37 <scorreia_> .hi
16:34:39 <zodbot> scorreia_: Sorry, but user 'scorreia_' does not exist
16:34:58 <lucab> ok I'll start
16:35:11 <bgilbert> scorreia_: you can say ".hello2 your-FAS-account-name" if you have a FAS account
16:35:17 <lucab> #topic Action items from last meeting
16:35:26 <bgilbert> * ".hello FAS-account"
16:35:48 <lucab> lorbus and jlebon to reach out to the containers team to discuss what cri-o versions will be supported and how at the modular level to pull it off (context: https://github.com/coreos/fedora-coreos-tracker/issues/767)
16:35:50 <lucab> dustymabe to talk to sumantro to try to get FCOS testing on the week of oct 11
16:35:57 <lucab> dustymabe to schedule some time on Monday to identify docs test cases and prepare for F35 test day
16:36:00 <scorreia> .hello scorreia
16:36:01 <zodbot> scorreia: scorreia 'Sergio Correia' <scorreia@redhat.com>
16:36:32 <lucab> jlebon to add some design details to the proposal in the ticket (#676) and also reach out to dghubble to see if this can be handled in typhoon
16:36:45 <dustymabe> lucab: yep all of those were handled (see notes from last week's video meeting in the hackmd: https://hackmd.io/vMWlKGH5TAOsLKYiqce61Q?view)
16:36:56 <lucab> uhm, although the last meeting was actually a video one
16:36:57 <dustymabe> from last week I think the only tangible thing we had was...
16:37:05 <dustymabe> #info jlebon created #990 to track running kubernetes node e2e tests in our CI
16:37:07 <dustymabe> #link https://github.com/coreos/fedora-coreos-tracker/issues/990
16:37:28 <lucab> oh yes sorry, I just went directly to the meetbot archive
16:37:37 <dustymabe> +1 - yeah
16:37:52 <dustymabe> we might ought to try to force some creation of meeting ino in meetbot for video meetings
16:38:16 <dustymabe> we'll improve over time
16:38:29 <lucab> np, I only realized at the end of my copy-pasting
16:39:18 <lorbus> #info re k8s and cri-o: https://github.com/coreos/fedora-coreos-tracker/issues/767
16:39:19 <lucab> I actually had an action item myself, let me find it and link it
16:39:36 <lorbus> Copying Jonathan's comment over:... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/972e3e6feb0ec3739c2c35fdc478ba6f7c5c5d53)
16:40:06 <lorbus> (I hope all clients can see properly what I pasted)
16:40:22 <dustymabe> it's a link - but yeah, it's clickable
16:40:41 <lucab> #info FCOS auto-updates are now disabled on k8s e2e CI https://github.com/kubernetes/test-infra/pull/23926
16:42:07 <lucab> ok, I think that was all
16:43:05 <lucab> I'll pick the EFI FS ticket first
16:43:14 <dustymabe> thanks for doing that lucab
16:43:18 <lucab> #topic Change EFI-System partition format from fat16 to fat32
16:43:37 <lucab> #link https://github.com/coreos/fedora-coreos-tracker/issues/993
16:43:45 <lucab> bgilbert: ^^^
16:44:00 <bgilbert> so, the UEFI spec technically requires that the EFI System Partition (ESP) use FAT32.  ours uses FAT16.
16:44:40 <bgilbert> most firmware doesn't care, though.  I'm aware of two pieces of hardware that have failed to boot with a FAT16 ESP:
16:45:10 <bgilbert> an ASUS laptop in https://github.com/coreos/bugs/issues/2246, and a Raspberry Pi with older firmware in https://github.com/coreos/fedora-coreos-tracker/issues/993.
16:45:19 <bgilbert> presumably there are others that go unreported.
16:45:42 <bgilbert> "it's a one-line fix", except it turns out it isn't.
16:46:24 <pjones> does seem to imply either a) they've written their own fat driver (like lunatics) or b) they're using a really, really old binary fat driver from EDK I (like lunatics)
16:46:25 <bgilbert> a filesystem is defined to be FAT12/FAT16/FAT32 based on the number of clusters in the filesystem, which is related to filesystem size.
16:46:40 * bgilbert waves at pjones
16:47:15 <bgilbert> our disk image uses a 127 MB ESP, but on 4K-sector disks, the minimum FS size for FAT32 is 257 MB.
16:47:22 <pjones> I could swear I fixed the default for this in mkfs.vfat like... 10 years ago
16:47:36 <pjones> ah, right, it can't do it in that case.
16:48:05 <bgilbert> so fixing this completely requires a partition layout change, which is difficult because the layout is hardcoded in Butane to handle the boot-disk RAID case.
16:48:32 <bgilbert> we can do it, but it's non-trivial.  so the question is what strategy we want to use here.
16:48:46 <bgilbert> pjones, I'd be very interested in your thoughts.  how much effort is it worth to switch the ESP to FAT32?
16:50:47 <dustymabe> bgilbert: if we were to bump size, we'd want to do it everywhere?
16:51:25 <bgilbert> we could do it for only the 512b image.  but then in the boot-disk-RAID case, the ESP would be reverted to FAT16.
16:51:28 <dustymabe> and what would the size be? `257 MB` or would we go higher to a nicer round number like 512?
16:51:41 <lucab> I know you are going to hate this because of the Butane pain, but a larger ESP (uniform for all images) would be a good thing IMHO
16:51:44 <bgilbert> I don't think we need our ESP to be large, so I'd be inclined to go for the minimum
16:52:05 <bgilbert> Butane can't split the difference and select FAT16/FAT32 at runtime because it doesn't know enough
16:53:22 <dustymabe> so.. I guess some of this depends..
16:53:34 <bgilbert> lucab: do you have a use case in mind?
16:53:38 <travier> Is anything here a breaking change?
16:53:41 <lucab> rationale is, I do expect sooner or later somebody to come up with a usecase where they hit the ceiling at 127MiB
16:53:48 <dustymabe> how much of a problem we really think it is.. and how painful changing the sizes $everywhere would be
16:53:53 <bgilbert> travier: kinda-sorta?
16:54:20 <lucab> bgilbert: not a sane one and specific right now, no. Mostly gut feeling.
16:54:28 <pjones> bgilbert: I guess it depends how much you care about the devices you've found?  "there are presumably more" is true but also they're probably similar classes of devices as the ones you've already found.
16:54:33 <dustymabe> yeah if we change the partition sizes I imagine the "re-install over previous install of FCOS and don't overwrite some FS" use case would be affected
16:54:38 <bgilbert> there's the issue of version skew between Butane and cosa, which isn't breaking so much as surprising/inconsistent.  but in principle, partition layouts in user Ignition configs could make assumptions about where the rootfs starts
16:54:42 <travier> Could we do a "manual" workaround as a butane config option?
16:55:24 <travier> that we would flit to be the default the first time we do a truly breaking change?
16:55:46 <travier> like "ESP: FAT32" (FAT16 being the default
16:55:47 <bgilbert> ahh, yes, https://docs.fedoraproject.org/en-US/fedora-coreos/storage/ requires the _rootfs_ to be 8 GiB or larger
16:55:48 <travier> )
16:56:48 <bgilbert> so the expansion buffer we've defined would not be used to cover this case
16:57:31 <travier> (not a butane config option but a butane spec entry)
16:57:45 <bgilbert> travier: there's a "layout" field; we could define e.g. "layout: x86_64_fat32"
16:58:02 <travier> my bad, we can not do that as we need this to work *before* we boot up
16:58:07 <bgilbert> boot_device.layout
16:59:16 <bgilbert> I think we should consider doing nothing
16:59:20 <bgilbert> or, as a middle ground, doing nothing for now
16:59:34 <dustymabe> yeah, "doing nothing" is an option
16:59:45 <bgilbert> for the specific case of the Pi, it sounds like a firmware upgrade will avoid the issue
16:59:52 <travier> Is there an easy  manual workaround for people hitting this issue?
17:00:09 <bgilbert> we've gone multiple years without any reports, and when this happened in Container Linux, it also went multiple years without reports
17:00:16 <travier> If there is one, documenting it could be enough for now
17:00:56 <bgilbert> travier: booting in BIOS mode, or copying out the contents of the ESP + reformatting FAT32 + copying back.  both of those only work on 512b sectors.
17:01:24 <bgilbert> the middle-ground option is: make a note of this, and fix it next time we have to redo the partition layout :-P
17:01:31 <bgilbert> s/the/a/
17:01:33 <dustymabe> bgilbert: I think it's fine if we wnat to do nothing for now, but maybe nice to 1. document what we would do if we did want to fix the problem (requires discussion) 2. add docs for working around the problem
17:02:06 <jaimelm> ^^
17:02:22 <travier> If it's "just" a "dd partitions and re-create one larger and dd back" then this is a reasonable workaround
17:02:29 <lucab> I do expect the RPi case to be common enough to justify a short FAQ with "update your FW first"
17:02:38 <bgilbert> travier: filesystem-level copy, not dd
17:02:46 <dustymabe> lucab: i'm actually working on that
17:02:49 <bgilbert> lucab: +1
17:02:50 <travier> copy in case of the ESP one instead of dd but dd for the rootfs?
17:03:00 <lucab> dustymabe++
17:03:00 <zodbot> lucab: Karma for dustymabe changed to 9 (for the current release cycle):  https://badges.fedoraproject.org/tags/cookie/any
17:03:14 <jaimelm> dustymabe: +1
17:03:29 <dustymabe> i've got some draft notes, but trying to get them polished and a PR submitted to the docs
17:03:30 <jaimelm> That would be a good addition
17:03:36 <bgilbert> travier: I wouldn't be surprised if no one ever hits this for the 4K-sector case
17:03:59 <lucab> technically, are we breaking the EFI spec? Or is it a case of firmwares blindly assuming FAT32?
17:04:10 <bgilbert> lucab: I _believe_ we are technically breaking the spec
17:04:11 <dustymabe> bgilbert: i.e. one option is to fat32 on 512b and fat16 on 4kn ?
17:04:30 <bgilbert> dustymabe: no, I'm saying I think we only need to document a workaround for 512b
17:04:35 <lucab> bgilbert: for the removable vs non-removable point?
17:04:40 <bgilbert> lucab: yes
17:04:45 <travier> I'm +1 for documenting, providing a workaround and noting that somewhere for the future if we ever change the layout
17:05:10 <dustymabe> dustymabe: right, but i'm saying - if you think no one will ever hit this on 4kn, then maybe "fat32 on 512b and fat16 on 4kn" would be an option
17:05:21 <bgilbert> dustymabe: changing 512b to FAT32 for everyone means the Butane RAID case will change it back, which seems weird and obscure
17:05:26 <jaimelm> It would be nice if down the road FCOS wasn't "technically" the spec. So, a set time to revisit might be good.
17:05:49 <dustymabe> bgilbert: but we can update that in a follow up, right?
17:05:57 <bgilbert> dustymabe: how do you mean?
17:06:08 <dustymabe> can we not update the butane RAID definition?
17:06:23 <travier> If we make a change, I think we should change everything at once
17:06:26 <bgilbert> Butane doesn't know whether the target disk is 512b or 4Kn
17:06:32 <dustymabe> i see
17:06:33 <bgilbert> and we can't ask the user because the user may not know either
17:06:39 <jaimelm> true
17:06:40 <travier> (but I don't think it's worth changing for that)
17:06:52 <dustymabe> ok, yeah. that foils the plan
17:07:02 <dustymabe> otherwise would have been a nice compromise
17:07:05 <dustymabe> :)
17:07:16 <bgilbert> yeah, it's a messy situation
17:07:30 <dustymabe> ok so.. 1. let's do nothing for now and document how to get out of the situation if it's hit
17:07:40 <pjones> I don't think you're technically breaking the spec, but that's because you're not producing an artifact that the spec speaks to directly.
17:07:51 <dustymabe> 2. let's agree that if we were to change it we'd bump the ESP size everywhere to XYZ MiB
17:08:14 <bgilbert> pjones: could you expand on that?
17:08:28 <bgilbert> I noticed that the spec language was careful not to forbid FAT12/16, just to declare it out of scope
17:08:32 <lucab> dustymabe: yes, I think that's the sanest plan right now
17:08:35 <pjones> bgilbert: the spec says what the firmware should do, and provides interfaces to talk to it.
17:08:53 <pjones> bgilbert: so this could be a thing a specific firmware doesn't support, but there's no /violation/ either way
17:08:59 <bgilbert> got it
17:09:21 <pjones> (although 2.8 does say "EFI encompasses the use of FAT32 for a system partition, and FAT12 or FAT16 for removable media." in section 13.3)
17:09:31 <jaimelm> heh
17:09:33 <pjones> I would not assume this part of the spec is, erm, very well written.
17:10:04 <bgilbert> yeah, "encompasses" is what I was referring to ^.  I read it as 'FAT12/16 is not contemplated but not forbidden'.
17:10:29 <pjones> Oh, no there's another bit that's normative about it: "The EFI firmware must support the FAT32, FAT16, and FAT12 variants of the EFI file system. What variant of EFI FAT to use is defined by the size of the media."
17:11:21 <pjones> so those firmwares are broken with respect to that (though a careful reading of chapter two may tell you this is all optional...)
17:11:27 <bgilbert> pjones: (rules lawyering) those aren't in conflict, though, right?  "must support" because of removable media, and naturally a FAT32 ESP must be large enough to use FAT32.
17:11:35 <pjones> righty
17:12:06 <pjones> I also don't know the last time I saw removable media that was smaller than 260MB, but... I mean sure.
17:12:24 <bgilbert> heh
17:12:42 * jaimelm pulls out a 100MB zip drive
17:12:50 <pjones> yeah, it's been a while
17:13:01 <bgilbert> jaimelm: <click>
17:13:04 <jaimelm> LOL
17:13:10 <jaimelm> I was literally just going to type that
17:13:18 <bgilbert> :-D
17:13:31 <lucab> sorry but I'll try to close the topic here
17:13:42 <bgilbert> +1 to dustymabe's proposal
17:13:49 <jaimelm> +1
17:14:01 <lucab> I think dustymabe basically proposed to do nothing now and agree on what to possibly do in the future
17:14:13 <jaimelm> and docs
17:14:38 <lucab> which would be uniformly enlarge ESP so that we get FAT32 everywhere
17:14:50 <bgilbert> if we want to define the target size now, I propose 257 MB (or 260 if we want; that's what MS recommends).  but I'm okay deferring details until we get there.
17:15:34 <dustymabe> 260M seems reasonable
17:16:02 <lucab> I think "at least minimum to meet FAT32 according to spec" would do for now, we'll likely discover some new fun quirk at that point
17:16:25 <travier> +1
17:17:03 <bgilbert> #proposed For now, we will document workarounds for systems that can't boot from FAT16, including older Raspberry Pi firmware.  Next time we change the partition layout, we will switch to FAT32 in both the 512b and 4Kn disks, selecting an ESP size of at least 257 MB to fit a valid FAT32 filesystem.
17:17:14 <bgilbert> s/disks/images/
17:17:17 <travier> +1
17:17:17 <lucab> +!
17:17:26 <jaimelm> +1
17:17:27 <lucab> +1
17:17:46 <jaimelm> well done
17:18:20 <lucab> dustymabe: self-action the doc writing?
17:18:25 <bgilbert> dustymabe?
17:20:02 <dustymabe> oops sorry
17:20:04 <lucab> bgilbert: I think we lost him, but I'd say there is overall consensus
17:20:19 <dustymabe> #action dustymabe to write docs for rpi4 including updating eeprom
17:20:26 <bgilbert> #agreed For now, we will document workarounds for systems that can't boot from a FAT16 ESP, including older Raspberry Pi firmware.  Next time we change the partition layout, we will switch the ESP to FAT32 in both the 512b and 4Kn images, selecting a size of at least 257 MB to fit a valid FAT32 filesystem.
17:20:29 <jlebon> .hello2
17:20:30 <bgilbert> ^ minor copyediting
17:20:30 <zodbot> jlebon: jlebon 'None' <jonathan@jlebon.com>
17:20:37 <dustymabe> I think we need a more generic FAQ entry for this, though and would prefer not to write that
17:20:39 <lucab> #chair jlebon
17:20:39 <zodbot> Current chairs: bgilbert dustymabe jaimelm jbrooks jlebon lorbus lucab ravanelli travier
17:20:45 <bgilbert> thanks for the input, pjones!
17:20:54 <pjones> hope it was helpful
17:21:06 <bgilbert> absolutely
17:21:31 <dustymabe> anybody else want to sign up for writing a FAW entry for this? including steps on how to convert to FAT32 if needed?
17:21:42 <dustymabe> also does everyone agree we need this ^^
17:21:48 <dustymabe> FAQ*
17:22:18 <lucab> dustymabe: I don't have the storage docs open now, but we can just directly put a NOTE there
17:22:20 <bgilbert> I think it'll be difficult for a user to know that the case applies to them
17:22:26 <bgilbert> agree with putting a NOTE in /storage/
17:22:40 <bgilbert> the FAQ is kind of a wasteland
17:22:41 <jaimelm> bgilbert: that's true
17:22:54 <dustymabe> i think of it more of a "user opens issue, we point them at FAQ" workflow
17:22:56 <lucab> bgilbert: I'll leave the wording of that to you
17:23:10 <jaimelm> yeah, I was starting on my 3rd party repo FAQ entry and saw how unorganized it is.
17:23:14 <dustymabe> I really doubt they'll find this in the "storage" docs.. there system just won't boot and they'll open an issue
17:23:14 <bgilbert> lucab: heh
17:23:14 <travier> pjones++
17:23:36 <bgilbert> #action bgilbert to write docs for switching to FAT32
17:23:44 <jaimelm> categories would be nice
17:23:50 <dustymabe> bgilbert++
17:23:57 <bgilbert> thanks for the discussion, all!
17:24:19 <bgilbert> jaimelm: a bunch of entries should probably be dropped
17:24:27 <lucab> ok it went a bit longer than expect
17:24:37 <jaimelm> it was good discussion though
17:24:37 <bgilbert> jaimelm: xref https://github.com/coreos/fedora-coreos-docs/issues/292
17:25:00 <lucab> do we want to touch another ticket (quickly)?
17:25:11 <dustymabe> probably out of time
17:25:14 <travier> I don't think we have time
17:25:15 <dustymabe> open floor?
17:25:20 <jaimelm> bgilbert: yeah
17:25:24 <lucab> yes, same feeling
17:25:51 <lucab> #topic Open floor
17:25:57 <bgilbert> for visibility: I think we may have a path forward on VirtualBox support
17:25:57 <dustymabe> #info Fedora CoreOS test day/week happening this week - please help execute test cases especially if you have access to obscure hardware platforms https://testdays.fedoraproject.org/events/122
17:26:00 <bgilbert> #link https://github.com/coreos/ignition/pull/1269
17:26:26 <dustymabe> bgilbert: nice
17:26:41 <scorreia> ueno, travier: looks like there is not enough time today, so can we discuss this one next meeting, perhaps?
17:26:44 <jaimelm> I'll be testing on vSphere and an old Mac Pro trash can tomorrow :)
17:26:46 <scorreia> https://github.com/coreos/fedora-coreos-tracker/issues/982
17:27:15 <travier> scorreia: whoops I'm really sorry, had not realized you where there
17:27:25 <dustymabe> scorreia: yep - we can make sure we get to it next time - sorry to have wasted your time
17:27:30 <travier> yes, let's make sure we discuss it next time
17:27:49 <dustymabe> bgilbert: does this pave way for vagrant support at all - was thinking on that a bit over the weekend
17:27:51 <lucab> same here sorry, I could have started with that otherwise
17:28:22 <scorreia> No worries :)
17:28:25 <bgilbert> dustymabe: in principle, yes.  I think we should have a discussion about whether we want to support Vagrant.
17:28:40 <dustymabe> fair
17:28:52 <dustymabe> probably for another day
17:28:55 <bgilbert> yup
17:28:57 <lucab> scorreia: personally, I'm missing the "at which point of the boot should this run" part
17:30:29 <jaimelm> can we extend to be respectful to the guest? Does anyone need to be anywhere?
17:30:54 <dustymabe> i can stay
17:31:13 <scorreia> actually  I gotta leave. lucab: sure, I will check with ueno to address that before the discussion.
17:31:16 <lucab> scorreia: because if the goal is to run this through a systemd unit after network bootup, it may as well be run through a privileged podman
17:31:42 <jaimelm> scorreia: next time for sure. maybe the meeting will open with it to not make you wait.
17:32:05 <scorreia> alright, that sounds great. thanks everyone
17:32:06 <lucab> (this assumes the agent is tweaked so that it can comfortably run from within a container first)
17:32:18 <lucab> scorreia: np, I'll also note that in the ticket
17:33:00 <lucab> ok, I don't see anything else raised, closing now
17:33:19 <lucab> (I'll send notes/followups tomorrow)
17:33:19 <lucab> #endmeeting