fedora_coreos_meeting
LOGS
16:31:38 <travier> #startmeeting fedora_coreos_meeting
16:31:38 <zodbot> Meeting started Wed Jan 27 16:31:38 2021 UTC.
16:31:38 <zodbot> This meeting is logged and archived in a public location.
16:31:38 <zodbot> The chair is travier. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:31:38 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
16:31:38 <zodbot> The meeting name has been set to 'fedora_coreos_meeting'
16:31:48 <travier> #topic roll call
16:31:52 <lucab> .hello2
16:31:53 <zodbot> lucab: lucab 'Luca Bruno' <lucab@redhat.com>
16:32:01 <travier> .hello siosm
16:32:02 <zodbot> travier: siosm 'TimothΓ©e Ravier' <travier@redhat.com>
16:32:05 <jlebon> .hello2
16:32:06 <zodbot> jlebon: jlebon 'None' <jonathan@jlebon.com>
16:32:21 <PanGoat> .hello jaimelm
16:32:22 <zodbot> PanGoat: jaimelm 'Jaime Magiera' <jaimelm@umich.edu>
16:32:35 <travier> jlebon: Hello 'None' πŸ˜…
16:33:28 <jlebon> travier: :)  i've given up trying to figure that out
16:34:20 <travier> #chair jlebon lucab
16:34:20 <zodbot> Current chairs: jlebon lucab travier
16:34:29 <travier> #chair PanGoat
16:34:29 <zodbot> Current chairs: PanGoat jlebon lucab travier
16:34:38 <jbrooks> .hello jasonbrooks
16:34:39 <zodbot> jbrooks: jasonbrooks 'Jason Brooks' <jbrooks@redhat.com>
16:34:43 <travier> #chair jbrooks
16:34:43 <zodbot> Current chairs: PanGoat jbrooks jlebon lucab travier
16:35:31 <cyberpear> .hello2
16:35:32 <zodbot> cyberpear: cyberpear 'James Cassell' <fedoraproject@cyberpear.com>
16:35:42 <travier> #chair cyberpear
16:35:42 <zodbot> Current chairs: PanGoat cyberpear jbrooks jlebon lucab travier
16:35:49 <dustymabe> .hello2
16:35:50 <zodbot> dustymabe: dustymabe 'Dusty Mabe' <dusty@dustymabe.com>
16:35:56 <travier> #chair dustymabe
16:35:56 <zodbot> Current chairs: PanGoat cyberpear dustymabe jbrooks jlebon lucab travier
16:36:48 <travier> #topic Action items from last meeting
16:37:14 <travier> I did not find any action item in the last meeting. Is this correct?
16:37:24 <PanGoat> Yeah, he didn't add any
16:37:49 <travier> πŸ‘
16:38:28 <travier> Let's move to meeting tickets
16:38:47 <travier> #topic sudo in FCOS allowing local privilege escalation (CVE-2021-3156)
16:38:49 <jlebon> hmm, some people might've dropped. guess we'll see :)
16:39:01 <dustymabe> πŸ‘‹ - still here
16:39:07 <jlebon> i can speak to that one
16:39:15 <travier> jlebon: Am I moving too fast?
16:39:25 <jlebon> travier: no it's good :)
16:39:44 <travier> #link https://github.com/coreos/fedora-coreos-tracker/issues/725
16:39:57 <PanGoat> Do we have a mechanism to let people know that the issue is known and that we're waiting for upstream?
16:40:02 <PanGoat> (other than looking at issues)
16:40:03 <jlebon> TL;DR: a recent sudo CVE came out which allows any unprivileged user to become root
16:40:18 <jlebon> the fixed sudo is in the fedora repos already
16:40:33 <PanGoat> FCOS blog might be a good idea.
16:40:48 <travier> PanGoat: is there an FCOS blog?
16:40:52 <PanGoat> Many folks don't scan issues
16:40:58 <jlebon> so we need to decide whether to do an async release or just wait until releases next week
16:40:59 <PanGoat> travier: I'm saying there could be one
16:41:24 <jlebon> there isn't an FCOS blog. we try to post to fedmag instead
16:41:44 <PanGoat> How easy is it to get something up there?
16:41:57 <dustymabe> I'd say if it's worthy of a blog, it's worthy of a spin
16:42:26 <jlebon> i also lean towards async release personally
16:42:45 <travier> dustymabe: kind of agree with that. I don't think the vuln is really important for us but it's a bad one for those impacted
16:42:59 <jlebon> FWIW, for OCP they're planning to just roll it into the regular releases
16:43:02 <travier> no obvious workaround
16:43:02 <PanGoat> Communication is key particulary for folks using FCOS because of OKD.
16:43:06 <lucab> I don't think this warrants a fedmag post. The CVE is widely discussed already and it's a LPE (like many other) unlikely in usual FCOS usages
16:43:22 <travier> OKD is not really impacted just like OCP
16:43:55 <travier> This only impacts people using FCOS as a classic host, not part of a kube cluster
16:43:56 <jlebon> i wish we could get a heads up on these so we can also respin in time if we decide to
16:44:15 <lucab> jlebon: I also think an async release would be good, so that people will get the rollouts during the weekend
16:44:28 <PanGoat> ^^
16:45:12 <jlebon> ok cool, looks like there's agreement overall
16:45:20 <travier> OK, so let's schedule an async release?
16:45:24 <lucab> I think this will also cover the dnsmasq CVEs
16:46:08 <jlebon> travier: yeah, let's do that. we can discuss ownership after the meeting
16:46:29 <jlebon> lucab: link?
16:47:10 <travier> Good for next topic?
16:47:39 <lucab> jlebon: https://bodhi.fedoraproject.org/updates/FEDORA-2021-84440e87ba
16:47:39 <jlebon> travier: go for it :)
16:47:42 <travier> #topic 2021-02-03: gather status update for Fedora Council
16:47:52 <travier> #link https://github.com/coreos/fedora-coreos-tracker/issues/710
16:48:11 <travier> Oops
16:48:16 <travier> that's for next week?
16:48:19 <dustymabe> :)
16:48:20 <jlebon> lucab: thanks, yeah we should make sure we pick up that one too
16:48:26 <jlebon> travier: yeah, that one is special :)
16:48:41 <lucab> #link https://bodhi.fedoraproject.org/updates/FEDORA-2021-84440e87ba
16:48:49 <jlebon> i think we can go straight to open floor
16:49:05 <lucab> #link https://bodhi.fedoraproject.org/updates/FEDORA-2021-2cb63d912a
16:49:11 <dustymabe> I have a topic
16:49:19 <lucab> (so that we have both in minutes)
16:49:28 <dustymabe> travier: mind if I bring it up
16:49:34 <travier> dustymabe: go ahead!
16:49:43 <dustymabe> #topic cgroups v2 strategy
16:49:46 <dustymabe> #link https://github.com/coreos/fedora-coreos-tracker/issues/292
16:50:01 <dustymabe> I brought this up last week and it seems like we got some confirmation on the kubernetes support
16:50:44 <dustymabe> so kube 1.19+ has v2 support - are we good to press ahead for v2 in the switch to f34  (will have docker that supports v2)?
16:51:05 <travier> I think we are good
16:51:22 <jlebon> is OKD on 1.19 yet?
16:52:04 <lucab> jlebon: 4.6 is 1.19
16:52:05 <travier> 4.6 is 1.19
16:52:10 <PanGoat> 1.19.4
16:52:12 <jlebon> yup, just checked in a cluster as well :)
16:52:13 <lucab> travier: ^5
16:52:24 <jlebon> so yeah, i think we're good :)
16:52:35 <travier> lucab: ?
16:52:39 <dustymabe> OKD might decide to stick on v1 for some reason too (in the install they can do whatever they like)
16:53:20 * cyberpear hops in from webchat since IRCCloud disconnected
16:53:23 <travier> Go for cgroups v2 for F34?
16:53:25 <jlebon> to be safe, we should probably just do it for new installs
16:53:33 <dustymabe> jlebon: I *think* I agree
16:53:44 <dustymabe> but this brings up a point which we should probably go over
16:54:03 <jlebon> which is good because we're currently on v1 via a karg, which is a pain to manage post-firstboot
16:54:08 * dustymabe hasn't been keeping up, but where are we on the "manipulating kargs easily via Ignition/FCCT"?
16:54:12 <travier> This will be only for new installs unless we manually change the BLS
16:54:48 <jlebon> dustymabe: i think arithx has started some early work, but current state of the art is still "`rpm-ostree kargs` via systemd unit"
16:54:49 <travier> dustymabe: the source of truth for kargs is the BLS config stored in /boot.
16:55:10 <lucab> not sure that it is wise to keep old notes at v1
16:55:40 <dustymabe> yeah, I imagine it would be good if we could pair this change with the enhanced ease of use of settings kargs from Ignition/FCCT.
16:55:47 <travier> If we do automatic migration we need to check if podman containers created with v1 works without action with v2
16:55:56 <jlebon> lucab: can you expand?
16:56:49 <lucab> jlebon: it splits the fleet of nodes into "scenarios our CI covers" and "things that exist but are invisible to our tests"
16:57:59 <dustymabe> I get both sides of the argument
16:58:02 <jlebon> lucab: i think we should test v1 too if we say it's supported
16:58:32 <PanGoat> +1 Supported implies tested
16:58:48 <lucab> jlebon: that would mean a straight 2x of all tests, no?
16:59:12 <travier> lucab: I would say only for containers tests?
16:59:14 <jlebon> hmm why rerun everything? i would think just a few, e.g. podman-related tests
16:59:33 <dustymabe> we can probably be a little more strategic than 2x, but yeah, we'd probably need to run the full suite of tests periodicially
16:59:46 <travier> I'm not a fan either of splitting the fleet but we have to confirm that moving from 1 to 2 will not cause issues
17:00:23 <dustymabe> travier: the second part is what I don't know. Messing it up won't be good or look good for us.
17:01:03 <dustymabe> i guess we can always ask some "experts" who know more.
17:01:51 <travier> I vaguely remember cleaning up all my containers for the switch on Fedora :/
17:02:08 <travier> But that might have been due to a runc -> crun switch at the same time
17:03:14 <dustymabe> #proposed now that kubernetes 1.19+ and the docker available in f34+ support cgroups v2 we will switch FCOS to default to cgroups v2. Currently our plan is to default to it on new nodes and leave existing upgraded nodes on whatever cgroups version they have been running on. If we consult cgroups experts and they tell us that is not necessary and migrating everyone should be fine, then
17:03:15 <dustymabe> we'll consider migrating nodes.
17:04:11 <dustymabe> does that reflect the current conversation? ^^
17:04:20 <PanGoat> yes
17:04:20 <jlebon> ack
17:04:36 <lucab> sidenote, there is always the nuclear option of cutting a stream with a dead-end and starting fresh after that
17:05:02 <jlebon> i think there's a discussion to be had re. the risk of doing a migration if we want to support v1 anyway, but we can discuss that upstream
17:05:30 <dustymabe> yeah, part of this did make me think about a proposal some time ago that limited how old a node could be and forced someone to reprovision it (which would allow us to not support everything forever)
17:06:23 <dustymabe> any opposed to proposed?
17:06:25 <PanGoat> oooo
17:06:33 <PanGoat> That would be interesting
17:06:39 <jlebon> is supporting just v2 on the table? if so, that simplifies things a lot :)
17:07:09 <dustymabe> how would we enforce that? other than just by stating it?
17:07:10 <jlebon> dustymabe: +1 from me
17:07:21 <jlebon> right, exactly
17:07:28 <travier> +1 for me
17:08:03 <jlebon> dustymabe: we can discuss that with cgroups SMEs as well
17:08:28 <dustymabe> #agreed now that kubernetes 1.19+ and the docker available in f34+ support cgroups v2 we will switch FCOS to default to cgroups v2. Currently our plan is to default to it on new nodes and leave existing upgraded nodes on whatever cgroups version they have been running on. If we consult cgroups experts and they tell us that is not necessary and migrating everyone should be fine, then we'll
17:08:30 <dustymabe> consider migrating nodes.
17:09:12 <dustymabe> anyone want to take the action item to get more info from people who know
17:09:55 <PanGoat> who do we consider experts?
17:10:04 <dustymabe> any names of people we could/should reach out to?
17:10:06 <dustymabe> PanGoat: :)
17:10:23 <travier> podman team?
17:10:32 <travier> docker team?
17:10:41 <dustymabe> giuseppe knows a lot about cgroups v2 - might be good to pull in some kernel people
17:11:23 <dustymabe> jlebon: is that rawhide stream up and running?
17:11:45 <dustymabe> we could switch that over and see what breaks
17:11:59 <jlebon> dustymabe: more or less. hit a snag, but should be fixed in next compose
17:12:08 <jlebon> yeah, we can do that
17:12:32 <travier> well, the issue is not so much for fresh systems but for existing containers on an updated system
17:12:43 * dustymabe hands meeting gavel back over to travier
17:13:05 <travier> πŸ˜‚
17:13:05 <jlebon> if we add the systemd service that does the removal, upgrade tests will cover this
17:13:26 <jlebon> we could make it a rawhide-only thing for now
17:13:51 <jlebon> well... not exactly. upgrade tests currently don't run e.g. container tests after reboot
17:13:55 <jlebon> but they could :)
17:14:02 <travier> :)
17:14:15 <lucab> travier: you mean existing exited container? or having to adjust the service running new ones?
17:14:55 <travier> We would need a test that creates "persistent" containers, then switch to v2 and re-runs the containers
17:15:24 <dustymabe> podman create; upgrade; reboot; podman start ?
17:15:36 <travier> lucab: I'd say we have to make sure that existing containers keep working after the switch
17:15:43 <dustymabe> yes!
17:16:01 <dustymabe> automatic upgrades, hopefully not breakage
17:16:06 <dustymabe> no*
17:17:25 <jlebon> shall we move to open floor?
17:17:32 <travier> I think we have covered the topic
17:17:35 <travier> #topic Open Floor
17:17:44 <PanGoat> I've got a short one
17:17:56 <PanGoat> #topic user systemd units
17:18:07 <travier> oups sorry
17:18:37 <travier> PanGoat: go ahead :)
17:18:46 <PanGoat> When I was writing up that short example, I noticed how much a cludge doing a user unit was.
17:19:33 <PanGoat> I haven't looked at the ignition code, but what are folks thoughts on adding the ability to directly define a user unit via the systemd object?
17:20:23 <jlebon> i'd say it's worth an Ignition github ticket
17:20:29 <travier> Maybe that should be FCCT suger?
17:20:31 <travier> sugar*
17:20:43 <lucab> PanGoat: is that https://github.com/coreos/fcct/issues/194 ?
17:21:06 <jlebon> lucab: i think PanGoat is saying expanding the spec
17:21:07 <PanGoat> for reference https://github.com/coreos/fedora-coreos-docs/blob/master/modules/ROOT/pages/tutorial-user-systemd-unit-on-boot.adoc
17:21:34 <PanGoat> There are a couple ways to get there
17:21:51 <PanGoat> Just wondering what folks think in general
17:22:45 <PanGoat> lucab: yeah, that's an example of the annoyance
17:23:36 <dustymabe> I don't think I have a strong opinion
17:24:02 <dustymabe> seems like a reasonable request, I guess it depends on how much complication it would add to the Ignition spec to do that
17:24:12 <lucab> I agree the FCCT sugar would be helpful. I can't see all the implications right now, but I wouldn't touch Ignition unless we find a blocker
17:24:37 <PanGoat> It's not a big issue, but something that seems like will become a bigger annoyance as more folks try to use FCOS.
17:26:05 <PanGoat> anyway, something to think about
17:26:18 <dustymabe> PanGoat: maybe add supporting comments in https://github.com/coreos/fcct/issues/194 ?
17:26:19 <jlebon> it's tricky, because user systemd is useful for doing systemd-y things unpriv, but if you're provisioning via Ignition anyway, you can just as easily use User=
17:26:55 <jlebon> of course, the semantics are different...
17:26:56 <PanGoat> dustymabe: yeah
17:27:18 <PanGoat> OK, nothing more from me on this.
17:27:36 <travier> +1 for more context on the existing issue so that we can grasp what's needed
17:27:44 <dustymabe> honestly I kind of lik the `docker run` with the option to restart this container whenever the system starts
17:28:07 <dustymabe> the systemd unit file management gets hairy. podman generate systemd helps, but still hairy
17:28:39 <travier> #topic Open Floor
17:28:52 <travier> Back to open floor for the last 2 minutes :)
17:29:31 <dustymabe> is devconf this week?
17:29:42 * dustymabe is so out of the loop
17:29:57 <travier> February 18-20, 2021
17:30:06 <dustymabe> ahh, cool
17:30:08 <dustymabe> thanks :)
17:30:36 <lucab> next week I think mostly of us have meeting conflicting with this timeslot
17:30:47 <PanGoat> Who from here is going to be rep'ing?
17:31:24 <travier> PanGoat: rep'ing?
17:31:33 <PanGoat> representing
17:31:38 <travier> at DevConf?
17:31:54 <travier> We have an FCOS workshop scheduled
17:32:00 <PanGoat> OK, great
17:32:19 <travier> Should we cancel next week meeting?
17:32:52 <jlebon> lucab: indeed, good catch
17:32:55 <dustymabe> I think that was the meeting for us to put together the upate for the council
17:33:14 <dustymabe> so we might want to do that async if not during the meeting
17:33:16 <PanGoat> that can be done offline, no?
17:33:28 <dustymabe> yep, the important part is that it gets done
17:33:38 <jlebon> dustymabe: hmm, i think it can wait a week no? last time, my PR took a while to merge :)
17:33:52 <dustymabe> yeah, as long as bcotton_ is good, i'm good
17:34:10 <jlebon> +1 ack
17:34:18 <dustymabe> one last thing.. anything for us to do for "Fedora Changes"
17:34:24 <dustymabe> for f34?
17:34:44 <dustymabe> on both sides - any to file? any to review?
17:35:05 <jlebon> filing deadline has passed i think
17:35:20 <travier> https://fedoraproject.org/wiki/Releases/34/ChangeSet#Enable_systemd-oomd_by_default_for_all_variants
17:35:37 <jlebon> i did some initial review in https://github.com/coreos/fedora-coreos-tracker/issues/704, but would appreciate more eyes
17:36:38 <dustymabe> +1
17:37:06 <dustymabe> thanks travier for running the meeting
17:37:25 <bcotton> yeah, i think we may stop doing the council updates, since FCOS is about the only group that is submitting those on the regular
17:37:29 <travier> dustymabe: πŸ‘
17:37:39 <lucab> yay for us I guess?
17:37:43 <travier> :)
17:38:14 <jlebon> FCOS ^5!
17:38:16 <dustymabe> bcotton: I think they are useful even for us, nice to have a succinct set of things written down that we've been working on periodically
17:38:43 <PanGoat> It's disappointing in a way. Having consolodated updates is nice.
17:38:56 <PanGoat> consolidated*
17:39:25 * dustymabe has to run - nice seeing everyone!
17:40:09 <jlebon> travier: want to end the meeting?
17:40:15 <jlebon> dustymabe: see ya!
17:40:16 <travier> #endmeeting