fedora_coreos_meeting
LOGS
16:30:50 <dustymabe> #startmeeting fedora_coreos_meeting
16:30:50 <zodbot> Meeting started Wed Feb 26 16:30:50 2020 UTC.
16:30:50 <zodbot> This meeting is logged and archived in a public location.
16:30:50 <zodbot> The chair is dustymabe. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:30:50 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
16:30:50 <zodbot> The meeting name has been set to 'fedora_coreos_meeting'
16:30:54 <dustymabe> #topic roll call
16:30:58 <bgilbert> .hello2
16:30:59 <zodbot> bgilbert: bgilbert 'Benjamin Gilbert' <bgilbert@backtick.net>
16:31:01 <mnguyen_> .hello mnguyen
16:31:02 <zodbot> mnguyen_: mnguyen 'Michael Nguyen' <mnguyen@redhat.com>
16:31:05 <dustymabe> .hello2
16:31:06 <zodbot> dustymabe: dustymabe 'Dusty Mabe' <dusty@dustymabe.com>
16:31:34 <miabbott> .hello2
16:31:35 <zodbot> miabbott: miabbott 'Micah Abbott' <miabbott@redhat.com>
16:31:38 <cyberpear> .hello2
16:31:39 <zodbot> cyberpear: cyberpear 'James Cassell' <fedoraproject@cyberpear.com>
16:31:52 <ksinny> .hello sinnykumari
16:31:52 <zodbot> ksinny: sinnykumari 'Sinny Kumari' <ksinny@gmail.com>
16:32:43 <jlebon> .hello2
16:32:44 <zodbot> jlebon: jlebon 'None' <jonathan@jlebon.com>
16:32:44 <dustymabe> #chair mnguyen_ miabbott cyberpear ksinny darkmuggle jlebon
16:32:44 <zodbot> Current chairs: cyberpear darkmuggle dustymabe jlebon ksinny miabbott mnguyen_
16:33:09 <darkmuggle> .hello2
16:33:10 <zodbot> darkmuggle: darkmuggle 'None' <darkarts@utlemming.org>
16:33:53 <dustymabe> #chair bgilbert
16:33:53 <zodbot> Current chairs: bgilbert cyberpear darkmuggle dustymabe jlebon ksinny miabbott mnguyen_
16:33:58 <dustymabe> #topic Action items from last meeting
16:34:35 <dustymabe> ok only topic is the one about the infographic.. i've got that on my todo, i'm going to stop re-actioning it now as it's a reminder of my failure :)
16:34:38 <walters> .hello2
16:34:39 <zodbot> walters: walters 'Colin Walters' <walters@redhat.com>
16:34:43 <dustymabe> welcome walters
16:34:48 <dustymabe> #chair walters
16:34:48 <zodbot> Current chairs: bgilbert cyberpear darkmuggle dustymabe jlebon ksinny miabbott mnguyen_ walters
16:36:06 <dustymabe> #topic cosa/mantle integration
16:36:19 <dustymabe> #link https://github.com/coreos/coreos-assembler/issues/163
16:36:47 <jdoss> .hello2
16:36:48 <zodbot> jdoss: jdoss 'Joe Doss' <joe@solidadmin.com>
16:36:49 <cyberpear> +1 to shrink the 3G cosa container
16:37:15 <dustymabe> Our teams have been working with the coreos-assembler and mantle repositories to build and deliver Fedora CoreOS and Red Hat CoreOS
16:37:36 <dustymabe> mantle is a repo that was built up in the CoreOS Inc days to deliver container linux
16:37:53 <bgilbert> #chair jdoss
16:37:53 <zodbot> Current chairs: bgilbert cyberpear darkmuggle dustymabe jdoss jlebon ksinny miabbott mnguyen_ walters
16:38:01 <dustymabe> we've borrowed many things from it, but now that Container Linux is going away we've decided to evaluate merging COSA and mantle together
16:38:29 <dustymabe> here is a summary of a discussion the team had yesterday: https://github.com/coreos/coreos-assembler/issues/163#issuecomment-590995844
16:38:39 <jlebon> cyberpear: i don't think this will change the cosa img size much :)
16:38:58 <dustymabe> and there is a WIP PR to merge mantle into COSA: https://github.com/coreos/coreos-assembler/pull/1152
16:39:41 <dustymabe> This should help us with the "does this new feature belong in mantle or COSA" dilemma we've had
16:40:09 <dustymabe> anyone have and questions or comments or things to add to what I said above ?
16:40:40 <darkmuggle> The motivation is important highlight
16:40:52 * lorbus says whoops he's late
16:40:53 <lorbus> .hello2
16:40:54 <zodbot> lorbus: lorbus 'Christian Glombek' <cglombek@redhat.com>
16:41:00 <bgilbert> #chair lorbus
16:41:00 <zodbot> Current chairs: bgilbert cyberpear darkmuggle dustymabe jdoss jlebon ksinny lorbus miabbott mnguyen_ walters
16:41:24 <darkmuggle> We have been slouching towards a tight coupling and we realized that the divide between them is no longer useful to developers and users.
16:42:12 <dustymabe> #info we're evaluating merging the mantle repo into coreos-assembler as the line between the two is blurring. With container linux going away the need to have them sepearate is even less. See https://github.com/coreos/coreos-assembler/issues/163
16:42:25 <dustymabe> darkmuggle: agreed
16:42:46 * dustymabe waits briefly for more comments
16:44:04 <dustymabe> #topic Request for usbguard package inclusion
16:44:11 <dustymabe> #link https://github.com/coreos/fedora-coreos-tracker/issues/326
16:44:27 <dustymabe> ok, we've bounced this one back and forth a bit
16:45:22 <dustymabe> assuming it doesn't add a bunch of deps and doesn't run anything by default, should we include it so we can scratch this itch ?
16:46:37 <walters> I think every time we do that other projects (wireguard, openvswitch, kata, ...) are going to ask why we don't do the same for them since they need to support traditional systems too and don't want to maintain a container too
16:46:53 <walters> did anyone have opinions on my "semi-supported packages to layer" approach?
16:47:34 <dustymabe> walters: i've wanted something like that for a while
16:47:37 <cyberpear> I think it'd be good to have a list of "if you need these packages, it's okay to layer them rather than try containerizing them"
16:47:57 <dustymabe> except it'd be nice if it was just some sort of 'addon' you enabled
16:48:10 <walters> if we filtered the fedora repo down it would also likely cut the rpm-md size down from like 70MB to 1-2
16:48:11 <cyberpear> and maybe eventually have a "fat" version of the distro and a "skinny" version
16:48:19 <dustymabe> i.e. we have the base ostree.. and then we have ONE other layer that includes stuff that fits in this grey area
16:48:30 <jdoss> As a end user I have heard the "Don't add a layer or install things" because it will make automated updates not work? I am not sure how much that is true but it makes me not want to deal with it.
16:48:36 <jlebon> the issue with package layering isn't really "whitelisting" good ones as much as it's about changing the nature of automatic updates
16:48:50 <dustymabe> jlebon: my suggestion would help there I think
16:49:42 <jlebon> dustymabe: hmm, so e.g. we'd ship two OSTrees per stream each release?
16:49:54 <walters> it does in *some* cases introduce https://github.com/projectatomic/rpm-ostree/issues/415 - in others though, e.g. wireguard would be extremely unlikely (IMO) to break anything
16:49:54 <dustymabe> jlebon: something like that, yes
16:50:16 <dustymabe> jlebon: and we'd try to test them both
16:50:49 <dustymabe> i.e. rather than having a yum repo of "ok to layer packages" we have a base ostree with an addons layer
16:51:07 <dustymabe> so we just have 1 extra set of tests
16:51:18 <walters> in https://blog.verbum.org/2019/12/23/starting-from-open-and-foss/ I argue of thing of RPM layering like "Firefox extensions for the OS" - the problem domains around updates breaking extensions are quite analogous
16:51:20 <dustymabe> rather than N extra sets of possible tests
16:51:31 <walters> hmm
16:51:52 * dustymabe notes this is mostly a random idea
16:52:02 <jlebon> hmm, ok so you're saying an "overlay" OSTree rather than a whole other tree?
16:52:17 <dustymabe> right, would prefer it derive from the real thing
16:52:38 <walters> i think the problem root is lifecycle binding/versioning/testing extensions (rpms) with the OS - we could also (as discussed in that thread) teach rpm-ostree how to find an rpm-md repo whose "version" corresponds with the OS
16:52:42 <dustymabe> but, of course. there would be a lot of details to work out too
16:52:47 <walters> shipping an overlay ostree is one implementation of binding
16:53:26 <dustymabe> walters: so if we solved the yum repo versioning problem you think it would solve this problem ?
16:53:35 <jlebon> walters: yeah agreed. i think if we had the binding story figured out, IMO it'd be much easier to recommend pkglayering
16:53:51 <walters> i'm not opposed to the "extras ref"; if e.g. we say that nothing in there does anything by default (e.g. perhaps one has to explicitly turn on the units via Ignition) then it seems safer
16:54:10 <jlebon> i know otaylor was looking at this a while back, as it's relevant to Silverblue, though not sure if something came of that
16:54:17 <walters> (I'd be surprised if e.g. wanting wireguard suddenly started running openvswitch)
16:54:24 <dustymabe> jlebon: yeah and lorbus just saw a customer hit it too
16:54:59 <dustymabe> tangent: appropriately the solution we've been talking about applies to FCOS
16:55:12 <dustymabe> I do believe the request for usbguard has also been made for RHCOS
16:55:31 <dustymabe> so if we were to say "package layer it" in FCOS, how would that apply to FCOS?
16:55:51 <lorbus> on a side note: SUSE also has a way of doing transactional rpm installs client-side...not sure if we can learn anything from the way they're doing it
16:55:52 <dustymabe> sorry, RHCOS
16:56:12 <dustymabe> lorbus: they're using btrfs snapshots IIUC
16:56:20 <walters> lorbus: right, the microOS thing is much closer to traditional RPM - i.e. depsolving *always* happens client side
16:56:23 <dustymabe> so they're just using the package manager (zypper)
16:56:33 <walters> it means you can get state drift; it's not really an "image system" by default
16:56:50 <lorbus> true..https://github.com/openSUSE/transactional-update
16:57:02 <otaylor> jlebon: Nothing came of it, my attention went elsewhere, still an outstanding problem :-) ... most approaches would require somewhat significant Fedora infrastructure work
16:57:03 <lorbus> nvmd thenso
16:57:04 <walters> https://github.com/openSUSE/transactional-update#caveats is basically all stuff rpm-ostree fixes
16:57:09 <jlebon> dustymabe: right, it'd probably end up having to be baked in or just shipped as RPMs a-la-kernel-rt
16:57:18 <miabbott> imo, since RHCOS use case is much more narrow than FCOS, we have different requirements/expectations from customers.  it's good to have the discussion about including new pkgs in the RHCOS base, but i don't think the decision tree is going to be the same as FCOS
16:58:05 <walters> Well, the other big RHCOS distinction is that it's part of OpenShift 4 which is oriented around container images entirely - so shipping extensions there would likely be via container images
16:58:17 <walters> i mean, it already is with kernel-rt
16:58:19 <jlebon> otaylor: :(   i think this is going to become much more relevant again once silverblue starts slowing down its cadence
16:59:02 <dustymabe> ok, let's try to wrangle this discussion back in
16:59:13 <dustymabe> anybody with a suggested way forward here ?
16:59:28 <dustymabe> we obviously have a macro problem, which is we can't include the world in the host
16:59:52 <dustymabe> but we do have a lot of small needs that aren't easily suited or desired to be run in a container
16:59:53 <jlebon> hmm, one path is to own the side yum repo ourselves
16:59:59 <jdoss> I really think for FCOS we need a good framework for helping users customize the OS in a best practice kind of way. I am still not sure what that really is from my angle besides shove it in a container and do a bunch of work on my end.
17:00:30 <jlebon> so basically every release also dumps known to match "good" RPMs into a yumrepo. and we ship the repo config in the host and disable the others by default
17:00:58 <dustymabe> jlebon: define "good" ?
17:01:35 <jlebon> essentially walters' idea of whitelisting, and the whitelist controls what goes in the repo
17:01:35 <walters> something like a yaml file with a list of packages, we run a service that depsolves vs base OS and maintains multiple versions of them?
17:01:43 <lorbus> I think the kernel-rt way walters mentioned may be a good way to go here? expand that to work for other RPMs as well
17:02:00 <dustymabe> to me good could mean two different things
17:02:01 <jlebon> the crucial part is that we *know* it layers/depsolves successfully at compose time
17:02:13 <dustymabe> 1. packages in this blessed list can be installed
17:02:23 <dustymabe> 2. packages in this blessed list have been tested to work
17:02:42 <jlebon> right, this would unlock meaningful testing too
17:02:50 <walters> and of course we could even run tests associated with that package but before we run there we need to integrate with the existing gating for the base probably...
17:03:38 <jlebon> basically, similar to RHCOS' approach of shipping the kernel-rt RPMs in the container, but we ship it in a yum repo :)
17:04:12 <dustymabe> ok. I like the brainstorming we're doing here, but I do think implementing anything like this is going to be some time off
17:04:15 <dustymabe> would anyone disagree ?
17:04:15 <otaylor> (For reference: https://docs.google.com/document/d/1yS0PTaUPmD-CkQkdlJ9OvY-Z5hBOIMlRKDyqutEG_ks/edit?usp=sharing was my analysis of fixing the issue on Fedora)
17:04:35 <walters> (Of course for OKD...raises the interesting question of whether OKD would match OCP and mirror extras into a container)
17:05:10 <lorbus> I think that'd be preferable..
17:05:27 <jlebon> dustymabe: agreed, though i think implementation wise it's probably the easiest path
17:05:43 <lorbus> it's the rpm-ostree vs pivot way of updating the OS
17:05:46 <jlebon> (while still addressing the lifecycle problem)
17:05:51 <dustymabe> jlebon: we probably need to have more discussion about the implementation
17:06:23 <dustymabe> but just in general we think we'd like to hold off on usbguard because we'd like to implement some sort of "reliable extensions" framework to handle use cases like this
17:07:14 <jlebon> +1, IMO i think it's worth thinking more on this before we ship it
17:08:05 <ksinny> +1, like the new approach of shipping additional  whitelisted apcakge in separate repo
17:08:06 <walters> i think my bottom line take on usbguard is for now anyone who wants it on FCOS can pkg layer and that should just work; for OpenShift 4...we could bake it into RHCOS short term but hopefully the fact that it's on the host is an implementation detail and e.g. we could switch to having it be a daemonset mostly transparently or so if we later decide to do that?
17:09:45 <jlebon> walters: not to derail too much, but for RHCOS, hopefully this be another install-config.yaml knob?  if so, then yeah we can easily swap alternatives down the road
17:09:50 <dustymabe> #proposed usbguard fits into a category of small OS utility/daemon that is not easy or desirable to containerize but is also not something we immediately want to include in the host because if we include every utility/daemon we end up with a kitchen sink OS. We'd like to develop a framework for "reliable extensions" that we can use to deliver usbguard and other utilities/daemons
17:10:02 <walters> the other related thing here is where Ubuntu is going with https://snapcraft.io/ - they've made it a nice experience to have "containerized" apps that work for CLI,GUI,Servers but are not aligned with the Docker/Kubernetes ecosytem and do require people to create both .deb and snaps (if relevant)
17:11:12 <dustymabe> how does my statement look ?
17:11:30 <jlebon> ack
17:11:49 <lorbus> ack
17:12:01 <ksinny> ack
17:12:31 <dustymabe> #agreed usbguard fits into a category of small OS utility/daemon that is not easy or desirable to containerize but is also not something we immediately want to include in the host because if we include every utility/daemon we end up with a kitchen sink OS. We'd like to develop a framework for "reliable extensions" that we can use to deliver usbguard and other utilities/daemons
17:12:58 <dustymabe> #action dustymabe to create issue to discuss possibilities for a "reliable extensions" framework
17:13:05 <walters> ack
17:13:23 <dustymabe> ok moving on to the next topic
17:13:47 <dustymabe> #topic Add factory reset capability
17:13:51 <dustymabe> #link https://github.com/coreos/fedora-coreos-tracker/issues/399
17:14:09 <bgilbert> I think this idea has been floated before
17:14:23 <bgilbert> but basically: we encourage users to reprovision from scratch whenever they have config changes
17:14:42 <bgilbert> that's straightforward on VMs and clouds, and also on bare metal when there's PXE install infrastructure in place
17:15:04 <bgilbert> but for a real single-node setup, like an air-gapped embedded appliance, it's not.
17:15:25 <bgilbert> so I wonder whether it'd 1) make sense and 2) be feasible to offer a factory reset capability.
17:15:52 <bgilbert> run a command, give it your new Ignition config, and it'll 1) put the Ignition config in /boot, 2) re-enable first-boot kargs, and 3) reboot.
17:16:04 <bgilbert> the initramfs would not only do all the first-boot stuff, but delete any customizations first.
17:16:18 <bgilbert> now, that probably doesn't work for **all** customizations.
17:16:28 <walters> if we committed the ignition config into the ostree repo by default this would be a bit easier
17:16:31 <bgilbert> e.g. if you've moved the root FS, we probably won't move it back
17:16:54 <walters> ah yeah, resetting the rootfs would be...interesting
17:16:57 <bgilbert> and so, conceivably, this feature could get bogged down in a pile of special cases
17:17:40 <dustymabe> coreos-installer did use to be in the initramfs - so technically every system could be re-installed by just changing kernel CLI args in grub
17:17:43 <bgilbert> but it seemed like it might be plausible.  the work to support moving the root FS might help, since maybe we could reuse some of that infra.
17:18:21 <bgilbert> dustymabe: yeah
17:18:27 <bgilbert> thoughts?
17:19:00 <dustymabe> seems good to me, though I'd be interested to know how much work we think it would be
17:19:22 <jlebon> dustymabe: that doesn't work if it's running from the disk it needs to install to though
17:19:59 <dustymabe> jlebon: hence why I mentioned it used to be in the initRAMfs
17:20:02 <walters> In general we can reboot into
17:20:14 <walters> The initrd and run from ram
17:20:17 <jlebon> but yeah, maybe a higher-level approach is figuring out a way to rerun coreos-installer while still targeting that disk
17:20:50 <jlebon> dustymabe: ahh sorry, your "could" there is past tense :)
17:21:15 <bgilbert> ...so actually, the stage2 discussion fits in here
17:21:22 <jlebon> could we "stage" the node somehow to reboot into live mode?
17:21:45 <bgilbert> if there's a way to use our existing kernel and initrd in /boot to pivot into a live system by fetching the root FS from the network
17:21:50 <bgilbert> and then run coreos-installer from that
17:22:07 <bgilbert> that fixes _some_ cases.  not the air-gapped single system, though.
17:22:48 <bgilbert> we could even fetch before the reboot and stash somewhere
17:23:12 <bgilbert> (uhh, kinda.  if the install fails, and wipes the install image from ROOT, you don't get a second chance)
17:23:14 <dustymabe> coreos-installer is a binary right? I wonder if all external deps are already in our initramfs
17:23:14 <walters> the initrd can mount the rootfs and copy out stuff like /usr/bin/coreos-installer into RAM, then blow it away
17:23:18 <walters> don't need to redownload
17:23:42 <bgilbert> I've been assuming we don't actually want to run c-i from the initramfs proper
17:23:50 <bgilbert> it's probably easier now than with the old shell script, true
17:23:54 <bgilbert> only has a couple external deps
17:23:58 <bgilbert> (but one of them is GPG)
17:25:11 <bgilbert> oh wow.  rube goldberg device:
17:25:33 * dustymabe notes time
17:25:35 <bgilbert> 1. before reboot, assemble a live squashfs from the ostree we already have
17:25:52 <bgilbert> 2. from the initramfs, mount the old root filesystem, copy the squashfs into ram
17:26:04 <walters> Yeah, that is the generalization
17:26:10 <bgilbert> 3. create a temporary partition at the end of the disk, copy the squashfs into it for safety
17:26:22 <bgilbert> 4. run coreos-installer.  if it fails, that's okay, we still have the safety partition
17:26:31 <bgilbert> 5. reboot into new system
17:27:00 <dustymabe> yeah.. seems like there are a lot of moving parts though
17:27:12 <bgilbert> I'm not sure it's a serious proposal :_)
17:27:14 <bgilbert> :-)
17:27:20 <dustymabe> i think we'd have to consider how important something like this is
17:27:30 <dustymabe> depending on the complexity of the proposed solution
17:27:35 <jlebon> yeah, agreed
17:27:37 <dustymabe> obviously more complex == harder to maintain
17:27:43 <bgilbert> to be clear, the original ticket wasn't contemplating rerunning the installer, just going through and deleting user customizations
17:27:56 <bgilbert> which is messier in its own way, since we're probably not getting 100% back to pristine state
17:28:17 <jlebon> i'm confused though, even for air-gapped systems, they were installed somehow, right?
17:28:33 <bgilbert> sure.  maybe before the hardware was emplaced.
17:29:04 <bgilbert> or maybe by hand
17:29:15 <jdoss> Folks, sometimes air-gapped systems just install themselves...
17:29:37 <dustymabe> spontaneous installation
17:29:54 <bgilbert> factory-installed FCOS
17:29:57 <jdoss> skynet is that you?
17:30:24 <jlebon> bgilbert: would it get us half-way there if we just fix the "keep state while rerunning coreos-installer" path?
17:30:24 <dustymabe> bgilbert: I don't see anyone against factory reset
17:30:36 <dustymabe> there are a few different ways to do it that we discussed
17:30:45 <bgilbert> dustymabe: yup, we got some discussion out of it, which was my goal here
17:30:51 <bgilbert> jlebon: sorry, which path?
17:30:56 <dustymabe> should we add a summary to the ticket and see if we get any more discussion?
17:31:01 <bgilbert> dustymabe: +1
17:31:35 <jlebon> bgilbert: being able to rerun coreos-installer, while keeping e.g. /var
17:31:48 <jlebon> on the same disk
17:31:57 <dustymabe> #info all seem to be in agreement that factory reset would be nice to have.. there are a few different options for how to go about doing that. bgilbert will add them to ticket #399
17:31:59 <bgilbert> jlebon: that should already be fixed as of the next c-i release.  but no, doesn't help here
17:32:21 <dustymabe> we're over time so I'm going to skip to open floor to see if there is anything
17:32:24 <dustymabe> #topic open floor
17:32:39 <bgilbert> jlebon: I think the primary point is having a way to reprovision without needing anything off-machine
17:32:59 <jlebon> ack
17:33:21 <dustymabe> #info we did a new FCOS release for testing and stable starting today
17:33:30 <dustymabe> thanks ksinny for running the release
17:33:52 <jlebon> thanks ksinny!
17:33:58 <ksinny> enjoyed working on the release :)
17:34:42 <dustymabe> #info we migrated our production fedora ostree repos into a netapp volume to be accessed via various openshift projects that our teams will use to do automated imports and prunes of OSTree commits for Fedora CoreOS
17:34:53 <dustymabe> anything else ?
17:35:00 <bgilbert> \o/
17:35:06 <jlebon> dustymabe: *awesome*
17:35:18 <ksinny> nice work
17:35:30 <dustymabe> jlebon: still working on one more issue with permissions https://pagure.io/releng/issue/8811#comment-628901
17:35:49 <dustymabe> but we're getting close
17:35:56 <dustymabe> I haven't unleashed the importer just yet
17:36:04 <jlebon> dustymabe: sorry i couldn't chat this morning, let's discuss after food! :)
17:36:09 <dustymabe> ok will end meeting in one minute
17:37:06 <dustymabe> #endmeeting