fedora_coreos_meeting
LOGS
16:29:32 <dustymabe> #startmeeting fedora_coreos_meeting
16:29:32 <zodbot> Meeting started Wed Apr 19 16:29:32 2023 UTC.
16:29:32 <zodbot> This meeting is logged and archived in a public location.
16:29:32 <zodbot> The chair is dustymabe. Information about MeetBot at https://fedoraproject.org/wiki/Zodbot#Meeting_Functions.
16:29:32 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
16:29:32 <zodbot> The meeting name has been set to 'fedora_coreos_meeting'
16:29:38 <dustymabe> #topic roll call
16:29:42 <dustymabe> .hi
16:29:43 <zodbot> dustymabe: dustymabe 'Dusty Mabe' <dusty@dustymabe.com>
16:30:30 <fifofonix> .hi
16:30:31 <zodbot> fifofonix: fifofonix 'Fifo Phonics' <fifofonix@gmail.com>
16:31:03 <ravanelli> .hi
16:31:04 <zodbot> ravanelli: ravanelli 'Renata Ravanelli' <renata.ravanelli@gmail.com>
16:31:22 <davdunc[m> .hello davdunc
16:31:22 <zodbot> davdunc[m: davdunc 'David Duncan' <davdunc@amazon.com>
16:31:36 <marmijo[m]> .hello marmijo
16:31:37 <zodbot> marmijo[m]: marmijo 'Michael Armijo' <marmijo@redhat.com>
16:32:37 <jlebon> .hello2
16:32:38 <zodbot> jlebon: jlebon 'None' <jonathan@jlebon.com>
16:33:41 <dustymabe> #chair fifofonix ravanelli davdunc[m marmijo[m] jlebon
16:33:41 <zodbot> Current chairs: davdunc[m dustymabe fifofonix jlebon marmijo[m] ravanelli
16:34:42 <dustymabe> thanks everyone for coming!
16:34:48 <dustymabe> let's get started
16:35:37 <dustymabe> #topic Action items from last meeting
16:35:42 <dustymabe> * dustymabe to open a new issue related to the "regular dbx updates" feature
16:35:44 <dustymabe> * jlebon to open a new issue related to the "regular bootloader updates" feature
16:35:50 <dustymabe> #info dustymabe opened issue for regular dbx updates: https://github.com/coreos/fedora-coreos-tracker/issues/1478
16:35:52 <dustymabe> #info jlebon opened issue for regular bootloader updates: https://github.com/coreos/fedora-coreos-tracker/issues/1468
16:36:39 <dustymabe> ok moving on to meeting topics
16:36:48 <dustymabe> #topic Rollback from F38 to F37 followed by another F38 upgrade can lead to loss of SSH access
16:37:17 <dustymabe> #link https://github.com/coreos/fedora-coreos-tracker/issues/1473
16:37:28 <dustymabe> so this one is... interesting
16:37:44 <dustymabe> on the 37->38 upgrade there is a migration script that runs that updates permissions on SSH keys
16:37:53 <dustymabe> the host keys live in /etc/
16:38:07 <dustymabe> the migration script drops down a stamp file in /var/ to tell it not to run the migration again
16:38:29 <dustymabe> upon rollback the keys get rolled back (in /etc/)
16:38:54 <dustymabe> but the stamp file in `/var/` lives on
16:39:11 <dustymabe> thus upon a later upgrade 37->38*
16:39:16 <dustymabe> the migration doesn't happen
16:39:40 <jdoss> .hi
16:39:41 <zodbot> jdoss: jdoss 'Joe Doss' <joe@solidadmin.com>
16:39:51 <bgilbert> .hi
16:39:52 <zodbot> bgilbert: bgilbert 'Benjamin Gilbert' <bgilbert@backtick.net>
16:39:56 <dustymabe> #chair jdoss bgilbert
16:39:56 <zodbot> Current chairs: bgilbert davdunc[m dustymabe fifofonix jdoss jlebon marmijo[m] ravanelli
16:40:23 <dustymabe> jdoss: bgilbert we're just discussing https://github.com/coreos/fedora-coreos-tracker/issues/1473 and if there's anything we can do about it
16:41:45 <jdoss> I agree with bgilbert on the no support rollbacks, but losing SSH access is a major bummer.
16:41:45 <bgilbert> my current sense is: if we want to put in the effort, then we can, and SSH is indeed special.  but in general we can't commit to this rollback ever ever working
16:42:06 <jdoss> +1
16:43:03 <dustymabe> bgilbert: would another way to say be that rollbacks are best effort?
16:43:10 <bgilbert> sure
16:43:53 <dustymabe> i think this is something we all knew deep down, but I don't know if we have any language anywhere about that to make it clear
16:44:07 <dustymabe> my experience so far has been that rolling back and re-upgrading has been pretty reliable
16:44:22 <dustymabe> but there are just too many factors involved to make any sort of gurantee on it
16:44:22 <fifofonix> fifofonix: ditto
16:44:43 <dustymabe> so maybe we can find a spot in our docs where it would be approprate to update language
16:44:44 <bgilbert> sure, it'll work except to the extent it doesn't
16:44:49 <travier> .hello siosm
16:44:50 <zodbot> travier: siosm 'Timothée Ravier' <travier@redhat.com>
16:44:56 <bgilbert> +1 to docs
16:45:16 <dustymabe> indeed
16:45:28 <dustymabe> ok let's focus on IF we want to do something for this case and if so, what?
16:46:00 <dustymabe> obviously we can document the workaround (which is to remove the stamp file)
16:46:14 <dustymabe> but that does require the user to be able to get back in to their machine
16:47:18 <travier> They can remove it while on F37
16:47:21 <dustymabe> jlebon's suggestion in https://github.com/coreos/fedora-coreos-tracker/issues/1473#issuecomment-1513512301 is to essentially ship a helper unit that removes the stamp file on every boot (guaranteeing the migration code runs every boot)
16:47:23 <travier> but yeah that's not great
16:47:36 <dustymabe> travier: the user doesn't know it's a problem until they try to re-upgrade to f38*
16:47:54 <travier> we can ship a drop-in that removes the drop file check for now and then remove it in a later release
16:47:57 <dustymabe> so they then have to select the older entry in grub (or use a password to log in on the console)
16:48:00 <travier> after the 38 barrier
16:48:36 <dustymabe> travier: ahh, so that's a different version of what jlebon suggested.. yours uses a dropin instead of a separate unit
16:48:51 <jdoss> Since ostree can do rollbacks, and that is a highlight of FCOS, doing best effort when moving between major Fedora releases seems reasonable.
16:49:11 <dustymabe> jdoss: it also happens to be the time you'd need rollbacks the most
16:49:20 <jdoss> yep totally.
16:49:39 <jlebon> travier: nice, that's a tinier patch even
16:51:16 <dustymabe> so.. it's looking like we'll need to spin a new `testing` anyway for the proxy issue: https://github.com/coreos/fedora-coreos-tracker/issues/1477
16:51:21 <travier> I can not find how ConditionPathExists= logic is handled in systemd
16:51:31 <travier> if it's a AND or a OR
16:51:36 <dustymabe> #info we paused the F38 rollout because of an issue related to updates behind a proxy: https://github.com/coreos/fedora-coreos-tracker/issues/1477
16:51:41 <fifofonix> patch sounds good (notes to side that fedora docs more broadly than fcos are silent on this nuance - implying rollback supported - silverblue / iot)
16:51:44 <travier> ConditionPathExists=!foo
16:51:44 <travier> ConditionPathExists=foo
16:52:03 <dustymabe> travier: I think if you just add ConditionPathExists= in a dropin it will cancel previous ConditionPathExists=
16:52:15 <travier> hum, not sure, you can have several
16:52:21 <dustymabe> right
16:52:30 <dustymabe> a single empty entry will cancel all previous
16:52:43 <dustymabe> and then you get to start from scratch defining them again (in the dropin)
16:52:57 <jlebon> fifofonix: silverblue at least assumes you're more hands on and not accessing via ssh so it's easier to deal with possible rollback fallout
16:53:26 <dustymabe> jlebon: I think that might not be obvious to the users :)
16:53:49 <jlebon> yeah, docs there too wouldn't hurt. but i think it's a less critical issue there in the first place
16:54:09 <dustymabe> ok so do we want to try to ship a dropin? at least it will help when we have our 38->`stable` transition
16:54:22 <dustymabe> and then we'd remove it in a barrier?
16:54:26 <jlebon> given that it's a trivial fix, i think we should
16:54:40 <bgilbert> +1
16:54:46 <travier> +1, shipping a smal droppin should be easy to test
16:54:49 <travier> small*
16:55:36 <dustymabe> #proposed we will ship a systemd dropin to remove the stamp file ConditionPathExists= on the migration unit so the idempotent migration code will run on every boot until we remove it after a barrier release
16:55:49 <jdoss> +1
16:55:55 <fifofonix> +1
16:56:16 <jlebon> ack
16:56:18 <travier> +1
16:56:22 <bgilbert> +1
16:56:28 <copperi[m]> +1
16:56:35 <dustymabe> #agreed we will ship a systemd dropin to remove the stamp file ConditionPathExists= on the migration unit so the idempotent migration code will run on every boot until we remove it after a barrier release
16:56:58 <dustymabe> I'm loving all the votes/input today 😍
16:57:17 <travier> 🍪
16:57:41 <dustymabe> #topic Upgrade LUKS key derivation function on (major?) updates
16:57:45 <dustymabe> #link https://github.com/coreos/fedora-coreos-tracker/issues/1474
16:58:36 <dustymabe> travier: want to intro this one?
16:59:16 <jdoss> travier: thanks for opening this. When I read mjg59's post I double checked all of my LUKS setups and thankfully they were new-ish installs.
17:00:17 <dustymabe> honestly I think this class of problem (along with the bootloader updates issue that we've recently discussed) is why I reprovision my systems ~ once a year
17:00:34 <jdoss> I agree with the post's call to have the distros handle this for users.
17:00:57 <davdunc[m> if not every 30 days. :)
17:01:27 <bgilbert> I can take the intro
17:01:50 <dustymabe> bgilbert: +1
17:01:56 <jdoss> davdunc[m: I like you
17:02:24 <bgilbert> LUKS disk encryption volumes are encrypted with a key which is unrelated to the password you type in at the console
17:02:32 <davdunc[m> jdoss: :)
17:02:33 <bgilbert> (or that Tang handles, etc.)
17:03:16 <bgilbert> that main key is essentially encrypted with your password (or Tang's key).  there can be multiple "key slots", which each encrypt the main key, so that e.g. multiple passwords can be used to unlock the volume
17:03:32 <bgilbert> but your password probably isn't very random
17:03:50 <jdoss> mine is hunter2
17:04:10 <bgilbert> so a "password-based key derivation function" (PBKDF) is used to convert your password to the key that's used to decrypt the main volume key
17:04:26 <bgilbert> password unlocks> key slot unlocks> volume key
17:04:49 <bgilbert> the job of the PBKDF is to make brute-forcing harder
17:05:33 <bgilbert> if you want to brute-force jdoss's password "hunter2", you should have to do a lot of work to generate the key that actually decrypts the key slot
17:05:48 <bgilbert> historically that meant "use a lot of CPU time", but GPUs exist and are good at parallelizing things
17:05:56 <bgilbert> so now it means "use a lot of CPU and a lot of memory"
17:06:28 <bgilbert> LUKS supports multiple PBKDFs, but volumes that were created a while ago use an older one that isn't as good at requiring a lot of memory
17:06:34 * dustymabe notes this is a beautifully crafted intro to this problem by bgilbert
17:06:41 <bgilbert> dustymabe: <3
17:06:45 <travier> thanks bgilbert!
17:07:23 <bgilbert> it's possible to upgrade to a new PBKDF, but there are some factors:
17:08:05 <bgilbert> 1. each key slot needs to be updated, and you have to update all key slots (not necessarily at the same time) to fully improve your security
17:08:23 <bgilbert> 2. you need the slot's password to update it (because you're re-encrypting with that password)
17:08:38 <bgilbert> 3. GRUB doesn't support the newest PBKDF (doesn't matter for us, since we don't encrypt /boot)
17:09:13 <bgilbert> 4. rewriting key slots feels scary: if you get it wrong somehow, you've lost access to your disk
17:10:12 <bgilbert> it'd be nice to handle this transition automatically.  ideally we'd just piggyback on existing tooling (e.g. upstream dracut glue would just handle this) but mjg's point is that distros aren't really doing this right now
17:10:25 <jlebon> hmm, one question here is what the default PBKDF was at the time FCOS stable started supporting LUKS
17:10:46 <jlebon> bgilbert: or did you already check that it changed since?
17:10:50 <bgilbert> we do have some of our own initrd glue though, so in principle we could pursue this ourselves, or work on getting some tooling upstream
17:11:03 <bgilbert> jlebon: I have not
17:11:24 <jdoss> Would an easy stopgap be have a systemd unit check for old LUKS and barf out a message to the login stuff we do pointing to documentation on how they can do it manually?
17:11:36 <bgilbert> but my current dev machine I believe is post-FCOS-LUKS and uses an older PBKDF
17:11:39 <jlebon> on my FSB, which is provisioned last year, it's argon2id, which per the blog is fine. wonder when that change happened.
17:11:39 <dustymabe> I think I'm against pursuing this ourselves.. if anything we should work as part of a subgroup within Fedora to try to solve the problem for Fedora as a whole
17:12:11 <bgilbert> jdoss: that UX isn't great
17:12:26 <jdoss> fair
17:12:33 <jlebon> dustymabe: definitely, though it'll likely require some special integration for FCOS/FSB/IOT
17:12:45 <bgilbert> oh, I should also note the threat model here: someone walks off with a copy of your encrypted data and then spends a bunch of CPU time to crack it
17:12:55 <bgilbert> well, GPU time
17:12:56 <travier> One of my suggestion was similar to jdoss in that we add something to CLHM & write docs for now until we have something better
17:13:13 <bgilbert> argon2i isn't broken AFAIK, it's just less good
17:13:34 <dustymabe> jlebon: true.. this feels like a system-wide fedora change that we should be a part of (maybe not necessarily owning it, but at least giving input and making sure our use cases are handled)
17:14:04 <travier> We don't have a good story for encryption on FCOS right now given that only Tang setups are "secure"
17:14:22 <dustymabe> this is an example where *CoreOS can help drive changes into all of Fedora and make it better
17:14:48 <travier> tpm ones are ok regarding theft
17:14:53 <dustymabe> travier: you can't have an encrypted disk outside of tang?
17:15:40 <travier> non, that's not what I'm saying. The only encrypted setups in FCOS right now that resists to theft are tpm & tang
17:15:48 <bgilbert> hmm
17:16:18 <bgilbert> FCOS _can_ read a password from the keyboard at boot, but all of our documented/encouraged use cases are noninteractive
17:16:27 <dustymabe> yeah ^^
17:16:29 <bgilbert> which means they use random keys
17:16:32 <dustymabe> i was thinking about that use case
17:16:41 <travier> ok, this one works as well but I don't really think a lot of folks are doing it
17:16:42 <dustymabe> kind of doesn't work well for automatic updates, though
17:16:51 <travier> but agree that this one also works
17:16:54 <dustymabe> +1
17:16:54 <bgilbert> the whole point of brute-forcing is that passwords aren't evenly distributed in the input space
17:17:08 <bgilbert> so aaaaactually this may be largely a non-issue for us
17:17:19 <dustymabe> bgilbert: would you be able to write up your nice intro in the GH issue?
17:17:29 <dustymabe> i found it very helpful in understanding the problem
17:17:42 <bgilbert> okay
17:17:57 <dustymabe> i think the open question here is: what do we want to do about it?
17:18:12 <bgilbert> I'm leaning "leave the bug open"
17:18:14 <dustymabe> there's the docs+CLHM helper suggestion (which could be just an intermediate thing)
17:18:33 <dustymabe> there's also the Fedora System Wide change and work with other teams option
17:18:35 <jdoss> what is CLHM again?
17:18:40 <bgilbert> console-login-helper-messages
17:18:43 <jdoss> ty
17:19:11 <dustymabe> there's also the option for us to impelement something on our own (but I'm not a big fan of this option)
17:19:15 <bgilbert> I think our use cases are less likely to be affected than the general population
17:19:23 <travier> I'm not really concerned about this issue for FCOS but I opened it so that we track this
17:19:31 <dustymabe> travier: +1
17:19:36 <bgilbert> travier: +1
17:19:40 <dustymabe> 4th option: do nothing
17:20:06 <travier> we can do nothing now, but we'll have to do something "at some point"
17:20:30 <bgilbert> again: will we?
17:21:02 <bgilbert> if you're using passwords you're already off the beaten path
17:21:14 <travier> warning users that they are using a weak pbkdf is the minimum from my perspective
17:21:24 <jdoss> +1
17:21:33 <bgilbert> my point is that in almost all cases, _they're not really passwords_
17:21:35 <jlebon> bgilbert: you could be using LUKS on a non-root device
17:21:42 <bgilbert> jlebon: and hand-unlocking it?
17:21:58 <travier> luks on non-root is not theft resistant
17:22:05 <jlebon> yeah
17:22:07 <travier> unless root is on luks too
17:22:12 <dustymabe> travier: THIS ^^
17:22:25 <jlebon> travier: can you expand?
17:22:31 <jlebon> do you mean if using a keyfile?
17:22:42 <travier> with keyfiles yes
17:22:48 <dustymabe> so this problem only presents itself IF the user is using a password?
17:23:20 <bgilbert> dustymabe: the PBKDF only matters if it's increasing the effective entropy of the input, yeah
17:23:20 <jlebon> but nothing stops you from deleting the keyfile post ignition
17:23:37 <bgilbert> dustymabe: if the input is already 128 bits of random data, you still have to search the whole input space
17:23:41 <dustymabe> bgilbert: and with tang or tpm you don't have that issue?
17:24:16 <bgilbert> dustymabe: right, those both use random keys
17:24:21 <dustymabe> ok
17:24:54 <jlebon> anyway, overall not super concerned either for FCOS. just wanted to mention this since ignition does support it.
17:24:56 <dustymabe> I think my proposal is we implement nothing for now but try to seek out within Fedora/RHEL people who are looking at this problem (they have to exist, right?)
17:25:22 <travier> +1
17:25:34 <bgilbert> +1
17:25:39 <bgilbert> #action bgilbert to write up a comment in the bug
17:26:01 <jlebon> might be interesting to know what the default was when we started supporting LUKS root
17:26:32 <jlebon> but overall +1
17:27:12 <dustymabe> # proposed we think the use case of using a password to unlock encrypted disks (which is where this issue has the most effect) for Fedora CoreOS isn't common. For now we'll do nothing but will reach out to the Fedora community to see if anyone is thinking about or working on this problem already.
17:27:22 <travier> +1
17:27:23 <bgilbert> +1
17:27:46 <dustymabe> aside: anyone want to volunteer to do the reaching out? could be a fedora devel list email
17:28:23 <dustymabe> #agreed we think the use case of using a password to unlock encrypted disks (which is where this issue has the most effect) for Fedora CoreOS isn't common. For now we'll do nothing but will reach out to the Fedora community to see if anyone is thinking about or working on this problem already.
17:28:31 <dustymabe> I didn't see anyone voting against :)
17:29:11 <dustymabe> and.. I think we are out of time :(
17:29:18 <dustymabe> #topic open floor
17:29:38 <jdoss> I got some new gear for my datacenter and I am going to try installing FCOS on some CM4s soon
17:29:45 <jdoss> https://usercontent.irccloud-cdn.com/file/LQBQZpvI/PXL_20230419_165217208.jpg
17:29:45 <dustymabe> nice
17:30:02 <jdoss> I got 2x Turing Pi 2 boards.
17:30:20 <jdoss> dustymabe: I will be crying to you when I get stuck :)
17:30:49 <dustymabe> ha
17:31:06 <dustymabe> hopefully it all works flawlessly
17:31:28 * dustymabe closes out the meeting in a few minutes if no new discussion
17:31:29 <jdoss> One could hope. I think the RPi4 docs should work.
17:31:36 <fifofonix> just to mention i have pinned my gpu processing on f37 nodes for now.
17:31:47 <dustymabe> fifofonix: for the CIFS issue?
17:31:50 <fifofonix> can't get f38 working due to missing kernel-headers.
17:32:06 <fifofonix> known issue not having a good means for kernel headers.
17:32:14 <fifofonix> (we've discussed before)
17:32:48 <fifofonix> hoping that soon we'll get some kernel versions that have koji matching headers.
17:32:49 <dustymabe> ahh +1
17:34:00 <dustymabe> #endmeeting