fedora-coreos-meeting
LOGS
<@angelcr:matrix.org>
15:30:09
!startmeeting fedora_coreos_meeting
<@meetbot:fedora.im>
15:30:11
Meeting started at 2026-05-20 15:30:09 UTC
<@meetbot:fedora.im>
15:30:12
The Meeting name is 'fedora_coreos_meeting'
<@angelcr:matrix.org>
15:30:42
!topic roll call
<@marmijo:fedora.im>
15:30:57
!hi
<@zodbot:fedora.im>
15:31:08
marmijo: Michael Armijo (marmijo)
<@thilofm:matrix.org>
15:31:16
!hi
<@jbtrystram:matrix.org>
15:31:21
!hi
<@zodbot:fedora.im>
15:31:26
thilofm: Thilo Fromm (tfm)
<@zodbot:fedora.im>
15:31:33
jbtrystram: Jean-Baptiste Trystram (jbtrystram) - he / him / his
<@siosm:fedora.im>
15:32:19
!hi
<@zodbot:fedora.im>
15:32:26
Timothée Ravier: Timothée Ravier (siosm) - he / him / his
<@rapneset:matrix.org>
15:32:33
!hi
<@zodbot:fedora.im>
15:32:35
rapneset: Rolv Apneseth (rapneset)
<@peytonrobertson:matrix.org>
15:33:27
! hi
<@peytonrobertson:matrix.org>
15:33:46
!hi
<@zodbot:fedora.im>
15:33:49
No Fedora Accounts users have the @peytonrobertson:matrix.org Matrix Account defined
<@ydesouza:fedora.im>
15:34:08
!hi
<@zodbot:fedora.im>
15:34:20
Yasmin Valim de Souza: Yasmin Valim de Souza (ydesouza)
<@angelcr:matrix.org>
15:34:26
!topic Action items from last meeting
<@dustymabe:matrix.org>
15:34:56
!hi
<@zodbot:fedora.im>
15:35:00
dustymabe: Dusty Mabe (dustymabe) - he / him / his
<@angelcr:matrix.org>
15:35:49
I see that we have one action from the last meeting:
<@angelcr:matrix.org>
15:35:49
- `Update FCOS Meeting Checklist to the right time info`
<@ydesouza:fedora.im>
15:36:09
I updated it in the checklist :)
<@zodbot:fedora.im>
15:36:32
acervera has already given cookies to ydesouza during the F44 timeframe
<@angelcr:matrix.org>
15:37:06
!topic Review Fedora 45 Release Schedule
<@angelcr:matrix.org>
15:37:16
<@marmijo:fedora.im>
15:39:24
We're sitting in an idle period right now with the F45 schedule. We have just a little over a month before the first change checkpoint
<@angelcr:matrix.org>
15:39:25
The next checkpoint seems to be next month
<@spresti:fedora.im>
15:39:41
!hi
<@zodbot:fedora.im>
15:39:42
spresti: Steven Presti (spresti)
<@marmijo:fedora.im>
15:40:16
I know we've had a discussion about one change consideration we'll be submitting. But we should make sure to get anything else submitted before these deadlines
<@marmijo:fedora.im>
15:40:48
I'll also start pulling the list of change considerations for F45 weekly so we can discuss them in this meeting from now on (if there are not more pressing topics).
<@angelcr:matrix.org>
15:41:43
Great, if there is nothing else to add here - we can movo onto the fedora coreos tracker issues, of which we have two to discuss today
<@angelcr:matrix.org>
15:42:20
!topic Rework and evaluate kola-upgrade testing strategy
<@angelcr:matrix.org>
15:42:36
<@angelcr:matrix.org>
15:43:52
I think this one is from Yasmin Valim de Souza
<@dustymabe:matrix.org>
15:44:18
yep. she created it last week because we were having frustrations with our upgrade tests
<@dustymabe:matrix.org>
15:44:51
I ended up doing some RCA on what was going on and opened https://github.com/coreos/fedora-coreos-tracker/issues/2149
<@dustymabe:matrix.org>
15:45:34
I originally thought the problem was with the ostree repo being slow (we no longer use it, but when doing upgrade tests from old starting points it does get used)
<@dustymabe:matrix.org>
15:45:56
but it was actually a bug in the skopeo stack (we think) that was causing the pain for us.
<@jbtrystram:matrix.org>
15:46:04
Nice work
<@dustymabe:matrix.org>
15:46:22
that being said, it's still worth discussing if we want to change things up or not.
<@dustymabe:matrix.org>
15:46:41
right now we run the upgrade test from as far back as we can. i'm not sure it makes sense to keep doing so.
<@dustymabe:matrix.org>
15:47:30
we can also leverage cloud instances for upgrade tests (in cases where we haven't cleaned up the cloud images (i.e. AMIs)) to help parallelize them
<@jbtrystram:matrix.org>
15:48:57
While it's technically really cool to test this, I don't think it's bringing us enough value to go as far back
<@marmijo:fedora.im>
15:49:33
I'm assuming we dont have information about our user's starting points? We just know general information like how many nodes?
<@dustymabe:matrix.org>
15:49:59
yeah, we don't really have much info on that front :(
<@siosm:fedora.im>
15:50:27
Well, those tests are what makes sure that if you had installed a node 5 years ago it would still work today.
<@dustymabe:matrix.org>
15:50:33
The upgrade tests have certainly help us catch some interesting bugs
<@dustymabe:matrix.org>
15:51:02
and even now CI coverage of "upgraded from a machine deployed two years ago" isn't very good
<@dustymabe:matrix.org>
15:51:26
the upgrade test covers "can it upgrade", but not really "does everything function after it upgraded"
<@marmijo:fedora.im>
15:52:59
Yes, there's definitely a lot of value in making sure these nodes can upgrade. I'm wondering what percentage of users would fall into this category If it's low, we *could* evaluate dropping support for it.
<@marmijo:fedora.im>
15:53:29
or at least not supporting a starting point so far back.
<@marmijo:fedora.im>
15:53:43
I dont have a strong preference, I'm just looking at it from multiple sides :)
<@dustymabe:matrix.org>
15:53:45
Maybe we could implement a sliding window of upgrade tests
<@dustymabe:matrix.org>
15:54:07
i.e. 10 fedora releases - or 8 fedora releases (some number N)
<@dustymabe:matrix.org>
15:54:19
that way the upgrade tests don't just continue to increase over time
<@marmijo:fedora.im>
15:54:39
I like that idea!
<@dustymabe:matrix.org>
15:55:00
so 10 right now would take us back to F34
<@dustymabe:matrix.org>
15:55:13
currently we start at f32
<@jbtrystram:matrix.org>
15:56:12
given how easy it is to respin an FCOS node 5 years sounds like a lot
<@jbtrystram:matrix.org>
15:56:31
but given how stable it is, 5 years can go fast :)
<@dustymabe:matrix.org>
15:56:43
jbtrystram: you'd be surprised at how much infra just stays in place if it 'just works'
<@marmijo:fedora.im>
15:56:44
I think we only start at f32 for x86_64. the other arches have later starting points IIRC
<@marmijo:fedora.im>
15:57:15
That's what I'm wondering. How many of our users fall into this.
<@dustymabe:matrix.org>
15:57:19
marmijo: right. because the other architectures were only added later
<@dustymabe:matrix.org>
15:57:32
my routre right now aleph was `build": "35.20220131.3.0",`
<@jbtrystram:matrix.org>
15:57:39
yeah yeah I know, I'm just seeing the two faces of the coin. Heck, Nemric respins every two weeks !
<@dustymabe:matrix.org>
15:57:40
my router right now aleph was `build": "35.20220131.3.0",`
<@jbtrystram:matrix.org>
15:58:15
dustymabe: your router ? I want to learn more
<@dustymabe:matrix.org>
15:58:29
so I either want to have a long window of testing here - or we make a statement that explicitly says "if your aleph is older than X, we won't consider your bug"
<@dustymabe:matrix.org>
15:58:47
jbtrystram: haha
<@jbtrystram:matrix.org>
15:59:12
dustymabe: i'd say we will consider the bug, but you have to report it
<@jbtrystram:matrix.org>
15:59:26
instead of us catching it early
<@dustymabe:matrix.org>
15:59:54
jbtrystram: yeah. I think we have to be ok with just saying reprovision your node
<@dustymabe:matrix.org>
16:00:14
which I'm OK with, but still want to state somewhere some guidance for how old a machine can be
<@angelcr:matrix.org>
16:01:33
if the initial issue is not to do with this test, what is the reason to not run the test / limit the versions ? Is it because the tests can take ver long in the CI and cause issues as the time grows ?
<@dustymabe:matrix.org>
16:02:04
Angel Cervera Roldan: yeah. it takes a long time to run them and each of them pulls a lot of bits across the network :)
<@dustymabe:matrix.org>
16:02:29
I've thought about periodically saving VMs that started at X,Y,Z and then starting them back up to let them update. but that is extra maintenance too
<@dustymabe:matrix.org>
16:02:44
not sure what the right balance is
<@siosm:fedora.im>
16:03:05
Can we maybe create an instance from the first boot image, update it to say the first barrier before we switch to container only and then take a snapshot and use that as a reference?
<@dustymabe:matrix.org>
16:03:44
Timothée Ravier: we can :)
<@siosm:fedora.im>
16:04:18
We don't support using old boot images, so what we really should test is old instances and keeping such a snapshot should do it
<@dustymabe:matrix.org>
16:04:56
correct. I've just always felt that was more maintenance work than just running the full upgrade
<@dustymabe:matrix.org>
16:05:58
I did some work last week (when I thought the ostree repo was the bottleneck) to remove the ostree repo from the actual upgrade (i.e. local tarball with ostree repo contents for barrier releases).
<@siosm:fedora.im>
16:06:21
We either make an AMI on AWS or create a one off QCOW2 image and upload it somewhere to use it
<@dustymabe:matrix.org>
16:06:25
now that I know what the actual bug was i'm not sure if we should use that or not, but I at least know how to do it
<@dustymabe:matrix.org>
16:07:08
"one off QCOW2" for each starting point
<@siosm:fedora.im>
16:07:45
hum, what do you mean each starting point?
<@dustymabe:matrix.org>
16:08:04
32 had a different bootloader than 33, than 34, etc
<@siosm:fedora.im>
16:09:08
we update those automatically now don't we?
<@siosm:fedora.im>
16:09:31
well on x86_64 & aarch64 at least
<@dustymabe:matrix.org>
16:09:41
now we do, yes.
<@dustymabe:matrix.org>
16:09:56
so maybe the value is minimized.. but I figure there's still some value
<@dustymabe:matrix.org>
16:10:09
i.e. xfs filesystem created from 5 years ago, etc.
<@dustymabe:matrix.org>
16:10:24
I think there's value in having tests from different starting points
<@dustymabe:matrix.org>
16:10:53
that's why the upgrade tests are heavyweight for us right now. we run a test for 32, 33, 34, 35,.... for each prod release
<@siosm:fedora.im>
16:10:57
Then let's make 10 QCOW2 / AMI and use those?
<@siosm:fedora.im>
16:11:13
trade storage for faster testing?
<@dustymabe:matrix.org>
16:11:20
Timothée Ravier: I'm happy with that if someone wants to take it on.
<@siosm:fedora.im>
16:11:27
(I know easy to say and someone has to do it)
<@dustymabe:matrix.org>
16:11:51
1. how to do it every time a new release comes out
<@dustymabe:matrix.org>
16:11:51
2. how to have kola start with a VM that was already booted (ignition already ran)
<@dustymabe:matrix.org>
16:11:51
<@dustymabe:matrix.org>
16:11:51
we'd have to figure out
<@angelcr:matrix.org>
16:12:55
We are running sort of tight for time, there is one more issue to discuss. Should we move onto the next topic, and come back to this in the open floor if time allows it ?
<@dustymabe:matrix.org>
16:13:15
happy to move on to the next topic.. can we capture some of this discussion in the ticket?
<@angelcr:matrix.org>
16:13:37
Sure - I'll add the context to the github issue after the meeting.
<@angelcr:matrix.org>
16:14:26
!topic use chunkah to split container images and increase max layers
<@angelcr:matrix.org>
16:15:04
<@angelcr:matrix.org>
16:15:24
I think this one is from dustymabe
<@dustymabe:matrix.org>
16:16:28
<@dustymabe:matrix.org>
16:16:28
yep. basically chunkah is a new tool for chunking up images smarter to try to encourage layer re-use.
<@dustymabe:matrix.org>
16:16:28
i'm interested to start using it to shake out any issues (maybe in `next` stream first` and also increase our number of layers (to get even more layer re-use))
<@dustymabe:matrix.org>
16:16:36
anyone have any strong opinions on this?
<@dustymabe:matrix.org>
16:17:24
or even opinions that aren't strong :)(
<@siosm:fedora.im>
16:17:53
+1
<@siosm:fedora.im>
16:18:22
I've been using chunkah for my toolbox container images and the sealed images without issues so far
<@dustymabe:matrix.org>
16:18:40
jbtrystram: if we revive the `next-devel` for chunkah it might be good to also roll out the bootc install to filesystem change there too (and maybe even konflux, cc jcapitao )
<@jbtrystram:matrix.org>
16:18:55
yeah !
<@jbtrystram:matrix.org>
16:19:05
I am all for testing out things
<@jbtrystram:matrix.org>
16:19:24
this isn't a user-impacting change either so we should roll it out IMHO
<@dustymabe:matrix.org>
16:19:32
anyone with opinions on the number of layers? I was going to make the number large (basically one layer per srpm), but interested to see if people want to challenge that
<@jbtrystram:matrix.org>
16:20:06
Can we reduce those later on ? I want to keep some space for when we support derivation
<@nemric:relativit.fr>
16:21:04
I'm not sure to well understand everything, but I'd like to remove some unsued GPU firmware from the image ... would that help me in any way ?
<@dustymabe:matrix.org>
16:21:28
jbtrystram: I haven't hit a limit yet because fcos doesn't have enough SRPMS to really bump up into it.
<@dustymabe:matrix.org>
16:21:55
jbtrystram: if you were set max layers to like 600 and then try to chunk a silverblue image you'd probably hit whatever the limit is
<@jbtrystram:matrix.org>
16:22:22
Nemric: no :/ i think you want to do a full rebuild for that
<@dustymabe:matrix.org>
16:22:24
but yes, we'd want some number of layers still available for people to do derivations
<@jbtrystram:matrix.org>
16:22:47
I thought the limit was 256 because of the kernel overlayfs ?
<@dustymabe:matrix.org>
16:23:08
jbtrystram: IIUC it's more like 500 now
<@dustymabe:matrix.org>
16:23:22
I've done tests with a FCOS with 355 layers - so at least that works
<@siosm:fedora.im>
16:23:23
What number did you have in mind? We can start with 120 or so
<@siosm:fedora.im>
16:24:15
Desktop sealed images are currently 128 layers (+ config + UKI so 130): https://github.com/travier/fedora-atomic-desktops-sealed/blob/main/Containerfile#L49
<@dustymabe:matrix.org>
16:24:16
Timothée Ravier: :) - max-layers = 400 - > right now yields an FCOS with 355 layers (or maybe that was RHCOS and FCOS was less)
<@dustymabe:matrix.org>
16:24:43
but yeah. i want to test the limit here (we can start in rawhide)
<@siosm:fedora.im>
16:24:50
There are diminishing returns with more layers, not sure it's worth it.
<@dustymabe:matrix.org>
16:24:58
I think the number of srpms we have is small enough that it's OK for us
<@dustymabe:matrix.org>
16:25:42
Timothée Ravier: that's true, maybe we adjust it back over time, I just kind of want to start at the max and then pull it back from there
<@dustymabe:matrix.org>
16:26:21
if no one is opposed to that
<@siosm:fedora.im>
16:27:57
let's give it a try
<@dustymabe:matrix.org>
16:27:59
!proposed we will try to use chunkah for chunking up our container layers and experiment rolling out a relatively high limit for the number of layers. we'll start in rawhide and next to see if there are any issues.
<@siosm:fedora.im>
16:28:12
It's easier to try once we have images up
<@siosm:fedora.im>
16:29:05
Maybe we can do 5 different number for 5 consequtive daily builds and compare / do an update between those / between a base
<@dustymabe:matrix.org>
16:29:50
Timothée Ravier: are you interested in layer re-use for those 5?
<@siosm:fedora.im>
16:29:56
+1 to proposed
<@marmijo:fedora.im>
16:30:00
+1 to the proposed
<@angelcr:matrix.org>
16:30:14
+1 too
<@jbtrystram:matrix.org>
16:30:15
+1
<@siosm:fedora.im>
16:30:40
I would say mostly how long it takes rpm-ostree to go from current to those
<@angelcr:matrix.org>
16:30:47
Ill give it a few more seconds and agree to the proposal if no one votes against.
<@angelcr:matrix.org>
16:31:21
!agreed we will try to use chunkah for chunking up our container layers and experiment rolling out a relatively high limit for the number of layers. we'll start in rawhide and next to see if there are any issues
<@siosm:fedora.im>
16:31:22
as they will likely not have much differing, but we can look at that as well
<@siosm:fedora.im>
16:31:38
or we rechunk existing images to try
<@siosm:fedora.im>
16:32:11
(we are at time, sorry)
<@angelcr:matrix.org>
16:32:15
We are starting to go overtime, unfortunately we wont have time for the open floor this week.
<@angelcr:matrix.org>
16:32:43
Thank you very much everyone for attending,
<@angelcr:matrix.org>
16:32:45
!endmeeting