infrastructure
LOGS
<@nirik:matrix.scrye.com>
17:00:06
!startmeeting Infrastructure (2026-02-26)
<@meetbot:fedora.im>
17:00:07
Meeting started at 2026-02-26 17:00:06 UTC
<@meetbot:fedora.im>
17:00:07
The Meeting name is 'Infrastructure (2026-02-26)'
<@nirik:matrix.scrye.com>
17:00:12
!chair @nirik:matrix.scrye.com @zlopez:fedora.im @jnsamyak:matrix.org @james:fedora.im @gwmngilfen:fedora.im @patrikp:matrix.org
<@nirik:matrix.scrye.com>
17:00:12
!topic Hola y bienvenido
<@nirik:matrix.scrye.com>
17:00:12
!info Agenda is at: https://board.net/p/fedora-infra
<@nirik:matrix.scrye.com>
17:00:12
!meetingname infrastructure
<@nirik:matrix.scrye.com>
17:00:12
!info Fedora Infra documentation: https://docs.fedoraproject.org/en-US/infra
<@nirik:matrix.scrye.com>
17:00:12
!info About our team: https://docs.fedoraproject.org/en-US/cle/
<@meetbot:fedora.im>
17:00:12
The Meeting Name is now infrastructure
<@nirik:matrix.scrye.com>
17:00:24
!info Getting Started Guide: https://docs.fedoraproject.org/en-US/infra/gettingstarted/
<@nirik:matrix.scrye.com>
17:00:24
!topic New folks introductions
<@nirik:matrix.scrye.com>
17:00:24
!info This is a place where people who are interested in Fedora Infrastructure can introduce themselves
<@nirik:matrix.scrye.com>
17:00:47
Will wait for folks to arrive... and if there's any new folks, please introduce yourself.
<@gwmngilfen:fedora.im>
17:00:53
!hi
<@zodbot:fedora.im>
17:00:54
Greg Sutcliffe (gwmngilfen) - he / him / his
<@patrikp:matrix.org>
17:01:34
!hi
<@zodbot:fedora.im>
17:01:38
None (patrikp)
<@nirik:matrix.scrye.com>
17:02:21
morning everyone.
<@zlopez:fedora.im>
17:02:33
!hi
<@zodbot:fedora.im>
17:02:34
Michal Konecny (zlopez)
<@nirik:matrix.scrye.com>
17:04:15
ok, I guess lets go ahead and dive in...
<@nirik:matrix.scrye.com>
17:04:17
!info magic eight ball says:
<@nirik:matrix.scrye.com>
17:04:17
!info chair 2026-03-12 - ?
<@nirik:matrix.scrye.com>
17:04:17
!info chair 2026-03-05 - patrikp
<@nirik:matrix.scrye.com>
17:04:17
!topic Next chair
<@nirik:matrix.scrye.com>
17:04:27
anyone like to take the chair on the 12th?
<@gwmngilfen:fedora.im>
17:05:10
i'd love to but as ever I'll have to go at :30 so probably a bad idea
<@zlopez:fedora.im>
17:05:20
I can
<@nirik:matrix.scrye.com>
17:05:27
cool. Sold.
<@nirik:matrix.scrye.com>
17:05:48
!info CLE Infra&Releng NA-hours team has a Monday through Thursday 30 minute meeting going through tickets at 1900 UTC in https://matrix.to/#/#meeting-3:fedoraproject.org
<@nirik:matrix.scrye.com>
17:05:48
!topic announcements and information
<@nirik:matrix.scrye.com>
17:05:54
Any other announcements this week?
<@nirik:matrix.scrye.com>
17:06:06
!info Fedora 44 beta rc1.1 is out, please test if you are able.
<@zlopez:fedora.im>
17:06:56
!info Ansible repository https://forge.fedoraproject.org/infra/ansible now has CI running
<@nirik:matrix.scrye.com>
17:07:48
Not really an announcement yet, but I have the new secure-boot signing working. Still need to test and write up docs, but great progress on it.
<@nirik:matrix.scrye.com>
17:08:31
it's the oldest open ticket we have I think. I know there's a email about it in my mailbox thats many years old. ;)
<@zlopez:fedora.im>
17:08:39
!info We have more documentation for zabbix https://docs.fedoraproject.org/en-US/infra/sysadmin_guide/zabbix/
<@zlopez:fedora.im>
17:09:04
That is probably everything I can think of right now
<@nirik:matrix.scrye.com>
17:09:15
alright, moving on then...
<@nirik:matrix.scrye.com>
17:09:18
!topic Oncall
<@nirik:matrix.scrye.com>
17:09:18
!info https://docs.fedoraproject.org/en-US/infra/day_to_day_fedora/#_the_oncall_role_in_our_team
<@nirik:matrix.scrye.com>
17:09:18
!info on call from 2026-02-20 to 2026-02-26 - Vit Smolik
<@nirik:matrix.scrye.com>
17:09:18
!info on call from 2026-02-27 to 2026-03-05 - nirik
<@nirik:matrix.scrye.com>
17:09:18
!info on call from 2026-03-06 to 2026-03-12 - ?
<@nirik:matrix.scrye.com>
17:09:28
vit wasn't able to be here, but provided:
<@nirik:matrix.scrye.com>
17:09:43
one ping for a playbook run
<@nirik:matrix.scrye.com>
17:09:43
broken httpd on people01
<@nirik:matrix.scrye.com>
17:09:43
mailq complaints from zabbix
<@nirik:matrix.scrye.com>
17:10:17
!oncall
<@zodbot:fedora.im>
17:10:17
● @nirik:matrix.scrye.com (kevin) Current Time for them: 09:10 (US/Pacific)
<@zodbot:fedora.im>
17:10:17
The following people are oncall:
<@zodbot:fedora.im>
17:10:17
<@zodbot:fedora.im>
17:10:17
If they do not respond, please file a ticket (https://pagure.io/fedora-infrastructure/issues)
<@nirik:matrix.scrye.com>
17:10:26
Does someone want to take it next week?
<@zlopez:fedora.im>
17:10:48
I will be chairing, so I can take it as well
<@nirik:matrix.scrye.com>
17:10:53
ok.
<@nirik:matrix.scrye.com>
17:11:07
!topic Monitoring discussion [nirik]
<@nirik:matrix.scrye.com>
17:11:07
!info Go over existing items and fix them
<@nirik:matrix.scrye.com>
17:11:07
!info https://nagios.fedoraproject.org/nagios & https://zabbix.fedoraproject.org (top 100 triggers: https://zabbix.fedoraproject.org/zabbix.php?action=toptriggers.list)
<@gwmngilfen:fedora.im>
17:11:26
so, a few interesting things here
<@gwmngilfen:fedora.im>
17:11:43
zabbix correctly caught a bunch of issues with httpd on people01, which was nice
<@gwmngilfen:fedora.im>
17:12:22
ssl keeps failing on checking whatcanidoforfedora.org - I just opened a PR which should help, but I also plan to try and find out why it's failing for that domain
<@nirik:matrix.scrye.com>
17:12:25
I looked at the people01 issue. Might be solved. I think it was scrapers of cgit downloading archives of every commit... so I disabled those links in cgit and also increased some capacity in apache config there and it's been ok since then.
<@gwmngilfen:fedora.im>
17:13:06
i had a quick look at postfix on smtp-mm-ib01, seemed like it's just an accumulation of connection-fresued junk
<@gwmngilfen:fedora.im>
17:13:20
i had a quick look at postfix on smtp-mm-ib01, seemed like it's just an accumulation of connection-refused junk
<@zlopez:fedora.im>
17:13:28
I would like to ask about the toddler-unretire-packages queue, that seems to be stale for some time, but it's acked on nagios
<@nirik:matrix.scrye.com>
17:13:30
I'll note that the asknot openshift app logs nothing at all. No way to tell if it's hitting maxclients or something...
<@gwmngilfen:fedora.im>
17:13:58
this is openssl s_client which I think terminates at the proxy, right?
<@nirik:matrix.scrye.com>
17:14:28
Anton Medvedev was working on getting that working, but I think in the mean time until it's being tested in prod we should delete that queue and it's monitoring.
<@gwmngilfen:fedora.im>
17:14:48
monitoring will auto-remove if you delete the queue
<@nirik:matrix.scrye.com>
17:14:51
yeah, I am not sure where it's failing. ;) Might be the proxy. I did get it to hang once here...
<@nirik:matrix.scrye.com>
17:15:05
in zabbix yes, not in nagios sadly. ;(
<@zlopez:fedora.im>
17:15:12
So should be staging only then
<@gwmngilfen:fedora.im>
17:15:31
well i'm thinking it's time for removing the queue checks in nagios anyway, they are shown to be working in zabbix now
<@nirik:matrix.scrye.com>
17:15:53
yeah, I think so...
<@zlopez:fedora.im>
17:16:01
They are much better in zabbix
<@nirik:matrix.scrye.com>
17:16:22
although... this is not alerting in zabbix?
<@gwmngilfen:fedora.im>
17:16:34
it is. just not in #noc because it's Warning level
<@nirik:matrix.scrye.com>
17:16:44
oh, it is. right
<@gwmngilfen:fedora.im>
17:16:53
https://zabbix.fedoraproject.org/tr_events.php?triggerid=57645&eventid=1415295
<@nirik:matrix.scrye.com>
17:16:58
yep. Found it.
<@gwmngilfen:fedora.im>
17:17:13
it would alert in noc if it kept increasing
<@nirik:matrix.scrye.com>
17:18:10
huh, I am not sure what created that queue... it's not in ansible...
<@nirik:matrix.scrye.com>
17:18:19
perhaps it was at some point and was removed?
<@gwmngilfen:fedora.im>
17:18:43
oh, hmm, no. there's only a warn level trigger right now - and now i recall I made a ticket to fix that, and I think it's in this sprint too. will sort tomorrow 😉
<@nirik:matrix.scrye.com>
17:19:20
anyhow, to remove it is just rabbitmqctl on rabbitmq01 deleting the queue, then rm the .cfg file on noc01 under /etc/nagios and restarting it.
<@gwmngilfen:fedora.im>
17:20:16
i can PR the nagios bit in ansible, np - but is it still technically broken in ansible because of the rdu3-iso hosts?
<@nirik:matrix.scrye.com>
17:21:08
there's nothing to change in ansible that I can see... it adds that .cfg monitoring file when it adds a queue, but I don't see that queue actually being added right now.
<@gwmngilfen:fedora.im>
17:21:18
oh, hmm
<@gwmngilfen:fedora.im>
17:21:20
ok
<@nirik:matrix.scrye.com>
17:21:29
I fixed nagios config before freeze, so ansible should correctly deploy it.
<@gwmngilfen:fedora.im>
17:21:43
i'm too used to all of nagios being in the monolithic role, clearly 🙂
<@nirik:matrix.scrye.com>
17:21:44
anyhow, I can do that or someone else can if they want.
<@gwmngilfen:fedora.im>
17:22:13
i can deal with it, sure
<@nirik:matrix.scrye.com>
17:22:19
any other monitoring stuff?
<@gwmngilfen:fedora.im>
17:22:23
moving on, I don't see a great deal of other interesting things in the top100 report
<@gwmngilfen:fedora.im>
17:22:43
lots of load-avg and disk as usual, the rest seems stuff we've discussed
<@nirik:matrix.scrye.com>
17:22:59
yeah, we should sort out the disk stuff, probibly after freeze.
<@gwmngilfen:fedora.im>
17:23:08
wiki01/02 seems to be high on load list again this week
<@nirik:matrix.scrye.com>
17:23:21
yeah, scrapers love them. ;(
<@gwmngilfen:fedora.im>
17:23:25
indeed
<@gwmngilfen:fedora.im>
17:23:30
anyway, seems quiet. move on?
<@nirik:matrix.scrye.com>
17:24:11
So, shall we do backlog refinement, some learning thing TBD or end early?
<@zlopez:fedora.im>
17:24:45
+1 for backlog
<@nirik:matrix.scrye.com>
17:24:52
ok, lets do at least a few. ;)
<@nirik:matrix.scrye.com>
17:24:59
<@nirik:matrix.scrye.com>
17:24:59
!info Refine oldest tickets on Fedora Infra tracker
<@nirik:matrix.scrye.com>
17:24:59
!topic Fedora Infra backlog refinement
<@nirik:matrix.scrye.com>
17:25:10
https://forge.fedoraproject.org/infra/tickets/issues/12727
<@nirik:matrix.scrye.com>
17:25:30
Vit isn't here, so not sure the status...
<@nirik:matrix.scrye.com>
17:25:35
Does anyone know?
<@zlopez:fedora.im>
17:25:59
No idea
<@nirik:matrix.scrye.com>
17:26:14
ok, I'll ask for status in ticket.
<@nirik:matrix.scrye.com>
17:26:16
Moving on...
<@nirik:matrix.scrye.com>
17:26:18
https://forge.fedoraproject.org/infra/tickets/issues/12750
<@gwmngilfen:fedora.im>
17:26:30
thats mine, well at least i logged it
<@gwmngilfen:fedora.im>
17:26:44
i can't fix it though, so unless we find someone to work on it ...
<@nirik:matrix.scrye.com>
17:27:25
The only one who might have bot experence I can think of here is ryan...
<@gwmngilfen:fedora.im>
17:27:28
tbh we probably should rewrite the whole bot in something not 4+ years old, but thats a wider discussion 😉
<@gwmngilfen:fedora.im>
17:27:47
i stood it up to see if we all liked the workflow, but we all know what happens to temporary things 🙂
<@nirik:matrix.scrye.com>
17:28:18
Is it worth trying our friend AI to see if it can come up with a fix?
<@gwmngilfen:fedora.im>
17:28:24
i did
<@zlopez:fedora.im>
17:28:30
The standupbot is really useful
<@gwmngilfen:fedora.im>
17:28:45
oh, 100%, but the code is .... not nice
<@gwmngilfen:fedora.im>
17:28:57
we could probably write a maubot plugin faster than fixing it
<@gwmngilfen:fedora.im>
17:29:27
but it's not urgent, what we have works - it just cant use a fedora.im account , thats all
<@zlopez:fedora.im>
17:29:38
We can add it to next sprint and get somebody working on it
<@nirik:matrix.scrye.com>
17:30:03
note that auth has changed too when we switched to MAS...
<@nirik:matrix.scrye.com>
17:30:46
you no longer get a access token from a client
<@nirik:matrix.scrye.com>
17:31:00
(see my flailings in the ticket on moderation bot)
<@gwmngilfen:fedora.im>
17:31:14
hmm, maybe I should retry it. the current issue is that it *cannot* use an access token, it has to use user/pw
<@nirik:matrix.scrye.com>
17:31:34
ah yeah, that would still need to change
<@gwmngilfen:fedora.im>
17:31:48
thought so. tbh, a maubot plugin is almost certainly the way to go
<@gwmngilfen:fedora.im>
17:31:56
since we already have that working
<@nirik:matrix.scrye.com>
17:32:22
yeah
<@gwmngilfen:fedora.im>
17:33:14
maybe mark it for next sprint, i think i can still write maubot code
<@nirik:matrix.scrye.com>
17:33:28
so, what should we do here... rescope this to be a re-write? but until we commit to doing it the ticket is kinda pointless.
<@zlopez:fedora.im>
17:33:45
I could try that as well 🙂
<@nirik:matrix.scrye.com>
17:34:19
ok. we should also when we triage these give them priority and points...
<@gwmngilfen:fedora.im>
17:34:28
yeah i logged it so as not to forget that we were still using an ansible.im account for the bot, and to put my findings somewhere, but some idea of where we ant to go is a good idea
<@gwmngilfen:fedora.im>
17:34:38
yeah i logged it so as not to forget that we were still using an ansible.im account for the bot, and to put my findings somewhere, but some idea of where we want to go is a good idea
<@gwmngilfen:fedora.im>
17:34:44
ok, let me take a stab
<@nirik:matrix.scrye.com>
17:35:24
you want to mod the ticket?
<@gwmngilfen:fedora.im>
17:35:42
i've commented for now, I will probably write a new ticket. on my todo for tomorrow
<@nirik:matrix.scrye.com>
17:36:18
ok, shall I just leave that one alone for now? or how about we close it and you can open a new one?
<@gwmngilfen:fedora.im>
17:37:09
yeah, i'll close it tomorrow
<@nirik:matrix.scrye.com>
17:37:13
ok.
<@gwmngilfen:fedora.im>
17:37:15
and link the two tickets
<@nirik:matrix.scrye.com>
17:37:20
sounds good.
<@nirik:matrix.scrye.com>
17:37:47
ok, on to next ticket?
<@nirik:matrix.scrye.com>
17:38:03
and Gwmngilfen is probibly leaving?
<@nirik:matrix.scrye.com>
17:38:09
https://forge.fedoraproject.org/infra/tickets/issues/12759
<@gwmngilfen:fedora.im>
17:38:12
i got time for one more 🙂
<@nirik:matrix.scrye.com>
17:38:28
so, I am not sure how to do this one in a clean way.
<@nirik:matrix.scrye.com>
17:38:45
I'd be inclined to just close it, but we could also update it and ask for any further ideas.
<@zlopez:fedora.im>
17:39:11
We can say, there is zabbix available now and just close it
<@zlopez:fedora.im>
17:39:22
Zabbix provides more details for those who want them
<@gwmngilfen:fedora.im>
17:39:32
and can send alerts to multiple rooms, if desired
<@nirik:matrix.scrye.com>
17:39:37
sure, but thats not something consumable for end users/contributors
<@gwmngilfen:fedora.im>
17:39:45
true
<@nirik:matrix.scrye.com>
17:39:52
They wanted a way for contributors to easily see status of those services.
<@zlopez:fedora.im>
17:40:00
Do you think the current status is missing something?
<@gwmngilfen:fedora.im>
17:40:14
zabbix api -> auto generated page? 😛
<@nirik:matrix.scrye.com>
17:40:16
I don't think so... others might
<@nirik:matrix.scrye.com>
17:40:24
-1
<@gwmngilfen:fedora.im>
17:40:35
i'm not serious, to be clear
<@zlopez:fedora.im>
17:40:40
We don't want users to be overwhelmed when visiting the page
<@nirik:matrix.scrye.com>
17:40:55
people keep asking for that tho... not realizing how bad an idea it is in the end.
<@nirik:matrix.scrye.com>
17:41:54
anyhow, how about I update the ticket and ask for any further ideas, if nothing close early next week?
<@gwmngilfen:fedora.im>
17:42:18
i do like what you said about a secondary page (or pages) with more detail
<@gwmngilfen:fedora.im>
17:42:51
i wonder if that *could* be generated. perhaps something like uptime-kuma could present that
<@gwmngilfen:fedora.im>
17:43:06
but it would not be the frontpage of status.fpo
<@zlopez:fedora.im>
17:43:22
+1 from me
<@nirik:matrix.scrye.com>
17:43:26
yeah, it would be something else...
<@gwmngilfen:fedora.im>
17:43:53
i'd be ok with closing too, tbh. I think we're all clear that status.fpo is fine, at the least
<@nirik:matrix.scrye.com>
17:44:00
I suppose for contributors (who are hopefully more interested in detail, we could have a zabbix dashboard of those things they might care about)
<@gwmngilfen:fedora.im>
17:44:12
oh, thats a good plan
<@gwmngilfen:fedora.im>
17:44:29
i have "making more dashboards" on my todo anyway
<@nirik:matrix.scrye.com>
17:44:49
but still I worry about 'koji01 is down' and if you don't know that there is a koji02 and they are load balanced, you would think the service is down, etc.
<@nirik:matrix.scrye.com>
17:45:06
ie, it's hard to know the details of what things mean sometimes
<@gwmngilfen:fedora.im>
17:45:10
well, that plays into some of the service-level stuff i've ben researching
<@gwmngilfen:fedora.im>
17:45:50
i.e not getting 0 alerts from the proxes when one haproxy backend goes down
<@gwmngilfen:fedora.im>
17:45:54
i.e not getting 30 alerts from the proxes when one haproxy backend goes down
<@zlopez:fedora.im>
17:46:27
All of that sounds like a quality of life improvements
<@gwmngilfen:fedora.im>
17:46:29
the service says something like "alert when more than N of these is alerting" and you could build a dashboard on that
<@nirik:matrix.scrye.com>
17:46:30
sure, but I think this all depends on what the audience is for the data... ie, how it's presented.
<@gwmngilfen:fedora.im>
17:46:36
oh, yes
<@nirik:matrix.scrye.com>
17:47:13
I mean I care if one proxy shows something down, depending on what it is it may not affect users of that thing at all.
<@gwmngilfen:fedora.im>
17:47:56
i think you're right to see if we can clarify/close the ticket, and investgate dashboards in the background
<@gwmngilfen:fedora.im>
17:48:03
because we need them anyway
<@nirik:matrix.scrye.com>
17:48:14
anyhow, how about I close for now and also perhaps redirect to discussion if we want to try and discuss further?
<@gwmngilfen:fedora.im>
17:48:19
if it looks like something we can do in the future... we can always revisit
<@gwmngilfen:fedora.im>
17:48:25
+1
<@gwmngilfen:fedora.im>
17:48:30
and i have to run 🙂
<@nirik:matrix.scrye.com>
17:49:36
thanks Gwmngilfen
<@nirik:matrix.scrye.com>
17:49:47
so, I guess lets go to open floor...
<@nirik:matrix.scrye.com>
17:49:52
!topic Open floor
<@zlopez:fedora.im>
17:49:53
We are almost at end anyway
<@patrikp:matrix.org>
17:50:44
Spring is almost here! ☀️
<@nirik:matrix.scrye.com>
17:50:49
yep
<@zlopez:fedora.im>
17:51:04
So I had some luck with the registry.fp.o to quay.io flatpak migration, but it seems that I will need to discuss the changes with somebody from flatpak team
<@zlopez:fedora.im>
17:51:23
I already got contact, so will try to reach to them
<@nirik:matrix.scrye.com>
17:51:27
cool.
<@nirik:matrix.scrye.com>
17:51:40
Thanks for working on that
<@zlopez:fedora.im>
17:51:46
One flatpak is on quay.io https://quay.io/repository/fedora-flatpak-testing/0ad
<@zlopez:fedora.im>
17:52:13
But I found out that the quay.io doesn't support nested organization, so it needs to be outside fedora organization 😕
<@zlopez:fedora.im>
17:52:38
That is what I want to discuss
<@nirik:matrix.scrye.com>
17:52:57
hum... can they not just be repos in an existing org?
<@nirik:matrix.scrye.com>
17:53:33
although I suppose we could split all the flatpaks out into a seperate org
<@nirik:matrix.scrye.com>
17:55:11
so you can't have a org/repo/repo right? or can you?
<@zlopez:fedora.im>
17:55:30
They can, but that is another problem, bot account can't have organisation wide permissions
<@zlopez:fedora.im>
17:55:52
No, that was throwing 401 error for me 😕
<@nirik:matrix.scrye.com>
17:55:53
we typically only allow the bot upload perms...
<@nirik:matrix.scrye.com>
17:56:15
but then we need to create the repos
<@zlopez:fedora.im>
17:56:19
The quay.io bot has permission per repository
<@nirik:matrix.scrye.com>
17:56:40
right, so we would need a process to make the repo and grant the perms to the bot
<@zlopez:fedora.im>
17:56:40
Yes, this is why I need to discuss it with flatpak folks
<@nirik:matrix.scrye.com>
17:57:00
yeah, makes sense.
<@zlopez:fedora.im>
17:57:37
It's possible, but will need some automation on top of it, or some documented process for adding new flatpak repository
<@nirik:matrix.scrye.com>
17:57:40
perhaps it could be in the new package toddler? if it's a flatpak, create repo, etc.
<@nirik:matrix.scrye.com>
17:57:55
flatpaks are added via the scm-request thing...
<@zlopez:fedora.im>
17:58:17
I wrote that toddler, so I know, it's possible to just listen for messages
<@nirik:matrix.scrye.com>
17:58:19
(of course we still would need a one off to create all the existing ones)
<@zlopez:fedora.im>
17:58:49
As I said, it would need some discussion, but the changes on bodhi should work
<@nirik:matrix.scrye.com>
17:59:04
excellent.
<@nirik:matrix.scrye.com>
17:59:13
anything else for the last minute? ;)
<@zlopez:fedora.im>
17:59:33
Have fun with rest of your day 🙂
<@nirik:matrix.scrye.com>
17:59:48
always.
<@nirik:matrix.scrye.com>
17:59:57
Thanks for coming!
<@nirik:matrix.scrye.com>
17:59:59
!endmeeting