centos-hyperscale-sig
LOGS
<@davide:cavalca.name>
16:00:29
!startmeeting CentOS Hyperscale SIG
<@meetbot:fedora.im>
16:00:32
Meeting started at 2024-04-10 16:00:29 UTC
<@meetbot:fedora.im>
16:00:32
The Meeting name is 'CentOS Hyperscale SIG'
<@davide:cavalca.name>
16:00:46
!topic Roll call
<@davide:cavalca.name>
16:00:59
!hi
<@zodbot:fedora.im>
16:01:01
Davide Cavalca (dcavalca) - he / him / his
<@conan_kudo:matrix.org>
16:01:49
!hi
<@zodbot:fedora.im>
16:01:51
Neal Gompa (ngompa) - he / him / his
<@aekoroglu:matrix.org>
16:01:56
!hi
<@zodbot:fedora.im>
16:01:58
Ali Erdinc Koroglu (aekoroglu)
<@rcolebaugh:matrix.org>
16:02:01
!hi
<@zodbot:fedora.im>
16:02:03
Raymond Colebaugh (rcolebaugh) - he / him / his
<@salimma:fedora.im>
16:02:49
!hi
<@zodbot:fedora.im>
16:02:50
Michel Lind (salimma) - he / him / his
<@jonathanspw:fedora.im>
16:03:38
!hi
<@zodbot:fedora.im>
16:03:40
Jonathan Wright (jonathanspw)
<@davide:cavalca.name>
16:03:43
welcome everyone, let's get started
<@jonathanspw:fedora.im>
16:03:48
I'm on mobile but sort of here
<@davide:cavalca.name>
16:03:51
!topic Followups
<@davide:cavalca.name>
16:04:05
any followups to share from the last meeting?
<@davide:cavalca.name>
16:05:17
ahah same here
<@conan_kudo:matrix.org>
16:06:46
stuff and things :)
<@conan_kudo:matrix.org>
16:07:02
alas we don't have the meeting logs and minutes on the sig site still :(
<@anitazha:matrix.org>
16:07:25
!hi
<@zodbot:fedora.im>
16:07:27
Anita Zhang (anitazha) - she / her / hers
<@salimma:fedora.im>
16:08:24
yeah, let me try and fish out a link
<@salimma:fedora.im>
16:08:35
we should probably send the link with every followup topic
<@davide:cavalca.name>
16:08:35
https://meetbot.fedoraproject.org/meeting_matrix_fedoraproject-org/2024-03-27/centos-hyperscale-sig.2024-03-27-16.02.log.html
<@zodbot:fedora.im>
16:08:53
salimma has already given cookies to dcavalca during the F39 timeframe
<@davide:cavalca.name>
16:08:57
at least the fedora meetbot has search
<@salimma:fedora.im>
16:09:42
no action item... hmm
<@salimma:fedora.im>
16:10:08
the info is ... virt stack refresh, talks, and RPM CoW
<@salimma:fedora.im>
16:10:20
and there's question about the 6.8 kernel
<@salimma:fedora.im>
16:10:37
sorry, there's info about that, and Jun Wang asked about Mellanox in the channel yesterday
<@conan_kudo:matrix.org>
16:10:47
I was waiting for the Fedora rebase to complete before shipping it here
<@salimma:fedora.im>
16:10:53
and someone from Intel is supposed to show up today? Adenilson Cavalcanti
<@conan_kudo:matrix.org>
16:10:55
that's done as of end of last week
<@conan_kudo:matrix.org>
16:11:05
so now I'll update Hyperscale probably today
<@conan_kudo:matrix.org>
16:11:12
I need to make new images anyway for TXLF
<@conan_kudo:matrix.org>
16:12:00
Carl George asked me about demoing CentOS Hyperscale at TXLF
<@aekoroglu:matrix.org>
16:14:45
he's here :)
<@aekoroglu:matrix.org>
16:15:18
Adenilson Cavalcanti: ?
<@conan_kudo:matrix.org>
16:16:05
he hasn't joined the room yet
<@davide:cavalca.name>
16:16:40
let's move on in the meantime
<@davide:cavalca.name>
16:16:44
!topic Announcements
<@davide:cavalca.name>
16:16:58
a bunch of us will be at TXLF later this week
<@davide:cavalca.name>
16:17:13
as Conan Kudo mentioned we might have a demo there as well
<@davide:cavalca.name>
16:17:52
I was checking if SCALE videos were up already but nope not yet
<@davide:cavalca.name>
16:19:16
the only other thing I had on my end is that we're looking at potentially backporting conda in Hyperscale
<@davide:cavalca.name>
16:20:01
that would only be for el9, and I'm hopeful we can actually get the bulk of it into EPEL proper
<@salimma:fedora.im>
16:20:38
I hope we don't have many cases where we need to have multiple vrsions of a Python module
<@salimma:fedora.im>
16:20:40
those are a pain
<@salimma:fedora.im>
16:21:17
oh, something similar but smaller: someone's asking for an ed update in c9s since apparently there are regressions in the version shipped affecting search/replace (yes, someone uses ed...)
<@salimma:fedora.im>
16:21:32
so I'll probably do a quick backport to HS and file a JIRA to get it upgraded
<@davide:cavalca.name>
16:21:35
yeah, for the record the only reason we're even considering conda is because recent version reimplemented the solver and it's massively faster
<@conan_kudo:matrix.org>
16:21:41
...
<@conan_kudo:matrix.org>
16:22:08
so is conda now using the mamba solver or something else?
<@davide:cavalca.name>
16:23:13
yep it's using mamba which uses libsolv
<@conan_kudo:matrix.org>
16:24:08
awesome
<@conan_kudo:matrix.org>
16:24:16
do we have that in fedora already?
<@davide:cavalca.name>
16:25:16
yep it's already in Fedora
<@davide:cavalca.name>
16:27:01
anything else for announcements?
<@salimma:fedora.im>
16:27:39
it was fun too that conda recently switched to using YY.MM as their version number
<@salimma:fedora.im>
16:27:54
making it initially seeming like EL9 is woefully out of date, but it's actually not
<@davide:cavalca.name>
16:30:06
next up
<@davide:cavalca.name>
16:30:12
!topic Tickets
<@davide:cavalca.name>
16:30:53
I don't think we have anything notable here this week?
<@davide:cavalca.name>
16:31:34
!topic Membership
<@davide:cavalca.name>
16:31:48
we have one membership request in https://pagure.io/centos-sig-hyperscale/sig/issue/163
<@davide:cavalca.name>
16:32:07
Adenilson Cavalcanti: would you like to introduce yourself?
<@conan_kudo:matrix.org>
16:32:35
still hasn't joined the room
<@davide:cavalca.name>
16:32:53
ah, would be nice if element flagged than when tagging them :)
<@davide:cavalca.name>
16:33:04
ah, would be nice if element flagged that when tagging them :)
<@junwang123:matrix.org>
16:35:20
Is there an office hour time to talk more about this one?
<@davide:cavalca.name>
16:36:22
if you mean talk over zoom, we have a hangout scheduled for next week
<@davide:cavalca.name>
16:36:35
we can also talk about it here if nobody else has stuff
<@davide:cavalca.name>
16:36:41
!topic Misc
<@conan_kudo:matrix.org>
16:37:24
I don't have anything :)
<@davide:cavalca.name>
16:38:08
Jun Wang: you have the floor :)
<@junwang123:matrix.org>
16:39:44
Hi Everyone, thanks for all the help. In the past, we go with LTS kernel and recompile nvidia kernel modules as out of the tree module, for each kernel version we use.
<@junwang123:matrix.org>
16:40:42
I'm looking for inputs on how it would work with the Hyperscale/Fedora kernel. Looks like the minor version update is quite frequent.
<@conan_kudo:matrix.org>
16:41:43
what's keeping the module out of tree?
<@salimma:fedora.im>
16:41:53
nvidia GPU modules, or Mellanox?
<@junwang123:matrix.org>
16:42:05
both
<@salimma:fedora.im>
16:42:20
for mellanox Conan Kudo's question holds I think
<@conan_kudo:matrix.org>
16:42:36
yeah, I know what's up with the GPU drivers
<@junwang123:matrix.org>
16:42:54
for nvidia-kmod, we were using https://github.com/elrepo/packages/tree/master/nvidia-kmod/el7. thinking about using https://github.com/elrepo/packages/tree/master/nvidia-kmod/el9 now.
<@salimma:fedora.im>
16:43:11
why is it out of tree (if it's because you get development versions that are not upstreamed yet, FWIW Meta might have similar drivers and we are ... not as far as 6.8 yet internally)
<@davide:cavalca.name>
16:44:05
this is the proprietary nvidia driver, it can't go in-tree
<@salimma:fedora.im>
16:44:07
for the GPU drivers I think your best bet is participating in Fedora's kernel test days
<@salimma:fedora.im>
16:44:50
have Fedora installed on one of the machine with a GPU you need to work, and report if there's any issue with the driver (note that Fedora recommends the RPM Fusion driver at the moment, so if you can repro with that it will help)
<@salimma:fedora.im>
16:45:20
because then you can catch issues before it hits Fedora, and before we then rebase the HS kernel on it
<@junwang123:matrix.org>
16:45:26
there are different nvidia versions and different kernel versions, we need combinations. so we were building them.
<@salimma:fedora.im>
16:46:08
right. we built nvidia drivers in house too for our production kernel, and sometimes people need different versions
<@conan_kudo:matrix.org>
16:46:35
fwiw, test results for fedora 100% apply to hyperscale
<@conan_kudo:matrix.org>
16:46:46
since the code is the same and the config is only slightly different
<@salimma:fedora.im>
16:46:50
but... I guess if you really need this to work, you need to control your own kernel release cadence and the HS kernel might not be suitable. I don't think we want to be blocked on making sure various Nvidia kernel versions work
<@conan_kudo:matrix.org>
16:46:54
fwiw, test results for fedora nearly 100% apply to hyperscale
<@salimma:fedora.im>
16:47:22
so yeah, test compiling your different versions during the Fedora kernel test and report any issue (I am not sure if they will consider it blocking, but you should try)
<@davide:cavalca.name>
16:47:47
we've talked about potentially doing a slower-moving kernel in HS as well, for similar reasons, but it's tricky in practice and I don't know if/where that will land
<@conan_kudo:matrix.org>
16:48:04
it also depends on how things shake out for cs10
<@conan_kudo:matrix.org>
16:48:41
I'm tracking the cs10 kernel development stuff now, and watching to see where things land
<@salimma:fedora.im>
16:48:56
yeah. it's chicken and egg... unless we can get some of us to actually dogfood it, who knows how well it will work
<@conan_kudo:matrix.org>
16:49:18
well the main problem is coexistence
<@conan_kudo:matrix.org>
16:49:41
we need to hackfest this to make it so parallel kernel tracks can be available in the repository at once
<@conan_kudo:matrix.org>
16:49:44
right now we don't have that
<@salimma:fedora.im>
16:50:32
something similar to Asahi where they used to have a differently-named kernel package would work, I guess?
<@salimma:fedora.im>
16:50:39
Debian and Ubuntu do it that way too
<@salimma:fedora.im>
16:50:52
but yeah there'll also be the question of who will maintain the other kernels :)
<@davide:cavalca.name>
16:51:01
we could use separate tags I suppose? but yeah this came up in another setting with sched_ext, so it'd be worth coming up with a good solution and documenting it
<@junwang123:matrix.org>
16:52:19
we're using version number combination, such as kmod-nvidia-5.15.147-t3-515.65.01-1.el7.twitter.x86_64.rpm
<@conan_kudo:matrix.org>
16:53:34
we will definitely need separate tags if for nothing else so kmod rebuilds don't get confused
<@salimma:fedora.im>
16:53:43
yeah, if we need different kernel tracks it will likely embed the kernel MAJ.MIN somewhere
<@salimma:fedora.im>
16:54:24
but it can't just be MAJ.Min - sometimes we'll need another tag e.g. sched_ext, or something else we don't anticipate right now
<@junwang123:matrix.org>
16:55:15
For Mellanox OFED, we're going through the support page. https://docs.nvidia.com/networking/display/mlnxofedv24010331/general+support
<@conan_kudo:matrix.org>
16:55:20
the problem is that the infrastructure around the kernel packaging makes it difficult to change the basename without potentially breaking something
<@junwang123:matrix.org>
16:56:20
For Mellanox OFED, we're going through the support page. There is a 6.7 kernel row on the table. Not sure what that means. Is it related to Fedora kernel therefore be a moving target or was it 6.7 kernel selected to gain more support. https://docs.nvidia.com/networking/display/mlnxofedv24010331/general+support
<@conan_kudo:matrix.org>
16:56:33
also, changing the basename means the mainline kernel package is no longer overshadowed too
<@salimma:fedora.im>
16:56:51
yeah but that's probably a feature not a bug
<@conan_kudo:matrix.org>
16:57:15
is it? our systems have btrfs
<@conan_kudo:matrix.org>
16:57:20
the mainline kernel does not
<@salimma:fedora.im>
16:57:25
so that question is still unanswered, is this driver in the process of being upstreamed?
<@conan_kudo:matrix.org>
16:57:30
("mainline" referring to what CentOS itself provides)
<@salimma:fedora.im>
16:57:50
good point. yeah but if you're tracking a different kernel series you should know to never boot the normal 'kernel'
<@salimma:fedora.im>
16:58:07
and we'll still have the normal, unrenamed HS kernel right? that one still shadows the mainline kernel
<@conan_kudo:matrix.org>
16:58:21
yeah
<@salimma:fedora.im>
16:58:22
if you need another series, remove every 'kernel' package. if you don't, use the untagged HS kernel
<@conan_kudo:matrix.org>
16:58:27
as long as we always have that, we should be good
<@davide:cavalca.name>
16:59:42
we're almost out of time
<@davide:cavalca.name>
17:00:13
once we get consensus on this we should document it somewhere so it doesn't get lost
<@davide:cavalca.name>
17:00:30
have a good one folks!
<@conan_kudo:matrix.org>
17:00:34
yes
<@conan_kudo:matrix.org>
17:00:45
I think this is something we're going to have to sort out in a hackfest
<@davide:cavalca.name>
17:00:59
!endmeeting