fedora_cloud_meeting
LOGS
15:01:03 <davdunc> #startmeeting fedora_cloud_meeting
15:01:03 <zodbot> Meeting started Thu Apr 14 15:01:03 2022 UTC.
15:01:03 <zodbot> This meeting is logged and archived in a public location.
15:01:03 <zodbot> The chair is davdunc. Information about MeetBot at https://fedoraproject.org/wiki/Zodbot#Meeting_Functions.
15:01:03 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
15:01:03 <zodbot> The meeting name has been set to 'fedora_cloud_meeting'
15:01:17 <davdunc> #topic roll call
15:01:52 <dustymabe> .hi
15:01:53 <zodbot> dustymabe: dustymabe 'Dusty Mabe' <dusty@dustymabe.com>
15:01:57 <mhayden> .hi
15:01:58 <zodbot> mhayden: mhayden 'Major Hayden' <mhayden@redhat.com>
15:02:00 <mhayden> 👋🏻
15:02:00 <davdunc> #chair dustymabe
15:02:00 <zodbot> Current chairs: davdunc dustymabe
15:02:06 <davdunc> #chair mhayden
15:02:06 <zodbot> Current chairs: davdunc dustymabe mhayden
15:04:02 <davdunc> #topic Action items from last meeting
15:04:07 <dustymabe> davdunc: i've got a topic but give me 10 min - brb
15:04:25 <davdunc> davdunc to package img-mash for use in the uploads
15:04:41 <davdunc> dustymabe: no worries. We'll keep the lights on.
15:05:28 <davdunc> So this img-mash (really mash) is ongoing. I should finish it up tomorrow.
15:05:52 <davdunc> I am going to reaction it for the next meeting, but really it will be ready for review tomorrow.
15:06:14 <davdunc> provided there is no escalation.
15:06:22 <davdunc> re-action
15:07:01 <davdunc> #action davdunc to complete the img-mash packaging by April 16, 2022
15:07:37 <davdunc> Action davdunc to publish beta images
15:08:53 <davdunc> this was started, but didn't complete the Azure images. i just didn't get enough time to get them into place. With the mash package, that won't be an issue any longer. I want that publication to stay topical though. So this is falling in with the mash completion.
15:09:47 <davdunc> we should be able to upload images using mash to the three major cloud providers before the next meeting.
15:10:19 <davdunc> okay.
15:10:57 <davdunc> #action davdunc to write or delegate a blog post for f36 release
15:11:19 <davdunc> this needs an issue in cloud-sig pagure. I'll add that.
15:11:47 <mhayden> that's something i am happy to help with / contribute to
15:12:29 <davdunc> Thanks mhayden !
15:12:37 <davdunc> #link https://pagure.io/cloud-sig/issue/375
15:12:54 <davdunc> we now have a ticket for tracking and completing the post.
15:13:29 * dustymabe back
15:13:43 <davdunc> cool.
15:14:03 <davdunc> dustymabe: you had something you wanted to cover. Let's hit that as our first topic
15:14:15 <dustymabe> ok
15:14:27 <dustymabe> #topic coordinating default/fallback hostname changes
15:15:01 <davdunc> Is that handled in cloud-init?
15:15:11 <dustymabe> A little background...
15:15:15 <davdunc> :)
15:15:18 <dustymabe> Back in Fedora 33 the default hostname was changed from localhost to fedora on instances that didn't get the hostname set in any other way (i.e. it's the fallback if it's not set anywhere else). As far as I know this change came in in systemd and was never proposed as a change in Fedora itself.
15:15:38 <davdunc> i see.
15:15:44 <dustymabe> Here's the original enablement upstream https://github.com/systemd/systemd/pull/5175 and the BZ requesting the change in Fedora: https://bugzilla.redhat.com/show_bug.cgi?id=1392925
15:16:02 <dustymabe> Unfortunately, this initiallly caused us in FCOS some pain because setting the hostname via reverse DNS lookups (via NetworkManager) stopped working along with breaking 3rd party tools that set the hostname. The NM problem was subsequently fixed, but it still remains that a lot of third party software will check to see if an instance's hostname is "unset" by checking the current hostname
15:16:04 <dustymabe> against the string "localhost". Additionally it even seems that this change will never be picked up by CentOS/RHEL (see https://src.fedoraproject.org/rpms/systemd/c/13d1341b108a24d13f5922054307b5c2efc6836a?branch=rawhide).
15:16:19 <davdunc> eww.
15:16:35 <dustymabe> The open question is, what do we think the fallback hostname should be for server like instances? And.. Should we do anything to change it for a subset of Editions/Working Groups in Fedora?
15:17:42 <davdunc> so... just thinking here, and I am happy to be wrong, but what is wrong with "fedora" for all of them.
15:17:44 <davdunc> ?
15:18:17 <davdunc> if we are setting a default, I wouldn't think that we would want it to deviate much, if at all.
15:18:22 <mhayden> perhaps we could use "deprecate-legacy-bios" as the hostname? 🤭
15:18:28 <davdunc> LOL
15:18:31 <mhayden> but yes "fedora" would be a decent suggestion
15:18:39 <dustymabe> davdunc: the main problem with it is the assumptions that other software/tools make
15:18:57 <davdunc> i expected as much.
15:19:19 <davdunc> so localhost ends up being the typical expectations
15:19:21 <dustymabe> we hit this in openshift where tools were checking "if `localhost`; then set hostname to xyzfoo"
15:19:45 <dustymabe> the problem with `fedora` is it's hard to know if someone set that hostname intentionally or not
15:20:01 <davdunc> indeed.
15:20:23 <davdunc> though from the cloud perspective, it's our default username for every platform, etc.
15:20:28 <dustymabe> there is tooling around that (hostnamectl), but a lot of scripts/tools don't use it
15:20:54 <davdunc> yea. clearly the OpenShift decision was to look for the expected.
15:21:19 <davdunc> And that code review had a lot of senior eyes on it, I am sure.
15:21:35 <dustymabe> which code review?
15:21:58 <davdunc> the one that let that "if `localhost` . . . " get through.
15:22:03 <dustymabe> :)
15:22:14 <dustymabe> well the thing is, that's the case for RHEL
15:22:18 <dustymabe> and RHCOS
15:22:31 <dustymabe> see https://src.fedoraproject.org/rpms/systemd/c/13d1341b108a24d13f5922054307b5c2efc6836a?branch=rawhide
15:23:30 <davdunc> yes and that proves that anything else but localhost is getting a kneejerk retooling.
15:24:10 <dustymabe> davdunc: right.. IOW you're saying it caused problems and downstream immediately changed it back?
15:24:25 <davdunc> okay... I am seeing how "fedora" is a great fit for workstation, but maybe not anywhere else...
15:24:30 <dustymabe> exactly
15:24:55 <davdunc> I feel like I am coming around to your way of thinking
15:25:27 <dustymabe> The current proposal is that we should work to get back to the default/fallback being `localhost` for Cloud WG. This matches what FCOS is currently doing (see https://github.com/coreos/fedora-coreos-tracker/issues/902)
15:25:52 <dustymabe> Depending on the level of buy in we can achieve that goal together
15:26:02 <davdunc> okay. Let's put an issue in and then I'll add it to the technical specification.
15:26:12 <dustymabe> ok I can open an issue
15:26:17 <davdunc> super.
15:26:31 <dustymabe> and we can vote/comment there on merits
15:26:38 <dustymabe> mhayden: any thoughts on $topic?
15:27:04 <davdunc> he gave us his legacy-bios naming preference!
15:27:12 <mhayden> i tend to agree with you, dustymabe -- `localhost` makes sense to me
15:27:14 <dustymabe> ha - done!
15:27:20 <davdunc> :facepalm:
15:27:30 <davdunc> ohh.
15:27:35 <mhayden> or `localhost.localdomain.local.dustymabe.aws.local`
15:27:41 <dustymabe> The gentleman from North Carolina yields his time back
15:27:47 <davdunc> lol.
15:28:14 <davdunc> dustymabe: that's an important topic and big change. I am glad you could make time to bring it up.
15:28:50 <davdunc> did you get a ticket created?
15:29:04 <dustymabe> davdunc: i'll create one now
15:29:16 <davdunc> thanks. we can link it in the notes.
15:29:28 <davdunc> we can always flesh it out later.
15:31:10 <dustymabe> https://pagure.io/cloud-sig/issue/376
15:31:16 <dustymabe> will add more notes and links in a bit
15:31:25 <davdunc> #link https://pagure.io/cloud-sig/issue/376
15:31:32 <davdunc> awesome! Thanks dustymabe
15:31:38 <davdunc> okay
15:31:58 <davdunc> #topic Publish AWS aarch64 AMI for Fedora 35
15:32:13 <davdunc> #link https://pagure.io/cloud-sig/issue/365
15:32:29 <davdunc> we are still not showing images in the alt downloads page.
15:32:47 <davdunc> this can't happen for F36 IMO.
15:33:33 <davdunc> Hopefully the img-mash upload can be done manually next week and then we can add this to koji before we meet again.
15:34:14 <dustymabe> note that (I think) the tooling that builds the website looks at the fedmsgs and populates AMI information from that
15:34:38 <dustymabe> so you'd need img-mash to do the upload and also have a process that publishes a fedmsg
15:34:44 <davdunc> dustymabe: that's plumbing I have not done yet.
15:34:53 <dustymabe> or (this first time) just manually update the AMI list in the website directly
15:35:17 <davdunc> I'll need to make sure the messages are delivered correctly and the tokens are all set up.
15:35:39 <davdunc> mhayden: I'll probably pull you in on that.
15:36:01 <davdunc> ~probably~
15:36:21 <davdunc> okay. moving on from here.
15:36:32 <davdunc> #topic F36 Test days
15:36:49 <davdunc> I didn't get as much preparation done for this as I had hoped.
15:37:09 <mhayden> i did a vexxhost, hetzner, and vultr test -- no issues found, but i need to write my data down in the test day form
15:37:32 <dustymabe> I can try it out on GCP
15:37:55 <davdunc> mhayden i did the aws testing on aarch64 and x86_64  - kernel testing too.
15:38:07 <davdunc> I also have to write down the tests.
15:38:25 <davdunc> I want to get azure done this weekend (last weekend was family only)
15:38:51 <davdunc> Azure has the new ARM instances!
15:39:02 <davdunc> We need to make sure that we bring those up right away
15:39:13 <davdunc> and add it to the testing.
15:40:05 <davdunc> Also, I told mhayden that we need to update the QA testing to include the T4g instances on Amazon instead of the a1 instances. The a1 instances are pretty much obsolete now.
15:40:39 <dustymabe> yeah, I didn't try to do it but it wasn't clear to me if there was a process in place to create our own images and boot those instances. i.e. the ARM images that you can use to boot instances might be limited for now
15:40:40 <davdunc> so let's make sure we roll out the later versions of the neoverse chips on every platform.
15:41:10 <dustymabe> davdunc: what's the latest "bare metal" ARM equivalent of a `a1.metal` ?
15:41:28 <davdunc> dustymabe: that would be the m5g.metal
15:41:48 <davdunc> we have some testing going on the c7g.metal in preview.
15:42:06 <davdunc> let's connect offline for the fcos testing there if it's not ongoing already.
15:42:55 <davdunc> that has some more advanced support that it would be good to know that we are supporting, like PAC in the kernel.
15:43:12 <davdunc> pointer authorization...
15:44:17 <dustymabe> https://instances.vantage.sh/ doesn't list an m5g.metal
15:44:28 <davdunc> hmm.
15:44:29 <dustymabe> I see m6g.metal
15:44:34 <davdunc> ah.
15:44:36 <davdunc> right!
15:44:44 <davdunc> got my generations mixed up.
15:44:59 <davdunc> there's no 5th generation graviton.
15:45:09 <davdunc> 6th gen and up.
15:45:30 <dustymabe> it's weird to me though, a1.metal is still the most cost efficient
15:45:46 <davdunc> but it's the wrong chip version.
15:46:03 <dustymabe> 32GiB 16cpu for .408 hourly
15:46:08 <davdunc> that G1 chip is just not at all what anyone will run on.
15:46:27 <dustymabe> yeah, i'm coming at this from a different perspective (we use a1.metal to build FCOS aarch64)
15:46:50 <dustymabe> so from a non-testing perspective a1.metal is still most efficient for our needs I think :(
15:46:56 <davdunc> if only there was a t4g.metal. :)
15:47:48 <davdunc> cool. So I think we should continue our testing discussion on the chat and ML
15:47:59 <dustymabe> right, the next available option up is c6g.metal with 128GiB and 64vpu for 4x+ the cost
15:48:16 <davdunc> and then make sure that we get some documentation in for adam.
15:48:16 <davdunc> hmmm.
15:48:24 <dustymabe> davdunc: or you could just get AWS to give us /dev/kvm inside the machines that aren't `metal`
15:48:27 <davdunc> that's a big cost different.
15:48:31 <davdunc> difference*
15:49:20 <davdunc> that would be awesome, but it's not happening yet. (unless you can build in the trusted execution environment.) ;)
15:49:49 <davdunc> we also arent' building for the TEE anywhere either.
15:50:03 <davdunc> we should add support for that where it's available.
15:50:25 <davdunc> And adding uefi boot validation needs to happen wherever it can now.
15:50:50 <davdunc> might need a chart to start providing feedback for the new legacy-bios SIG.
15:51:08 <davdunc> okay.
15:51:30 <davdunc> #topic Proposed: Build a "NeuroFedora Cloud Image"
15:51:47 <davdunc> #link https://pagure.io/cloud-sig/issue/374
15:52:16 <davdunc> discussed this at the most recent. fedora-neuro event and it went over well.
15:52:50 <davdunc> this is the first solutions build that I have proposed and I really want to get this off the ground.
15:53:01 <dustymabe> hmm - if container is an option then why don't we just use the cloud base image and get the container experience working well (and offered from the fedora registry)?
15:53:27 <davdunc> dustymabe: I am happy to do that for sure.
15:53:27 <dustymabe> are there hardware enablement pieces that are needed?
15:53:51 <davdunc> dustymabe: there is GPU support, but that's an AMD nightmare.
15:54:28 <dustymabe> could we not just do those bits in the cloud base image itself?
15:54:31 <davdunc> we can approach that from the cloud base side and focus on the container first.
15:54:40 <davdunc> yea.
15:55:05 <dustymabe> i'm only here for advice.. but yeah. I would try as hard as possible to focus on what's best for us all to maintain
15:55:09 <davdunc> I like your approach. let's build it into something that we can consume across the board.
15:55:39 <davdunc> if there is something that needs to break out, then we'll make that the focus.
15:55:42 <dustymabe> at this point in time (when we're having trouble even getting images uploaded to clouds and on the website) I think it's important not to fragment into more "offerings" than what we have with the base
15:55:52 <davdunc> I need more practice on podman anyway.
15:56:28 <dustymabe> davdunc: indeed and with the container people can run that on their desktop or on cloud wg or on FCOS - and all those users contribute back to the same container base
15:56:42 <dustymabe> with the container, people <-- needed a comma in there
15:56:43 <davdunc> agreed dustymabe I don't have a timeline in mind for this.  I just want to establish a goal here since it's part of the next generation cloud image focus for me.
15:56:55 <dustymabe> davdunc++
15:57:06 <davdunc> cooll.
15:57:09 <davdunc> okay
15:57:19 <davdunc> #topic Open Floor
15:57:37 <davdunc> Anything you want to bring up before we close it down?
15:58:14 <dustymabe> nope
15:58:18 <dustymabe> thanks for the work you do davdunc
15:58:33 <davdunc> dustymabe: ditto!
15:58:49 <davdunc> okay then. let's call it a wrap.
15:58:53 <davdunc> #endmeeting