infrastructure
LOGS
15:00:05 <smooge> #startmeeting Infrastructure (2019-02-27)
15:00:05 <zodbot> Meeting started Thu Feb 28 15:00:05 2019 UTC.
15:00:05 <zodbot> This meeting is logged and archived in a public location.
15:00:05 <zodbot> The chair is smooge. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:05 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
15:00:05 <zodbot> The meeting name has been set to 'infrastructure_(2019-02-27)'
15:00:05 <smooge> #meetingname infrastructure
15:00:05 <smooge> #topic aloha
15:00:05 <smooge> #chair nirik pingou puiterwijk relrod smooge tflink threebean cverna mkonecny mizdebsk
15:00:05 <zodbot> The meeting name has been set to 'infrastructure'
15:00:05 <zodbot> Current chairs: cverna mizdebsk mkonecny nirik pingou puiterwijk relrod smooge tflink threebean
15:00:15 <tflink> morning
15:00:27 <mizdebsk> .hello2
15:00:28 <zodbot> mizdebsk: mizdebsk 'Mikolaj Izdebski' <mizdebsk@redhat.com>
15:00:30 <pingou> .hello2
15:00:31 <zodbot> pingou: pingou 'Pierre-YvesChibon' <pingou@pingoured.fr>
15:00:32 <smooge> .hello smooge
15:00:34 <zodbot> smooge: smooge 'Stephen J Smoogen' <smooge@gmail.com>
15:00:38 <mkonecny> .hello zlopez
15:00:39 <zodbot> mkonecny: zlopez 'Michal Konečný' <michal.konecny@packetseekers.eu>
15:00:46 <cverna> hello
15:02:43 <smooge> #topic New folks introductions
15:02:43 <smooge> #info This is a place where people who are interested in Fedora Infrastructure can introduce themselves
15:02:43 <smooge> #info Getting Started Guide: https://fedoraproject.org/wiki/Infrastructure/GettingStarted
15:02:47 <chris7871> o/
15:02:58 <smooge> Hello any new people this week? And what would you like to say?
15:04:36 <smooge> #topic announcements and information
15:04:36 <smooge> #info nirik will have sparse hours due to house move
15:04:36 <smooge> #info Mass update/reboots planned for 2019-02-26 -> 2019-03-01
15:04:36 <smooge> #info Beta Freeze Begins 2019-03-05
15:04:36 <smooge> #info Pagure 3.2.0/3.2.1 was deployed to stg/prod/pkgs.
15:04:37 <smooge> #info Staging Koji sync planned for 2019-03-08 (ticket 7600)
15:04:39 <smooge> #info Test gating temporarily disabled in Bodhi, awaiting a 3.13.3 release for https://github.com/fedora-infra/bodhi/issues/3044
15:05:13 <smooge> #info Build infrastructure updated/rebooted on Friday because of crunch before freeze
15:05:32 <smooge> any other announcements for this week?
15:05:33 <pingou> Fri ?
15:05:42 <mizdebsk> if anyone has any objections for staging koji sync, please comment in the ticket or let me know
15:05:44 <pingou> another round?
15:06:11 <smooge> yeah.. we freeze on Tuesday and could not update/reboot koji last night due to trying to get a working compose
15:06:13 <smooge> which failed
15:06:44 <pingou> oh build infra
15:06:50 <pingou> misread that, sorry
15:07:13 <smooge> ah ok. fudge
15:07:23 <smooge> the email to say we are doing this never got sent
15:08:16 <bowlofeggs> .hello2
15:08:17 <zodbot> bowlofeggs: bowlofeggs 'Randy Barlow' <rbarlow@redhat.com>
15:08:36 <smooge> ok sent that
15:08:52 <bowlofeggs> #info Test gating temporarily disabled in Bodhi, awaiting a 3.13.3 release for https://github.com/fedora-infra/bodhi/issues/3044
15:09:06 <bowlofeggs> smooge: i added a topic to gobby a few min ago, fyi
15:09:14 <smooge> yeah I put it in the above
15:09:16 <bowlofeggs> oh you did have that info
15:09:18 <bowlofeggs> hahahaha
15:09:20 <bowlofeggs> sorry
15:09:27 <smooge> I do this live people!
15:09:29 <bowlofeggs> i'd blame coffee, but i have coffee
15:09:41 <smooge> #topic Oncall
15:09:41 <smooge> #info smooge is on call from 2019-02-14 -> 2019-02-21
15:09:41 <smooge> #info puiterwijk is on call from 2019-02-21 -> 2019-02-28
15:09:41 <smooge> #info smooge is on call from 2019-02-28 -> 2019-03-07
15:09:41 <smooge> #info ?????? is on call from 2019-03-07 -> 2019-03-14
15:09:42 <smooge> #info ?????? is on call from 2019-03-14 -> 2019-03-21
15:09:44 <smooge> #info Summary of last week: (from smooge )
15:09:46 <smooge> and I missed that
15:10:04 <puiterwijk> You mean from Patrick? Not much special, just some pings.
15:10:16 <smooge> puiterwijk, did a lot of ping work this last week
15:10:27 <smooge> thank you puiterwijk
15:12:20 <pingou> +1
15:12:28 <smooge> zodbot: alias add oncall "echo smooge (Stephen Smoogen) is oncall. Please file a ticket if you don't hear from me ( https://pagure.io/fedora-infrastructure/issues ) My current hours are 1100 UTC to 1900 UTC Monday through Friday"
15:12:28 <zodbot> smooge: Kneel before zod!
15:12:46 <smooge> I am moving my hours 1 hour later.. I will sleep in again
15:12:59 <smooge> is anyone able to take oncall next week?
15:13:54 <pingou> the week of the 7th?
15:14:07 <smooge> 7th -> 14th
15:14:16 <mizdebsk> smooge, i can take it if needed
15:14:29 <smooge> thanks mizdebsk
15:14:39 <bowlofeggs> ah i cant' do taht weekend
15:14:44 <bowlofeggs> but i could do the business days
15:14:53 <bowlofeggs> ah mizdebsk has it
15:15:13 <bowlofeggs> i can do 14-21
15:15:24 <puiterwijk> bowlofeggs: note that also in smooge's alias, there's no expectation you are available over weekends. Other times people just file issues :)
15:15:34 <bowlofeggs> ah ok
15:15:45 <bowlofeggs> well i'd be like extra unavailable, like wouldn't even see tickets
15:15:49 <bowlofeggs> no interweb
15:15:56 <smooge> but bars
15:16:06 <smooge> bears
15:16:10 <bowlofeggs> haha yeah
15:16:12 <smooge> #topic Monitoring discussion
15:16:12 <smooge> #info https://nagios.fedoraproject.org/nagios
15:16:12 <smooge> #info Go over existing out items and fix
15:16:26 <bowlofeggs> smooge: am i on for the 14th-21st?
15:17:07 <smooge> bowlofeggs, you said you can do it so I added you there
15:17:12 <bowlofeggs> cool
15:17:12 <smooge> sorry for not confirming
15:17:34 <smooge> so on our monitoring.. we have a bunch of red services which I will try to clean up.
15:17:54 <smooge> The one I am worried about is notifs-backend01.phx2.fedoraproject.org
15:17:54 <smooge> 
15:17:54 <smooge> 
15:17:54 <smooge> Check fedmsg-hub consumers backlog
15:17:54 <smooge> 
15:17:55 <smooge> This service has 1 comment associated with it	This service problem has been acknowledged
15:17:57 <smooge> UNKNOWN 	02-28-2019 15:16:00 	0d 18h 20m 21s 	3/3 	UNKNOWN: fedmsg consumer FMNConsumer not found
15:18:30 <smooge> this came up after the reboot but I don't know enough about fedmsg or notifs-backend to diagnose/fix
15:18:41 <puiterwijk> smooge: rerun playbook basically.
15:18:42 <smooge> can someone help me with this later?
15:18:47 <smooge> ok will do
15:18:54 <puiterwijk> Basically, the permissions on monitoring are... funny at times
15:20:00 <smooge> pkgs02 swap should clear up after the next reboot
15:20:13 <smooge> #topic Tickets discussion
15:20:14 <smooge> #info https://pagure.io/fedora-infrastructure/report/Meetings%20ticket
15:20:30 <smooge> mizdebsk, this is your ticket
15:21:21 <mizdebsk> .ticket 7588
15:21:22 <zodbot> mizdebsk: Issue #7588: OpenShift app monitoring with Nagios - fedora-infrastructure - Pagure.io - https://pagure.io/fedora-infrastructure/issue/7588
15:21:46 <mizdebsk> like we discussed on one of previous meetings, we need a monitoring for apps in openshift
15:22:01 <smooge> yeah.. it was quite clear last weekend
15:22:02 <chris7871> prometheus ?
15:22:03 <mizdebsk> or at least i need, not sure how others feel with production apps with no monitoring
15:22:05 <smooge> last week
15:22:14 <mkonecny> +1
15:22:28 <mizdebsk> for me a simple check for number of running pods is sufficient
15:23:02 <mizdebsk> chris7871, we are currently using nagios, it is integrated with notification system
15:23:18 <mizdebsk> i think prometheus would need much more work than adding a simple nagios plugin
15:23:32 <mkonecny> I would be glad to get notification about failure with log :-)
15:23:39 <mizdebsk> but i am wondering whether we need something more than just checking number of running pods
15:23:44 <puiterwijk> mizdebsk: we do have prometheus running in openshift though. Automatically configured and what not :)
15:24:06 <mizdebsk> if yes then it could make sense to package and deploy one of existing nagios plugins for monitoring openshift
15:24:06 <puiterwijk> Prometheus is checking things like how often pods get restarted, how many pods are dead, whether the control plane is okay, etc
15:24:38 <mizdebsk> smooge found one nagios plugin that can check various things, it would need a serviceaccount with enough read-only privileges
15:24:42 <chris7871> i didn't know there was a nagios plugin for openshift monitoring
15:25:25 <smooge> someone in ubuntu land made one
15:25:35 <mizdebsk> on the other hand, if there's not much interest in getting nagios to work with openshift then from my pov it would be much simpler to write a custom plugin myself and put it in ansible.git
15:26:03 <mizdebsk> so i would like to hear your opinions about nagios + openshift
15:26:22 <smooge> mizdebsk, there is interest. I think packaging up that git repo and testing it makes sense. I am hoping to do so next week
15:26:35 <mizdebsk> 1. make something quick until a different monitoring is implemented and deployed? or 2. invest more time in a proper nagios plugin
15:27:06 <mizdebsk> ok, thx smooge
15:27:07 <smooge> puiterwijk, can we make prometheus send out emails to our pages?
15:27:15 <mizdebsk> and irc
15:27:22 <puiterwijk> smooge: we should be able to set that up, yeah. Otherwise, we can have nagios monitor it...
15:28:35 <smooge> ok either way. mizdebsk if I don't have a PoC by next Thursday meeting.. you have complete permission to do what is needed :)
15:28:54 <mizdebsk> ok
15:29:01 <mizdebsk> nothing else from my side on this topic
15:29:20 <smooge> #topic Priorities for next week?
15:29:20 <smooge> #info please put tickets needing to be focused on here
15:29:44 <smooge> #info Beta Freeze on Tuesday, composers willing
15:30:43 <smooge> #info upgrade/fix of ci box on Monday
15:31:03 <smooge> #info reboot/updates of build systems on Friday (will need extra eyes)
15:31:19 <smooge> those are the items I know about. any other items needed focus on?
15:32:16 <cverna> we will be pushing the fedora-messaging effort next week, not sure how well it will go with the freeze
15:32:27 <cverna> but I guess we will find out :-)
15:32:52 <smooge> just ask for freeze exceptions and do extra coordination with mboddu/adamw
15:33:03 <adamw> robot says no
15:33:05 <adamw> sorry what?
15:33:10 <smooge> aka if it breaks them then no
15:33:25 <mizdebsk> freeze is a perfect time to do work in staging
15:33:34 <adamw> but yeah trying to push it during the freeze seems unfortunate timing
15:33:39 <bowlofeggs> oh man, freeze already? crazy
15:33:45 <bowlofeggs> gotta bust out my heavy cota
15:33:46 <adamw> everyone who wanted to migrate to messaging would need freeze exceptions to do it
15:33:47 <bowlofeggs> *coat
15:34:04 <pingou> cverna: we'll be playing in stg which isn't covered by freeze
15:34:11 <smooge> oh ok
15:34:16 <smooge> sorry I thought this was prod
15:34:38 <smooge> cverna, skip what I said.. if this is stg you have a good time
15:34:59 <cverna> I think the end goal is to put it in prod, and it would be nice not to have to wait for the end of the freezw
15:35:17 <cverna> freeze but we will see how that goes no need to worry too much :)
15:35:40 <cverna> how long is the freeze ? 2 or 3 weeks ?
15:35:49 <smooge> cverna, until a beta is out the door
15:35:58 <pingou> depends if its slips or not
15:36:16 <pingou> cverna: once we have all our ducks ready in stg, it's easier to ask for freeze break for prod
15:36:20 <cverna> ok even better :)
15:36:28 <pingou> starting with the out layer and working inword
15:36:52 <pingou> we'll have patch that people can review and proof that things should work :)
15:37:07 <cverna> pingou: yes that 's why I think no need to worry to much but I think it is good to share that info :)
15:37:14 <smooge> ok anything else for priorities next week. I think the non-stage work is 1 week out
15:37:44 <cverna> oh I have one freeze related question but that can wait for Open floor :)
15:37:57 <smooge> ok next
15:38:04 <smooge> #topic Discuss: Is the Fedora pastebin still useful? - relrod
15:38:04 <smooge> #info how many users are using it? [3000 posts a day from 350 ips]
15:38:06 * mboddu is happy to help
15:38:18 <smooge> this is just to answer an item.. that was a question from last meeting
15:38:34 <smooge> we avg 3000 posts a day from around 350 different ips a day
15:38:54 <smooge> next one is a short one too
15:38:56 <smooge> #topic Lots of cron job noise.
15:38:56 <smooge> #info filing tickets on each one
15:38:56 <smooge> #info should we assign to 'owner' to fix?
15:39:46 <smooge> I am going through the daily 200+ emails sysadmin-* gets from various cron jobs. I will be filing tickets on ones which show up each day and will try to clean them up because I am tired of them
15:40:12 <cverna> smooge++
15:40:47 <pingou> oh wow, that's not a small undertaking, thanks for doing this smooge
15:41:05 <smooge> please look out for them and if you see one your service has.. fix it and close it. If you are a bystander/apprentice/extra time look and see what script in ansible is doing it and suggest a fix
15:41:17 <smooge> that is all
15:41:32 <smooge> topic Discuss: Future of fedora-packages - cverna
15:41:32 <smooge> #info ongoing thread on the infra list https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedoraproject.org/thread/VT54S4LBEGTC6SAGHHPZ3VZA6K5GDOQ7/
15:41:40 <smooge> cverna, your turn
15:41:59 <cverna> not much on this just raising awareness of the thread on the mailing list
15:43:07 <cverna> Overall I propose looking at elasticsearch to replace fedora-packages
15:43:23 <cverna> s/looking/to look*
15:43:40 <bowlofeggs> we woooould need the one part of bodhi that uses it to use something to get that info
15:43:47 <bowlofeggs> bodhi uses packages to search package names in the web UI
15:43:50 <bowlofeggs> not a major feature
15:43:52 <bowlofeggs> but still
15:44:08 <bowlofeggs> my keyboard has been doing this thing lately where it sticks on keys for some reasons
15:44:12 <bowlofeggs> that's why those ooooooo's
15:44:14 <bowlofeggs> hahahah
15:44:27 <cverna> bowlofeggs: yes it would make more sense if bodhi queried elasticsearch directly
15:44:30 <smooge> you need to eat less honey baklava at the keyboard. its what brings the bears
15:44:36 <bowlofeggs> haha
15:44:45 <cverna> src.fp.o would be a good candidate too
15:44:50 <misc> and ants
15:44:50 <bowlofeggs> yeah as long as i have some place to get it that works for me
15:45:30 <smooge> cverna, how much data does fedpkg get/have/keep these days? When I was looking at elasticsearch in the past we were looking to replace ALL the things because everyone wanted something
15:45:47 <mkonecny> I want to use packages to guess the name of the package in Anitya
15:46:45 <cverna> smooge just for fedora-packages currently we store ~24000 documents ( one per package )
15:46:49 <bowlofeggs> cverna: i also do use the web UI
15:46:53 <bowlofeggs> what would replace that?
15:47:08 <bowlofeggs> the main reason i use the web UI is to find out, in one place, what versions of a package are in which releases
15:47:27 <bowlofeggs> bodhi can kiiinda answer it, except rawhide (soon to be fixed) and it's UI is not as nice for it
15:47:28 <cverna> bowlofeggs: that's another question that don't have a good answer :S
15:47:36 <bowlofeggs> but maybe i can just make bodhi['s UI better at that
15:47:55 <smooge> ok that is a lot smaller than I was building out for
15:48:01 <bowlofeggs> the problem is that bodhi can only search and you have to use your brain to collate teh answers
15:48:17 <bowlofeggs> packages does taht collation for you, which is nice (but not critical i suppose)
15:48:27 <cverna> ryan pointed out that src.fp.o could be a good candidate too
15:48:29 <bowlofeggs> it's really just that it presents the data in an easier to consume format for my brain ☺
15:48:30 <smooge> ok I would like to get the next item in before we close out the meeting.
15:48:53 <smooge> I recommend everyone with an interest to read the thread and we come up with a way forward at next meeting?
15:49:07 <bowlofeggs> i'm not opposed to removing it
15:49:07 <cverna> yes thanks smooge
15:49:11 <bowlofeggs> just things to consider
15:49:15 <bowlofeggs> +1 smooge
15:49:19 <smooge> #topic bodhi-3.13.3 deployment date - bowlofeggs
15:49:19 <smooge> #info 3.13.3 to address https://github.com/fedora-infra/bodhi/issues/3044
15:49:19 <smooge> #info test gating is disabled until 3.13.3 is released
15:49:19 <smooge> #info do we want to deploy 3.13.3 this late in the week, or wait for Monday?
15:49:23 <cverna> anyway it will stop working when f29 is eol
15:49:24 <bowlofeggs> alright
15:49:34 <cverna> so we have 1 year or so
15:49:36 <smooge> OK I would really like to wait until Monday
15:49:45 <bowlofeggs> so right now test gating is disabled in bodhi, because greenwave is returning HTTP 500's on about 1/14 requests
15:49:55 <bowlofeggs> yeah i myself would also suggest monday
15:50:04 <bowlofeggs> i cannot be around on this weekend to react if something goes weird
15:50:04 <smooge> unless it would get pulled in when I do yum update tomorrow on the bodhi boexes
15:50:13 <bowlofeggs> it won't get pulled in
15:50:19 <bowlofeggs> i only tag to stg until i want to deploy
15:50:34 <bowlofeggs> but i wanted to ask the ops for their preference
15:50:44 <bowlofeggs> sounds like monday is preferred and that's also what i recommend
15:50:58 <smooge> ok then I would prefer to Monday as I think this weekend will be a lot of 'why did this brreaaaaakkk omg '
15:51:10 <smooge> #info decided to update on Monday
15:51:12 <bowlofeggs> i generally don't like to do deployments on thursday/friday or even wednesday unless it's critical
15:51:16 <bowlofeggs> imo, this is not critical
15:51:23 <bowlofeggs> few packages use gating today
15:51:26 <smooge> bowlofeggs, I can help with this after 15:00 UTC on Monday
15:51:35 <bowlofeggs> cool
15:51:50 <bowlofeggs> i don't expect problems, but then again, i never do and there are problems sometimes ☺
15:52:04 <smooge> ok last item for this week
15:52:14 <smooge> #topic Open Flush
15:52:30 <smooge> cverna, you said you had something?
15:53:01 <cverna> yes how do we get a system off the freeze list ? I think OSBS should not be impacted by the freeze
15:53:24 <mizdebsk> why not? it is used to build the release
15:53:34 <smooge> cverna, if something is producing somehting for the release it is mission critical
15:53:47 <cverna> mizdebsk smooge we release container every 2 weeks
15:54:07 <cverna> so we should freeze OSBS every  week
15:54:13 <bowlofeggs> right, but doesn't QA work on the branch containers during the freeze?
15:54:24 <bowlofeggs> and QA doesnt' QA the containers the rest of the cycle?
15:54:26 <cverna> and the base image is not built by OSBS
15:54:31 <bowlofeggs> ah
15:54:32 <cverna> at least currently
15:54:35 <mizdebsk> cverna, don't you want to use it for building base image?
15:54:38 <bowlofeggs> yeah i was thinkin gabout the base container
15:54:48 <cverna> later but currently it does not make sense
15:55:08 <bowlofeggs> do we not consider our container applications to be part of the release then?
15:55:14 <bowlofeggs> seems like a releng question actually
15:55:15 <cverna> if OSBS is broken that does not impact the container release at all
15:55:49 <cverna> we don't mass rebuild layered image maybe we should
15:55:51 <bowlofeggs> should we ask releng if they consider the artifacts from OSBS to be part of the release to inform the question?
15:56:19 <mizdebsk> bowlofeggs, +1
15:56:35 <cverna> sounds good, anyway the question was is there a process to ask to remove a system from the freeze list
15:56:44 <smooge> cverna, so to get it off the list, I would make an email case that the artifacts it creates are not in the items that QA or Releng needs to be stable. Get those groups to say 'sure thing' and we can make it official
15:56:56 <mizdebsk> cverna, it should be fine to discuss it here, on the list or in a ticket
15:57:04 <mizdebsk> there is no other process for that
15:57:23 <smooge> I would prefer on the list or a ticket for something that is easy for people to go back to in 2 years
15:57:25 <cverna> ok thanks that's sound like a good plan
15:57:45 <smooge> thank you for bringing it up and giving the reasons
15:57:46 <cverna> yes the list might be a better medium
15:58:03 <smooge> is there anything else on this?
15:58:12 <cverna> not from me
15:58:33 <smooge> any other items? I can close this out with 30 seconds to spare...
15:58:50 <smooge> thank you all for coming and helping each other
15:58:56 <smooge> #endmeeting