infrastructure
MINUTES
18:00:06 <nirik> #startmeeting Infrastructure (2015-03-12)
18:00:06 <zodbot> Meeting started Thu Mar 12 18:00:06 2015 UTC.  The chair is nirik. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:00:06 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
18:00:06 <nirik> #meetingname infrastructure
18:00:06 <zodbot> The meeting name has been set to 'infrastructure'
18:00:06 <nirik> #topic aloha
18:00:06 <nirik> #chair smooge relrod nirik abadger1999 lmacken dgilmore mdomsch threebean pingou puiterwijk
18:00:06 <zodbot> Current chairs: abadger1999 dgilmore lmacken mdomsch nirik pingou puiterwijk relrod smooge threebean
18:00:06 <nirik> #topic New folks introductions / Apprentice feedback
18:00:12 <relrod> here
18:00:17 <andreasch> hi
18:00:17 * puiterwijk is here
18:00:22 * threebean is here
18:00:42 <Mohamed_Fawzy> hi
18:00:44 <ClockworkOmega> here
18:01:04 <relrod> Oh, no roll-call section anymore? sorry
18:01:05 <smooge> here
18:01:23 <janeznemanic> hi
18:01:35 <smooge> there is always rolecall
18:01:43 <nirik> relrod: we can, perhaps we can add it to this same topic.
18:02:03 <nirik> seems like a waste to just have several minutes where we just say hi and then ask for freeback/new people.
18:03:27 <nirik> anyhow, any new folks like to introduce themselves? or apprentices with questions?
18:03:41 <ClockworkOmega> I'm new actually.
18:03:41 <kushalk124> Hey,
18:04:20 <kushalk124> So I started with some things, made my first package, which has been reviewed and I am looking for a sponsor,
18:04:34 <kushalk124> And next I would like to contribute to some apps, have been looking at datanommer
18:04:41 * oddshocks here
18:05:14 <nirik> ClockworkOmega: welcome. care to give us a one line intro? are you more interested in sysadmin or application devel stuff?
18:05:25 <nirik> kushalk124: cool. datanommer can always use some work...
18:05:48 <kushalk124> nirik, I would be happy to help out :)
18:06:35 <ClockworkOmega> Thanks. Yes I'm an spiring sysadmin looking to get professionally into Linux but in the mean time I want to volunteer with Fedora.
18:06:40 <ClockworkOmega> a*
18:07:44 <nirik> ClockworkOmega: welcome. We can give you some pointers on where to start after the meeting over in #fedora-admin. ;)
18:08:15 <nirik> ok, get ready for info dump...
18:08:16 <kushalk124> nirik, Is there something where we can put some machine learning /data analysis / visualization , I would love to do something of that sort as well :D
18:08:21 <ClockworkOmega> Thanks. Maybe my spelling will improve by then :P
18:08:58 <nirik> kushalk124: not sure what you mean, you're welcome (and encouraged) to mine data for interesting things...
18:09:36 <relrod> kushalk124: probably some opportunities there with fedmsg -- suggest talking to threebean
18:10:14 <kushalk124> nirik, ah thanks :) Yes I can think of interesting things to work on with data
18:10:33 <nirik> with fedmsg we have a lot of data... making sense of it would be great. ;)
18:10:42 <kushalk124> relrod, Thanks :) fedmsg would be helpful , I will also have a word with threebean
18:10:59 <threebean> kushalk124: cool :)
18:11:04 <kushalk124> nirik, Yes, today I was exploring how the badges are given , using the data from fedmsg :D
18:11:26 <nirik> excellent.
18:11:30 <nirik> #topic announcements and information
18:11:30 <nirik> #info Group effort cleaned up the pkgdb branch script on friday - kevin
18:11:30 <nirik> #info Good progress made on new cloud (vnc working, copr being tested) - kevin/msuchy
18:11:30 <nirik> #info Fedora 22 Alpha is out! Freeze is over! - kevin
18:11:31 <nirik> #info Mass reboots happened yesterday, please report any issues you find - kevin
18:11:32 <nirik> #info https://register.flocktofedora.org deployed to OpenShift for Flock 2015 Rochester. (Please wait for announcement to register). Need to figure out how to stand https://flocktofedora.org back up. -lmacken
18:11:36 <nirik> #info VACUUM ANALYZE on datanommer db made a difference.  we'll need to investigate why autovacuum isn't running regularly on our postgres dbs - ralph
18:11:39 <nirik> #info fedmsg+karma commands coming to zodbot soon https://github.com/fedora-infra/supybot-fedora/pull/22 - ralph
18:11:42 <nirik> #info we have tons of open pull requests this week.  any help reviewing is appreciated. http://ambre.pingoured.fr/fedora-infra/ - ralph
18:11:45 <nirik> so, theres a big dump of info. ;)
18:12:14 <nirik> on to discussion topics
18:12:44 <nirik> #topic monitoring: let us design something better - kevin
18:13:04 <nirik> so, we get a lot of alerts, and they kind of aren't all that useful much of the time.
18:13:24 <nirik> I'd like to look at some alternatives.
18:13:44 <nirik> One of them is that we should try out assimilation
18:13:48 <mhurron> complete alternatives to nagios or just a different way to configure it?
18:14:00 <nirik> http://linux-ha.org/source-doc/assimilation/html/index.html
18:14:04 <nirik> both. ;)
18:14:14 <nirik> I think we can try assimilation out in our cloud network
18:14:32 <nirik> and we could look at redesigning our nagios setup if that proves easy/possible to do shorter term
18:15:21 <nirik> I'll look at floating some ideas on the list.
18:15:34 <nirik> if anyone is interested in helping out, they could chime in there too. ;)
18:16:10 <threebean> I'm interested in seeing it happen.. but I don't know much about alternatives.
18:16:17 <nirik> Things I want this to fix:
18:16:38 <threebean> I was hoping to get into automating much of our existing nagios config (so it's derived from ansible host and group vars..).  but switching systems: I hadn't considered.
18:16:40 <oddshocks> Whatever the resolution may be, the alerts system could definitely be improved
18:16:44 <nirik> * alerts should start out just going to irc, then if still happening, email, then pager.
18:16:52 <oddshocks> +1
18:17:13 <oddshocks> too many emails
18:17:14 <nirik> * alerts for things that aren't user/customer impacting should never go to pagers.
18:18:32 <nirik> I'd really like to have alerts be a special event, not a 'oh no, there goes the pager again'
18:18:44 <nirik> anyhow, we can discuss more on list
18:19:03 * threebean nods
18:19:07 <puiterwijk> maybe also prioritizing services. while tagger is user-facing, I'm not sure it's as critical as distgit or koji being down
18:19:12 <nirik> the nice thing about Assimilation is that it just autodetects. You don't need to configure what it monitors.
18:19:21 <nirik> puiterwijk: also good idea.
18:19:30 <threebean> there's also a lot of app-specific errors that mostly go to developers, but not a broader monitoring thing.
18:19:47 <threebean> lmacken just noticed a bunch of internal errors from the badges backend.  it needed a restart, but nagios didn't know about it.
18:20:13 <nirik> yeah, and we often forget to add things to nagios when we make new ones
18:20:25 <nirik> and staging alerts should not be the same as production
18:20:49 <puiterwijk> I would say staging should only (maybe) get IRC alerts, never email or pager
18:20:52 <ClockworkOmega> How could someone help in improving the system?
18:21:19 <nirik> puiterwijk: email still might be handy to see if something is down for a long time.
18:21:39 <nirik> ClockworkOmega: well, chime in on the mailing list post I am going to make I guess. ;) and/or look at our current setup in ansible git.
18:21:42 <lmacken> threebean: ah
18:21:42 <puiterwijk> nirik: not sure. for "long time", the monitoring should have a log of itself
18:21:59 <puiterwijk> (just my opinion)
18:22:01 <nirik> perhaps.
18:22:12 <nirik> also, currently nagios alerts go to 'sysadmin-members'
18:22:24 <nirik> I would suspect 99% of them just filter them into the trash. ;)
18:22:40 <nirik> well, perhaps 95%
18:23:16 <puiterwijk> right, but I guess that's caused by the fact that it sends so much email
18:23:35 <puiterwijk> so if we'd fix the signal/noise ratio, that percentage should hopefully go down
18:23:43 <threebean> yeah.  I rely excusively on irc for nagios alerts.
18:23:43 <nirik> well, that and a number of sysadmin members aren't very active or have no idea how to fix something or have access to do so
18:24:13 <smooge> my phone is my alerter
18:24:20 <smooge> when its charged
18:24:27 <smooge> unlike right now
18:24:28 <nirik> a case I often hit: something goes down like a proxy or something, and so theres 20-30 alerts, then 20-30emails or whatever.
18:24:37 <nirik> but I saw them on irc and fixed it.
18:24:51 <nirik> so I hit 'delete all' on my phone and 'catch up all' in my nagios folder
18:24:58 <nirik> all those pages/emails are... completely overhead
18:25:29 <nirik> anyhow, will post to the list we can brainstorm a plan there. :)
18:25:50 <nirik> anything else on monitoring?
18:26:15 <puiterwijk> yeah, I think "no monitoring" is not a solution, even though it may solve the "too many alerts" problem :)
18:26:35 <nirik> agreed. we want to see problems before our users do.
18:26:40 <nirik> #topic where is our source code? - smooge
18:26:41 <nirik> Google code and gitorious are going away.. what projects there we might rely on?
18:26:43 <threebean> eh, if we're moving monitoring around, it might be nice to get a flashier collectd frontend (or replacement). there are nice, modern open source frontends out there
18:26:44 * oddshocks digs nirik's idea of IRC -> email -> pager, with those other exceptions/rules mentioned along with it
18:26:45 <nirik> smooge: you added this?
18:26:45 <ClockworkOmega> What about a different system for filtering?
18:27:03 <ClockworkOmega> Or rather a different methodology for it?
18:27:13 <nirik> oops. Didn't mean to cut off everyone there on monitoring. ;)
18:27:13 <puiterwijk> I think we only have code in fedorahosted and github in infra
18:27:16 <nirik> #undo
18:27:16 <zodbot> Removing item from minutes: <MeetBot.items.Topic object at 0xfa7b3d0>
18:27:42 <nirik> threebean: graphite was suggested... it's Django tho and bigger...
18:27:49 <nirik> ClockworkOmega: filtering where?
18:28:53 * relrod has played with graphite before. It's _extremely_ modular, but that also means that setting it up has a _lot_ of little components to maintain and set up and learn.
18:29:44 <ClockworkOmega> I wasn't speaking so much about direction.
18:29:57 <nirik> relrod: and it seems heavy to me, but perhaps it's worth it. ;)
18:31:05 <nirik> #topic where is our source code? - smooge
18:31:08 <smooge> puiterwijk, my questions was about a bit bigger picture.. do we rely on tools which are hosted there and do we know where they will be after they close down.
18:31:18 <nirik> anyhow, I don't think we have any code there... anyone know of any?
18:31:23 <puiterwijk> smooge: yes, we will be.
18:31:32 <puiterwijk> but that's all Fedora packaged as far as I know
18:31:50 <puiterwijk> nirik: as said, we don't have any code there, but smooge is worried about stuff we depend on
18:32:10 <nirik> sure, it's good to ask everyone... in case we missed something.
18:32:13 <puiterwijk> and I personally think that's the problem of the EPEL apckage maintainers
18:32:14 * threebean doesn't know of any
18:32:16 <smooge> it is more of a 'something we need to be aware of if it all goes away'
18:32:39 <puiterwijk> smooge: yeah, makes sense. though I guess most upstreams that are still active will find another host, and then package maintainers should follow that
18:32:43 <puiterwijk> (just my 2 cents)
18:33:00 <puiterwijk> or rather, my opinion
18:33:02 <smooge> still active...
18:33:10 <smooge> that was where I start to get itchy
18:33:12 <nirik> it's just like when berlios went away. ;)
18:33:32 <smooge> our planet uses a forked venus which doesn't match what current venus is
18:33:45 <puiterwijk> smooge: well, non-active upstreams have always been a problem, regardless of the place where the code is
18:34:04 <puiterwijk> so yes, I see the problem there, but that's not especially related to gitorious/gcode shutting down
18:34:29 <nirik> well, it's just more at once.
18:34:33 <threebean> it might be worth trying to script something that goes through our packages searching for links to these soon-to-be-gone services.  Look at SourceN fields, look at the 'upstream url' in pkgdb..
18:34:36 <smooge> ok never mind.
18:34:44 <threebean> ... and generate a list to send to the devel list.
18:34:49 <nirik> threebean: could anytia do that?
18:35:05 <nirik> or the using it's db I guess
18:35:11 <puiterwijk> nirik: well, it has a list of upstream URLs, yes.
18:35:14 <puiterwijk> but not for all source files
18:35:15 <threebean> yeah.  would take a little scripting.
18:35:37 <nirik> sure, it would never be 100%
18:35:43 <nirik> but could find the ones that are obvious
18:35:57 <puiterwijk> I can take a throw at that after the meeting
18:36:24 <nirik> cool. :)
18:36:47 <nirik> #action puiterwijk will see if we can generate a list of packages with upstreams being retired to notify the devel list of.
18:36:53 <nirik> #topic Mirrormanager2 [how is this coming along?]
18:37:05 <nirik> smooge: this was your question? or ?
18:37:11 <puiterwijk> For this, it's too bad that pingou is gone today.
18:37:19 <nirik> well, I can give some info. :)
18:37:26 <puiterwijk> ah, sure
18:37:30 <smooge> nirik, it was brought up about bapp02 ooms
18:37:43 <smooge> and I thought that was the area to put questions like that.
18:38:05 <puiterwijk> smooge: yeah, I think that's a right place indeed.
18:38:09 <nirik> we have 1 mirrorlist server thats on mm2. The mirrormanager2-mirrorlist rpm needs some work tho (it doesn't come up right on boot). I'll work with pingou to fix that next week.
18:38:24 <nirik> sure, absolutely right. ;)
18:38:43 <nirik> once mirrormanager2-mirrorlist is set, we should convert the rest of the mirrorlist servers to use it.
18:39:11 <nirik> oddshocks was looking into seeing if we could validate the data those use...
18:39:27 <nirik> so we avoid pushing out bad data to them.
18:39:46 * oddshocks nods
18:40:03 <nirik> On the other parts, we have staging versions of: backend, crawler, frontend. We need to finish some fedmsg work on them... then make production ones and switch.
18:40:27 <nirik> I don't know for sure if fedmsg integration is the last bit they need or if there was something more pingou was waiting on doing
18:40:55 <nirik> for bapp02 in the mean time the only thing we could possibly do is decrease the number of crawlers I guess.
18:40:59 <threebean> hm.  did we do that already?  can't recall.  will have a look.
18:41:20 <nirik> threebean: it still spews crons about fedmsg things missing... might just need adding in the playbook(s)
18:41:41 <threebean> ah, cool. I'll poke it after the meeting.
18:41:53 <oddshocks> On my end... I'm really clueless as to which parts of the pickle data is the critical data that determines if the pickle is good or bad. I have pingou's script to compare two pickles and have used that as a jumping-off point to write a validate_pickle.py script, but I'm still pretty clueless. So if anyone has any other info on the rather-large amount of content in these pickles, it'd be appreciated
18:41:59 <nirik> It would be cool if we could finish rolling this out before beta, but not sure if thats being too pushy
18:42:47 <nirik> oddshocks: yeah, the only thing I have is that traceback from the mm2-mirrorlist. The old mirrorlists don't show any error they just suck up all memory and fall over.
18:43:00 <oddshocks> At the least, I could probably use maybe 2 more good pickles as examples, so I can compare 3 good pickles and see what they have, that the bad pickle doesn't. threebean was kind enough to get me one good pickle to compare to the bad one, but I'm not sure where that was taken from. But I could probably use a couple more for comparitive purposes
18:43:15 <oddshocks> Oh, yeah, I have that traceback you sent me, too, nirik  :)
18:43:37 <nirik> sure. I can get you some more good ones...
18:43:52 <nirik> it makes them hourly
18:44:02 <oddshocks> Feel free, anyone, to tell me that I'm going about this less-than-optimally. :P
18:44:07 <oddshocks> nirik: cool, thanks :)
18:45:03 <nirik> smooge: did that answer the question? anything more on mm2 (without pingou around)
18:45:20 * oddshocks wasn't sure if other people knew more than he did about what causes the pickle to be bad
18:45:21 <smooge> well other than "we are moving to it next week"
18:45:38 <oddshocks> when does pingou get back again?
18:45:38 <puiterwijk> oddshocks: I have some notes about bad pickles. will look them up for you
18:45:39 <nirik> next week.
18:45:43 <oddshocks> puiterwijk: _awesome_, thanks
18:45:47 <oddshocks> nirik: cool
18:45:56 <nirik> I don't know if we can move to it before beta, but it would be nice.
18:45:58 <smooge> oddshocks, they are magic to me
18:46:12 <nirik> we could also retire bapp02, app01.stg from this, so that would be all good.
18:46:43 <smooge> yay!
18:46:46 <smooge> ok that is all I needed
18:47:00 <nirik> ok, I didn't have anyone signed up to tell us all about an app, so hey, I guess I will randomly pick one to talk about... how about collectd! :)
18:47:08 <nirik> #topic Learn about: collectd
18:47:18 <smooge> ah man.. I was about to leave too.
18:47:28 <nirik> ha ha ha.
18:47:31 <nirik> https://admin.fedoraproject.org/collectd/
18:47:37 <nirik> giving a dns error. neat. ;)
18:47:49 <puiterwijk> demo effect
18:47:55 <puiterwijk> but works for me, so it's proxy-local
18:48:19 <nirik> ok, fixed.
18:48:26 <nirik> its log01's vpn. ;)
18:48:51 <nirik> anyhow, we have this application called collectd. It runs a agent on various machines and reports back to a central version on log01.
18:49:12 <nirik> that version takes the data in as rrdtool files and then can display it on the above web page.
18:49:31 <nirik> it's kinda clunky, but it can give bit picture graphs of things.
18:50:05 <nirik> anytime you see a host in the list that is NOT a fully qualified domain name, it's an old host still in puppet/rhel6
18:50:39 <nirik> you can zoom on graphs pretty close, it takes quite a lot of readings.
18:51:01 <nirik> there are plugins for various things, including some we have written ourselves.
18:51:23 <nirik> threebean has http://threebean.org/fedmsg-health.html which uses some of these.
18:52:09 <nirik> any questions or comments on collectd? ;)
18:52:36 <mhurron> it is pretty ugly isn't it :P
18:52:46 <nirik> yeah, it's not a winner on the interface. ;)
18:53:05 <puiterwijk> it's a well-designed admin tool - it shows what it needs to, in a concise interface :)_
18:53:23 <nirik> heh
18:53:29 <nirik> #topic Meeting process
18:53:43 <nirik> just real quickly, do we want to keep doing the gobby document meeting process?
18:53:45 <mhurron> well ... if it showed what it needed to wouldn't it show abnormalities on the front page?
18:54:03 <nirik> mhurron: well, it doesn't know what abnormal is. It only reports the news. ;)
18:55:07 <nirik> anyhow, I know gobby has issues, but I like the shared document thing... but I'm happy to go back to the old meeting format if people prefer.
18:55:07 <puiterwijk> nirik: I think this process works fine
18:55:33 <smooge> now that i have gobby working on my laptop I do to
18:56:48 <nirik> yeah, having to have a special client is a pain. As is to me having to reenter the password and reconnect everytime I reboot or move networks, and it's autosave doesn't work right accross reboots of the server.
18:56:51 <nirik> otherwise it's fine. ;)
18:56:57 <threebean> yeah, I like the new process still.  I was late to update the gobby document this week, fwiw.. but hope to do it earlier in subsequent weeks.
18:57:18 <nirik> I might look into *pads again and see if we can find a web based on we can actually package and deploy
18:57:40 <nirik> #topic Open Floor
18:57:45 <nirik> anyone have items for open floor?
18:57:47 <puiterwijk> I have a very quick thing
18:58:02 <puiterwijk> I just ran a first test run of the gcode/gitorious script, and found 90 projects so far
18:58:09 <puiterwijk> (of the 5000 or so in anitya)
18:58:44 <smooge> puiterwijk, did you do that in awk?
18:58:47 <smooge> :)
18:58:48 <nirik> quite a few
18:58:49 <puiterwijk> smooge: yup :)
18:59:08 <puiterwijk> nirik: yeah, but not a lot of "important" ones to us
18:59:13 * smooge goes off to corrupt more people so his mod_awk httpd module will be used
18:59:24 <puiterwijk> but I'll refine it a bit and make a more complete list
18:59:30 <puiterwijk> smooge: mod_awk? tell me more :-)
18:59:36 <nirik> ha.
18:59:43 <nirik> ok, if nothing else will close out in a minute or so
18:59:45 <smooge> puiterwijk, ok thanks. pastebin me the code
19:00:17 <puiterwijk> smooge: you mean for mod_awk (which I don't have... yet)? or the gcode/gitorous stuff?
19:00:29 <relrod> hah
19:00:29 <smooge> gcode/gitorious
19:01:04 <nirik> thanks for coming everyone!
19:01:06 <puiterwijk> smooge: yeah, will send it out when I get more stuff added (I want to also check the distgit repos)
19:01:12 <smooge> its like mod_perl but even creakier.. /* I started on this as a joke back in 1997.. please don't make me look anymore */
19:01:23 <nirik> #endmeeting