infrastructure
LOGS
19:00:00 <nirik> #startmeeting Infrastructure (2011-04-21)
19:00:00 <zodbot> Meeting started Thu Apr 21 19:00:00 2011 UTC.  The chair is nirik. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:00 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
19:00:00 <nirik> #meetingname infrastructure
19:00:00 <nirik> #topic Robot Roll Call
19:00:00 <nirik> #chair goozbach smooge skvidal codeblock ricky nirik
19:00:00 <zodbot> The meeting name has been set to 'infrastructure'
19:00:00 <zodbot> Current chairs: codeblock goozbach nirik ricky skvidal smooge
19:00:11 <CodeBlock> wow right on :00 :D
19:00:12 <nirik> who all is around for a lovely infrastructure meeting?
19:00:16 * skvidal is here
19:00:17 * CodeBlock is here
19:00:28 <goozbach> hare
19:00:55 <StylusEater> here
19:01:06 * jbass29503 here am new
19:01:14 * skvidal is there
19:01:19 * skvidal is everywhere
19:01:28 * jsmith lurks
19:01:29 * skvidal hums
19:01:33 <nirik> #topic new folks introduction (don't be shy!)
19:01:54 <nirik> so, any new folks who are looking for things to do? or would like to say Hi?
19:02:08 <jsmith> "Hello!  My name is Jared, and I'm a Fedoraholic..."
19:02:13 <jbass29503> ok, I guess thats me
19:02:29 <Southern_Gentlem> hello
19:02:31 <jbass29503> hi fedora infrastucture team
19:02:33 <jsmith> Welcome jbass29503!
19:02:35 <smooge> hellow
19:02:53 * StylusEater isn't really part of the team...used to help with a few packages and lurks a bit
19:03:04 <nirik> welcome jbass29503 and StylusEater. :)
19:03:12 <nirik> any areas you guys are interested in?
19:03:30 <StylusEater> I think I mentioned a while back about helping with nagios.
19:03:37 <Southern_Gentlem> nirik,  in anything you need help in
19:03:55 <nirik> cool. There's always nagios tweaking that needs doing...
19:04:02 <StylusEater> any programming help needed ... I'd prefer to work on that...
19:04:21 <nirik> StylusEater: well, much of our stuff is turbogears/python type stuff.
19:04:31 <jbass29503> am not half bad at nagios, and see you have a nagios upgrade, would like to assit or at least be part of that to learn fop processes
19:04:41 <StylusEater> nirik: not familiar with the turbogears framework but I've written python code.
19:04:57 <skvidal> StylusEater: if you would like to become an expert in fas  - we could use additional eyes
19:05:11 <StylusEater> nirik: I've used web.py and built some of my own templating stuff.
19:05:23 <StylusEater> skvidal: account system?
19:05:28 <skvidal> yep
19:05:34 <StylusEater> skvidal: kk
19:06:04 <nirik> on the programming side, ricky / abadger2001 / lmacken would be the folks to talk with. You can look for tickets or look at code and see what you want to work on anytime of course.
19:06:10 <skvidal> grab a copy of the code from hosted - there are some todo list items that I know of that would be great to research some - like openid 2.0 providing
19:06:21 <nirik> for nagios CodeBlock is going to be working on the migration...
19:06:23 <StylusEater> skvidal: toshio is probably super overloaded...
19:06:37 <skvidal> StylusEater: which is exactly why more eyes is helpful
19:06:43 <nirik> StylusEater: he's off on vacation right now, but always willing to help get someone up to speed.
19:06:46 <StylusEater> skvidal: yup
19:07:24 <nirik> otherwise, if you guys want to lurk in #fedora-admin and #fedora-noc, things come up all the time... please ask questions or offer to assist with things you are interested in.
19:07:40 <skvidal> nirik: also - unless I'm smoking dope
19:07:52 <skvidal> fi-apprentice works
19:07:53 <skvidal> as a group
19:08:09 <skvidal> so deciding to join that should let people LOGIN to systems w/o giving them much in the way of access
19:08:14 <nirik> yeah, can we add a few more of us as sponsors/admins? I'm all for starting to use that.
19:08:41 <skvidal> sure
19:08:43 * skvidal does so
19:09:37 <nirik> how do we want to handle it? add people in as we like, and remove after some timeout if they are no longer active?
19:09:39 * CodeBlock is fine with sponsoring some people, now that my semester is coming to an end
19:09:47 <CodeBlock> I'll have some time to work with people and such
19:10:15 <nirik> excellent.
19:10:20 <StylusEater> nirik sponsored my old packages
19:10:21 <skvidal> nirik: done
19:10:38 <nirik> ok, so, welcome new folks... please hang around and ask questions or chime in. ;)
19:10:41 <nirik> skvidal: thanks.
19:10:45 <nirik> StylusEater: happy to.
19:10:50 <StylusEater> I saw the apprentice option but my free time is ... err ... unpredictable
19:10:59 <cyberbyte> hi, sorry i'm late
19:11:34 <nirik> cyberbyte: no worries. ;)
19:11:39 <nirik> #topic Upcoming outages and work items
19:11:54 <nirik> So, the next few weeks I have:
19:11:58 <nirik> 2011-04-21 at 20UTC: fas01 migration to new host.
19:11:58 <nirik> 2011-04-25 or so: puppet update on puppet1
19:11:58 <nirik> 2011-05-02 or so: fpca change (short fas outage)
19:11:58 <nirik> 2011-05-10 final freeze
19:12:15 <nirik> if anyone would like to schedule other items in there, let me know...
19:12:33 <nirik> we may have the pkgs01 branch changes from Oxf13 before final freeze sometime.
19:12:49 <nirik> Anyone have other items there? or comments?
19:13:08 * ricky shows up for a bit, sorry
19:13:11 <CodeBlock> mmm, zodbot move to value01 -- at some point soonish.
19:13:21 <Oxf13> I filed a ticket about that
19:13:52 <nirik> Oxf13: yep. Would be good to do before final freeze?
19:14:06 <nirik> CodeBlock: ok. I think most anytime with no meeting should work.
19:14:11 <Oxf13> when is that?  (and probably yes)
19:14:14 <skvidal> ricky!
19:14:32 <nirik> Oxf13: 2011-05-10
19:14:47 <CodeBlock> skvidal: man, I wish people would get that excited when I show up to stuff. :P
19:15:01 <skvidal> CodeBlock: we can't miss you if you don't go away! :)
19:15:01 <Oxf13> nirik: yeah, I'd hope to have it done by then
19:15:10 <StylusEater> skvidal: I see fi-apprentice is invite only?
19:15:21 <skvidal> StylusEater: yes
19:15:22 <nirik> Oxf13: ok, cool. We can see what fesco wants to do. It sounds like a pretty short outage.
19:15:24 <skvidal> on purpose - really
19:15:42 <nirik> anyhow, moving along...
19:15:45 <skvidal> StylusEater: so we don't get a lot of people who are not REALLY interested signing up and making it hard to figure out who is who
19:15:47 <StylusEater> skvidal: yes. I understand. "...and ask for assistance getting started"
19:15:51 <skvidal> StylusEater: right
19:15:58 <nirik> #topic Post release housecleaning tasks
19:16:22 <nirik> So, I'd like to look at having a set of tasks we do some weeks after every release.
19:16:42 <nirik> I think it makes sense to tie them to our release cycle instead of 90days or whatever since then we don't run into freezes, etc.
19:16:46 <nirik> https://fedoraproject.org/wiki/Infrastructure_post_release_housekeeping
19:16:53 <nirik> is the page I whipped up on it.
19:17:19 <nirik> Are there other tasks that would be good to add? Any comments on the ones there or more info on them?
19:18:24 <ricky> Wonder if that's a good time to do somewhat regular rebuilds of certain machines so that they happen at a less invasive time
19:18:42 <skvidal> ricky: I'd be happy to see that
19:18:51 <skvidal> app## and proxy## would be nice to make more regular and easier
19:19:26 <nirik> yeah, that might be good.
19:19:44 <nirik> oh, and ping publictest people... "are you still using this?"
19:20:23 <skvidal> well - Ideally
19:20:33 <skvidal> I'd like to just do away with publictest as is entirely
19:20:35 <skvidal> speaking of that
19:20:41 <skvidal> if anyone wants to take up that task
19:20:56 <skvidal> of a very small system (to be run on puppet1, even) to track the publictest## boxes
19:21:01 <skvidal> and dispose of them at will
19:21:07 <skvidal> that would be great, imo
19:21:36 <skvidal> sorry for getting away from the convo
19:21:38 <nirik> ok, we do have the wiki pages for them.
19:21:49 * nirik also purged some recently that were no longer being used.
19:21:51 <skvidal> nirik: right -0 but those grow stale and we could set real timeouts
19:21:58 <nirik> agreed.
19:22:02 <skvidal> we talked about this before - but it didn't go anywhere
19:22:05 <nirik> so a small tracking app...
19:22:24 <nirik> "foo has been using publictest-3.14 for more than 6 months"
19:22:51 <CodeBlock> 3.14, nice. ;D
19:23:13 * nirik updates the wiki page with a few more things.
19:23:23 <nirik> so, feel free to edit/fill out or discuss anything on there.
19:23:36 <nirik> I'd like to try it out say 3-4 weeks after f15 is released...
19:24:09 <skvidal> sounds reasonable
19:24:34 <nirik> #topic Meeting tagged tickets
19:24:42 <nirik> https://fedorahosted.org/fedora-infrastructure/query?status=new&status=assigned&status=reopened&group=milestone&keywords=~Meeting&order=priority
19:24:51 <nirik> anyone have a meeting tagged ticket they wish to discuss?
19:25:45 <tibbs> Hmm, I got a "database is locked" error trying to view that link.
19:25:52 <tibbs> Went away on reload.
19:26:13 <nirik> yeah, same here. I think too many people hit it at once.
19:26:20 <ricky> Interesting that it'd error out instead of waiting on it for a bi
19:27:20 <tibbs> Didn't mean to interrupt, but is kind of an infrastructure issue.
19:27:21 <nirik> anyhow, will move on if there's nothing else to call out from there.
19:27:32 <nirik> tibbs: yeah, out trac version is ancient too. Can't help
19:27:34 <jbass29503> so would be the best person to reach out to for the nagios one ?
19:27:59 <StylusEater> !ticket 2275
19:28:04 * StylusEater fails
19:28:06 <nirik> jbass29503: check with CodeBlock for the migration. I think we are doing that after f15 release...
19:28:12 <nirik> .ticket 2275
19:28:13 <zodbot> nirik: #2275 (Upgrade Nagios) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2275
19:28:23 <StylusEater> nirik: success!
19:28:25 <CodeBlock> correct, after f15 release
19:28:40 <CodeBlock> should go pretty smoothly
19:29:00 <nirik> jbass29503: if you just want to poke at it, there's several other nagios tickets I think. Or we could get you able to look at the config and suggest additions...
19:29:09 <CodeBlock> If we wanted to set up a proxy path, say admin.fpo/nagios-test to go to noc03-tmp for some quick testing meanwhile, we could do that
19:29:51 * ricky debates a little whether nagios should be behind a proxy or not
19:30:03 <nirik> yeah, although the noc03-tmp probibly doesn't have perms to talk to all the things it needs to monitor?
19:30:05 <ricky> I remember hearing skvidal was looking at just poking holes in the firewall for it
19:30:32 <skvidal> so lemme make sure I grok this
19:30:34 <nirik> I think for host checks that makes sense...
19:30:35 <ricky> And might as well poke 80/443 or whatever as well and not have it depend on the proxies being up.  Then again, it is just the web interface, so it's not really a big deal.
19:30:38 <skvidal> we want to have a peice of infrastructure
19:30:41 <skvidal> that can and DOES break
19:30:47 <skvidal> in front of the thing we use to determine what IS broken?
19:31:07 * skvidal gets egg
19:31:10 * skvidal gets chicken
19:31:11 <nirik> well, just it's web interface
19:31:27 <skvidal> nirik: how do you determine what's down when you get a notice at 2am?
19:31:43 <jbass29503> hm, well you could use an 'intermediate' host, instead of allowing the server running nagios, this maybe what ricky is speaking about?
19:32:11 <nirik> skvidal: well, look at the notice and see what it says, then check/ping/login to it and see...
19:32:27 <nirik> if it's up I often ssh to noc01 and run the check command myself to see what it's checking.
19:32:44 <skvidal> nirik: right - what I have found is when things go sideways - it tends to be multiple things going sideways
19:32:53 <nirik> yeah.
19:33:00 <skvidal> so being able to see an overview....
19:33:20 <nirik> nagios is a horrible monitoring solution, it's just better than all the rest. ;)
19:33:52 <skvidal> okay - fine
19:33:52 <CodeBlock> I don't see any reason that we need to proxy it. No load balancing (it's one server), hardly any caching needed (sans stylesheets, which we could config apache to cache if it was a concern)
19:34:04 <skvidal> so - that's what I mean
19:34:16 <skvidal> right now our proxy keeps people from hitting an apache webserver and the cgi scripts
19:34:23 <skvidal> by hiding it behind a proxy + ssl + apache
19:34:25 <skvidal> so.... ummm
19:34:27 <ricky> So what I was suggesting above was to just allow 80/443 to noc01 and have it be outside-facing.  It shouldn't be too hard to do
19:34:29 <skvidal> aren't we masking apache from apache?
19:34:35 <skvidal> ricky: exactly
19:34:38 <skvidal> ricky: +1 to that
19:34:39 <ricky> But I think we're in agreement here
19:34:41 * nirik is fine with that.
19:34:46 <CodeBlock> yeah
19:34:50 <nirik> nagios.fedoraproject.org or something.
19:34:55 <CodeBlock> yeah
19:35:19 <nirik> ok, any other meeting tagged tickets?
19:35:19 <ricky> Or noc01 and noc02 even, to make things consistent
19:35:37 <nirik> yeah... although it's nagios and nagios-external in puppet.
19:35:42 <StylusEater> nirik ... we can't use monit i guess
19:36:16 <nirik> monit is pretty limited I thought...
19:36:20 <jbass29503> not sure I see any benefit in using a proxy for the nagios interface, unless you are using nagios in a disturbed fashion
19:36:24 <CodeBlock> I've seen a few people suggest monit, never looked into it though
19:36:42 <skvidal> jbass29503: I hope you meant distributed - but curiously enough 'disturbed' works too
19:36:43 <nirik> we did try zabbix.
19:36:50 <StylusEater> i use it at work
19:36:55 <skvidal> and that was a disaster
19:37:04 <skvidal> zabbix, that is
19:37:07 <jbass29503> skvidal: sorry yes
19:37:13 <StylusEater> nirik ... it can be ... but it's scriptable/customizable
19:37:15 <nirik> yeah, failure.
19:37:49 * skvidal wonders if monit is born from the codebase of 'mon'
19:37:59 <StylusEater> i believe it is
19:38:00 <skvidal> I used to love the txt-output status checks
19:38:02 <skvidal> ah
19:38:23 <nirik> well, I think for the most part nagios works for us... we need to try and get it so that things are fixed and alerts are rare/only when there are real issues however.
19:38:29 <StylusEater> mmonit.com/monit
19:38:39 <skvidal> StylusEater: nod
19:38:45 <nirik> It's better than it was, but still noisy.
19:38:45 <skvidal> nirik: agreed
19:39:33 <nirik> anyhow, patches or suggestions on improving our nagios setup welcome.
19:39:37 <nirik> #topic Open Floor
19:39:42 * StylusEater thinks there really aren't any good foss monitoring tools ... all are noisy
19:39:42 <nirik> anyone have anything for open floor?
19:39:46 <StylusEater> even paid ones are noisy
19:40:22 <ricky> Better of two evils - too noisy or too quiet :-)
19:40:30 <casep> Hi, sorry I was late for introductions, I was /trying/ to work with Toshio
19:40:40 <nirik> yeah. Ideally it's a balance between: fix things so they are not ever seeing problems vs making the monitoring too lax and you never are alerted about problems that exist.
19:40:57 <nirik> casep: welcome. ;) He's out on vacation right now...
19:40:58 <ricky> casep: Ah, yeah he's on vacation now
19:41:06 <casep> but I think python/real developing is not best side
19:41:21 <casep> so I think I could help in other issues
19:41:23 <StylusEater> i guess it's a fundamental network design problem really ... so striking a balance is really like catching water
19:42:19 <smooge> zabbix was a replacement for something else that went crap up also in implementation I believe
19:42:29 <goozbach> I'm a fan of opennms but it's java
19:42:34 <goozbach> *shudder* :)
19:42:43 <skvidal> goozbach: and it wants to eat everything else you do
19:42:47 <skvidal> so for openfloor
19:42:59 <skvidal> if someone wants to go through puppet and look for any/everything using snmp
19:43:05 <skvidal> and figure out how we can put a bullet in it
19:43:07 <nirik> casep: sounds good. Do hang out in #fedora-admin and #fedora-noc, look thru tickets and ask questions/offer to look at things. ;)
19:43:09 <skvidal> that would be a good thing
19:43:22 <ricky> +1 to that :-)
19:43:25 <casep> nirik: done
19:43:35 <goozbach> nirik: question for the meetings...
19:43:42 <nirik> goozbach: sure, shoot
19:43:58 <goozbach> you want I still do announcement/notes/minutes
19:44:05 <goozbach> or would you rather handle them?
19:44:20 * goozbach has been flaky in that regard lately
19:44:28 <goozbach> w/ the transistion of power :)
19:44:43 <nirik> either way. Happy to do whatever works. If you would be interested in sending minutes and announcement I could run the meetings?
19:45:23 <smooge> I liked having someone do that.
19:45:26 <smooge> it really helped
19:45:33 <smooge> thankyou goozbach
19:45:34 <nirik> yeah.
19:45:55 <goozbach> ok so I'll handle announcement, agenda reminder, and minutes
19:46:00 <goozbach> and let you run the meeting
19:46:05 <goozbach> best of both worlds
19:46:17 <goozbach> this brings up another question
19:46:23 <nirik> sounds good. ping me before sending and I can see if I need to add anything.
19:46:24 <StylusEater> I've been posting our meeting notes after cleaning up the wiki section so I'll keep doing that and start reading the code base for fas.
19:46:33 <nirik> StylusEater: sounds great.
19:46:33 <goozbach> is it possible to do out-of-band meeting notes with zodbot?
19:46:50 <nirik> goozbach: not sure what you mean...
19:46:52 <ricky> Not that I know of
19:46:57 <goozbach> instead of me taking notes by spamming the meeting channel
19:47:01 <ricky> I assume he means keeping notes in /msg instead of in channel
19:47:09 <nirik> ah, nope.
19:47:17 <nirik> but thats a good suggestion for upstream.
19:47:18 <goozbach> #action goozbach to do meeting announcement and minutes
19:47:29 * StylusEater is glad we switched to git
19:47:50 <nirik> ok, anything else? or shall we call it a meeting today?
19:48:04 * CodeBlock has nothing else; goes to see why app02 is evil.
19:48:40 * ricky goes off for a bit, see you later - will hopefully be around a little bit this weekend depending on how hectic things are :-/
19:48:54 <nirik> Thanks for coming everyone... continue discussion over in #fedora-admin.
19:48:56 <nirik> #endmeeting