fedora-meeting
LOGS
20:02:43 <smooge> #startmeeting Fedora Infrastructure 2010-03-11
20:02:43 <zodbot> Meeting started Thu Mar 11 20:02:43 2010 UTC.  The chair is smooge. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:02:45 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
20:02:51 <smooge> thankyou zodbot
20:03:09 <smooge> mmcgrath is finishing up a meeting so I am just helping along
20:03:26 <smooge> #topic Roll Call
20:03:29 * ricky 
20:03:32 <smooge> smooge
20:03:36 * nirik is around.
20:03:36 * a-k is here
20:04:05 <smooge> mdomsch said he was here a second ago before I started meeting
20:04:20 <smooge> mmcgrath will be here soon.
20:04:37 * ayoung is here
20:04:40 <smooge> ok let me pull up the other standard things we do in the meeting
20:05:37 <smooge> #topic Meeting Tickets
20:06:39 <smooge> According to trac.. there are no tickets for this meeting :)
20:06:52 <smooge> am i missing anything people know?
20:07:06 <smooge> ok moving on
20:07:09 <smooge> #topic Alpha Release - https://fedorahosted.org/fedora-infrastructure/report/9
20:07:20 <smooge> alpha release was on Tuesday
20:08:31 <smooge> ok tickets
20:08:43 <smooge> sorry for my slowness guys.. my greps aren't as fast as I hoped
20:08:49 <smooge> .ticket 1944
20:08:52 <zodbot> smooge: #1944 (Fedora 13 Alpha Partial Infrastructure Freeze 16/Feb - 3/March) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/1944
20:08:58 <a-k> # topic?
20:09:12 <mmcgrath> yo
20:09:14 * mmcgrath here :)
20:09:22 <smooge> hi mmcgrath
20:09:27 <mmcgrath> herro
20:09:36 <mmcgrath> smooge: you want to keep going or you want me to take over?
20:09:43 <smooge> hopefully I am helping here ... but its been real quiet. y
20:09:46 <smooge> you can take over...
20:09:53 <mmcgrath> alrighty
20:09:54 <smooge> you get people to say things :)
20:10:06 <smooge> #chair mmcgrath
20:10:07 <zodbot> Current chairs: mmcgrath smooge
20:10:12 <mmcgrath> hehehe
20:10:18 <mmcgrath> so this alpha release went fine, but with oddities.
20:10:25 <mmcgrath> I'll go ahead and close
20:10:27 <mmcgrath> .ticket 1944
20:10:28 <mmcgrath> and
20:10:29 <zodbot> mmcgrath: #1944 (Fedora 13 Alpha Partial Infrastructure Freeze 16/Feb - 3/March) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/1944
20:10:30 <mmcgrath> .ticket 1990
20:10:33 <zodbot> mmcgrath: #1990 (Release Day Ticket) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/1990
20:10:40 <mmcgrath> lets talk about .ticket 1992
20:10:43 <mmcgrath> .ticket 1992
20:10:44 <zodbot> mmcgrath: #1992 (Lessons Learned) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/1992
20:11:04 <mmcgrath> So the first problem we actually ran into was that bapp1 had puppet disabled.
20:11:09 <mmcgrath> ricky: you around by chance?
20:11:11 <ricky> Yup
20:11:29 <mmcgrath> so having puppet disabled on bapp1 did what exactly?
20:12:05 <ricky> syncStatic is run by the apache user on bapp01 now, so that prevented it from getting the change to update the website
20:12:27 <mmcgrath> got'cha, so new syncStatic didn't make it on to the server so new website didn't make it to the proxy servers.
20:12:44 <mmcgrath> this is a monitoring thing, so when we get to monitoring which servers have puppet disabled, that one will go away
20:12:57 <mmcgrath> The other thign that happened was with our i2 netapp
20:13:09 <mmcgrath> Smooge and I just got off a meeting with the storage team about this.
20:13:19 <mmcgrath> basically it took several hours to transfer 16G worth of blocks.
20:13:35 <mmcgrath> the temporary fix was to put our sync traffic at a higher QoS then other traffic
20:13:44 <mmcgrath> long term though there was actually something wrong with the link... again.
20:13:53 <mmcgrath> they know and they're working on it
20:14:02 <mmcgrath> but it's certainly something we're going to want to track ourselves.
20:14:25 <mmcgrath> But, really the alpha went well even with those things... for non i-2 users anyway.
20:14:34 <mmcgrath> we had 85%+ good mirror rate
20:15:00 <mmcgrath> And the last thing I'd say is boot.fedoraproject.org
20:15:01 <smooge> and once the I-2 got up I think they had little issues too
20:15:03 <mmcgrath> Oxf13: ping
20:15:08 <mmcgrath> smooge: indeed
20:15:57 <mmcgrath> well, right now I just did boot.fedoraproject.org
20:16:02 <mmcgrath> but it dawns on me that's sort of a releng task.
20:16:04 <mmcgrath> one that I'm happy to do
20:16:17 <mmcgrath> but I wanted to check with Oxf13 as to where he things the SOP should sit, in Infrastructure or RELENG.
20:16:30 <mmcgrath> it's a minor distinction in this case, but making one group responsible will ensure that it always gets done :)
20:16:52 <mmcgrath> So anyway
20:17:01 <mmcgrath> anyone have any other questions about the alpha release?
20:17:37 <mmcgrath> alrighty
20:17:38 <mmcgrath> well
20:17:41 <mmcgrath> #topic Next freeze
20:17:46 <mmcgrath> The next freeze is coming upretty quick
20:17:51 <mmcgrath> #link http://fedoraproject.org/wiki/Schedule
20:18:00 <mmcgrath> Infrastructure will be freezing on the 23rd.
20:18:03 <mmcgrath> that's less then 2 weeks.
20:18:14 <mmcgrath> so keep that in mind as you're deploying new things.
20:18:24 <smooge> so we should go slushy
20:18:29 <smooge> ?
20:18:37 <mmcgrath> slushy?
20:19:04 <ricky> Ice, slushy, freeze?  :-)
20:19:04 <mdomsch> I have some MM fixes to get out, but may not make it before the next freeze :-(
20:19:20 <mmcgrath> mdomsch: anything I can do to help?
20:19:40 * mdomsch needs to test/fix one part, but it's been a month, so I forget which part it was ;-(
20:20:07 <smooge> sorry I meant, if you are going to be testing stuff for over 2 weeks.. not the time to do it.
20:20:18 <mmcgrath> yeah
20:20:35 <mmcgrath> I'll try to send a couple of reminders as the time gets closer.
20:20:35 <mdomsch> mmcgrath, nothing huge; I'll get to it, or not...
20:20:39 <mmcgrath> the freeze always sneak up on us :)
20:21:03 <mmcgrath> anyone have anything else on that?
20:21:05 <smooge> we have updates to do this/next week
20:21:29 <mmcgrath> skvidal: ping
20:21:32 <skvidal> pong
20:21:33 <mmcgrath> smooge: lets talk about that
20:21:37 <mmcgrath> #topic Monthly Update
20:21:38 <smooge> okie dokie
20:21:49 <mmcgrath> skvidal: ok, so the list I've been working from with you is very nearing completion.
20:22:02 <mmcgrath> skvidal: think we'll be in any place to update soon?
20:22:35 <skvidal> yah - I think likely
20:22:40 <skvidal> though not sure TOMORROW is gonna happen
20:22:54 <smooge> thats fine tuesday sound ok?
20:23:07 <skvidal> sounds possible - depends what gets set on fire over the weekend
20:23:12 <mmcgrath> tuseday of next week sounds good, the freeze starts one week after that.
20:23:20 <smooge> yeah which was why I didn't want monday :)
20:23:30 <skvidal> unless there are A LOT of yum bugs that show up in f13a over the weekend
20:23:34 <skvidal> then I should have the time
20:23:37 <smooge> wghat are the changes you are working on?
20:24:22 <mmcgrath> smooge: I've been removing old hosts from puppet
20:24:35 <mmcgrath> skvidal's been working on having func use those hosts and coming up with a solid update script
20:24:38 <mmcgrath> or func program
20:24:40 <skvidal> smooge: you talking to me or mmcgrath?
20:24:41 <mmcgrath> I'm not sure what they're called.
20:24:45 <skvidal> okay
20:24:49 <skvidal> so here's all it ids
20:24:50 <skvidal> err is
20:25:07 <skvidal> 1. make it so our func certs aren't constantly screwed up
20:25:08 <smooge> ah
20:25:18 <skvidal> 2. make it so our func minions match our puppet minions
20:25:42 <skvidal> 3. have a script to let us do searches/installs/updates/etc from puppet1 w/o having to schlep all over the place
20:26:15 <skvidal> the script I worked on a couple of weeks ago
20:26:26 <skvidal> and got something mostly functional - but with room for improvement
20:26:37 <mmcgrath> smooge: do you know if this update will require a reboot?
20:26:47 <skvidal> the first 2 is what I spent the week working on to solve the problem that our func minions were completely wrong
20:26:57 <skvidal> and mangled badly from the phx2 move + rename stuff
20:27:20 <smooge> well it depends on if a kernel update gets dropped over the weekend :). At the moment I don't think so
20:27:40 <mmcgrath> yeah so I've been most of my last afternoon and this morning renaming hosts.  I think I've got 3 hosts left and a little additional cleanup to do
20:28:26 <mmcgrath> OK
20:28:28 <smooge> no the worst will be an openssh restart
20:28:31 <mmcgrath> so anyone have anything else on this?
20:28:32 <mmcgrath> <nod>
20:28:37 <skvidal> umm
20:28:40 <skvidal> I have a couple more things
20:28:50 <mmcgrath> skvidal: have at it
20:29:22 <skvidal> so I'll see if I can get  a new func pkg out for all the hosts and a mechanism to update their minion.conf files to point to the puppet certificates
20:29:32 <skvidal> that's going to be the REALLY fun part :)
20:29:52 <mmcgrath> If you need any puppeting help let me know
20:30:02 <mmcgrath> you talking about a new func epel package or in the infra repo/
20:30:03 <skvidal> is minion.conf on the boxes puppet controlled now?
20:30:07 <skvidal> infra repo
20:30:08 <skvidal> for now
20:30:12 <skvidal> it'll eventually make it over to epel
20:30:15 <skvidal> but this is new cod
20:30:19 <ricky> I think so
20:30:23 <skvidal> e
20:30:47 <smooge> and if it isn't I will be happy to schlep what needs to be done
20:31:04 <skvidal> anyway - that's really all
20:31:23 <skvidal> the func-yum overlord script is fairly simple and could be added to by anyone as we go along
20:31:34 <skvidal> that's what's going to do the update/list updates/etc work
20:31:58 <gholms|work> I just wanted to mention that that is the greatest script name ever.
20:32:05 <mmcgrath> skvidal: and just becuase I'm overly paranoid... the time out bug?  all fixed in the new version?
20:32:34 <skvidal> mmcgrath: the timeout bug was fixed long long ago
20:32:45 <skvidal> we've been running an old version of func for quite sometime
20:33:17 <dgilmore> sorry im late
20:33:28 <mmcgrath> k
20:33:29 <mmcgrath> dgilmore: no worries
20:33:32 <skvidal> dgilmore: BETRAYER!
20:33:34 <smooge> the only thing I wanted to ask was what would be prefered for making func 'groups/classes'
20:33:37 * skvidal giggles
20:33:43 <smooge> skvidal, hey hey hey.. thats me
20:33:53 <mmcgrath> Ok, anyone have anything else on this?  If not we'll move on.
20:33:54 <skvidal> dgilmore: sorry, I just thought it was hilarious!
20:34:03 <dgilmore> skvidal: i had to get food for dinner and screwed up times
20:34:05 <smooge> nothing form me
20:34:15 <mmcgrath> ok
20:34:19 <mmcgrath> #topic Search Engine
20:34:24 <mmcgrath> a-k: ping, want to take this?
20:34:29 <a-k> Sure
20:34:46 <a-k> I looking again at one of the candidates I eliminated from consideration earlier
20:34:55 <a-k> It advertises the most complete support for Unicode char sets
20:35:05 <a-k> I had eliminated it because I couldn't get it to work with SQLite
20:35:14 <a-k> ... and its user forum had no answered questions about how to get SQLite to work
20:35:16 <mmcgrath> which one was this?
20:35:32 <a-k> mnoGoSearch
20:35:38 <Oxf13> pong
20:35:52 <a-k> So I'm not there yet, but is there a db server I can use in pub test or do I need to install my own?
20:36:00 <a-k> Either MySQL and PostgreSQL should work
20:36:11 <mmcgrath> a-k: we'll have one for when we move to staging and production, but on the pt servers, just yum and install one
20:36:41 <a-k> OK.  I'm still working on it locally, but I'll move to pt when I'm ready
20:36:46 <mmcgrath> cool
20:36:54 <a-k> I think that's it for now
20:37:09 <mmcgrath> a-k: thanks
20:37:18 <mmcgrath> Anyone have any questions for a-k about that?
20:37:21 <dgilmore> a-k: do we really care for sqlite support?
20:37:49 <a-k> I thought SQLite would be easier/preferable to MySQL or PostgreSQL
20:38:03 <mmcgrath> easier for a demo but probably not for production
20:38:10 <a-k> None of the other candidates had needed an external db
20:38:27 <mmcgrath> interesting
20:38:34 <mmcgrath> they all had their own local store then?
20:38:38 <a-k> The other ones use their own local db
20:38:48 <mmcgrath> yeah
20:38:54 <mmcgrath> alright, anyone have anything else?
20:39:20 <mmcgrath> alrighty
20:39:23 <mmcgrath> #topic Monitoring
20:39:29 <mmcgrath> SO we talked about this on infrastructure for a bit.
20:39:34 <mmcgrath> it leaves us in an ackward position
20:39:38 <mmcgrath> do we just dump zabbix now?
20:39:40 <smooge> mmcgrath, did you want to get Oxf13 while he was here.
20:39:42 <mmcgrath> go back to nagios for a bit?
20:39:43 <dgilmore> mmcgrath: yes
20:39:45 <mmcgrath> smooge: oh right
20:40:11 <mmcgrath> Oxf13: did you want me to put the boot.fedoraproject.org as a releng SOP or an infrastructure SOP?  it seems more releng, I'm happy to do it for as long as I'm part of Fedora but it should be documented :)
20:40:30 <Oxf13> that's a good question.
20:40:46 <Oxf13> it does sound relengy
20:41:16 <mmcgrath> it's pretty easy to maintain, I just want to make sure it actually gets done every release.
20:41:37 <mmcgrath> Oxf13: you're call.  I'll write it up this week sometime, just let me know :)
20:41:53 * dgilmore thinks is a releng thing
20:42:16 * mdomsch says releng ;-)
20:42:33 <mmcgrath> alrighty, well unless Oxf13 says otherwise I'll put it as a marketing sop :-P
20:42:45 <mmcgrath> naw, I'll just put it in releng for now and if we change our minds later we can move it.
20:42:55 <mmcgrath> ok, back to monitoring.
20:42:57 <mdomsch> mmcgrath, unless you write a script to run when releng pushes a tree, to update bko automatically.  then it's an infra ticket that gets opened and closed automagically
20:42:58 <ricky> I think zabbix has been taking away effort on improving our nagios monitoring, so I'm up for dumping
20:43:09 <Oxf13> releng works for us, and if you come with content that's even better (:
20:43:34 <mmcgrath> ricky: so the problem then is trending.
20:43:38 <mmcgrath> our cacti install
20:43:50 <mmcgrath> doesn't even seem to exist anymore :)
20:44:15 <ricky> Apart from request times for our websites, what do you want to be able to monitor that cacti can't?
20:44:18 <smooge> can I work on tht with someone?
20:44:20 <ricky> Well, easily can't
20:44:29 <mmcgrath> ricky: well, the problem is that we have to enter data in two locations
20:44:33 <mmcgrath> and that has inherit problems.
20:44:56 <mmcgrath> and doing custom trending in cacti can be tricky at times.
20:45:25 <ricky> What kind of problems?  You still get your notification when something goes down and the general idea of movements, right?
20:45:26 <mmcgrath> anyway, lets take a look at our current zabbix install
20:45:28 <mmcgrath> err nagios install
20:45:30 <mmcgrath> and see how it goes.
20:45:44 <mmcgrath> ricky: ehh, I use trending a lot, to see when things started.
20:45:54 <mmcgrath> the alert is when a threshold started, but it's usually only a small part of the story.
20:45:56 <ricky> I've also not been crazy about running the zabbix agent public facing either.
20:46:11 <mmcgrath> for example when MM had the bloated pickle issues.
20:46:21 <mmcgrath> we didn't get the alert until days after MM was upgraded.
20:46:30 <mmcgrath> without trending we wouldn't have noticed when the problems started
20:46:38 <ricky> Yeah, but cacti would still have given you the big picture you needed, right?
20:46:44 <dgilmore> ricky: i rember you disabled catci
20:46:44 <mmcgrath> if it were in there.
20:46:55 <mmcgrath> trying to keep cacti and nagios in sync is goign to be a pretty big pain.
20:47:11 <mmcgrath> especially when we start wanting to monitor artibrary bits of info with cacti
20:47:16 <smooge> ricky, that is my major problem with it also
20:47:19 <mmcgrath> it gets complex pretty quick
20:47:31 <smooge> I think we should have one agent per server.. and thats func
20:47:38 <ricky> dgilmore: Probably, I don't running public facing stuff we're not using :-)
20:47:58 <dgilmore> ricky: from memory there was a security bug in cacti
20:48:01 <mmcgrath> ricky: FWIW, I still use zabbix, it's sitting on my desktop now and beeps at me :)
20:48:05 <dgilmore> and rather than fix it you disabled it
20:48:12 <dgilmore> but i could be remebering wrong
20:48:30 <mmcgrath> that could be
20:48:40 <ricky> Sounds like me alright
20:48:53 <mmcgrath> well no rush to make a decision today.  nagios alerts are still going out and paged alerts are going out
20:49:04 <mmcgrath> and at the moment, zabbix does have trending of some of the important things we need
20:49:07 <mmcgrath> like /mnt/koji usage
20:49:12 <ricky> The question is whether we should be spending time on improving nagios monitoring
20:49:25 <ricky> Like adding checks and keeping it in sync with hosts.  That stuff has mostly stagnated since we looked at zabbix
20:49:33 <mmcgrath> yeah
20:49:37 <mmcgrath> I'm not sure how out of sync they are.
20:49:42 <mmcgrath> and zabbix has a lot of stuff nagios doesn't monitor.
20:49:47 <mmcgrath> but that nagios doesn't really need to
20:49:47 <ricky> Is that a yeah to "we should be spending time on it" ?  :-)
20:50:04 <mmcgrath> just yeah, that stuff has stagnated
20:50:16 <mmcgrath> I'll take a look at things and try to get a better estimate of the work that needs to be done.
20:50:58 <dgilmore> ricky: i know we have hosts not in nagios
20:51:46 <mmcgrath> yeah
20:51:56 <mmcgrath> but most of our exernal service are still properly montiored by it
20:52:02 <mmcgrath> properly(ish)
20:52:06 <mmcgrath> ok, so more work to do, I'll get on that.
20:52:13 <mmcgrath> anyone have anything else?  If not I'll open the floor
20:52:32 <smooge> all I want in a monitoring solution is something where adding a host to be monitorer can be done via puppet +files
20:53:13 <mmcgrath> yeah
20:53:13 <mmcgrath> ok
20:53:16 <ricky> Same here - the question seems to be what we want for a trending solution :-/
20:53:18 <smooge> I hate clicking on things or depending on having to click on things. But open floor
20:53:18 <mmcgrath> #topic open floor
20:53:47 <mmcgrath> ricky: I would love it if something like sar had the ability for arbitrary input :)
20:54:02 * gholms|work raises hand
20:54:06 <mmcgrath> gholms|work: yo
20:54:07 <gholms|work> Do you folks still plan on moving to Zenoss once 3.0 is released?
20:54:21 <mmcgrath> gholms|work: I don't think we ever planned on moving to Zenoss
20:54:28 <mmcgrath> so not to still, and no to zenoss 3.0 :)
20:54:37 <gholms|work> Heh, ok.
20:54:42 * gholms|work wonders where that idea came from...
20:54:58 <mmcgrath> I thought zenoss wasn't totally free?
20:55:02 <mmcgrath> I know it's not in Fedora yet, and that's a requisite
20:55:27 <gholms|work> It has a FOSS version.  The big hangup was that it relied on bundled Python 2.4.
20:55:30 <mmcgrath> Ok, anyone have anything else to discuss?
20:56:22 <ricky> Just want to get this some visibility
20:56:39 <ricky> Jason Walsh has been working on poseidon, an OpenID provider writtin in pylons to replace the broken stuff in FAS
20:56:59 <ricky> So I'll be trying to get it setup with FAS auth on a publictest soon
20:57:16 <mmcgrath> ricky: excellent
20:57:22 <mmcgrath> happy to hear that one's been making progress.
20:57:31 <ricky> Which will hopefully be a good workout for the auth middleware in python-fedora :-)
20:57:31 <mmcgrath> ricky: are we the only users / potential users?
20:57:54 <ricky> Sorry, not sure what you mean
20:58:02 <mmcgrath> does anyone else use poseidon?
20:58:22 <ricky> Oh, no - it was started and written for this
20:58:26 <mmcgrath> k
20:58:39 <dgilmore> ricky: do you think we could have a way to use openid, present the cla and loow wiki edits etc
20:58:56 <dgilmore> and things like bodhi we could use openid sans  cla
20:59:03 <dgilmore> for feedback
20:59:33 <ricky> Hm, that could get messy, as openids not linked to an existing FAS account could create all sorts of corner cases
20:59:35 <dgilmore> or am i all sorts of crazy
20:59:38 <mmcgrath> dgilmore: I think this is more of a provider
20:59:48 <mmcgrath> to accept openid from places would require a different sort of work
20:59:58 <dgilmore> mmcgrath: ok
20:59:59 <ricky> This is more of a "packagers can login to upstream bugzillas, etc. with their OpenID" sort of thing.
21:00:13 <ricky> (If those exist - or even just commenting on blogs)
21:00:21 <dgilmore> cool
21:00:23 * abadger1999 shows up in time for open mic^Wfloor
21:00:32 <dgilmore> just throwing my crazy ideas out there
21:00:40 <mmcgrath> yup yup
21:00:47 <mmcgrath> time to close the meeting, if no one has any objections we'll close in 30
21:01:01 * dgilmore wants to hear abadger1999
21:01:08 <ricky> ... sing :-)
21:01:16 <abadger1999> dgilmore: Interpretive dance :-)
21:01:24 <gholms|work> Over IRC?!?
21:01:26 <dgilmore> abadger1999: seen it already
21:01:26 <mmcgrath> we can talk about this next time or in #fedora-admin if we want, lets not hold everyone here :)
21:01:30 <dgilmore> mmcgrath: close it up
21:01:31 <mmcgrath> #endmeeting