20:00:28 <mmcgrath> #startmeeting Infrastructure
20:00:29 <zodbot> Meeting started Thu Apr 22 20:00:28 2010 UTC.  The chair is mmcgrath. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:00:31 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
20:00:35 <mmcgrath> zodbot: do as I mean not as I say
20:01:01 <gholms|work> Hehe
20:01:15 * nirik is hanging around in the cheap seats.
20:01:19 <mmcgrath> #topic who's here?
20:01:22 * ricky 
20:01:24 * a-k is
20:01:39 <smooge> here
20:01:41 * Infern4us 
20:01:48 <smooge> needs coffee
20:02:11 * mdomsch 
20:02:14 <mmcgrath> Ok, lets get started
20:02:17 <mmcgrath> #topic Final Release
20:02:25 <mmcgrath> The final F13 release is on the way here pretty quick.
20:02:34 <mmcgrath> Our final freeze goes into place on the 4th IIRC.
20:03:23 <mmcgrath> Anyone have any questions or concerns about that?
20:03:30 <mmcgrath> any major projects to get deployed before then?
20:03:33 <mmcgrath> I have only 2 major change
20:03:34 <mmcgrath> s
20:04:24 <mmcgrath> Alrighty, well we can move on.
20:04:38 <ricky> What are the changes?
20:04:39 <smooge> hmm isn't that the same day as U-10.04?
20:04:52 <mmcgrath> ricky: going into those right now
20:04:56 <mmcgrath> #topic Insight
20:04:58 <mmcgrath> stickster: ping
20:05:24 <mmcgrath> I'm wondering if there's anything we can get in place today now so there's less to do later.
20:05:24 <smooge> mmcgrath, I have no projects for that time. I was going to deploy rsyslog next week
20:05:31 <stickster> mmcgrath: pong
20:05:48 <mmcgrath> stickster: hey, so is there any insight bits that can be done now?
20:05:49 <mdomsch> not much time before the freeze then
20:05:59 <mmcgrath> anything that, even though the whole project isn't ready, parts could be deployed now?
20:06:49 <mmcgrath> stickster: I'm thinking even if the base stuff is in place and not advertised it'd help increase the chances of success.
20:06:50 <stickster> mmcgrath: There are still both styling and technical bits that have critical or blocker bugs attached
20:07:15 <mmcgrath> what are the nature of the changes that are still to be made?  packaging?  upstream stuff?
20:07:15 * hydh is here too
20:07:42 <stickster> mmcgrath: There are problems with the authentication that still need to be solved, then upstreamed to the fedora-zikula module and released
20:07:57 <stickster> The styling bugs are not as pernicious but will take some time to resolve
20:07:59 <mmcgrath> so in your estimation, we still on track for deployment later in the month?
20:08:15 <stickster> mmcgrath: http://lists.fedoraproject.org/pipermail/logistics/2010-April/000510.html
20:08:17 <mmcgrath> stickster: also, how much of this code is stuff we'll have to maintain?
20:08:29 <stickster> No, we agreed to push off to post-GA
20:08:35 <mmcgrath> ah, k.
20:08:37 <stickster> There's not much code we have to maintain
20:08:39 <mmcgrath> I missed that, sorry.
20:08:41 <stickster> AuthFAS module is about it.
20:08:46 <mmcgrath> excellent.
20:08:48 <stickster> And that's fairly understandable
20:08:58 <mmcgrath> stickster: ok, thanks for the latest.  Anything else?
20:09:10 <stickster> It's the other issues we still have to solve that weren't ready for our go/no-go that caused us to wave off.
20:09:28 <stickster> logistics@ list is where discussion is taking place about what we're going to do next.
20:09:32 <stickster> eof
20:09:44 <mmcgrath> stickster: thanks
20:09:48 <mmcgrath> ok, next topic
20:09:52 <mmcgrath> #topic netapp migration
20:10:02 <mmcgrath> This is something I wanted to have done before the beta but failed to do so
20:10:10 <smooge> ok what is this?
20:10:26 <mmcgrath> basically I need to move alt and whatever is left on the secondary1 drives, to the netapp.
20:10:53 <mmcgrath> smooge: so they'll show up on download.fedora.redhat.com
20:11:41 <mmcgrath> any questions or concerns about that?
20:11:53 <mmcgrath> For me the big one is trying to figure out exactly how to let everyone continue to upload their content.
20:11:53 <mdomsch> nah, they're small
20:11:55 <smooge> not really
20:12:04 <mmcgrath> AFAIK it'll all be the same way.
20:12:12 <mmcgrath> Ok, moving on :)
20:12:14 <smooge> oh.. there is that
20:12:20 <mdomsch> log into a server that has it mounted r/w
20:12:27 <mdomsch> right now that's secondary1 for alt
20:12:36 <smooge> who is allowed to do this?
20:12:41 <mmcgrath> mdomsch: well I'm thinking they'd still be allowed to do that
20:12:48 <mmcgrath> but then I'm not sure what to do with secondary1's actual drives :)
20:12:56 <mdomsch> altvideo group can for /pub/alt/video/
20:12:58 <mmcgrath> maybe just have them sync from the netapp and continue to expose.
20:13:02 <mmcgrath> smooge: there's an SOP
20:13:04 * mmcgrath gets it
20:13:05 <mdomsch> yeah
20:13:16 <mmcgrath> smooge: http://fedoraproject.org/wiki/Content_Hosting_Infrastructure_SOP
20:13:25 <mmcgrath> giving users direct access to the netapp concerns me a bit
20:13:37 <mmcgrath> but really it's a completely different share then the /pub/fedora and /pub/epel stuff
20:13:49 <mmcgrath> and the only thing they could do is fill the disk up which A) we monitor and B) is easy to fix
20:13:58 <tremble> Which netapp modules are yu using?
20:14:13 <smooge> ok so it will need to be a seperate partition/log-volume on the netapp
20:14:18 <mmcgrath> tremble: I always forget.
20:14:26 <mmcgrath> smooge: <nod> it already is.
20:14:40 <mmcgrath> smooge: oh wait, not a seperate 'partition' in that way
20:14:53 <mmcgrath> since we don't really know what future expansion will be
20:15:05 <mmcgrath> this will allow either side of the house to grow without us having to guess.
20:15:11 <mdomsch> so alt.fp.o becomes a new VM too?
20:15:16 <skvidal> oh crap, the meeting
20:15:20 <tremble> FWIW $POE uses a netapp.
20:15:20 <skvidal> sorry about being late
20:15:22 <smooge> ah well I was wondering about setting up a netapp quota and not having to worry about filling
20:15:37 <mmcgrath> mdomsch: I haven't figured that part out yet, I might just see if download.fedora.redhat.com will start accepting alt.fedoraproject.org
20:15:41 <mmcgrath> smooge: that could work too.
20:16:02 <smooge> it allows for us to also seperate out differing snapshot schedules and such
20:16:43 <mmcgrath> <nod>
20:16:49 <mmcgrath> so anyone have any other questions or comments on that/
20:16:50 <mmcgrath> ?
20:17:48 <smooge> no we can talk offline
20:17:56 <mmcgrath> k
20:17:59 <mmcgrath> next topic!
20:18:06 <tremble> At $POE we've found that having 1 or 2 aaggregates and multiple thin provisioned volumes works well as long as you monitor the aggregates
20:18:11 <mmcgrath> #topic collectd
20:18:19 <mmcgrath> So I've added some more collectd modules
20:18:25 <mmcgrath> Of particular interest are these 3
20:18:49 <mmcgrath> ping test:
20:18:51 <mmcgrath> .tiny https://admin.fedoraproject.org/collectd/bin/index.cgi?hostname=log01&plugin=ping&timespan=3600&action=show_selection&ok_button=OK
20:18:53 <zodbot> mmcgrath: http://tinyurl.com/zddwih
20:19:16 <mmcgrath> postgres connections:
20:19:20 <mmcgrath> .tiny https://admin.fedoraproject.org/collectd/bin/index.cgi?hostname=db02&plugin=pg_conns&timespan=3600&action=show_selection&ok_button=OK
20:19:23 <zodbot> mmcgrath: http://tinyurl.com/zddv5d
20:19:36 <mmcgrath> mdomsch: you might be interested in what happened to mirrormanager there in the last hour
20:19:44 * gholms|work hopes that doesn't cause extra URLs to show up in the minutes
20:19:56 <mmcgrath> .tiny https://admin.fedoraproject.org/collectd/bin/index.cgi?hostname=proxy3&plugin=haproxy&timespan=3600&action=show_selection&ok_button=OK
20:19:58 <zodbot> mmcgrath: http://tinyurl.com/zddtjd
20:20:02 <mmcgrath> and that's the last one, haproxy by site
20:20:21 <mdomsch> looking
20:20:24 <ricky> Is that response time I see?  Veeery nice
20:20:55 <mmcgrath> ricky: which one?  the haproxy one?
20:21:04 <mdomsch> ricky, what unites?
20:21:04 <mmcgrath> nope that's actually...
20:21:05 <mdomsch> units?
20:21:12 <mmcgrath> stot: requests/s
20:21:19 <mmcgrath> econ: errors/s
20:21:28 <mmcgrath> eresp: err responses/s
20:21:29 <mdomsch> so, every 10 minutes on the dot, we spike in mirrorlist requests
20:21:31 <ricky> Ah, OK
20:21:38 <mmcgrath> econ: is error connections /s
20:21:46 <mmcgrath> ricky: there's LOTS we can get out of haproxy if you want to add something
20:21:49 <mdomsch> for about a minute then it drops back down
20:21:50 <mmcgrath> response time is on my list.
20:21:58 <mmcgrath> mdomsch: what did you think about MM db connections there?
20:22:29 <smooge> hmm the tiny urls dont seem to work
20:22:43 <mmcgrath> smooge: pooh, interesting
20:22:49 <mmcgrath> use the longer ones then :)
20:23:06 <mdomsch> mmcgrath, blow that out over a larger time scale...
20:23:14 <mmcgrath> yeah it's pretty common
20:23:17 * mdomsch bets that's the crawler with 80 threads
20:23:25 <mmcgrath> mdomsch: that could very well be.
20:23:26 <mdomsch> tailing off at the end of the run
20:23:30 <mmcgrath> yeah
20:23:44 <mdomsch> it tries to keep 80 threads running at once, starting a new one as one completes
20:24:04 <mdomsch> so it'll flatline around 80, then tail off, the jump back to 80 for a while
20:24:04 <smooge> beb back in a sec
20:24:09 <mmcgrath> <nod>
20:24:18 <mmcgrath> but yeah, we now have more visibility into our applications then ever before.
20:24:31 <mmcgrath> we've learned a great deal about our environments just in the last couple of weeks from collectd.
20:24:32 <mdomsch> yep, that's what it's doing.  Nice graphs. :-)
20:24:35 <mmcgrath> in particular it's the 10s resolution.
20:24:43 <mmcgrath> it is just so much detail that we were missing before.
20:24:45 <smooge> ok dog thrown outside
20:25:27 <mmcgrath> anyone have any questions / requests?
20:25:44 <smooge> no thanks for this
20:25:59 <smooge> oh one question
20:26:06 <smooge> what does the ping test against?
20:26:28 <mmcgrath> We can have it run from everywhere but right now I've got it running on log1 (which is the central ping server)
20:26:31 <mmcgrath> maybe I should have used noc1.
20:26:32 <mmcgrath> anywho.
20:26:37 <mmcgrath> it then pings out to the hosts from there
20:26:39 <mmcgrath> just an ICMP ping
20:26:47 <mmcgrath> then tracks latency, std dev, and drop rate.
20:27:02 <mmcgrath> How do you add more hosts?
20:27:06 <mmcgrath> I'm glad you asked mmcgrath  :)
20:27:10 <mmcgrath> collectd::ping { 'ping':
20:27:10 <mmcgrath> hosts => ['tummy1.fedoraproject.org', 'telia1.fedoraproject.org', 'serverbeach4.fedoraproject.org', 'serverbeach1.fedoraproject.org', 'osuosl1.fedoraproject.org']
20:27:11 <ricky> Heheh
20:27:13 <mmcgrath> }
20:27:18 <mmcgrath> add that to the node or server group you want
20:27:19 <smooge> ah cool
20:27:22 <mmcgrath> and collectd will do the rest.
20:27:36 <smooge> I think we want noc01/noc02 as the ping testers.
20:27:41 <mdomsch> mmcgrath, how does haproxy determine if mirror-lists is down ?
20:27:42 <smooge> but log01 works too
20:28:24 <mmcgrath> mdomsch:  it hits /mirrorlist every 5 seconds, 3 failures in a row takes that node out.
20:28:37 <ricky> It should go by timeouts we have set or http status codes
20:28:38 <mmcgrath> smooge: actually that is a good transition into the next topic I wanted to bring up (also monitoring oriented)
20:28:44 <mdomsch> ah.  then I bet that's the hourly cache refresh non-responsiveness doing it
20:28:53 <mmcgrath> anyone have any questions or comments on this?
20:28:56 <mdomsch> that's kind of a short timeout...
20:29:39 <tremble> Suppose it depends wht you consider acceptable down time.
20:29:56 <ricky> Oh, that actually makes sense.  I wonder how long one cache refresh takes an app server out for
20:30:12 <mmcgrath> mdomsch: does the refresh staggar at all?
20:30:36 <mmcgrath> mdomsch: actually that doesn't match up
20:30:39 <mmcgrath> it's an hourly refresh.
20:30:45 <mmcgrath> but we see stuff going down more often then that
20:30:48 <mmcgrath> mdomsch: take a look at proxy3 -
20:30:50 <mmcgrath> grep mirror /var/log/messages
20:30:56 <mmcgrath> anywho, we can discuss that more in a bit.
20:30:58 <mdomsch> ok, we don't have to solve it here
20:31:01 <mmcgrath> <nod>
20:31:03 <mmcgrath> #topic Nagios
20:31:04 <mmcgrath> so
20:31:09 <mmcgrath> right now we have noc1 and noc2.
20:31:20 <mmcgrath> if we move to nagios3, it becomes easier to merge the two.
20:31:43 <mmcgrath> but I'm still wary about having monitoring only in PHX2
20:32:12 <smooge> how does it merge them?
20:32:19 * ricky wouldn't mind moving noc2 out of germany though :-/  As nice as it is to get a perspective from there, it often gives alerts on network issues we can't do anything about
20:32:27 <mmcgrath> smooge: well, nagios3 has a better ability to realize multiple IPs for a given host.
20:32:40 <mmcgrath> ricky: agreed.
20:32:46 <smooge> ah but we would still have hairpin problems
20:32:50 <mmcgrath> smooge: so we can do the whole 'internal' and 'external' test without problems.
20:32:55 <mmcgrath> yeah
20:33:05 <mmcgrath> yeah that's it then, there's a blocker there.
20:33:15 <mmcgrath> because external to PHX2 we can't monitor everything in phx2.
20:33:21 <mmcgrath> inside phx2 we can't monitor everything in phx2 :)
20:33:29 <mmcgrath> so we'll probably have to keep that dynamic at least somewhere.
20:33:41 <mmcgrath> lets think on it a bit
20:33:53 <smooge> I would make noc03 in ibiblio and go from there
20:34:19 <mmcgrath> I'm hoping to work with pvangundy on that, he's a volunteer that's been gone for a while.
20:34:22 <mmcgrath> He's back but has been busy
20:34:25 * mmcgrath hopes he gets less busy
20:34:30 <mmcgrath> anywho, anything else on that for the meeting?
20:34:48 <mmcgrath> alrighty
20:34:53 <mmcgrath> #topic search engine
20:34:57 <mmcgrath> a-k: whats the latest?
20:35:07 <a-k> I've got DataparkSearch on publictest3
20:35:15 <mmcgrath> url?
20:35:17 <a-k> #link http://publictest3.fedoraproject.org/cgi-bin/dpsearch
20:35:28 <a-k> DataparkSearch forked from mnoGoSearch in 2003
20:35:35 <a-k> Mostly so far it seems like a broken version of mnoGoSearch
20:35:54 <a-k> I've indexed only a tiny number of documents from the wiki
20:36:03 <mmcgrath> :( that's no fun
20:36:06 <a-k> I'll poke it a little more to see how bad it is
20:36:17 <a-k> More docs, etc
20:36:29 <ricky> Search is hard :-/
20:36:29 <hydh> hehe
20:36:41 <mmcgrath> a-k: thanks, anything else for now?
20:36:49 <a-k> I don't think so
20:36:54 <mmcgrath> alrighty
20:37:01 <mmcgrath> Well with that I'll open the floor
20:37:04 <mmcgrath> #topic Open Floor
20:37:11 <mmcgrath> anyone have anything they'd like to discuss?
20:37:50 <mmcgrath> if not we'll close in 30
20:39:08 <mmcgrath> sweet, silence is golden
20:39:11 <mmcgrath> ok
20:39:12 <mmcgrath> #endmeeting