infrastructure
LOGS
20:00:24 <mmcgrath> #startmeeting infrastructure
20:00:24 <zodbot> Meeting started Thu Sep 16 20:00:24 2010 UTC.  The chair is mmcgrath. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:00:24 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
20:00:29 <mmcgrath> #meetingname infrastructure
20:00:29 <zodbot> The meeting name has been set to 'infrastructure'
20:00:33 <mmcgrath> #topic who's here
20:00:35 * lmacken 
20:00:35 <mmcgrath> who's here?
20:00:38 <abadger1999> hey
20:01:30 * mmcgrath waits a bit
20:02:10 <abadger1999> mmcgrath: Wanna have open floor first? :-)
20:02:26 <mmcgrath> abadger1999: actually that would be great, I'm needing to get tickets in place for the beta release.
20:02:33 <mmcgrath> We can end with it too just in case
20:02:36 * mdomsch 
20:02:38 <mmcgrath> #topic Open Floor
20:02:41 <mmcgrath> abadger1999: you have something?
20:02:45 <abadger1999> yeah
20:02:46 <abadger1999> https://fedoraproject.org/wiki/LATAM_Infrastructure
20:03:01 <abadger1999> Talked to the latam infra people a few weeks ago and been forgetting to bring it up.
20:03:14 <mmcgrath> Yeah I had a conversation or two with them as well
20:03:14 <abadger1999> I had them list some of the things that they need.
20:03:25 <mmcgrath> basically got just far enough to tell them not to auth against us except for using ssh keys.
20:03:36 <abadger1999> I don't know how we can satisfy them but I figure knowing what the issues are is the first step.
20:03:38 <mmcgrath> which is not particualrly helpful for what they're trying to do unfortunately :(
20:03:55 <smooge> here
20:04:06 <mmcgrath> abadger1999: did they want us to host their DNS?
20:04:26 <abadger1999> mmcgrath: They want to get it so that it's not just Rodrigo being the contact.
20:04:44 <abadger1999> mmcgrath: I think that they're for us hosting it... figured that would be pretty easy to do.
20:04:56 <mmcgrath> yeah it's a transfer, and something we've done several times before.
20:05:02 <mmcgrath> it is, however, time consuming for some reason.
20:05:05 <mmcgrath> it just takes a whiel.
20:05:14 <smooge> a looooong while
20:05:23 <smooge> 1 year if you are in Malaysia
20:05:27 <mmcgrath> abadger1999: can you give a roundup of what all you talked about and what they're wanting to do?
20:05:49 <abadger1999> Easy stuff: get away from single points of person failure
20:06:04 <abadger1999> Like transfering DNS to fedora project so that one person can't take away the domain.
20:06:25 <abadger1999> Social stuff - integrate better into fedora.
20:06:29 <mmcgrath> do they have a team of sysadmins?
20:06:39 <mmcgrath> or something similar at least?
20:06:52 <abadger1999> ie: right now latam infra and community is pretty isolated from the noramerican/Europeans.
20:06:56 <abadger1999> Yes.
20:07:08 <abadger1999> All volunteers so they don't have as mch time as we do.
20:07:12 <abadger1999> Nor the hardware we do.
20:07:21 * dgilmore turns up
20:07:29 <abadger1999> But gomix nushio dbruno are all on the sysadmin team.
20:07:30 <mmcgrath> are they wanting to make websites for non-latam people?
20:07:51 <abadger1999> Not sure -- They want to make web apps for non-latam people.
20:07:54 <mmcgrath> or are they just focusing on it, but would like better access to the rest of the community for... idea sharing?  I'm not sure what word I want to use there.
20:08:02 <mmcgrath> knowledge pool is probably better.
20:08:12 <abadger1999> timpus -- events platform for all of the ambassadors everywhere.
20:08:23 <abadger1999> for instance.
20:08:41 <abadger1999> So they're more than just websites/documents.
20:09:01 <mmcgrath> and they're looking to host that for the larger ambassador community?
20:09:07 <abadger1999> Right.
20:09:24 <mmcgrath> I'm generally for that, I know this is probably a tough pill for some to swallow and might look weird.
20:09:39 <mmcgrath> but if we can properly empower teams like that to host their stuff, it lowers the barriers for them to create those apps
20:09:49 <abadger1999> Yep.  I agree.
20:09:50 <mmcgrath> while allowing us to keep the high quality architecture we currently have.
20:10:04 <abadger1999> i'm just not sure of how to make it all smooth.
20:10:22 <abadger1999> Like how to make the events platform auth against fas in a way that doesn't compromise security.
20:10:25 <mmcgrath> so we don't end up committing to a bunch of... side apps?  I'm not sure how to say that without seeming negative because I'm really for teams being able to provide for themselves where they are able.
20:10:39 <mmcgrath> abadger1999: yeah, that's the big 'got'cha' right now
20:11:18 <abadger1999> gomix and nushio will be at fudcon tempe so it might be good to have some plans around figuring out what we can do there.
20:11:27 <mmcgrath> yeah
20:11:27 <abadger1999> But also figuring out options right now would be good.
20:12:02 <mmcgrath> abadger1999: one thing I wanted to think about is if there's any sort of auth mechanism where the password itself never leaves the browser.
20:12:05 <abadger1999> Like SSL auth for their sites Or something.
20:12:07 <mmcgrath> but the encrypted form would?
20:12:20 <mmcgrath> I'm not sure how sensitive encrypted passwords should be considered.
20:12:28 <ninjazjb> Hello everyone, this is Jason Brown
20:12:31 <mmcgrath> just something else I thought was worth investigating.
20:12:38 <mmcgrath> ninjazjb: hello Jason, glad you could make it
20:12:49 <abadger1999> mmcgrath: There is -- but you still have to be careful about replay attack or simply, MITM causing something different than you expect to happen.
20:12:59 <ninjazjb> Thanks
20:13:00 <dgilmore> mmcgrath: id feel more comfortable with using ssl auth
20:13:41 <mmcgrath> abadger1999: yeah, I guess a replay could cause other non-official sites to get jacked at that point
20:13:45 <mmcgrath> anyway, a conversation for another time.
20:13:49 <mmcgrath> abadger1999: what else you got?
20:14:11 <abadger1999> That's it from me for now -- just wanted to get us thinking about it before fudcon.
20:14:14 <dgilmore> not that i think they would do it but there would be the potential to harvest  passwords which would  take constant code audits tomake sure it doesnt happen
20:14:18 <abadger1999> And point out the wiki page with the brainstorming
20:14:48 <mmcgrath> abadger1999: thanks
20:14:58 <mmcgrath> Ok, if no one has anything else on that, we'll get down to the F14beta business.
20:15:40 <mmcgrath> ok, lets do it
20:15:47 <mmcgrath> #topic Fedora 14 Beta.
20:16:11 <mmcgrath> https://fedorahosted.org/fedora-infrastructure/report/9
20:16:18 <mmcgrath> .ticket 2392
20:16:19 <zodbot> mmcgrath: #2392 (New website) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2392
20:16:30 * mmcgrath tries to summon sijis
20:16:57 <mmcgrath> we can skip that one for now
20:17:00 <mmcgrath> .ticket 2393
20:17:01 <zodbot> mmcgrath: #2393 (Verify Mirror Space) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2393
20:17:03 <mmcgrath> I'll get this one
20:17:26 <mmcgrath> .ticket 2394
20:17:30 <zodbot> mmcgrath: #2394 (Release day ticket) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2394
20:17:46 <mmcgrath> I'll nab that
20:17:51 <mmcgrath> actually
20:17:59 <mmcgrath> smooge: do you want to do the release day coordination this time?
20:18:47 <mmcgrath> we'll come back to that
20:18:53 <mmcgrath> .ticket 2392
20:18:54 <zodbot> mmcgrath: #2392 (New website) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2392
20:19:00 <mmcgrath> sijis: will we have a fancy new beta site?
20:19:16 <sijis> yep. i think we definitely will
20:19:17 <mmcgrath> or the old beta site?  what's the plan there?
20:19:27 <sijis> well. for GA == new site
20:19:35 <sijis> for Beta = existing site
20:19:42 <smooge> mmcgrath, is that putting in new tickets or doing overall tickets
20:19:43 <mmcgrath> you guys sure you don't want to release that a bit earlier then the actual release day?
20:19:49 <mmcgrath> smooge: one sec
20:19:57 <smooge> np slow typing
20:20:46 <mmcgrath> sijis: ok, well I do look forward to the new site.  Are you going to be point person for this release?
20:20:50 <sijis> mmcgrath: i don't think we'll have th site completely finished for beta
20:20:56 <sijis> yup
20:21:11 <mdomsch> I'd be happy to see the new website live a few days ahead of the release...
20:21:17 <mdomsch> even if it's not done by beta
20:21:30 <mdomsch> build momentum for the actual release day
20:21:31 <mmcgrath> sijis: can you accept that ticket?
20:21:32 <smooge> a week before release :)?
20:21:34 <mmcgrath> mdomsch: yeah
20:21:36 <mmcgrath> ok
20:21:39 <mdomsch> and not risk blowing things up on release day
20:21:45 <mmcgrath> mdomsch: that's my main concern.
20:21:52 <sijis> mmcgrath: will do
20:21:57 <mmcgrath> sijis: thanks
20:22:08 <sijis> you mean ticket 2392 or another one?
20:22:20 <mmcgrath> sijis: 2392
20:22:25 <mmcgrath> we'll move on to the next ticket :)
20:22:28 <mmcgrath> .ticket 2394
20:22:29 <zodbot> mmcgrath: #2394 (Release day ticket) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2394
20:22:36 <mmcgrath> smooge: would you like to do this?
20:22:53 <mmcgrath> It's basically just making sure everything gets done prior to us sending the announcement out.
20:22:53 <smooge> taking
20:23:06 <mmcgrath> for example, the website should be up and ready, all the links should work
20:23:09 <mmcgrath> that sort of thing.
20:23:18 <mmcgrath> smooge: sweet
20:23:20 <mmcgrath> ok
20:23:25 * sijis will make sure links work this time :)
20:23:52 <mmcgrath> smooge: the only downside for you is I think you'll have to get up early because release time is 8:00 am your time.
20:24:01 <smooge> I am of the opinion that for some of our audience a web page with a long list of href's is all we ever need :/
20:24:06 <mmcgrath> the website should generally get started around 7:30 your time because it takes a while to sync, that sort of thing.
20:24:20 <mmcgrath> ok, well it'll be good to have someone else go through that process for a change anyway
20:24:24 <smooge> ah ok so that day I need to be up at 0400
20:24:24 <mmcgrath> smooge: any questions?
20:24:33 <mmcgrath> :)
20:24:33 <smooge> to get coffee into system
20:24:43 <smooge> when is it currently planned?
20:24:53 <smooge> October?
20:25:02 <smooge> Or are we talking beta
20:25:06 <mmcgrath> September 28th
20:25:10 <smooge> crap
20:25:12 <mmcgrath> this one's the beta
20:25:13 <smooge> I can't do that
20:25:20 <smooge> I am in RDU that day for class.
20:25:22 <mmcgrath> that's ok, that's why we discuss these things :)
20:25:25 <mmcgrath> I'll grab that one
20:25:32 <smooge> I will be up though :)
20:25:51 <mmcgrath> ok, next ticket
20:25:57 <mmcgrath> .ticket 2395
20:25:58 <zodbot> mmcgrath: #2395 (Verify releng permissions) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2395
20:26:04 <mmcgrath> smooge: you want to get that one?
20:26:12 <smooge> taking
20:26:34 <mmcgrath> .ticket 2396
20:26:35 <zodbot> mmcgrath: #2396 (Add MirrorManager repository redirects) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2396
20:26:38 <mmcgrath> mdomsch: got it?
20:26:47 <mdomsch> yup
20:27:14 <mmcgrath> excellent
20:27:19 <mmcgrath> .ticket 2397
20:27:20 <zodbot> mmcgrath: #2397 (Infrastructure Change Freeze.) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2397
20:27:26 <mmcgrath> I'll accept this one, we're already in the freeze.
20:27:32 <mmcgrath> enjoy all the new infrastructure-list traffic :)
20:27:42 <mmcgrath> and last
20:27:44 <mmcgrath> .ticket 2398
20:27:45 <zodbot> mmcgrath: #2398 (Lessons Learned) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2398
20:27:48 <mmcgrath> that's for after the release.
20:28:21 <smooge> ok got it
20:28:36 <mmcgrath> and that's that
20:28:40 <mmcgrath> Oxf13: ping
20:28:47 <Oxf13> mmcgrath: hi
20:28:49 <smooge> is he on a plane
20:28:54 <smooge> no he is on the ground
20:29:03 <mmcgrath> Oxf13: time for your favorite question.  You got any odds for chance of beta slip?
20:29:39 <Oxf13> I have no idea, I'm out of the loop
20:30:03 <mmcgrath> yeah I know you've been busy.
20:30:13 <mmcgrath> Oxf13: who might know better?
20:30:47 <Oxf13> jlaska/adamw of QA, dlehman of Anaconda, dgilmore/notting of releng
20:30:53 <mmcgrath> adamw: ping
20:30:54 <mmcgrath> jlaska: ping
20:30:57 <jsmith> mmcgrath: https://bugzilla.redhat.com/showdependencytree.cgi?id=611991&hide_resolved=1
20:31:05 <jsmith> mmcgrath: That might be a pretty good indicator :-/
20:31:13 <jlaska> mmcgrath: we have 2 bugs left
20:31:17 * jlaska just sent mail to devel list
20:31:27 <dgilmore> the last bugs are supposed to be fixed today
20:31:27 <mmcgrath> jlaska: you don't have to commit to it but you think probably not goign to slip?  like 10% chance?
20:31:30 <jlaska> we need someone from kernel to provide guidance on bug#629719
20:31:45 <dgilmore> once we compose that and start testing we will know better
20:31:50 <jlaska> mmcgrath: if we can't compose an RC on time, chances aren't good
20:32:09 <mmcgrath> k, I'll follow up again next week.
20:32:27 <jlaska> there are only 2 bugs remaining ... the installer issue dlehman has a handle on ... but we need kernel guidance for the remaining dmraid issue
20:32:28 <mmcgrath> sounds like there's still some unknowns.
20:32:34 <mmcgrath> <nod>
20:32:43 <jlaska> much better shape than yesterday, but still not 0
20:32:52 <mmcgrath> jlaska: thanks
20:33:01 <mmcgrath> ok, anyone have any questions, comments or concerns wrt the beta release?
20:33:17 <jlaska> hmmm ...
20:33:38 <mmcgrath> alrighty :)
20:33:47 <mmcgrath> #topic Strange pkgdb / bodhi outages on app5/6
20:34:18 <mmcgrath> so I was working with abadger1999 and lmacken just before the freeze to try to figure out what on earth was going on with apps on app5 and 6.
20:34:36 <mmcgrath> for those of you that don't know, basically app5 and 6 are considered backups.  they don't get live traffic because they're offsite.
20:34:47 <mmcgrath> but, if for some reason all the production app servers go down, they pick up the slack.
20:35:08 <mmcgrath> well, even with no traffic, sometimes bodhi or pkgdb would hang, somtimes for hours.
20:35:14 <mmcgrath> and then they'd recover on their own.
20:35:17 <mmcgrath> it was incredibly strange.
20:35:25 <mmcgrath> the hosts were low load, db access was fine.
20:35:40 <mmcgrath> and both being tg apps it was extra strange that both of them getting in that state at the same time on the same server was low
20:36:02 <mmcgrath> I'm still not sure of a root cause, but I believe some of the wsgi processes were hanging, which was causing apache to block new requests from getting in.
20:36:21 <mmcgrath> So to bandaid that, we increased the number of processes available to each.
20:36:24 <mmcgrath> and so far.  good luck.
20:36:29 <mmcgrath> I haven't seen any outage
20:36:39 <mmcgrath> at least not related to that
20:36:50 <mmcgrath> we have had some from the database filling up
20:36:57 <mmcgrath> anywah, any questions or comments on that?
20:37:40 <mmcgrath> alrighty
20:37:44 <mmcgrath> #topic pkgdb caching
20:37:59 <mmcgrath> abadger1999: any issues seen since we started caching image content?
20:38:17 <abadger1999> It's been smooth.
20:38:20 <mmcgrath> .headers https://admin.fedoraproject.org/pkgdb/appicon/show/Terminator
20:38:21 <zodbot> mmcgrath: apptime: D=215947, content-length: 3412, x-varnish: 2111766604, age: 0, expires: Tue, 21 Sep 2010 20:38:20 GMT, connection: close, server: Apache/2.2.3 (Red Hat), appserver: app03.phx2.fedoraproject.org, proxyserver: proxy01.phx2.fedoraproject.org, via: 1.1 varnish, cache-control: max-age=432000, date: Thu, 16 Sep 2010 20:38:20 GMT, content-type: image/png, proxytime: D=218092
20:38:29 <mmcgrath> .headers https://admin.fedoraproject.org/pkgdb/appicon/show/Terminator
20:38:29 <zodbot> mmcgrath: apptime: D=215947, content-length: 3412, x-varnish: 2111766630 2111766604, age: 9, expires: Tue, 21 Sep 2010 20:38:29 GMT, connection: close, server: Apache/2.2.3 (Red Hat), appserver: app03.phx2.fedoraproject.org, proxyserver: proxy01.phx2.fedoraproject.org, via: 1.1 varnish, cache-control: max-age=432000, date: Thu, 16 Sep 2010 20:38:29 GMT, content-type: image/png, proxytime: D=664
20:38:35 <abadger1999> mmcgrath: I don't know how much it helped -- need to ask mbacovsk or someone on the other end of a slow pipe from the servers.
20:38:36 <mmcgrath> hey hey, age.  that's what I like to see.
20:38:59 <mmcgrath> for me I got the time generally cut in half.
20:39:05 <mmcgrath> but it's still several seconds for a large page list.
20:39:09 <abadger1999> <nod>
20:39:12 <mmcgrath> expires headers does seem to be working properly
20:39:50 <smooge> how goes varnish with this?
20:40:28 <mmcgrath> abadger1999: one thing I've noticed... expires doesn't seem to be working
20:40:31 <mmcgrath> and i'm not sure why
20:40:38 <mmcgrath> my browser has these iamges, it shouldn't be re-requesting them.
20:40:45 <mmcgrath> it could be related to the auth / cookie.  I need to research it.
20:41:05 <mmcgrath> smooge: well, basically we have set aside a part of the pkgdb namespace (/pkgdb/appicon/show)
20:41:10 <mmcgrath> and we're doing two things with it
20:41:27 <mmcgrath> when a cookie gets sent, varnish unsets it to request the data, when it does get the data, it unsets the cookie and sends it back.
20:41:39 <mmcgrath> because cherrypy wants to set a cookie with every request.
20:42:08 <smooge> ah ok
20:42:15 <mmcgrath> anyone have any questions or comments?
20:42:25 <mmcgrath> or ideas as to why firefox is ignoring the expires header :)
20:42:37 <mmcgrath> abadger1999: etagging would be helpful here. FWIW.
20:42:55 <mmcgrath> ok, that's all I've got
20:42:59 <mmcgrath> #topic Open Floor
20:43:05 <mmcgrath> anyone have anything else they'd like to discuss?
20:43:07 <mmcgrath> anything at all?
20:43:12 <smooge> fas
20:43:25 <mmcgrath> smooge: hit it
20:43:28 <smooge> we are having issue with the fas servers at the moment
20:43:40 <mmcgrath> oh right right
20:43:44 <mmcgrath> that was on my list and I forgot :)
20:43:54 <smooge> we have an open bugzilla on it and I am trying to get the data to developers as soon as possible
20:44:14 <smooge> it looks like something with swap space just not working under certain loads
20:44:39 <smooge> and when swap space quits working.. OOM gets hungry
20:45:07 <smooge> so we have some interesting OOPS but not much else.
20:45:28 <smooge> It seems to occur on the servers rather regularly at 03:30-03:50
20:45:30 <mmcgrath> interesting
20:45:33 <smooge> but not sure why
20:45:35 <mmcgrath> I'm surprised we're using swap on there at all
20:45:37 <mmcgrath> https://admin.fedoraproject.org/collectd/bin/index.cgi?hostname=fas02&plugin=swap&timespan=604800&action=show_selection&ok_button=OK
20:45:41 <smooge> http grows
20:45:45 <mmcgrath> even still, its not a lot.
20:46:00 <smooge> no it isnt.. and when the problem occurs it is not like its heavy in swap
20:46:02 <mmcgrath> smooge: are they still all rebooting at least every 24 hours?
20:46:08 <smooge> just all of a sudden no more swap for you
20:46:23 <smooge> well the new kernel has slowed that down a bit
20:46:39 <smooge> but not sure why. I am expecting tonight to be a hit
20:46:47 <mmcgrath> k
20:46:53 <mmcgrath> smooge: thanks for following up and tracking that issue
20:46:57 <smooge> 2 nights ago we had all 3 reboot and looking at the db02 data
20:47:08 <smooge> we had a TON of fas connections beyond normal at that time
20:47:12 <smooge> not sure why yet
20:47:54 <mmcgrath> yeah
20:48:48 <smooge> EOF
20:48:53 <mmcgrath> alllrighty
20:48:55 <mmcgrath> thanks :)
20:49:01 <smooge> np
20:49:03 <mmcgrath> if no one has anything else, we'll close in 30
20:49:13 <smooge> allergies killing me softly with sneezes
20:49:20 <mmcgrath> bummer :(
20:49:24 <rbergeron> ....
20:49:26 <mmcgrath> that's never fun
20:49:30 <gholms> As usual, the Cloud SIG meeting starts at the top of the hour for those of you who are interested.  ;)
20:49:39 <mmcgrath> and that's it!
20:49:40 <mmcgrath> #endmeeting