19:00:02 <nirik> #startmeeting Infrastructure (2011-08-04)
19:00:02 <zodbot> Meeting started Thu Aug  4 19:00:02 2011 UTC.  The chair is nirik. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:02 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
19:00:02 <nirik> #meetingname infrastructure
19:00:02 <zodbot> The meeting name has been set to 'infrastructure'
19:00:02 <nirik> #topic Robot Roll Call
19:00:03 <nirik> #chair smooge skvidal codeblock ricky nirik abadger1999
19:00:03 <zodbot> Current chairs: abadger1999 codeblock nirik ricky skvidal smooge
19:00:11 <smooge> here
19:00:17 <Klainn> giggity
19:00:18 * abadger1999 here
19:00:18 <nirik> morning smooge
19:00:39 * nirik waves to all
19:01:26 * nirik will start the meeting at :03
19:01:45 * CodeBlock waves
19:02:44 <smooge> oh I thought I was late again
19:03:02 <nirik> #topic New folks introductions and apprentice tasks/feedback
19:03:06 <nirik> smooge: not at all. ;)
19:03:28 <nirik> so, any new folks like to introduce themselevs? any apprentice folks like to talk about specific items or questions?
19:04:16 <nirik> I added another apprentice / easyfix ticket yesterday...
19:04:24 <nirik> move/convert SOP's over from wiki to git.
19:04:53 <nirik> I've also gotten several replies to my aug fi-apprentice ping email. A number of people had busy summers but hope to dig back in soon.
19:05:18 <nirik> I'll be doing the group cleanup next week.
19:05:46 <nirik> #topic F16 Alpha Freeze reminder and tickets
19:05:57 <nirik> Reminder that we are in a pre-release freeze right now.
19:06:22 <nirik> https://fedorahosted.org/fedora-infrastructure/browser/architecture/Environments.png
19:06:28 <nirik> lists whats included and whats not.
19:06:40 <nirik> Anything thats included, you MUST post to the list and get 2 +1's on.
19:06:56 <smooge> It looks like we will slip 1 or more weeks if I read the email correct
19:07:06 <nirik> yeah, seeming likely. ;(
19:07:08 <jsmith> It's entirely possible
19:07:18 <nirik> we also have f16 alpha tickets all filed:
19:07:20 <jsmith> Not for sure yet, but somewhat likely, given the late TC
19:07:27 <nirik> .ticket 2894
19:07:28 <zodbot> nirik: #2894 (F16Alpha: websites) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2894
19:07:33 <nirik> .ticket 2895
19:07:36 <zodbot> nirik: #2895 (F16Alpha: Verify mirror space) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2895
19:07:40 <nirik> .ticket 2896
19:07:41 <zodbot> nirik: #2896 (F16Alpha: Release day ticket) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2896
19:07:42 <nirik> .ticket 2897
19:07:45 <zodbot> nirik: #2897 (F16Alpha: Verify mirror permissions) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2897
19:07:47 <nirik> .ticket 2898
19:07:50 <zodbot> nirik: #2898 (F16Alpha: Verify mirrormanager redirects) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2898
19:08:08 <nirik> so, we should make sure we have these under control before Alpha.
19:08:43 <skvidal> sorry, sorry
19:08:45 * skvidal is here
19:08:50 <nirik> hey skvidal. No worries.
19:09:12 <nirik> so, does anyone wish to take on any of those alpha tickets for their very own? ;)
19:09:47 <nirik> in any case we will make sure they get done before alpha.
19:10:19 <nirik> Anything more on alpha tickets?
19:10:37 <nirik> #topic Upcoming Tasks/Items
19:10:50 <nirik> Anyone have upcoming items they wish to plan/schedule or discuss?
19:11:09 <nirik> We can't affect any of the machines in the freeze, but we can work on other machines and also plan/document things. ;)
19:11:52 <nirik> I'm planning on sending out a straw man plan for upgrading hosted for people to poke holes in.
19:12:37 <abadger1999> I'm working on a little web app for ambassadors to be able to run a raffle.
19:12:52 <abadger1999> Plan to deploy it to production after alpha freeze.
19:13:04 <abadger1999> Not sure if it'll become a permanent fixture or will be a one-shot.
19:13:07 <nirik> abadger1999: cool. ;)
19:13:30 <nirik> is that likely to need to follow the dev-> stg-> prod chain? or so simple it can just test in stg?
19:14:07 <abadger1999> nirik: I can test in stg since it's not deployed yet.
19:14:17 <abadger1999> nirik: But I can start in dev/w a dev instance if you'd rather.
19:14:22 <abadger1999> up to you :-)
19:14:23 <smooge> I will take mirror space and permissions
19:14:47 <smooge> I am planning one two things that I will need +1 for
19:14:54 <nirik> abadger1999: don't care too much on a simple app I don't think. If it can be safely tested in stg thats fine. Especially if it doesn't use a different framework, etc.
19:15:04 <abadger1999> <nod>
19:15:05 <nirik> smooge: thanks on the tickets.
19:15:34 <abadger1999> It'll be TG2.  I'll plan on testing stg; I'll holler if I need something else b/c it's not safe to test there.
19:15:50 <nirik> ok
19:16:21 <nirik> #topic List items / random info
19:16:33 <nirik> So, I thought I would bring up a few things I posted on list for discussion...
19:16:43 <nirik> but of course replies to the list are fine too.
19:17:12 <nirik> First one was: sysadmin group requirement for sysadmin-qa. I was thinking we might drop that requirement for them since they don't care about sysadmin emails.
19:17:24 <nirik> I don't know if there's some other reason sysadmin-foo groups require sysadmin.
19:18:01 <nirik> Second one was access to log02 for apprentices. ;)
19:18:09 <abadger1999> nirik: one thing about that was that they needed to go through bastion to get to their boxes I think... we could add sysadmin-qa to the list of groups that can shell into bastion, though.
19:18:33 <nirik> abadger1999: I made them a bastion-comm01... so they should be able to use that for access.
19:18:40 <abadger1999> Okay
19:18:43 <abadger1999> That works too :-)
19:18:48 <nirik> does sysadmin get shell on bastion?
19:19:06 <abadger1999> I think that's the way we set it up.
19:19:08 * abadger1999 checks
19:19:31 <nirik> doesn't seem to.
19:19:38 <skvidal> you have to be in sysadmin-noc or above
19:19:40 * nirik thinks thats just the emails
19:19:41 <skvidal> to get into  bastion
19:20:15 <abadger1999> ah, looks like we explicitly list all the sysadmin-* groups.  Misrecollection on my part.
19:20:40 <nirik> for not sysadmin-qa I think it makes sense... if you are sysadmin-resource you should still be in the loop on commits and outages so you can know changes that affect your resource.
19:21:48 <nirik> anyhow, can see if there's a historical reason and just change it if there's not.
19:23:14 <nirik> so, do chime in on list. ;)
19:23:33 <nirik> some info items:
19:23:48 <nirik> #info there was a short unplanned outage yesterday. Sent details to list.
19:24:04 <nirik> #info infra-docs is live and ready for SOP's to be converted to it.
19:25:18 <nirik> #info DNS glue records are now fixed.
19:25:23 <smooge> 1) I need to update our wildcard certificate. 2) I am going to remove ns1/ns2 from the dns for fedoraproject.org and other zones that have been fixed
19:25:34 <nirik> #info backup03 sees it's take drive, so we can set it up now.
19:25:40 <smooge> actually the files aren't fixed. I realized that I needed +1 to do so
19:25:52 <nirik> #info new wildcard cert is ready to go (28 days to spare)
19:26:48 * nirik thinks of other things pending.
19:27:04 <nirik> #info new ibiblio02 machine should be ready soon.
19:27:50 <nirik> #topic Meeting tagged tickets:
19:27:50 <nirik> https://fedorahosted.org/fedora-infrastructure/query?status=new&status=assigned&status=reopened&group=milestone&keywords=~Meeting&order=priority
19:28:00 <nirik> any meeting tickets folks would like to note or talk about?
19:28:06 <nirik> or any other tickets for that matter?
19:28:18 <smooge> not me at the moment
19:28:28 <skvidal> nothing leaps to mind
19:28:32 <CodeBlock> nope
19:28:45 <nirik> cool.
19:28:51 <nirik> #topic Open Floor
19:28:55 <nirik> anything for open floor?
19:29:17 <smooge> just waiting for the hardware to be finished racking
19:29:25 <nirik> smooge: any news on that?
19:29:42 <smooge> nothing beyond that it was what caused our outage yesterday :)
19:30:03 <nirik> yeah, I figured. ;(
19:30:28 <nirik> Once those are in place, I'd like to build up the bvirthostwhatever and put a new releng03 on it.
19:30:53 <smooge> ok
19:30:54 <nirik> smooge: oh, can you talk about that IMM/RSA reset thing a bit?
19:31:15 <smooge> ok so for some reason a bunch of our IMM boxes went "dead" to the world after we left pHX2
19:31:17 * skvidal stabs imm/rsa in the face
19:31:26 <skvidal> oh, sorry, bitter
19:31:51 <smooge> the only fix I have found is to install an IBM tool which talks to the hidden controller between the IMM and the box
19:32:00 <nirik> there are 4 machines where the management interface is not working currently.
19:32:08 <smooge> and tell it to give an ip address and reset
19:32:21 <nirik> s/not working/not working at all. no ping, no ssh, no nothing/
19:33:06 <nirik> unfortunately, those machines are contain 'important' guests.
19:33:09 <smooge> the issue is.. all the systems which are down are critical
19:33:19 <smooge> so it can't happen until after the freeze
19:33:46 * nirik nods.
19:34:02 <nirik> Also, many of our machines have older versions of the IMM firmware. Updating that might be a good thing too.
19:34:13 <nirik> not that the new one is too much better. ;)
19:34:34 <abadger1999> So, the boxes are up and the guests are running but the management interface is down?
19:34:48 <nirik> #info need to update IMM/RSA on machines, as well as reset it on 4 of them.
19:34:48 <smooge> correct. if something happens to the box.. we are sol
19:34:50 <nirik> abadger1999: yep
19:34:53 <abadger1999> Okay.
19:35:44 * nirik tries to think of anything else to discuss...
19:35:53 <nirik> any other topics? Or shall we call it a short meeting?
19:36:20 <skvidal> one minor thing
19:36:23 <skvidal> the infra-hosts git repo
19:36:30 <skvidal> if anyone wants to start adding notes to servers
19:36:32 <skvidal> please do so
19:36:44 * nirik nods. Good plan.
19:36:46 <skvidal> hell, anytime you remember something 'odd' that's is quasi-specific to that server, do it
19:36:51 <skvidal> it can be anything
19:37:07 <skvidal> look at log02 for an example
19:37:41 * nirik has an idea. Not sure it will be useful or work tho.
19:38:04 <skvidal> nirik: ?
19:38:05 <nirik> could we put something in that repo to mark what hosts are in which update group? A B C ?
19:38:12 <skvidal> absolutely
19:38:29 <nirik> then, somehow generate func lists or whatever from that...
19:38:31 <skvidal> put it in the 'notes' file
19:38:35 <skvidal> hmmm...
19:38:44 <skvidal> sure
19:38:45 <nirik> or perhaps thats best as seperate groups in func
19:38:46 <skvidal> we could do that
19:38:47 <skvidal> no
19:38:55 <skvidal> I think we could do that
19:39:01 <skvidal> I can write a script to mine that data out
19:39:05 <skvidal> don't put it in 'notes' then
19:39:14 <skvidal> maybe make a 'servertype' item or something like that
19:39:28 <nirik> I'd like a 'func-yum --hosts-from-list=group-a check update' or whatever.
19:39:34 <nirik> yeah, or 'updategroup' or something.
19:39:39 <skvidal> it would probably be
19:39:46 <skvidal> func-yum --hosts=@group-a update
19:39:54 * nirik nods, thats fine.
19:39:56 <skvidal> since func-yum should handle thar group syntax now
19:40:14 <nirik> anyhow, can figure that out out of band...
19:40:25 <skvidal> yep
19:40:55 <abadger1999> app => rhel6; lmacken thinks that fedoracommunity should be pretty easy to fix once he gets the last packages built for EPEL6.
19:41:08 <abadger1999> So that just leaves mediawiki slowness.
19:41:24 <nirik> cool. I keep meaning to look at that, but never get to it. ;)
19:41:29 <abadger1999> Do we want to put out a cattle call to find a new fi-apprentice to look at that?
19:41:31 <lmacken> yeah, I'm working on the moksha EL6 thing... dealing with odd issues with the TG2 stack atm.
19:41:33 <nirik> might see if ricky or ianweller can look at some point.
19:41:41 <nirik> abadger1999: that would be cool too.
19:42:35 <abadger1999> nirik: Do we have a ticket about the slowness issue?
19:42:40 <nirik> abadger1999: once we have a rhel6 app server working, would bapp01 be hard to do? or it's mostly distro independent?
19:42:54 <nirik> nope. I can file one tho...
19:43:36 <abadger1999> nirik: I'll write a call for volunteers; if you get a ticket open with some numbers/testing it'll be a good place for me to send people to get started.
19:43:47 <abadger1999> nirik: I'd say do bapp01 last.
19:44:08 <abadger1999> nirik: bapp01 has a bunch of stuff running that's not on the other app servers.
19:44:14 <nirik> yeah.
19:44:15 <abadger1999> cron jobs and such.
19:44:42 <nirik> ok.
19:44:45 <abadger1999> things that interface with rh bugzilla, koji... not everything on there is easy to test in stg for those reasons :-(
19:45:00 <nirik> ok.
19:45:32 <nirik> I can file a ticket on the mediawiki thing.
19:45:35 <abadger1999> Probably we need to update the other app servers, then look through puppet for what's running on bapp01.
19:45:44 <abadger1999> (and not on the other app servers)
19:45:55 <nirik> does bapp01 need to be in phx2? (for bugzilla access, etc?)
19:46:00 <abadger1999> and the people responsible for those (mdomsch, I, maybe lmacken)
19:46:01 <smooge> yes
19:46:14 <smooge> it needs bugzilla, mounting of the netapps
19:46:16 <abadger1999> site down and make sue all of those work...
19:46:23 <smooge> and various other things
19:46:27 <abadger1999> maybe in production since they might be hard to test.
19:46:42 <abadger1999> (without having side effects on bugzilla/koji/etc)
19:46:49 <smooge> nirik, it is probably the most critical box that needs to be in phx :/
19:47:20 <nirik> ok.
19:48:20 <abadger1999> If we think that multiple small, targetted servers are more scalable than one beefier server, bapp01 might be a good candidate.
19:48:46 <abadger1999> It doesn't truly need to be an app server and it doesn't need to be load balanced.
19:49:08 <nirik> well, the reason I asked if it needs to be in phx2, was thinking that it would be nice if it could be 'floating'... ie, have app server setup in puppet and a bapp thing and we could move bapp to whatever app server we wanted to run those things.
19:49:21 <nirik> but it sounds like thats not possible.
19:49:37 <skvidal> nirik: the mount points make it tricky, I suspect
19:49:43 <nirik> abadger1999: https://fedorahosted.org/fedora-infrastructure/ticket/2908
19:49:54 <skvidal> though I've often wondered about that... is it actually MOVIING or accessing files on those mount points?
19:49:59 <skvidal> or is it mostly acquiring directory indexes?
19:50:42 <smooge> Some of everything I believe
19:51:35 <nirik> not sure.
19:52:10 <abadger1999> I can't think of anything off hand that would be writing to the mount points  at least, but bapp01 is very... eclectic so I don't know everything that's running on it.
19:52:40 <skvidal> I guess I was wondering
19:52:44 <abadger1999> nirik: thanks.  I'll send a message aboout that.
19:52:49 <skvidal> could we dump the nfs mounts
19:53:00 <skvidal> and use file-indexes of the rpms generated on the boxes
19:53:11 <nirik> skvidal: yeah, I think that might be for bodhi to complete package names...
19:53:11 <skvidal> or even repometadata
19:53:20 <nirik> on the other apps at least
19:53:34 <skvidal> nirik: that 's what I was thinking - I'm sure bodhi can read a list from a file faste than a dir glob.glob()
19:53:56 <skvidal> I'll see about looking at the code for bodhi to see if I can make that work
19:53:57 <nirik> https://fedorahosted.org/fedora-infrastructure/ticket/2836
19:54:09 <nirik> lmacken: ^ is that for package name completion?
19:54:12 <nirik> skvidal: cool.
19:54:23 <nirik> It would be nice to not have to have mounts on the app servers.
19:55:00 <skvidal> indeed
19:55:10 <skvidal> and it would make those boxes less 'special'
19:55:41 <nirik> also, currently we have app05 and app06 that are not in phx2, but they are not in the base load (only backups) I think due to this reason.
19:56:20 <nirik> (well, and possibly db latency)
19:56:40 <smooge> I think the writing is from mirrormanager
19:56:59 <smooge> nirik, a lot of db latency
19:57:32 <skvidal> smooge: mirrormanager is writing to nfs? or do you mean writing to the db?
19:58:01 <smooge> skvidal, I thought there was something in mirrormanager that writes to the disks.. but I could be wrong
19:58:25 <skvidal> smooge: I know it writes out its mirror metalinks files and what-not
19:58:29 <skvidal> but that's not big
19:58:55 <smooge> oh I was thinking you were wondering about ro access versus rw. I misread something
19:59:02 <abadger1999> nirik: db latency was why they were backups originally.
19:59:03 <skvidal> np
19:59:05 <lmacken> nirik, skvidal: it used to be for the build auto-completion, but I think we may not need /mnt/koji on the app servers anymore.  I'll look into it and follow up in the ticket.
19:59:13 <skvidal> lmacken: thank you
19:59:17 <nirik> abadger1999: yeah. ;(
19:59:21 <nirik> lmacken: cool. Thanks.
19:59:56 <nirik> in any case I think we all agree on bapp01: a) identify and document the 'specialness' it has and b) try and reduce that so it's less complex/SPOF. ;)
20:00:42 <smooge> +1
20:00:49 <nirik> ok, any last items from anyone? if not will close out soon here...
20:01:24 <skvidal> lmacken: just did some searches through the code
20:01:43 <skvidal> lmacken: looks like it is _fetch_candidate_builds() which does the autocompletion and that looks like direct koji calls to get those lists
20:01:56 <skvidal> lmacken: so - I suspect you are correct about /mnt/koji being a legacy mount
20:02:56 <nirik> cool.
20:03:55 <nirik> ok, thanks for coming everyone!
20:03:57 <nirik> #endmeeting