infrastructure
LOGS
20:00:39 <smooge> #startmeeting infrastructure
20:00:39 <zodbot> Meeting started Thu Jan  6 20:00:39 2011 UTC.  The chair is smooge. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:00:39 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
20:00:46 <smooge> #meetingname infrastructure
20:00:46 <zodbot> The meeting name has been set to 'infrastructure'
20:00:47 * abadger1999 reminded that he stuck water in the microwave for tea... four hours ago
20:01:06 <smooge> I hope you didn't run it for 4 hours
20:02:26 <skvidal> smooge: he likes a few extra rem
20:02:32 <goozbach> abadger1999: we have a guy here who will do that for 24-48 hours
20:02:49 <smooge> #chair skvidal
20:02:49 <zodbot> Current chairs: skvidal smooge
20:02:50 <goozbach> "SAM! HERE'S YOUR now-cold WATER!"
20:02:54 * dgilmore is here
20:02:59 * nirik is lurking around if needed.
20:03:01 * ricky 
20:03:02 <smooge> #topic Roll Call
20:03:07 <smooge> hi guys
20:03:13 <abadger1999> hey
20:03:13 <CodeBlock> hi
20:03:13 * goozbach here, not really paying attention though
20:03:44 * skvidal is here
20:04:15 <smooge> #topic End of Year Break
20:04:51 <smooge> Ok our slushy freeze for Devemver is over
20:05:22 <smooge> we had only one outage it looks like.
20:05:40 <smooge> I think it recovered over itself so that was nice
20:06:14 <smooge> I have just rememvered to remove the "Slushy Freeze" notice.
20:06:14 <CodeBlock> yeah it was a quickie
20:06:23 <smooge> Anything else come up CodeBlock ?
20:06:35 <CodeBlock> smooge: nothing worth noting really
20:06:51 <CodeBlock> Everything seems to have stuck together.... and hopefully everyone had a nice vacation
20:06:54 <smooge> thanks for you, ricky and dgilmore for covering things
20:07:18 <CodeBlock> :)
20:07:30 <smooge> #topic Upcoming Outages
20:07:38 <skvidal> all things considered - I think I would have rather been here :)
20:07:42 <skvidal> CodeBlock: but thanks
20:08:03 <smooge> oh sorry to hear that skvidal I had hoped you would ahve a nice relaxing break.
20:08:13 <smooge> Ok we have a couple of outages coming up
20:08:32 <smooge> 1) We have a rolling outage for reboots of servers to get them all running on updated kernels and glibs
20:08:58 <smooge> 2) We have a major mondo oh crap outage in PHX2 when we move to the new Netapps
20:09:21 <smooge> #1 I think we can do tonight/tomorrow after a notice is emailed out.
20:09:30 <smooge> #2 I do not have a firm date on.
20:10:16 <smooge> But basically we are getting a new netapp and all our netapp storage (iscsi, sata, etc) will have to be frozen while copied over to it I think
20:11:03 <smooge> This could mean a 24-48 hour outage by my back of my envelope estimates but I am hopefully overestimating.
20:11:32 <goozbach> is this stuff netapp snapclonable?
20:11:47 <ricky> Hm.
20:11:48 <goozbach> ie separate volumes, compatable netapps?
20:11:56 <smooge> some might be.. but some won't be as it is moving from SATA->FC
20:12:00 <ricky> mmcgrath mentioned something about moving db02 storage off of the netapp during that last outage
20:12:03 <smooge> if I understand the new info.
20:12:25 <ricky> Ooh, shiny FC.
20:13:48 <smooge> I am not sure exactly when/where this will be. I will need to let rbergeron know as it will definately effect/affect? her schedules
20:14:37 <ricky> So what will go down for this?
20:15:34 <smooge> well some of this will hopefully be snapcloned (the ISCSI shares).
20:15:55 <ricky> So that means that it'll still be read/write during the copy, and the switchover can be quick?
20:16:25 <smooge> The big problem will be the moving of nfs if they really move us from SATA->FC. As that will be a long rsync and then some sort of freeze
20:16:55 <smooge> I am asking for some estimates because I may have misunderstood some steps.
20:16:58 <ricky> OK, so for that, only releng/mirrormanager stuff will be halted?
20:17:12 <skvidal> smooge: no
20:17:13 <skvidal> you can just do
20:17:22 <skvidal> pre-rsync days in advance - if the new unit it up in advance
20:17:30 <skvidal> and then we schedule the outage
20:17:34 <skvidal> freeze everything
20:17:39 <skvidal> do a final re-rsync
20:17:39 <smooge> They may have meant that FC stuff stays FC and SATA gets moved over.. but I am not sure.
20:17:42 <skvidal> and move over
20:18:07 <smooge> skvidal, I think I meant what you said :)
20:18:10 <ricky> Probably a good idea to get a list of iscsi machines and find out how those will be affected.
20:18:14 <skvidal> nod
20:18:23 <smooge> working on it
20:18:29 <ricky> Hopefully we can move those over one by one while both old and new netapps are up
20:18:32 <smooge> its part of a ticket that skvidal asked about
20:18:44 <skvidal> nod
20:19:04 <smooge> since I don't ahve much more I think we can move along to other items.
20:19:18 <smooge> #topic Tickets
20:20:06 <smooge> ok I need to add a ticket for rebooting systems for this month but we discussed that already.
20:20:19 <smooge> .ticket 2519
20:20:20 <zodbot> smooge: #2519 (kill cvs with fire) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2519
20:20:34 <nirik> sadly, we are up to 292 tickets again. ;)
20:20:48 <skvidal> nirik: I added a bunch on monday
20:20:50 <smooge> nirik, I hope to close a couple this week/next week.
20:20:57 <nirik> yeah, such is life I fear.
20:21:25 <smooge> I think I will also look at purging a bunch next week.
20:21:35 <smooge> the equivalent of a bugzilla EOL.
20:21:54 <smooge> try to give the new guy/gal a clean slate to reopen things with
20:21:57 <smooge> ok so back to cvs
20:22:15 <smooge> we have a couple of projects still using it, and they need to move off of it.
20:22:46 <skvidal> I wanted to ask about this
20:22:50 <skvidal> how would we feel
20:22:58 <skvidal> about installing 'cvs' on people01
20:23:06 <skvidal> and letting the folks rest their project there?
20:23:16 <smooge> I was wondering about hosted instead
20:23:27 <dgilmore> skvidal: rest there project there in what sense?
20:23:27 <skvidal> I really don't want to run pserver anywhere
20:23:32 <smooge> I trust its broken hardware a lot more than people's broken hardware
20:23:36 <hicham> isn't cvs dead ?
20:23:53 * ricky also prefers just killing it if possible
20:24:02 <ricky> It'd be doing those projects a favor :-)
20:24:10 <abadger1999> fedorahosted would mean "more supported' I guess :-(
20:24:29 <skvidal> ricky: so there is just one or two projects that won't be migrated
20:24:36 <skvidal> I looked into the others and updated that ticketr
20:24:45 * nirik would be fine with: we are shutting this machine on XXXX-XX-XX. If you don't want to move to fedorahosted, we will be happy to give you a tar.gz of your cvs project. Good luck.
20:25:08 <skvidal> nirik: so - if they want to use hosted they have to change scms
20:25:09 <skvidal> OR
20:25:12 <skvidal> we have to support cvs
20:25:19 <skvidal> which I, at least, don't want to do
20:25:24 <nirik> right, and I don't think we should support cvs.
20:25:34 <skvidal> does anyone disagree with that?
20:25:37 <nirik> if they can't use svn, then they should find their own cvs server elsewhere.
20:25:40 <skvidal> does anyone here want to support cvs on hosted?
20:26:15 <gholms> [You notice a deafening silence]
20:26:16 <ricky> OK, looks like we all agree then :-)
20:26:36 <skvidal> gholms: [a grue eats you] :)
20:26:38 <smooge> or at least put in an RFR for us to build a pukecvs box
20:26:42 <nirik> really svn should be close enough so as not to matter... except it sucks less.
20:26:43 <abadger1999> skvidal: Ah without pserver fedorapeople seems okay .
20:26:57 <gholms> skvidal: D:
20:26:59 <skvidal> abadger1999: right - that's what I was saying - just let them use cvs + ssh
20:27:03 <abadger1999> <nod>
20:27:19 <skvidal> if we're cool w/it then let's float an EOL on cvs01 to the list(s)
20:27:21 <smooge> skvidal, with that in mind I think people01/02 would be ok
20:27:24 <skvidal> and stick a fork in it
20:27:40 <tibbs> I wonder if folks who can't handle moving from CVS can handle changing to cvs+ssh.
20:27:51 <ricky> Eh, not having anonymous access is kind of bad :-(
20:27:56 <smooge> I declare 2011-03-03 to be pulled pork day for CVS
20:28:00 <skvidal> ricky: not for these trees
20:28:09 <ricky> Those projects would probably be better off having a "real" CVS provider somewhere else
20:28:11 <skvidal> smooge: we have to wait that long? :)
20:28:35 <smooge> Ok... 2011-02-03
20:28:36 <skvidal> I'll see your 2011-03-03 and raise you 2011-02-14
20:28:44 <skvidal> we'll all be coming back from fudcon
20:28:50 <nirik> happy v-day cvs! :)
20:28:55 <goozbach> +1 for valentines
20:29:01 <CodeBlock> hah
20:29:03 <goozbach> tis memorable
20:29:16 <smooge> and we will see what else we can clean up on massacre day too.
20:29:21 <smooge> ok 2011-02-14 it is
20:29:37 <skvidal> .info CVS01 to be EOL'd on 2011-02-14
20:29:38 <zodbot> skvidal: (info <url|feed>) -- Returns information from the given RSS feed, namely the title, URL, description, and last update date, if available.
20:29:53 * skvidal doesn't know how to use this damn thing
20:29:56 <smooge> #agreed 2011-02-14 will be end of cvs system. Projects will move to people or other services
20:29:59 <ricky> #info CVS01 to be EOL'd on 2011-02-14
20:30:09 <smooge> #info CVS01 to be EOL'd on 2011-02-14
20:30:13 <ricky> (it's inconsistent for meetbot, yeah)
20:30:14 <smooge> ok next
20:30:23 * skvidal grumbles about newfangled irc technology
20:30:26 <skvidal> YOU KIDS GET OFF MY LAWN
20:30:30 <CodeBlock> XD
20:31:12 <smooge> .ticket 2275
20:31:13 <zodbot> smooge: #2275 (Upgrade Nagios) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2275
20:31:39 * CodeBlock just throws this out there: https://gist.github.com/754833
20:31:40 <smooge> ok moving noc01 to el6 and using newer nagios
20:32:18 <smooge> I guess that was your winter vacation CodeBlock
20:32:24 <ricky> Yay :-)
20:32:27 <CodeBlock> that link deals with (hopefully) moving nagios to a proper puppet module, which I'd like to try to do at the same time as upgrading nagios :)
20:32:30 <ricky> Also, configfile is kind of deprecated at this point :-/
20:32:44 <CodeBlock> mmh
20:32:45 <CodeBlock> bah
20:32:49 <CodeBlock> ricky: that diff is reversed, oops
20:33:07 <ricky> Ah, OK - I was wondering why you were removing quotes :-)
20:33:09 <smooge> CodeBlock, ok so first we need to do that with noc01.stg
20:33:25 <smooge> so I would say start checking in and breaking staging
20:33:53 <CodeBlock> smooge: will do that this weekend
20:34:17 <smooge> okie dokie. sned out a notification if it will cause a pager storm somehow
20:34:55 <CodeBlock> smooge: last I head stg can't send mails out
20:35:02 <CodeBlock> so.. in theory that shouldn't be possible
20:35:08 <CodeBlock> least I heard*
20:35:27 <CodeBlock> ricky: bah, now I want to reverse that diff, but can't because the files are at home. :(
20:36:23 <ricky> There's apways patch -R to apply reversed diffs :-)
20:36:25 <smooge> s/^+/X/; s/^-/+/; s/^X/-/; [or something like that]
20:36:41 <smooge> thanks for the progress on that.
20:36:48 <CodeBlock> no problem :)
20:37:07 <CodeBlock> smooge: and somehow that only took one night to do, btw :P
20:38:07 <smooge> abadger1999, ricky ping
20:38:13 <abadger1999> smooge: here
20:38:16 <smooge> .ticket 2481
20:38:17 <zodbot> smooge: #2481 (Fedora switching from the CLA to FPCA) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2481
20:38:42 * ricky has not looked at this at all, sorry :-(
20:38:57 <ricky> The changes in FAS should be pretty simple though.
20:39:01 * abadger1999 hasn't looked since he outlined the steps to make it happen.
20:40:15 <smooge> ok I think we will look at this again after FudCON unless a sprint is needed there.
20:40:25 <smooge> .ticket 2542
20:40:26 <zodbot> smooge: #2542 (reinstall fas01 on rhel6 kvm host) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2542
20:40:54 <skvidal> fas01 is on rhel6
20:40:59 <skvidal> runnong on a xen server
20:41:10 <smooge> Ok the steps here require us really to make one more kvm host. This means moving some stuff off of a current Xen box and rebuilding it to el6
20:42:00 <ricky> Does the kickstart SOP apply exactly the same for KVM?
20:42:08 <skvidal> ricky: almost, yes
20:42:11 <skvidal> ricky: I made a couple of edits
20:42:13 <ricky> Cool
20:42:15 <skvidal> to the kickstart sop
20:42:19 <skvidal> for el6
20:42:26 <skvidal> but the virt-install are the same ultimately
20:42:29 <smooge> I think Xen15 is our 'best' candidate
20:42:31 <skvidal> there is one thing I've not fixed yet
20:42:36 <skvidal> smooge: quick thought
20:42:41 <skvidal> smooge: lemme see if this seems wrong
20:43:00 <skvidal> xb-01
20:43:25 <skvidal> is on bxen01
20:43:33 <skvidal> and dgilmore said xb-01 is doing nothing at all
20:43:33 <dgilmore> skvidal: we would need to get eth0 on bxen01 moved to the public network
20:43:39 <skvidal> damn
20:43:41 <skvidal> okay
20:43:43 <skvidal> dgilmore: thank you
20:43:48 <skvidal> smooge: carry on
20:43:53 <skvidal> sorry for the useless input
20:43:55 <dgilmore> skvidal: they can probably do that in the switch
20:44:06 <skvidal> ah - it's not a physical move?
20:44:16 <dgilmore> move the vlan the port is on
20:44:20 <skvidal> smooge: i guess I was thinking - use bxen01 as our bubble sort location
20:44:40 <skvidal> smooge: move hosts from xen15 over there - reinstall xen15 to rhel6 + kvm
20:44:51 <skvidal> and then migrate hosts back by reinstalling them
20:45:26 <smooge> ah ok
20:45:27 <skvidal> ricky: the kvm installs do a couple of wonky-ish things - you'll want to have vnc running to connect to the installing guest.
20:45:40 <smooge> we will need to get RHIT to move bxen01 over to the 126 network
20:45:40 <dgilmore> skvidal: wonky in what way?
20:45:41 <skvidal> ricky: and then I need to figure out how to tell kvm + rhel6 to open the console port properly
20:45:52 <skvidal> dgilmore: just those 2
20:45:54 <dgilmore> smooge: thats just a ticket
20:46:02 <ricky> Ah yeah, I remember running into the virsh console issue
20:46:14 <skvidal> ricky: I've found a way to fix the virsh console thing
20:46:17 <dgilmore> skvidal: virsh start guest --console
20:46:30 <skvidal> dgilmore: doesn't work for a running one
20:46:31 <ricky> Ah, nice.
20:46:41 <smooge> dgilmore, I have to bribe mgalgoci every time.. I am down 1 kidney, half a liver and a pancreas. I need to get an intern this summer instead
20:46:45 <skvidal> dgilmore: if init is not listening on it
20:46:53 <skvidal> dgilmore: s/init/getty/
20:46:57 <smooge> dgilmore, but yeah its just a ticket
20:46:58 <dgilmore> skvidal: if we pass console=ttyS0 to the install it should just work
20:47:09 <skvidal> dgilmore: when we install, we can't do that, I believe
20:47:10 <dgilmore> skvidal: which i think is different to rhel5 and xen
20:47:20 <dgilmore> skvidal: we can
20:47:22 <skvidal> dgilmore: b/c txt consoles are no longer there for the instaqller
20:47:25 <dgilmore> we can try it anyway
20:47:30 <skvidal> dgilmore: it NEEDs to use vnc
20:47:46 <dgilmore> skvidal: text install is there its just not interactive at all
20:47:47 <smooge> db02 might be pretty easy.. its just a xenGuest on iscis. It can be moved to a LOT of boxes pretty quickly
20:47:47 <skvidal> dgilmore: definitely try it - but I tried a bunch of things when I was doing fas## and friends
20:48:07 <skvidal> dgilmore: we occasionally need the interactivity
20:48:11 <skvidal> for disk partitioning
20:48:18 <dgilmore> skvidal: yeah that has to be done via vnc
20:48:20 <skvidal> therefore - we need vnc
20:48:27 <skvidal> rather than have 2 different instructions
20:48:33 <skvidal> It makes sense to do it all via vnc
20:48:37 <smooge> especially because kickstart+RAID in EL6 is funky
20:48:46 <dgilmore> smooge: no its not
20:48:50 <skvidal> I was just trying to make the vnc ALSO open the console for virsh console
20:49:07 <smooge> dgilmore, we have had problems with every box when making more than one RAID array.
20:49:26 <smooge> it makes the first one, and then sometimes makes the second one, and sometimes makes up a new one.
20:49:40 <smooge> skvidal can go over that one :)
20:49:41 <skvidal> dgilmore: and when you migrate from an older raid array - it has had some issues - especially when spares are involved
20:49:56 <skvidal> dgilmore: I filed a bug on it - and it's being worked on
20:50:07 <skvidal> dgilmore: apparently we're the only people using rhel with spares :)
20:50:11 * skvidal is kidding
20:51:03 <skvidal> dgilmore: it's not horribly broken - but I had to nuke the drives on the hw to get anaconda to let me install the box
20:51:20 <skvidal> but we're WAY off in the weeds
20:51:37 <dgilmore> smooge: ive had no issues with raid arrays in el6.  all the builders have arrays, as does quite a few other boxes ive build using kickstart
20:51:38 <skvidal> ricky: if you want to do rhel6 installs on kvm - yell at me if you run into anything 'odd' and I'll make sure I update any docs
20:51:44 <ricky> Sure thing, thanks
20:51:45 <dgilmore> skvidal: :( ok
20:52:40 <skvidal> so - fas01 migration
20:53:05 <skvidal> smooge: just out of curiosity do we have a xen host holding 2 less critical pieces of infrastructure?
20:53:29 <skvidal> alternatively
20:53:32 <smooge> dgilmore, we can test on bvirthost01 its been rebuilt 5 times with different results each time. If I have a problem with the kickstart I want to fix it.
20:53:35 <skvidal> dgilmore: do you need bxen01 at all?
20:53:37 <smooge> but back to the problem at hand
20:53:59 <dgilmore> skvidal: nope
20:54:02 <skvidal> okay
20:54:05 <skvidal> is it under warranty?
20:54:10 <dgilmore> skvidal: it was there just for xb-01
20:54:19 <dgilmore> skvidal: probably not
20:54:19 <smooge> skvidal, db02 can move and bastion01 is a backup box
20:54:35 <skvidal> dgilmore: ah :( sad face
20:54:36 <skvidal> okay
20:54:47 <dgilmore> skvidal: it could be still
20:54:53 <dgilmore> mmcgrath: would know
20:54:56 <smooge> is bxen01 a dell?
20:55:12 <skvidal> smooge: yes
20:55:50 <smooge> it is not under warranty anymore
20:55:55 <skvidal> womp womp
20:55:57 <skvidal> then never mind
20:56:03 <smooge> and was to be replaced with bvirthost01
20:56:08 <skvidal> okay -
20:56:12 <skvidal> then just ignore me
20:56:22 <skvidal> bastion01 goes off - and we move db02
20:56:23 <skvidal> reinstall xen15
20:56:34 <smooge> and use it for bublble sort
20:56:55 <skvidal> and put fas01 on xen15
20:57:09 <skvidal> then bastion01 and db02?
20:57:36 <smooge> I would go with bastion01 going back
20:57:44 <smooge> actually that would break the buble sort
20:57:53 <smooge> bastion01 can go on virthost01
20:58:21 <smooge> if I can get iscsi exported to virthost02 we can put db02 there
20:59:12 <smooge> then we work on say xen11 or xen12
20:59:35 * rbergeron pokes in just ahead of the cloud sig mtg and waves hi to her favorite infrastructure folks
20:59:49 <skvidal> we've got a lot of tickets left
20:59:58 <skvidal> can we continue - or do we need to move to -2?
21:00:09 <dgilmore> lets move
21:00:25 <ricky> Or #fedora-admin where people already are :-)
21:00:40 <skvidal> ricky: too much random noise
21:00:41 <skvidal> imo
21:00:42 * rbergeron feels bad about disrupting every week
21:00:51 <skvidal> rbergeron: doesn't seem to STOP you
21:01:04 <rbergeron> no, no, it surely doesn't. :)
21:01:07 <smooge> ok #fedora-meeting-2?
21:01:22 <smooge> rbergeron, did you see the poke earlier about possible effects on your schedule?
21:01:27 <abadger1999> smooge: Sure.  Don't forget to stop meeting here.
21:01:30 <smooge> #endmeeting