20:00:39 #startmeeting infrastructure 20:00:39 Meeting started Thu Jan 6 20:00:39 2011 UTC. The chair is smooge. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:00:39 Useful Commands: #action #agreed #halp #info #idea #link #topic. 20:00:46 #meetingname infrastructure 20:00:46 The meeting name has been set to 'infrastructure' 20:00:47 * abadger1999 reminded that he stuck water in the microwave for tea... four hours ago 20:01:06 I hope you didn't run it for 4 hours 20:02:26 smooge: he likes a few extra rem 20:02:32 abadger1999: we have a guy here who will do that for 24-48 hours 20:02:49 #chair skvidal 20:02:49 Current chairs: skvidal smooge 20:02:50 "SAM! HERE'S YOUR now-cold WATER!" 20:02:54 * dgilmore is here 20:02:59 * nirik is lurking around if needed. 20:03:01 * ricky 20:03:02 #topic Roll Call 20:03:07 hi guys 20:03:13 hey 20:03:13 hi 20:03:13 * goozbach here, not really paying attention though 20:03:44 * skvidal is here 20:04:15 #topic End of Year Break 20:04:51 Ok our slushy freeze for Devemver is over 20:05:22 we had only one outage it looks like. 20:05:40 I think it recovered over itself so that was nice 20:06:14 I have just rememvered to remove the "Slushy Freeze" notice. 20:06:14 yeah it was a quickie 20:06:23 Anything else come up CodeBlock ? 20:06:35 smooge: nothing worth noting really 20:06:51 Everything seems to have stuck together.... and hopefully everyone had a nice vacation 20:06:54 thanks for you, ricky and dgilmore for covering things 20:07:18 :) 20:07:30 #topic Upcoming Outages 20:07:38 all things considered - I think I would have rather been here :) 20:07:42 CodeBlock: but thanks 20:08:03 oh sorry to hear that skvidal I had hoped you would ahve a nice relaxing break. 20:08:13 Ok we have a couple of outages coming up 20:08:32 1) We have a rolling outage for reboots of servers to get them all running on updated kernels and glibs 20:08:58 2) We have a major mondo oh crap outage in PHX2 when we move to the new Netapps 20:09:21 #1 I think we can do tonight/tomorrow after a notice is emailed out. 20:09:30 #2 I do not have a firm date on. 20:10:16 But basically we are getting a new netapp and all our netapp storage (iscsi, sata, etc) will have to be frozen while copied over to it I think 20:11:03 This could mean a 24-48 hour outage by my back of my envelope estimates but I am hopefully overestimating. 20:11:32 is this stuff netapp snapclonable? 20:11:47 Hm. 20:11:48 ie separate volumes, compatable netapps? 20:11:56 some might be.. but some won't be as it is moving from SATA->FC 20:12:00 mmcgrath mentioned something about moving db02 storage off of the netapp during that last outage 20:12:03 if I understand the new info. 20:12:25 Ooh, shiny FC. 20:13:48 I am not sure exactly when/where this will be. I will need to let rbergeron know as it will definately effect/affect? her schedules 20:14:37 So what will go down for this? 20:15:34 well some of this will hopefully be snapcloned (the ISCSI shares). 20:15:55 So that means that it'll still be read/write during the copy, and the switchover can be quick? 20:16:25 The big problem will be the moving of nfs if they really move us from SATA->FC. As that will be a long rsync and then some sort of freeze 20:16:55 I am asking for some estimates because I may have misunderstood some steps. 20:16:58 OK, so for that, only releng/mirrormanager stuff will be halted? 20:17:12 smooge: no 20:17:13 you can just do 20:17:22 pre-rsync days in advance - if the new unit it up in advance 20:17:30 and then we schedule the outage 20:17:34 freeze everything 20:17:39 do a final re-rsync 20:17:39 They may have meant that FC stuff stays FC and SATA gets moved over.. but I am not sure. 20:17:42 and move over 20:18:07 skvidal, I think I meant what you said :) 20:18:10 Probably a good idea to get a list of iscsi machines and find out how those will be affected. 20:18:14 nod 20:18:23 working on it 20:18:29 Hopefully we can move those over one by one while both old and new netapps are up 20:18:32 its part of a ticket that skvidal asked about 20:18:44 nod 20:19:04 since I don't ahve much more I think we can move along to other items. 20:19:18 #topic Tickets 20:20:06 ok I need to add a ticket for rebooting systems for this month but we discussed that already. 20:20:19 .ticket 2519 20:20:20 smooge: #2519 (kill cvs with fire) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2519 20:20:34 sadly, we are up to 292 tickets again. ;) 20:20:48 nirik: I added a bunch on monday 20:20:50 nirik, I hope to close a couple this week/next week. 20:20:57 yeah, such is life I fear. 20:21:25 I think I will also look at purging a bunch next week. 20:21:35 the equivalent of a bugzilla EOL. 20:21:54 try to give the new guy/gal a clean slate to reopen things with 20:21:57 ok so back to cvs 20:22:15 we have a couple of projects still using it, and they need to move off of it. 20:22:46 I wanted to ask about this 20:22:50 how would we feel 20:22:58 about installing 'cvs' on people01 20:23:06 and letting the folks rest their project there? 20:23:16 I was wondering about hosted instead 20:23:27 skvidal: rest there project there in what sense? 20:23:27 I really don't want to run pserver anywhere 20:23:32 I trust its broken hardware a lot more than people's broken hardware 20:23:36 isn't cvs dead ? 20:23:53 * ricky also prefers just killing it if possible 20:24:02 It'd be doing those projects a favor :-) 20:24:10 fedorahosted would mean "more supported' I guess :-( 20:24:29 ricky: so there is just one or two projects that won't be migrated 20:24:36 I looked into the others and updated that ticketr 20:24:45 * nirik would be fine with: we are shutting this machine on XXXX-XX-XX. If you don't want to move to fedorahosted, we will be happy to give you a tar.gz of your cvs project. Good luck. 20:25:08 nirik: so - if they want to use hosted they have to change scms 20:25:09 OR 20:25:12 we have to support cvs 20:25:19 which I, at least, don't want to do 20:25:24 right, and I don't think we should support cvs. 20:25:34 does anyone disagree with that? 20:25:37 if they can't use svn, then they should find their own cvs server elsewhere. 20:25:40 does anyone here want to support cvs on hosted? 20:26:15 [You notice a deafening silence] 20:26:16 OK, looks like we all agree then :-) 20:26:36 gholms: [a grue eats you] :) 20:26:38 or at least put in an RFR for us to build a pukecvs box 20:26:42 really svn should be close enough so as not to matter... except it sucks less. 20:26:43 skvidal: Ah without pserver fedorapeople seems okay . 20:26:57 skvidal: D: 20:26:59 abadger1999: right - that's what I was saying - just let them use cvs + ssh 20:27:03 20:27:19 if we're cool w/it then let's float an EOL on cvs01 to the list(s) 20:27:21 skvidal, with that in mind I think people01/02 would be ok 20:27:24 and stick a fork in it 20:27:40 I wonder if folks who can't handle moving from CVS can handle changing to cvs+ssh. 20:27:51 Eh, not having anonymous access is kind of bad :-( 20:27:56 I declare 2011-03-03 to be pulled pork day for CVS 20:28:00 ricky: not for these trees 20:28:09 Those projects would probably be better off having a "real" CVS provider somewhere else 20:28:11 smooge: we have to wait that long? :) 20:28:35 Ok... 2011-02-03 20:28:36 I'll see your 2011-03-03 and raise you 2011-02-14 20:28:44 we'll all be coming back from fudcon 20:28:50 happy v-day cvs! :) 20:28:55 +1 for valentines 20:29:01 hah 20:29:03 tis memorable 20:29:16 and we will see what else we can clean up on massacre day too. 20:29:21 ok 2011-02-14 it is 20:29:37 .info CVS01 to be EOL'd on 2011-02-14 20:29:38 skvidal: (info ) -- Returns information from the given RSS feed, namely the title, URL, description, and last update date, if available. 20:29:53 * skvidal doesn't know how to use this damn thing 20:29:56 #agreed 2011-02-14 will be end of cvs system. Projects will move to people or other services 20:29:59 #info CVS01 to be EOL'd on 2011-02-14 20:30:09 #info CVS01 to be EOL'd on 2011-02-14 20:30:13 (it's inconsistent for meetbot, yeah) 20:30:14 ok next 20:30:23 * skvidal grumbles about newfangled irc technology 20:30:26 YOU KIDS GET OFF MY LAWN 20:30:30 XD 20:31:12 .ticket 2275 20:31:13 smooge: #2275 (Upgrade Nagios) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2275 20:31:39 * CodeBlock just throws this out there: https://gist.github.com/754833 20:31:40 ok moving noc01 to el6 and using newer nagios 20:32:18 I guess that was your winter vacation CodeBlock 20:32:24 Yay :-) 20:32:27 that link deals with (hopefully) moving nagios to a proper puppet module, which I'd like to try to do at the same time as upgrading nagios :) 20:32:30 Also, configfile is kind of deprecated at this point :-/ 20:32:44 mmh 20:32:45 bah 20:32:49 ricky: that diff is reversed, oops 20:33:07 Ah, OK - I was wondering why you were removing quotes :-) 20:33:09 CodeBlock, ok so first we need to do that with noc01.stg 20:33:25 so I would say start checking in and breaking staging 20:33:53 smooge: will do that this weekend 20:34:17 okie dokie. sned out a notification if it will cause a pager storm somehow 20:34:55 smooge: last I head stg can't send mails out 20:35:02 so.. in theory that shouldn't be possible 20:35:08 least I heard* 20:35:27 ricky: bah, now I want to reverse that diff, but can't because the files are at home. :( 20:36:23 There's apways patch -R to apply reversed diffs :-) 20:36:25 s/^+/X/; s/^-/+/; s/^X/-/; [or something like that] 20:36:41 thanks for the progress on that. 20:36:48 no problem :) 20:37:07 smooge: and somehow that only took one night to do, btw :P 20:38:07 abadger1999, ricky ping 20:38:13 smooge: here 20:38:16 .ticket 2481 20:38:17 smooge: #2481 (Fedora switching from the CLA to FPCA) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2481 20:38:42 * ricky has not looked at this at all, sorry :-( 20:38:57 The changes in FAS should be pretty simple though. 20:39:01 * abadger1999 hasn't looked since he outlined the steps to make it happen. 20:40:15 ok I think we will look at this again after FudCON unless a sprint is needed there. 20:40:25 .ticket 2542 20:40:26 smooge: #2542 (reinstall fas01 on rhel6 kvm host) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2542 20:40:54 fas01 is on rhel6 20:40:59 runnong on a xen server 20:41:10 Ok the steps here require us really to make one more kvm host. This means moving some stuff off of a current Xen box and rebuilding it to el6 20:42:00 Does the kickstart SOP apply exactly the same for KVM? 20:42:08 ricky: almost, yes 20:42:11 ricky: I made a couple of edits 20:42:13 Cool 20:42:15 to the kickstart sop 20:42:19 for el6 20:42:26 but the virt-install are the same ultimately 20:42:29 I think Xen15 is our 'best' candidate 20:42:31 there is one thing I've not fixed yet 20:42:36 smooge: quick thought 20:42:41 smooge: lemme see if this seems wrong 20:43:00 xb-01 20:43:25 is on bxen01 20:43:33 and dgilmore said xb-01 is doing nothing at all 20:43:33 skvidal: we would need to get eth0 on bxen01 moved to the public network 20:43:39 damn 20:43:41 okay 20:43:43 dgilmore: thank you 20:43:48 smooge: carry on 20:43:53 sorry for the useless input 20:43:55 skvidal: they can probably do that in the switch 20:44:06 ah - it's not a physical move? 20:44:16 move the vlan the port is on 20:44:20 smooge: i guess I was thinking - use bxen01 as our bubble sort location 20:44:40 smooge: move hosts from xen15 over there - reinstall xen15 to rhel6 + kvm 20:44:51 and then migrate hosts back by reinstalling them 20:45:26 ah ok 20:45:27 ricky: the kvm installs do a couple of wonky-ish things - you'll want to have vnc running to connect to the installing guest. 20:45:40 we will need to get RHIT to move bxen01 over to the 126 network 20:45:40 skvidal: wonky in what way? 20:45:41 ricky: and then I need to figure out how to tell kvm + rhel6 to open the console port properly 20:45:52 dgilmore: just those 2 20:45:54 smooge: thats just a ticket 20:46:02 Ah yeah, I remember running into the virsh console issue 20:46:14 ricky: I've found a way to fix the virsh console thing 20:46:17 skvidal: virsh start guest --console 20:46:30 dgilmore: doesn't work for a running one 20:46:31 Ah, nice. 20:46:41 dgilmore, I have to bribe mgalgoci every time.. I am down 1 kidney, half a liver and a pancreas. I need to get an intern this summer instead 20:46:45 dgilmore: if init is not listening on it 20:46:53 dgilmore: s/init/getty/ 20:46:57 dgilmore, but yeah its just a ticket 20:46:58 skvidal: if we pass console=ttyS0 to the install it should just work 20:47:09 dgilmore: when we install, we can't do that, I believe 20:47:10 skvidal: which i think is different to rhel5 and xen 20:47:20 skvidal: we can 20:47:22 dgilmore: b/c txt consoles are no longer there for the instaqller 20:47:25 we can try it anyway 20:47:30 dgilmore: it NEEDs to use vnc 20:47:46 skvidal: text install is there its just not interactive at all 20:47:47 db02 might be pretty easy.. its just a xenGuest on iscis. It can be moved to a LOT of boxes pretty quickly 20:47:47 dgilmore: definitely try it - but I tried a bunch of things when I was doing fas## and friends 20:48:07 dgilmore: we occasionally need the interactivity 20:48:11 for disk partitioning 20:48:18 skvidal: yeah that has to be done via vnc 20:48:20 therefore - we need vnc 20:48:27 rather than have 2 different instructions 20:48:33 It makes sense to do it all via vnc 20:48:37 especially because kickstart+RAID in EL6 is funky 20:48:46 smooge: no its not 20:48:50 I was just trying to make the vnc ALSO open the console for virsh console 20:49:07 dgilmore, we have had problems with every box when making more than one RAID array. 20:49:26 it makes the first one, and then sometimes makes the second one, and sometimes makes up a new one. 20:49:40 skvidal can go over that one :) 20:49:41 dgilmore: and when you migrate from an older raid array - it has had some issues - especially when spares are involved 20:49:56 dgilmore: I filed a bug on it - and it's being worked on 20:50:07 dgilmore: apparently we're the only people using rhel with spares :) 20:50:11 * skvidal is kidding 20:51:03 dgilmore: it's not horribly broken - but I had to nuke the drives on the hw to get anaconda to let me install the box 20:51:20 but we're WAY off in the weeds 20:51:37 smooge: ive had no issues with raid arrays in el6. all the builders have arrays, as does quite a few other boxes ive build using kickstart 20:51:38 ricky: if you want to do rhel6 installs on kvm - yell at me if you run into anything 'odd' and I'll make sure I update any docs 20:51:44 Sure thing, thanks 20:51:45 skvidal: :( ok 20:52:40 so - fas01 migration 20:53:05 smooge: just out of curiosity do we have a xen host holding 2 less critical pieces of infrastructure? 20:53:29 alternatively 20:53:32 dgilmore, we can test on bvirthost01 its been rebuilt 5 times with different results each time. If I have a problem with the kickstart I want to fix it. 20:53:35 dgilmore: do you need bxen01 at all? 20:53:37 but back to the problem at hand 20:53:59 skvidal: nope 20:54:02 okay 20:54:05 is it under warranty? 20:54:10 skvidal: it was there just for xb-01 20:54:19 skvidal: probably not 20:54:19 skvidal, db02 can move and bastion01 is a backup box 20:54:35 dgilmore: ah :( sad face 20:54:36 okay 20:54:47 skvidal: it could be still 20:54:53 mmcgrath: would know 20:54:56 is bxen01 a dell? 20:55:12 smooge: yes 20:55:50 it is not under warranty anymore 20:55:55 womp womp 20:55:57 then never mind 20:56:03 and was to be replaced with bvirthost01 20:56:08 okay - 20:56:12 then just ignore me 20:56:22 bastion01 goes off - and we move db02 20:56:23 reinstall xen15 20:56:34 and use it for bublble sort 20:56:55 and put fas01 on xen15 20:57:09 then bastion01 and db02? 20:57:36 I would go with bastion01 going back 20:57:44 actually that would break the buble sort 20:57:53 bastion01 can go on virthost01 20:58:21 if I can get iscsi exported to virthost02 we can put db02 there 20:59:12 then we work on say xen11 or xen12 20:59:35 * rbergeron pokes in just ahead of the cloud sig mtg and waves hi to her favorite infrastructure folks 20:59:49 we've got a lot of tickets left 20:59:58 can we continue - or do we need to move to -2? 21:00:09 lets move 21:00:25 Or #fedora-admin where people already are :-) 21:00:40 ricky: too much random noise 21:00:41 imo 21:00:42 * rbergeron feels bad about disrupting every week 21:00:51 rbergeron: doesn't seem to STOP you 21:01:04 no, no, it surely doesn't. :) 21:01:07 ok #fedora-meeting-2? 21:01:22 rbergeron, did you see the poke earlier about possible effects on your schedule? 21:01:27 smooge: Sure. Don't forget to stop meeting here. 21:01:30 #endmeeting