infrastructure
LOGS
19:00:01 <nirik> #startmeeting Infrastructure (2011-07-21)
19:00:01 <zodbot> Meeting started Thu Jul 21 19:00:01 2011 UTC.  The chair is nirik. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:01 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
19:00:01 <nirik> #meetingname infrastructure
19:00:01 <zodbot> The meeting name has been set to 'infrastructure'
19:00:01 <nirik> #topic Robot Roll Call
19:00:01 <nirik> #chair smooge skvidal codeblock ricky nirik abadger1999
19:00:01 <zodbot> Current chairs: abadger1999 codeblock nirik ricky skvidal smooge
19:00:05 * skvidal is here
19:00:07 * CodeBlock waves
19:00:14 <smooge> Here
19:00:44 * athmane here
19:00:49 * nirik waits a minute more for folks.
19:01:50 <nirik> #topic New folks introductions and apprentice tasks/feedback
19:02:06 <nirik> Any new folks like to say hi? or apprentices have questions/comments/tasks to talk about?
19:03:04 * nirik listens to the sound of silence. ;)
19:03:16 <CodeBlock> good song. :P
19:03:24 <nirik> ok, do feel free to chime in on list or in our regular channels anytime...
19:03:30 <nirik> #topic Phoenix on-site work recap/summary
19:03:51 <nirik> So, smooge and I were out at phx the other week. Just thought I would summarize what all we did...
19:04:05 <abadger1999> hola
19:04:20 <nirik> We setup a new backup box and tape drive. We will be transitioning to this one from our existing one in the coming weeks/months.
19:04:46 <nirik> We pulled old machines out and sent some to the great place in the sky for old hardware.
19:05:37 <skvidal> we sent the machines to colorado?
19:05:47 <nirik> We took 5 of the newest/most useful looking boxes and stuck them in a new rack as 'junk01' to 'junk05'. We were thinking we might be able to use these as a testbed for things that we want to see if might someday work for us.
19:05:52 <nirik> ha. ;)
19:06:11 <nirik> We moved all the qa machines to a qa rack and network
19:06:32 <nirik> we took inventory/tried to make sure all management/serial/power was setup and known
19:06:55 <nirik> we were going to rack new machines, but they didn't arrive in time... so they will be added as we can get someone there to rack them for us.
19:07:20 <nirik> smooge: can you think of anything I left out? I probibly did miss some things...
19:07:34 <skvidal> you broke the backups?
19:07:35 <smooge> I got sick
19:07:42 <skvidal> oh - you meant things you did intentionally...
19:08:12 <nirik> yeah, I have no idea how backups broke. ;( We must have nudged the tape drive some how... it was in a weird state.
19:08:22 <smooge> yeah..
19:08:25 <nirik> but that should be fixed now.
19:08:34 <smooge> we learned that various boxes have only one power supply
19:08:40 <smooge> even if they have multiple plugs
19:08:51 <nirik> oh, I have some pics, need to see if they came out at all.
19:09:04 <skvidal> nirik: hmm - can we put them into the infra-hosts repo? :)
19:09:15 <smooge> my 'love' of ppc found new depths
19:09:38 <nirik> sure, if they are usable. ;) It wasn't easy getting far enough away to get anything... so they might be out of focus, etc.
19:09:43 <smooge> I wish I had another week out there so I could get the rest of the hardware
19:10:18 <nirik> I think it was a very productive trip... we got a lot done/cleaned up/etc
19:11:09 <nirik> which brings us to the next topic...
19:11:17 <nirik> #topic QA network setup brainstorming
19:12:12 <nirik> so, we have 2 racks in a qa network... this includes some qa test boxes, a virtual host that has autoqa01 and autoqa01.stg and bastion-comm01 on it along with the junk boxes and the secondary arch stuff
19:12:36 <nirik> qa folks have expressed interest in monitoring and puppet or puppet like setup.
19:12:57 <nirik> how seperate do we want the qa setup from our main setup?
19:13:17 <athmane> so we separate monitoring too ?
19:13:45 * skvidal thinks separate monitoring is overkill. having said that we have a lot of legacy in our existing nagios layout
19:13:47 <nirik> athmane: thats a question, yeah. It seems a pain to have more nagios to me... but it would allow them to monitor their own stuff without ours
19:14:25 <nirik> I think we could get them to possibly use bcfg2 there, as a testbed. They have many fewer machines than we do.
19:14:38 <nirik> I'm not sure how usefull bastion-comm01 is.
19:15:12 <nirik> I guess we wanted seperate from our bastion for access there.
19:16:42 <CodeBlock> hmm
19:16:45 <nirik> anyone have thoughts or ideas? I guess I am leaning toward no seperate monitoring, stick bcfg2 on bastion-comm01 and make it the config host there... then move bastion-comm01 and virthost-comm01 out of our puppet.
19:17:03 <dgilmore> i think we do one of 2 things
19:17:21 <dgilmore> fully integrate it wheich means move it back to a fedora network
19:17:25 <dgilmore> or fully seperate
19:17:34 <athmane> yes, I don't see the need to separate monitoring, so I agree with skvidal
19:18:04 * nirik nods. we could easily just add their hosts to our puppet, but then qa folks would need access to our puppet. ;) (which I don't really know if it's a big issue or not)
19:18:18 <dgilmore> its not until it is
19:18:30 <athmane> separate config is good for security imho (qa net is more like a lab)
19:19:40 <abadger1999> Did we ever update to the new version of nagios?
19:19:44 <smooge> We went the seperate fact since the boxes there can run stuff
19:19:45 <nirik> abadger1999: yep.
19:19:48 <abadger1999> k
19:20:06 <nirik> abadger1999: we are on nagios3 now
19:21:02 <nirik> ok, I'll gather more info and talk to qa folks and set it up one way or the other. If anyone has strong ideas on it, let me know soon...
19:21:19 <abadger1999> Proposal sounds good... the only question I have is how long well run both bcfg2 and puppet
19:21:46 <nirik> abadger1999: well, it would depend on how well it works out there... and then if it did we would need some kind of transition plan.
19:21:53 <abadger1999> Yeah.
19:21:54 <skvidal> abadger1999: and if we like it at all
19:22:26 <nirik> I think this is a nice small group to test with...
19:22:29 <abadger1999> Would we want a transition plan either way?  If we do like it migrate fi-main to bcfg2, if we don't like it migrate fi-qa to puppet?
19:22:41 <nirik> 8 qa machines, 2 autoqa instances, a bastion and a virthost.
19:23:16 <nirik> abadger1999: yeah. either migrate back to puppet there, or fold them into our puppet.
19:23:27 <nirik> but if it's already seperate, probibly just migrate them back to their own.
19:23:35 <abadger1999> Sounds like a plan.
19:24:09 <nirik> #action nirik to continue talks with qa and move stuff
19:24:21 <nirik> anything more on this?
19:24:44 <nirik> #topic Outstanding RFR (Request for Resources)
19:24:53 <nirik> I noticed we have a number of RFR's open.
19:24:58 <smooge> oi
19:25:01 <nirik> I added a list to the agenda email
19:25:20 <nirik> many of them are old or in an unknown to me state. ;)
19:25:35 <smooge> close
19:26:05 <pingou> #1591 is two years old
19:26:08 <nirik> yeah, if anyone wants to update them, please do. Otherwise I will look at closing...
19:26:09 <pingou> and fpaste.org is running
19:26:23 <nirik> pingou: yeah, but that one it turns out is active. ;)
19:26:30 <nirik> herlo is going to be updating it.
19:26:50 <pingou> I saw something about it but didn't get the issue
19:26:56 <nirik> the fpaste.org folks are tired of running it, and want us to.
19:27:10 <pingou> do we ?
19:27:18 <nirik> it's been finally packages up.
19:27:20 * StylusEater is late
19:27:27 <nirik> packaged.
19:27:30 * nirik can't type
19:27:48 <smooge> Have we taken over fpaste?
19:27:54 <Southern_Gentlem> nope
19:28:10 <nirik> not yet... it's unclear to me the status of the domain...
19:28:45 * pingou wonders what make them tired (eg that wouldn't make us tired)
19:28:48 <nirik> askbot is also active recently. Others I am not too clear on.
19:28:59 <ciphernaut> maximum 24hour lifecycle.  Is that enough ?
19:29:33 <nirik> pingou: spam, dealing with upkeep, paying for the instance that runs it, etc
19:29:41 <nirik> ciphernaut: for what?
19:30:17 <ciphernaut> nirik, for anything/everthing.
19:30:42 <Southern_Gentlem> !
19:30:59 <nirik> Related to RFR's: I am going to try and whip up a SOP page on process around them...
19:31:04 <Southern_Gentlem> ciphernaut,  if you are referring to fpaste we have found that works very well
19:31:23 <Southern_Gentlem> its a pastebin not permanet hosting
19:31:44 <ciphernaut> most pastebins I've dealt with have 1 month or forever.. though if thats the majority of required usage cool
19:31:58 <nirik> well, thats all details we can tune later right?
19:32:08 <Southern_Gentlem> yep
19:32:13 <ciphernaut> true
19:33:14 <smooge> Of the items what should be at PHX2 and what outside (and where)
19:33:19 <nirik> for ask and paste I would like to try a new process: applicationname01.dev -> applicationname01.stg -> application01 (production). Create the group from the start that will work on it, etc.
19:33:57 <nirik> smooge: yeah. That should be determined at least at the stg point in the process... should it be load balanced/cached or not.
19:35:00 <nirik> I think both ask and paste are good to be nice and seperate as we can easily make them... ie, their own instance/db. I don't know how well clustering/replication will work for them, thats something we also need to find out.
19:36:30 <nirik> anyhow, will try and update the RFR page and make a SOP and send to list for more comment.
19:36:51 <nirik> so we can try and have a process for these.
19:37:27 <nirik> anything else on RFR's? any others folks want to save/comment on?
19:37:31 <smooge> yeah I like that process
19:37:40 <smooge> nitrate sounds like another one for that
19:37:58 <nirik> yeah, it's stalled in review... so not sure whats going to happen there.
19:38:04 <athmane> nitrate is not yet packaged
19:38:42 <nirik> perhaps step 0 of the rfr process should be: "get it packaged, then come back here" to avoid filing RFR's too far in advance.
19:39:11 <smooge> hehehe that would make a lot of stuff easier on us.
19:39:17 <athmane> we (qa team) still use wiki for record test results but i heard about a pilot project to use nitrate
19:39:34 <nirik> nitrate looks cool from a quick glance...
19:39:49 * athmane forgets if for f15 or f16
19:40:02 <nirik> anyhow, if nothing else on this, moving on...
19:40:17 <nirik> #topic Hotfixes
19:40:28 <abadger1999> I think dmalcolm and dgilmore were working on the python buildbot one.
19:40:32 <nirik> So, we also have a pile of hotfixes built up over the last while.
19:40:58 <nirik> abadger1999: ok, will ping them for status. ;)
19:41:17 <nirik> https://fedorahosted.org/fedora-infrastructure/query?status=new&status=assigned&status=reopened&group=component&summary=~hotfix&order=priority
19:41:38 * nirik waits while hosted dies because we all clicked.
19:41:49 <smooge> dead dead dead
19:41:56 <athmane> nirik: :)
19:42:00 <smooge> we need our own hosted just for us
19:42:12 <CodeBlock> :S
19:42:17 <dgilmore> abadger1999: i need to work on that
19:42:29 <dgilmore> it was part of why we moved some builders tp be virtual hosts
19:42:56 <CodeBlock> Yeah, I did some testing/pingdom loading on fedorahosted's gitweb index, and it took over 16 seconds to load on average. That's just bad :(
19:43:13 <nirik> anyhow, what are the chances we could get a pkgdb release update, supybot-koji, pinglists, mediawiki-116? ;)
19:44:00 <nirik> dgilmore: huh... which ones are virtual?
19:44:44 <smooge> I will be working on mediawiki-116/117
19:45:04 <smooge> xb01 I thought was the virtual builder
19:46:09 <nirik> smooge: not sure if the mediawiki bug was filed upstream yet... I guess ping ricky on it.
19:46:23 <smooge> ok will do so
19:46:38 <smooge> oooh long trac traceback
19:47:21 <abadger1999> nirik: pkgdb update is something I want to do in the next month.  It's probably third on my list of "not-a-fire" tasks, though.
19:47:35 <nirik> abadger1999: ok, cool. Would close a number of hotfixes. ;)
19:47:45 <abadger1999> Seems that I always run into one freeze or another before getting them out :-)
19:48:09 <nirik> yeah.
19:48:19 <abadger1999> nirik: hehe.  The only thing is a lot of hotfixes go in just after a release due to finding new bugs in the code :-)
19:48:29 <nirik> yep. it's a never ending cycle. ;)
19:48:48 <nirik> which brings us to the next topic:
19:48:54 <nirik> #topic Upcoming Tasks/Items
19:49:34 <nirik> basically the only things on my list are the freezes for right now:
19:49:37 <nirik> 2011-08-01 mail fi-apprentice folks.
19:49:37 <nirik> 2011-08-02 - 16: Alpha change freeze
19:49:37 <nirik> 2011-08-09 Remove inactive fi-apprentice people.
19:49:37 <nirik> 2011-08-16: Fedora 16 alpha
19:49:37 <nirik> 2011-09-06 - 20: Beta change freeze
19:49:38 <nirik> 2011-09-20: Fedora 16 Beta
19:49:40 <nirik> 2011-10-11 - 25: Final change freeze
19:49:43 <nirik> 2011-10-25: Fedora 16 release.
19:49:53 <nirik> so, we have until the 2rd before our first freeze.
19:50:10 <nirik> If anyone wants to work on/schedule things, please let me know.
19:50:33 <nirik> more moving things to rhel6.
19:50:35 <abadger1999> We should get the change freezes into the infra calendar
19:50:49 <nirik> yeah, keep meaning to, then getting distracted. ;(
19:50:59 <skvidal> nirik: app servers, proxies, hosted... what else is on the migrate to rhel6 thing?
19:51:02 <nirik> anyone interested in updating the calendars? :)
19:51:39 <nirik> skvidal: last I looked we were just over 50% rhel6, so all the rest.
19:51:44 <skvidal> nirik: :)
19:51:47 <skvidal> smartass
19:51:58 <nirik> I think ibiblio01 we can move over once we have a ibiblio02 we can migrate things to
19:52:06 <smooge> infra calender?
19:52:16 <nirik> tummy01 might be a good one to remote re-install.
19:52:30 <nirik> value's might not be hard to migrate over.
19:52:48 <skvidal> looks like 64 hosts on 5server
19:53:11 <nirik> once we have new machines racked up in phx2, we can move more things there.
19:53:13 <skvidal> http://fpaste.org/HTSO/
19:53:43 <abadger1999> smooge: http://kevin.fedorapeople.org/infrastructure-*.ics
19:53:58 <nirik> smooge: they are in the git infra repo too.
19:54:07 <skvidal> serverbeach1 should be doable and would be an interesting case to find out if the serverbeach boxes will be able to survive el6
19:54:35 <nirik> skvidal: yeah. I was meaning to talk to them about a hardware refresh at the same time, but didn't get to that either. ;)
19:55:24 <nirik> for many of the rhel5 instances, we need to move their host to rhel6/kvm before moving them.
19:55:28 <skvidal> nod
19:55:38 * skvidal grimaces at torrent
19:55:44 <skvidal> hmm
19:55:50 <skvidal> cnode01...
19:55:56 <skvidal> and dhcp02.c
19:55:59 <skvidal> not our problem soon
19:56:23 <nirik> tummy01 and bodhost might be good ones to re-install as they don't have any critical stuff on them I don't think. we could even leave the guests lvm alone and bring them back up after the re-install
19:56:36 <skvidal> nod
19:57:07 <smooge> also what are we using serverbeach1 for?
19:57:24 <nirik> bxen03 only has releng01 on it... once we get a new machine racked in phx2 in the build rack I can move that over and we can reinstall bxen03
19:57:28 <skvidal> smooge: a mirror istr
19:57:34 <nirik> smooge: another download mirror I think is all.
19:57:40 <nirik> thats not phx2.
19:57:56 <skvidal> sb1 has had a host of issues trying to make it be a virthost
19:58:24 <nirik> another possibly good reason asking about a hw refresh. ;)
19:58:33 <skvidal> nod
19:58:45 <nirik> #topic Open Floor
19:58:59 <nirik> running low on time, any other plans/ideas/dreams?
19:59:24 <skvidal> dreams
19:59:25 <skvidal> yes
19:59:36 <skvidal> anyone here looked at salt? http://saltstack.org/
19:59:48 <skvidal> I've been playing with it a bit and looking over the features in it
19:59:51 <nirik> I glanced at it the other day... first I had heard of it.
20:00:00 <skvidal> it's more or less func + zeromq for the communication layer
20:00:07 <skvidal> fairly fascinating, actually.
20:00:08 <skvidal> all in python
20:00:23 <skvidal> and the devs definitely have a use case like we have in mind
20:01:06 <nirik> cool. So it means clients listen to a bus for actions?
20:01:16 <skvidal> more or less, yes.
20:01:29 <skvidal> it means the clients don't need a port open
20:01:31 <smooge> heheh I have that calender in my system already. I will update the calenders this week
20:01:32 <skvidal> like we have right now wit hfunc
20:01:41 <skvidal> so it means one more port closed off
20:01:45 <smooge> what is zeromq?
20:01:45 <nirik> cool.
20:01:49 <skvidal> which is good
20:01:58 <skvidal> smooge: google is your friend :)
20:02:03 * nirik remembers lots of talk about message bussing a few years ago, but it never seems to have taken off.
20:02:20 <skvidal> there are a couple of things here that are interesting to me.
20:02:35 <smooge> amcq or something :)
20:02:57 <skvidal> 1. whether or not this adequately covers the functionality of what func has been providing for us?
20:03:18 <skvidal> 2. I'm looking at if I can port functionality like func-yum to it and have it all work the same (which would be amusing)
20:03:45 <nirik> cool.
20:03:46 <skvidal> 3. one of the things we wanted out of qpid/amqp is notifications/events as well. - the question is if we can get there from here with zeromq
20:03:57 * nirik nods. That was my next question.
20:04:09 <abadger1999> smooge: It's a library that implements easy to program buffered network sockets.. depending on who you ask, it's a lightweight message bus or nearly everything you need to make a message bus.
20:04:48 <skvidal> nirik: much to wonder and play with...
20:04:59 <skvidal> anywya - just wanted to ask if anyone here already had experience
20:05:01 <nirik> yeah, keeps things fun/interesting. ;)
20:05:02 <smooge> want to use a couple of cloud instance to do so?
20:05:18 <smooge> I read about it yesterday.. interesting to see if its packaged etc?
20:05:22 <skvidal> smooge: right now I'm just dinking with it on guests on my laptop
20:05:36 <skvidal> smooge: there are pkgs - not in fedora - b/c of our zeromq ver
20:05:50 <skvidal> the authors of salt appear to be rpm-friendly people, though
20:06:20 <lmacken> salt looks interesting... do you see that potentially obsoleting func?
20:06:46 * nirik notes we are over time, but I don't think anyone else is scheduled to meet, so we should be able to just keep going. ;)
20:06:55 <skvidal> lmacken: it has a lot of the same functionality
20:07:04 <skvidal> lmacken: and it offers a very similar plugin infrastructure
20:07:21 <skvidal> lmacken: I talked to the lead dev - the reason it is similar is b/c he had investigated func before working on salt
20:07:29 <skvidal> lmacken: it's not accidental.
20:08:07 <skvidal> lmacken: he doesn't have the same modules but a goodly number of func's modules are bound up with some xmlrpc-isms.
20:08:46 <skvidal> and the connect-out-only is useful for us.
20:08:54 <skvidal> which is the main thing pulling me at it
20:09:07 <skvidal> also that it doesn't require us to tie up qpidd on a systems-mgmt tool is nice
20:09:16 <lmacken> yeah, true
20:09:16 <skvidal> so qpidd can be used for other apps that need it w/o any conflict
20:09:44 <lmacken> speaking of our message bus vision, hopefully we'll pickup some momentum on that in the near future
20:10:13 <nirik> lmacken: cool.
20:10:41 <nirik> ok, anything else, or shall we call it a meeting?
20:11:28 <skvidal> that's all I have
20:12:07 <nirik> cool. Thanks everyone. Lets get back to #fedora-admin and #fedora-noc. ;) Thanks for coming everyone...
20:12:10 <nirik> #endmeeting