infrastructure
LOGS
20:01:24 <smooge> #startmeeting infrastructure
20:01:24 <zodbot> Meeting started Thu Jan 13 20:01:24 2011 UTC.  The chair is smooge. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:01:24 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
20:01:33 <smooge> #meetingname infrastructure
20:01:33 <zodbot> The meeting name has been set to 'infrastructure'
20:01:42 <smooge> #chairs ricky skvidal
20:01:51 <smooge> #chair ricky skvidal
20:01:51 <zodbot> Current chairs: ricky skvidal smooge
20:02:00 <smooge> #topic Robot Roll Call
20:02:19 * nirik is lurking around.
20:02:29 <ianweller> tom servo!
20:02:40 <smooge> crowbot
20:03:14 * dgilmore is kinda here
20:03:19 <smooge> skvidal is kinda here
20:03:27 * skvidal is here
20:03:28 <skvidal> sorry
20:03:32 <skvidal> I was not in the channel
20:03:39 <dgilmore> 6am is before propper wakeup time
20:03:46 <smooge> dgilmore, ugh sorry.
20:03:47 * abadger1999 waves
20:03:56 <dgilmore> smooge: ehh
20:04:05 * gomix waves hi... probably lurk...
20:04:09 <smooge> #topic this weeks business
20:04:17 * fchiulli sits in the rafters
20:04:41 <smooge> Ok we(skvidal) did updates to all the servers and we spent tuesday rebooting systems
20:04:53 <smooge> skvidal anything you wanted to add?
20:04:57 <skvidal> yay for reboots
20:05:04 <skvidal> and oh look - we get start again
20:05:15 * nirik wonders about 5.6 plans. ;)
20:05:25 <dgilmore> skvidal: new kernel update?
20:05:36 <skvidal> dgilmore: rhel 5.6 , yah
20:05:49 <skvidal> but the kernel update doesn't have much in the way of anything compelling or pressing afaict
20:06:22 <dgilmore> ok
20:06:50 <smooge> unless someone has put /dev/ecryptfs as 755 or something
20:06:58 <dgilmore> do we have a list of rhel5 machines we want to move to rhel6?
20:07:31 <smooge> not yet. I had been putting that off til we got a replacement mmcgrath
20:07:43 <skvidal> dgilmore: umm, all of them?
20:07:48 <skvidal> dgilmore: :)
20:07:55 <dgilmore> skvidal: well yeah.
20:07:56 <smooge> but since it may be a while... we can go to next stage of ordering them
20:08:17 <smooge> #topic EL5 -> EL6
20:09:20 <smooge> ok first item so I can say I read f-a-b and lwn.net. We have no plans for go from EL5 -> FX for infrastructure. I don't think it is feasible for our kind of development
20:09:30 <skvidal> +1
20:09:37 <skvidal> I don't think that's to be contested
20:09:47 <skvidal> and anyone who wants to put up a fight about it is going to find themselves in trouble
20:09:52 <skvidal> b/c no one with root access is going to do that
20:10:00 <skvidal> and that's really that, afaict
20:10:19 <smooge> wow you said it in a much nicer and cleaner way than I was going to.
20:10:27 <abadger1999> <nod>
20:10:28 * nirik finds this a non starter, lets move on. ;)
20:11:03 <skvidal> +1
20:11:06 <skvidal> move on
20:11:12 <smooge> ok next item. server list for moving.
20:11:31 * skvidal generates an el5 list
20:11:35 <smooge> Do we want to look at doing it bottom up or top down
20:11:59 <smooge> and do we want to take up the meeting with this? [I just put it in topic as it was what was started.]
20:12:10 <smooge> Oh quick change while skvidal does his thing
20:12:33 <smooge> last week we did an out of band meeting in #fedora-meeting-3
20:12:44 <smooge> and triaged all the tickets that were listed as meeting.
20:12:59 <smooge> I think we ended up with a good list of what we can do quickly and what not.
20:13:12 <smooge> I just need to find those notes and post them for communal memory sake.
20:13:29 <smooge> #action smooge will post logs from last weeks meeting and this weeks.
20:13:29 <nirik> did anyone update the indicated tickets?
20:13:33 <nirik> that might be good to do also.
20:13:38 <skvidal> el5 boxes
20:13:39 <skvidal> http://fpaste.org/yGR7/
20:13:42 <skvidal> according to func
20:14:13 <smooge> nirik, once I find the notes I will do so.  I got caught up in somehting else and forgot.
20:14:19 <smooge> #topic EL5 -> EL6
20:14:45 <smooge> I would say we would want to do the following:
20:15:22 <smooge> 1) Rebuild staging to EL6 say db/proxy servers first, and then work our way to the middle. app servers and such.
20:15:47 <smooge> The reason is that app servers are probably going to take the longest
20:16:02 <abadger1999> question -- do all our backup solutions work with el6?
20:16:28 <abadger1999> as I don't know that we test that aspect in stg
20:16:30 <smooge> mmcgrath, created a old bacula rpm for us to use on the servers
20:16:36 <abadger1999> k
20:16:45 <smooge> the drbackup will need to be checked
20:16:56 <smooge> I need to fully grok it anyway.
20:18:00 <abadger1999> question -- do we need to consider the hosts that are running the virtual machines when figuring out order?
20:18:02 <mmcgrath> smooge: FWIW, drbackup is pretty simple.  just rsync and some filesystem acl's
20:18:06 <smooge> my second idea was that we have a bunch of xen boxes going EOL early next year. I was figuring we would just go with "shutdown and replace with kvm" versus worrying too much
20:18:38 <smooge> mmcgrath, I figured it was.. but there is that and then there is the "oh thats how it really works.. wow I should have realized that 2 months ago"
20:18:43 <skvidal> smooge: shutdown and replace seems like a reasonable move to me
20:18:57 * skvidal makes a note to bring up something in #open floor
20:19:04 <smooge> I have one more box to get a quote and order next week
20:19:16 <dgilmore> id like to move bacula to EL-6s versions
20:19:19 <smooge> that should allow us to bubble sort stuff
20:19:38 <smooge> dgilmore, I would too. However.. I am not sure how to do it without losing old backups.
20:19:54 <dgilmore> smooge: we can work that out
20:19:55 <smooge> dgilmore I wanted to talk to you about it when you got back or at Fudcon
20:20:04 <skvidal> so the new bacula is completelyt backward incompat with the old bacula
20:20:09 <dgilmore> we should talk about it at fudcon
20:20:10 <skvidal> it can't even READ the old backups?
20:20:13 <smooge> dgilmore, we are about ready to order another tape server.
20:20:25 <dgilmore> skvidal: afaik it should be able to read the backups just fine
20:20:45 <dgilmore> skvidal: the issues is the on the wire protocols changed
20:20:46 <smooge> skvidal, I am not sure. We are multiple major versions behind what is in EL6
20:20:56 <dgilmore> so the new client cant talk to the old server
20:21:03 <smooge> and the database layout changed also
20:21:08 <skvidal> okay
20:22:17 <smooge> what I had read so far was we could move from 2->3..->5 but not tested for 2->5 (I think those are the versions looked at).
20:22:31 <smooge> anyway.. fudcon
20:23:05 <smooge> so anything else people want to discuss on 5->6?
20:23:17 * ricky is here
20:23:29 <smooge> hey ricky
20:25:00 <smooge> #topic new netapp
20:25:15 <smooge> ok we are going to have a new netapp for PHX2 systems coming up RSN
20:25:19 <ricky> I did a bacula 2->3 conversion at home without troubles for what it's worth - restores were fine
20:25:46 <smooge> the hardware is in place and RHIT is doing a lot of testing to make sure its working
20:26:01 <smooge> new system will be all Fibre Channel..
20:26:29 <smooge> we will be looking at a series of outages either before Fudcon or after fudcon depending on various testing.
20:26:58 <smooge> most outages should be on the order of 4 hours as data from netappA -> netappB
20:27:13 <smooge> is done one final time.
20:27:31 <smooge> however times/dates will be finalized early next week.
20:27:42 <smooge> questions I need to know is any preference from us on when/where?
20:28:44 <smooge> skvidal, did I forget anything?
20:28:51 <skvidal> nope
20:29:06 <smooge> ok in that case.. call for topics before #open?
20:29:38 * skvidal has 3 items
20:29:40 <skvidal> for open floor
20:29:44 <smooge> #topic Open Floor
20:29:49 <smooge> floor goes to skvidal
20:30:00 <skvidal> okay
20:30:02 <skvidal> item 1
20:30:12 <skvidal> disabling fsck on-mount-time - anyone opposed to this?
20:30:31 <ricky> What are the pros/cons?  Any affect safety-wise?
20:30:37 <skvidal> this is just disabling the "it has been 180 days" crap
20:30:47 <skvidal> ricky: unless disks are damaged it won't turn up anything
20:30:55 <skvidal> and if disks are damaged it won't do much good
20:31:03 <skvidal> it's been disabled in fedora for a while now
20:31:18 <skvidal> if the disk is not umounted cleanly it will always run
20:31:34 <skvidal> we're only turning off the 'it has been X days since an fsck, check forced' part of things
20:31:39 <ricky> Ah, OK.  I'm pretty satisfied if it was decided to be OK for Fedora :-)
20:32:01 <skvidal> anyone else have another opinion?
20:32:07 <smooge> I say go ahead
20:32:26 <smooge> func-command go
20:32:33 <nirik> there's the time and the number of mounts...
20:32:35 <skvidal> hah - I dunno if I'll do it with func or not
20:32:41 <nirik> both are fine to be 0 IMHO
20:32:47 <skvidal> nirik: I doubt we'll ever hit the number of mounts :)
20:32:52 <skvidal> nirik: but I agree -in either case
20:32:59 <nirik> it's happened to me before. ;)
20:33:05 <nirik> just need an instance that crashes a lot
20:33:17 <skvidal> releng!
20:33:31 <nirik> or fas01
20:33:50 <skvidal> :)
20:33:58 <skvidal> sounds like no objections
20:33:59 <skvidal> next item
20:34:14 <skvidal> who is going to be at fudcon in tempe,az?
20:34:41 * ricky 
20:34:41 <skvidal> don't everyone answer at once
20:34:46 <skvidal> :)
20:34:55 <gomix> me
20:34:57 <dgilmore> skvidal: im ok with disabling the auto every 180 days fsck
20:34:59 * ianweller 
20:35:08 <dgilmore> skvidal: ill be at fudcon
20:35:25 <skvidal> okay
20:35:25 <smooge> I will be
20:35:31 <skvidal> so we'll have a considerable number of folks
20:35:33 <skvidal> that's good
20:35:37 <smooge> I will be split between it and the colo
20:35:38 <abadger1999> i will
20:35:45 <ricky> smooge: Really?  Aw :-(
20:36:02 <skvidal> it might be worth discussing this next item at fudcon
20:36:26 <abadger1999> smooge: and at fudcon you'll be split between infra and the board...
20:36:28 <skvidal> but I'd like to get a decision on whether or not using machines in rackspace or amazon's cloud is acceptable for us
20:37:13 <skvidal> neither of the services are 'open source' - but then neither is serverbeach nor telia nor internetx
20:37:49 * nirik should be there.
20:37:50 * smooge would mention that we should only be building on pure opensource MIPS systems but someone might take him serious
20:37:51 <abadger1999> rbergeron: By any chance, do you know if anyone from rackspace will be at fudcon?
20:37:52 <ricky> I guess it depends on how we use the service - like will we end up writing a lot of code that ties into jut their API?
20:38:21 <ricky> And is that something we want to put time into doing if that talks to a closed source service
20:38:32 <smooge> my main issue is that we need to budget for it
20:38:34 <skvidal> ricky: a closed-source service?
20:38:45 <rbergeron> abadger1999: rackerhacker, who is the guy who puts up the fedora images into their various hosting stuff, is coming.
20:38:53 <rbergeron> He's a fedora lovah.
20:39:02 <skvidal> rbergeron: glad he's coming - it'll be nice to meet him
20:39:13 <dgilmore> smooge: there is a fedora-mips we could run on it
20:39:16 <ricky> As in the thing that the cloud APIs talk to (mostly guessing here, no real experience with any of this)
20:39:29 <skvidal> ricky: the modules they use are open source
20:39:35 <skvidal> ricky: and many are in fedora now - python-boto
20:39:55 <rbergeron> abadger1999: also, my understanding had been that osmeone from the openstack side of things will also be coming, but the person who it would have been (rick clark, aka dendrobates) can't make it, but he was trying to find someone appropriate to come in his place.
20:39:59 <skvidal> ricky: the eucatools are another set of open source client ends
20:39:59 <ricky> No objections from me of those are open source/not locked into any particular vendor
20:40:00 <rbergeron> abadger1999: why?
20:40:22 <dgilmore> rbergeron: to talk to them :)
20:40:30 <rbergeron> ohhhhh. I see.
20:40:48 <dgilmore> ricky: there not open source, and are locked to the vendor
20:40:55 <rbergeron> I know rackerhacker is coming. :)
20:40:56 <ricky> Or to IRC to them while sitting across from them :-)
20:40:57 <skvidal> ricky: the client tools are open source
20:40:59 <skvidal> ricky: the backends are not
20:41:08 <dgilmore> though its supposed to be easy to move your data to a different provider
20:41:14 <abadger1999> rbergeron: Very cool.  Let us know if we can make influence someone coming by telling them what we want to discuss :-)
20:41:19 <skvidal> dgilmore: well in our case it would be 'reinstall elsewhere'
20:41:25 <skvidal> I'm mostly not worried about the client tools, actually
20:41:40 <skvidal> I'm more thinking that the hosting costs could be considerably cheaper
20:41:46 <smooge> this is for the build shit fast and far?
20:41:48 <skvidal> and we don't have to play the "well do we have space in the ack" game
20:42:04 <skvidal> smooge: build shit fast, far and also to deploy publictest/dev boxes quickly and w/o all the bullshit
20:42:23 <dgilmore> skvidal: right
20:42:42 <skvidal> so - let's say we treat both rackspace and ec2 just like we do serverbeach
20:42:57 <skvidal> but instead of having to call and talk a new site out of them
20:42:59 <skvidal> we just spin one up
20:43:00 <skvidal> boom
20:43:02 <dgilmore> im personally against having the builders on other providers systems
20:43:13 <skvidal> dgilmore: okay, why?
20:43:19 <skvidal> but let's not just think about builders
20:43:24 <skvidal> let's also think about publictest boxes
20:43:39 <skvidal> and even additional on-demand app and proxy## servers
20:44:51 <dgilmore> skvidal: a few reasons, but the biggest being that we cant be sure that a build has not been tampered with. some vunerabily  or someone in the providers hosting could effect a build. inject something we dont want
20:45:15 <skvidal> dgilmore: so two things to speak to that
20:45:21 <skvidal> 1. we don't have to be talking about official builds
20:45:24 <skvidal> 2. we can't be sure of the above now
20:45:31 <dgilmore> skvidal: but i do have some ideas for how we could better utilise spare capacity on the builders
20:45:46 <tibbs> Oh, so do I.
20:46:23 <skvidal> okay - so the answer to my question seems to be that if we use the cloud providers as we use the other hosting providers - that no one objects to using them
20:46:27 <skvidal> is that roughly true?
20:46:39 <dgilmore> skvidal: thats roughly true
20:46:39 <abadger1999> ehh...
20:46:55 <skvidal> abadger1999: ?
20:47:16 <abadger1999> If there's a difference in the open-sourceness of the two I'd rather go with the one that's more open source.
20:47:40 <skvidal> abadger1999: well both are just as open source as serverbeach
20:47:42 <abadger1999> But the cost compared to doing all of our own work is very compelling.
20:47:42 <skvidal> or internetx
20:47:51 <skvidal> when it comes to their infrastructure
20:48:09 <skvidal> hell and considering we use cyclades and friends inside phx - they're just as open as we are.
20:49:17 * gholms notes that eucalyptus is compatible with ec2  :)
20:49:38 <skvidal> gholms: and if you can find a provider using only eucalyptus I'm all ears
20:49:53 <smooge> just wants to know how to pay the bill without using his credit card. after that I don't care. [I am so much a candidate for management now :)]
20:50:03 <gholms> Ah, the aim is to run it on someone else's hardware.  Got it.
20:50:07 <skvidal> smooge: you pay using max's creditcard :)
20:50:17 <skvidal> smooge: fedora already has an account
20:50:19 <abadger1999> <nod>
20:50:23 <skvidal> we just need to get it sent to the right internal cost center
20:50:46 <skvidal> hw maintenance is expensive and time consuming
20:50:56 <skvidal> and from a 'benefit to fedora' standpoint it doesn't buy us mich
20:50:57 <skvidal> err much
20:51:00 <abadger1999> I guess -- if we could use our choosing of a provider to help encourage more open sourceness then that seems like a good thing.
20:51:21 <gholms> I can ask some people at eucalyptus if they know of a hosting provider that uses it.
20:51:40 <skvidal> gholms: I suspect their answer will be "well, not ALL of it"
20:51:42 <skvidal> or "yes, but"
20:52:00 <skvidal> one thing the reboot-cycle taught me this week is this
20:52:04 <skvidal> our shit is fragile in lots of place
20:52:07 <gholms> Probably.  I'll just walk across the hall and ask...
20:52:16 <skvidal> but the most noticeable place is the db's
20:52:42 <skvidal> we need to to spread that out and even if performance suffers - make sure things continue working w/o the db's in placew
20:53:33 <rbergeron> gholms: have i mentioned that i love your new job? ;)
20:53:33 <skvidal> and I think at this point we don't need to be worrying about hw or disks falling out - we need to be focusing on services that help fedora
20:53:49 <skvidal> b/c I can assure you that knowing abour rsaII mgmt on ibm boxes is NOT useful or helpful to fedora
20:53:54 <ricky> I think that's solvable with replication - file storage is another tough question though :-/
20:54:18 <skvidal> ricky: gluster might be a nice option - and I was actually considering testing it out - I just needed a few boxes to do that with
20:54:37 <skvidal> ricky: so my first thought was - deploy 5 boxes at ec2 - and set them up in the same region and try out gluster
20:54:53 <skvidal> ricky: I played with a lot of stuff last month on my own dime and it cost me a grand total of $2.57
20:54:55 <skvidal> I'm okay w/that
20:55:05 <ricky> :-)
20:55:07 <skvidal> I think if we want to encourage people to play and work on projects
20:55:11 <skvidal> that we offer that to them
20:55:13 <skvidal> but on fedora's dime
20:55:50 <skvidal> okay - that's all the thoughts I had
20:55:57 <smooge> ok next one?
20:56:00 <ricky> Out of curiosity, are they donating, or is Fedora just paying?
20:56:06 <skvidal> ricky: fedora's just paying
20:56:07 <smooge> Fedora would be paying
20:56:09 <ricky> Or is that to be seen
20:56:09 <ricky> OK.
20:56:14 * mmcgrath has never known amazon to donate anything on ec2
20:56:16 <skvidal> ricky: we pay (a lot) for hw right now
20:56:27 <ricky> True
20:56:36 <smooge> well actually RHIT pays for the hw, fedora's cost accounts dont
20:56:38 <skvidal> ricky: did I mention it's a lot
20:56:39 <skvidal> b/c it's a lot
20:56:52 <mmcgrath> smooge: while you're working on next years budget, you might want to see if we can get some EC2 in there.
20:56:52 <smooge> so we need to work out that side of things :).
20:56:53 <skvidal> smooge: there is a lot of money spent on hw/hosting
20:57:11 <skvidal> mmcgrath: I did the math - for all the time we built things in the last year
20:57:13 <skvidal> for example
20:57:21 <gholms> I can ask around at eucalyptus to see if they know of any hosting providers that use our stuff if you would like.
20:57:24 <skvidal> the cost for getting ec2 systems to do those builds comes to 12K
20:57:32 <gholms> s/they know/anyone knows/
20:57:36 <smooge> skvidal, correct. most of that money gets spent by RHIT whehter we put boxes in there or not.. thats where finding funds gets fun :)
20:57:38 <skvidal> we can buy, maybe, 2 or 3 systems for that
20:58:11 <gholms> (Right now most of the people I would ask are at lunch)
20:58:12 <smooge> I am not against it.. and will put it in there
20:58:14 <smooge> anyway.
20:58:24 <skvidal> I'm mostly thinking that if someone comes to fi
20:58:26 <skvidal> with an rfr
20:58:29 <smooge> 2 minutes til rbergogre comes and kicks us off her bridge
20:58:33 <skvidal> it'd be nice to spin up a host instaneously
20:58:53 <skvidal> and not have to think about 'do I have the hw for this'
20:58:56 <Jeff_S> mmcgrath: AIUI, amazon has donated some ec2 cycles to drupal; however I don't know the exact arrangement
20:59:17 <rbergeron> wow, this conversation is going to segue very well into the cloud sig meeting momentarily ;)
20:59:23 <skvidal> Jeff_S: I'm sure there's a bloodrite of some kind :)
20:59:26 <mmcgrath> Jeff_S: no kidding?  interesting they turned us down cold last time I went probing about it :)
20:59:27 <Jeff_S> mmcgrath: I can put you in touch w/ drupal people if you'd want to discuss the details further (off topic here)
20:59:36 <smooge> Jeff_S, cycles are cheap for them.. storage is where we would sock them..
20:59:40 <Jeff_S> likely
20:59:51 <skvidal> smooge: and again - that's only if we're thinking of doing building, etc
20:59:53 <smooge> plus drupal is cool
20:59:55 <skvidal> but think about proxy or app servers
21:00:00 <skvidal> those are space cheap
21:00:03 <skvidal> and mostly network costly
21:00:19 <skvidal> but on release days, for example, they might help us
21:00:27 <smooge> we will ahve a mirror there..
21:00:29 <skvidal> to be able to have 20 proxy servers and N app servers
21:00:32 <smooge> which leads us to
21:00:34 <skvidal> and then shut them down
21:00:35 <skvidal> okay
21:00:38 * skvidal stops talking
21:00:42 <smooge> ending this meeting for th e cloud
21:00:47 <smooge> #endmeeting