cloud-sig
LOGS
21:00:32 <gregdek> #startmeeting
21:00:32 <zodbot> Meeting started Thu Apr 29 21:00:32 2010 UTC.  The chair is gregdek. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:34 <gholms|work> Have at it.
21:00:34 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
21:00:40 * jforbes is here
21:00:43 * gholms|work here
21:00:44 <gregdek> #topic roll call
21:00:55 <gregdek> Who else?  :)
21:01:05 <gholms> huff: ping
21:01:08 <gregdek> I saw dmalcolm lurking also.
21:01:19 <gregdek> Just say hi to get your name in the minutes, woo hoo!  :)
21:01:47 <gregdek> Enough of that.
21:01:48 <rackerhacker> hi ;)
21:01:51 <gregdek> So.  :)
21:01:55 * gholms recommends #meetingname cloud-sig
21:02:03 <gregdek> #meetingname cloud-sig
21:02:04 <zodbot> The meeting name has been set to 'cloud-sig'
21:02:08 <gregdek> :)
21:02:22 <gholms> (That way it's easy to find in the directory tree)  ;)
21:02:23 <gregdek> #topic current image status
21:02:29 * huff ni
21:02:30 <huff> in
21:02:36 <gregdek> huff, good timing.
21:03:12 <huff> So from my message to the list
21:03:23 <gregdek> So, do we have F12 images that are booting on plain ol' RHEL 5.5, per jforbes' note?
21:03:25 <smooge> here
21:03:36 <gregdek> Hi smooge.  (I haven't forgotten your access, btw.)
21:03:56 <huff> gregdek: i think so but amazons infrstructure is 5.0 and 5.2
21:04:06 * Oxf13 watches
21:04:10 <gregdek> Yes, that's tricky, isn't it?
21:04:13 <jforbes> huff: the last images I built with your update would not boot on 5.5
21:04:24 <gregdek> jforbes, same selinux issues?
21:04:28 <jforbes> it wasn't a xen problem, it was an image problem
21:04:30 <gregdek> Or other issues?
21:04:41 <huff> jforbes: gregdek: yea but thats minor details i think we have a bigger problem
21:04:46 <jforbes> selinux was the last issues I saw, didn't see a tool update sense then
21:05:00 <jforbes> huff: non booting images are not really minor details
21:05:15 <jforbes> bugger problems sound scarey
21:05:29 <gregdek> huff, it would be nice to be able to say definitively that we have an image that boots on the known Xen we have in 5.5, right?
21:05:35 <jforbes> s/bugger/bigger
21:05:36 <huff> amazon said somthing about umbuto had to patch there latest kernels to boot im not sure if thats why we were waiting on the newest kernels or not
21:05:48 <jforbes> huff: it was
21:05:50 <gregdek> I don't dispute the notion that Amazon may be running some Frankenstein kernels.  :)
21:06:30 <huff> jforbes: gregdek: I also made some progress on the pvgrub stuff
21:06:42 <jforbes> excellent
21:06:47 <gregdek> Oo.  Such as?
21:06:50 <huff> but not sure amazons status on if we can use them or not
21:07:13 <huff> i have a pvgrub kernel that boots and trys to load the kernel from the ami
21:07:23 <gregdek> iirc, they want to standardize on their own pvgrub kernels...
21:07:23 <huff> the one I am fooling with is rhel5.5
21:07:36 <jforbes> huff: yeah, we shouldn't be doing that, they are doing it for us
21:07:41 <huff> but once we can get a f12 image to work Im pretty sure I can get ti working with fedora
21:07:47 <huff> and probably use the same kernel image
21:07:59 <smooge> does anyone know what Amazon EBS means?
21:08:06 <gregdek> Elastic Block Storage.
21:08:11 <huff> Elastic block storage
21:08:22 <smooge> thanks reading on how Ubuntu works on EC2
21:08:46 * gregdek hrms.
21:09:05 <gregdek> So... does it matter if we get our own pvgrub stuff working if Amazon is going to be using their own pvgrub?
21:09:14 <huff> gregdek: i say not realluy
21:09:21 <jforbes> gregdek: no, we shouldn't be doing our own pvgrub at all
21:09:25 <huff> we sould worrie about gettign a workign F12 image up fist
21:09:28 <huff> first
21:09:29 <gregdek> So.  Well done, huff, but irrelevant.  ;)
21:09:40 <gregdek> How do we proceed?
21:10:00 <gregdek> jforbes, what failures are you seeing booting F12 on 5.5/Xen?  Still selinux?
21:10:08 <gregdek> Or other failures?
21:10:16 <jforbes> first step is getting an image that works on a local 5.x, no point even bothering to upload to EC2 until then
21:10:22 <jforbes> gregdek: yeah, still selinux
21:10:27 <gregdek> OK.
21:10:55 <huff> sigh
21:11:01 <gregdek> It would *really* help if we can find someone with some cycles to just crank through this.  If it's really just a matter of "throw in all the selinux policy modules until boot"...
21:11:09 <gregdek> ...can we find someone besides huff to go through that?
21:11:23 <smooge> jforbes, do you have a link on how it dies? does it work if selinux=0 is set or soemthing? sorry trying to catch up
21:11:32 <gregdek> Because this is just trial-and-error ks rebuilding, yes?
21:11:37 <jforbes> smooge: it works with selinux=0 yes
21:11:56 <jforbes> gregdek: I might have cycles after the F13 freeze on Tues, but unlikely before then
21:12:03 <smooge> jforbes, does it work with it being in permissive?
21:12:11 <jforbes> smooge: yup
21:12:14 <jforbes> smooge: err, no
21:12:28 <jforbes> smooge: dracut halts because selinux cannot load policy at all
21:13:00 <smooge> ok interesting...
21:13:22 <gregdek> pjones seemed to think it was just missing selinux-policy modules.
21:13:49 <gregdek> I know we covered this in a previous meeting, specifics should be in the minutes...
21:13:59 <smooge> gregdek, ok will read old notes
21:14:02 <smooge> don't want to derail
21:14:07 <gregdek> Still, the issue is clear:
21:14:16 <huff> I have some cycles I will set up RHEL5 box and see what I can do
21:14:22 <gregdek> 1. We need to get a known good F12 image booting on RHEL5 Xen.
21:14:28 <gregdek> 2. That takes cycles.
21:14:39 <gregdek> 3. huff has other things going on, like everyone else.  :)
21:15:00 <gregdek> So we move at the speed that huff is able to move, unless and until someone can take some heat off of him.
21:15:05 <jforbes> huff: you can even boot with the kernel/initrd that I uploaded, I can send them to you.  Boot with them as external kernels instead of using pygrub
21:15:12 <gregdek> I will now hold my breath for volunteers until I turn blue.  :)
21:15:20 <jforbes> gregdek: like I said, after Tues I can puthc in a bit more
21:15:29 <gregdek> jforbes, you don't count.  ;)
21:15:31 * gholms prepares to call the paramedics
21:15:32 <smooge> I can see about trying on the cloud boxes
21:15:35 <huff> jforbes: have you had luck booting your image with all the selinux moduels in 3c2?
21:15:39 <gregdek> smooge!
21:15:47 <gregdek> Yes, sir!
21:15:53 <huff> ec2
21:15:58 <jforbes> huff: I haven't uploaded a complete image, seems a bit of waste
21:16:06 <gregdek> When you say "cloud boxes"... do you mean the boxes hanging around the Fedora rack that were earmarked for cloud usage?
21:16:08 <smooge> I would do it at home.. but none of my play hardware is VT able
21:16:38 <huff> jforbes: well in my exerince even in the image is bad the kernels will still "post" in ec2 and give some console output
21:16:40 <smooge> gregdek, yes. that is what I meant. I would need to get permission etc but that was the free box I could think of
21:16:51 <huff> and I am *NOT* seeing that with F12 kernels
21:16:52 <gregdek> Seems like a perfect usage.  They're idle otherwise.
21:17:03 <jforbes> huff: not when it fails that early.  the SELinux failure is early
21:17:11 <huff> jforbes: k
21:17:38 <huff> smooge: amazon infrsturcture is all PV so dosnt need full virt
21:17:38 <gregdek> #action smooge will help build/test F12 images on Fedora's set-aside "cloud" systems
21:18:47 <gregdek> So we *really* need to make sure that we're coordinating these image builds on-list.
21:18:48 <rackerhacker> i'd be glad to assist with f12 testing on RHEL 5.5's Xen
21:18:58 <gregdek> rackerhacker, w00t!
21:19:20 <gregdek> #action rackerhacker will help with f12 testing on rhel5.5 xen
21:19:21 <smooge> I know that the F13 kernels do not work in RHEL-5.4 xen. The 2.6.33 fail early
21:20:04 <huff> smooge: can you elaborate
21:20:07 <rackerhacker> gregdek: do you have documentation/requirements of what you're looking for in the testing?
21:20:13 <jforbes> smooge: they did on Test day, I haven't retested since then... That would be a release blocker based on our criterea
21:20:21 <jforbes> criteria even...
21:20:40 <smooge> well the various infrastructure hardware for spins and such use F13 stuff to boot since they are building f13 (or something like that)
21:20:47 <gregdek> rackerhacker: we've got images that are currently failing to load because we don't yet have selinux-policy loading properly in dracut.
21:21:07 <smooge> Using the 2.6.33-x kernels we get crashes.. so we have bene booting them for a while on the last 2.6.32 kernel that was in rawhide
21:21:07 <gregdek> We are trying to iterate over ks files until we get one that works.
21:21:42 <jforbes> smooge: Okay, retesting that now since I just need to yum update my domUs
21:21:52 <gregdek> Which means that we need to do a good job coordinating on "latest tried ks file".  On the mailing list, I would think.  dhuff, does that make sense?
21:22:04 <jforbes> smooge: is there a bug?
21:22:09 <huff> gregdek: yea that makes sense
21:22:10 <smooge> it could be that they are rawhide versus F13 so I might have mispoken
21:22:24 <jforbes> smooge: might be, I haven't tested rawhide in a while
21:22:26 <smooge> jforbes, I thought there was.. but I have no idea myself
21:22:42 <huff> gregdek: im still not convienced that if we get it working in 5.5 it will work in ec2
21:22:50 <gregdek> huff, you're probably right.  :)
21:22:53 <smooge> I try to leave development systems to development admins so I don't spoil their broth
21:23:05 <gregdek> But then we can put the onus more squarely on the Amazon folks.
21:23:07 <jforbes> huff: I can almost guarantee that if it wont work in 5.5, it wont work in ec2 though
21:23:27 <huff> Thats what ive beensaying
21:23:41 <huff> nm
21:23:46 <jforbes> actually I still have 5.2 on a partition here, I can test that too
21:23:56 <smooge> I am running on 3 hours sleep from last night.. PV means physical virtualization eg VT extensions in the cpu.. versus Bochs style emulation
21:24:07 <huff> jforbes: can you try the image that is workign for you on the 5.2 install
21:24:11 <jforbes> smooge: pv means no reqiredment for VT
21:24:18 <jforbes> err no requirement
21:24:22 <rackerhacker> gregdek: i noticed that upstream kernels have selinux policy version 19 as a max
21:24:30 <rackerhacker> i think f12 requires version 24 last i checked
21:24:35 <rackerhacker> not sure if that's related
21:24:40 <huff> smooge: i was refering to para-virt may be using wrong ab
21:24:50 <smooge> jforbes, I can't remember which one was faster.. the Optiplex350 I have is reaaaaaal slow runing anything xen
21:25:20 <gregdek> OK, so to recap the plan:
21:25:48 <gregdek> * huff will post the latest ks and tools he's been using to get latest image, to remind everyone of where we're at.
21:26:02 <gregdek> * others feel free to hack at the ks and build/test new images, sending feedback to the list.
21:26:21 <gregdek> * eventually we'll get something that boots on rhel 5.5 xen.
21:26:33 <gregdek> * then we try to upload Teh Winnar to EC2.
21:26:51 <gregdek> * It will probably fail, at which point we'll throw it over the wall to Amazon -- or maybe it'll work, in which case, WIN!
21:26:54 <gregdek> Is that about right?
21:26:59 <smooge> yes.
21:27:21 <huff> sounds good to me
21:27:52 <gregdek> All right.  Anything else on the topic of "getting F12 to boot in rhel 5.5 xen"?
21:28:28 * smooge will see if he can find a cheap used computer to run xen instances on at home... I can download CentOS-5.2 or something and test it with no updates
21:28:33 <smooge> no
21:28:43 <gregdek> ok.
21:28:59 <gregdek> Personally, since that's the topic that has been blocking us, and we've been blocking heavily on it...
21:29:06 <gregdek> ...I don't even care about any other topic.
21:29:12 <gregdek> But the floor is open nevertheless.  :)
21:29:32 * gholms raises hand
21:29:41 <huff> gregdek: is there any update on a mirro in ec2?
21:29:45 <huff> mirror
21:29:48 <gholms> ^ do that one first
21:30:02 <huff> gholms: sorry did not see you
21:30:08 <gregdek> None.  I have dropped that ball utterly.
21:30:14 <gregdek> But I will pick it back up.
21:30:15 <gholms> Dang
21:30:34 <gholms> Well, there went my question.
21:30:59 <huff> gregdek: ping me later on that Ihad an idea
21:31:01 <gregdek> I need to set up mdomsch with access.
21:31:08 <gregdek> huff, what's your idea?
21:31:44 <gregdek> iirc, we were just gonna set up internal mirrors at ec2 that would serve each availability zone.  Very low cost.
21:31:53 <smooge> I have an issue after this
21:32:03 <huff> smooge: go ahead
21:32:32 <smooge> we have been seeing a problem with some version of Amazon EC2 images
21:32:38 <smooge> and infrastructure
21:33:04 <gregdek> Go on...
21:33:14 <smooge> some largely used image looks for EPEL packages but has a broken configuration line that says $basea$ versus $basearch
21:33:37 <smooge> I have no idea if it is an RHEL image, a CentOS image, or Scientific Linux?
21:34:10 <smooge> just that a lot of boxes hit the infrastructure and were causing some problems with mirror manager at one point
21:34:31 <smooge> at this point I only have ip addresses and nothing else.
21:35:01 <smooge> Anyone know who to contact at Amazon to help deal with this? as when they build a lot of them.. we were getting sort of dos'd for some reason
21:35:12 <gregdek> Hm.
21:35:16 <smooge> sorry for the slow typing
21:35:20 <gregdek> There's a *lot* of images out there.
21:35:24 <gregdek> Mostly homegrown.
21:35:27 <huff> smooge: i stated a box running from our dicussion last week and ahve not seen these messages
21:35:39 <gregdek> It's quite possible...
21:35:53 <gregdek> ...that someone broke this, and a bunch of people made their own versions of the broken image.
21:36:10 <gregdek> A lot of EC2 users, aiui, basically clone instances of appliances they like.
21:36:36 <smooge> well I would say 450 as that is the number of IP addresses I have seen.. but some of those ips are making 100's of requests in a short time period which made me think it might be NAT'd somehow
21:36:45 <smooge> ah
21:37:00 <gholms> A NAT in EC2, eh?
21:37:04 <gregdek> Tough problem to fight, since I don't think there's any way to identify a system's base image.
21:37:16 <gregdek> But we can certainly let Amazon folks know about it.
21:37:34 <smooge> they seem RHEL-5 based versus 4 or something.
21:37:38 <gholms> Can we ask Amazon if most of the instances in a list of IPs are running the same ami?
21:37:46 <gregdek> We could, sure.
21:37:46 <smooge> I have IP addresses and times.. maybe they can do something from that
21:37:50 <gregdek> Worst they can do is say no.  :)
21:37:58 <gregdek> smooge, if you can get me that, I'll send it along to Amazon folk.
21:38:02 <smooge> ok thanks
21:38:12 <smooge> that was all on my part. I will email you that in 30
21:38:34 <gregdek> #action smooge will send list of IPs from systems potentially broken EPEL configs, gregdek will follow up with Amazon
21:39:04 <gregdek> Any other opens?
21:39:10 <smooge> just huff's
21:39:23 <huff> mine is no biggie
21:39:25 <jforbes> smooge: last night's F-13 kernel boots, so it has to be devel
21:40:10 <gregdek> huff, shout it out.
21:40:19 <gregdek> Unless you want to save it for the list.
21:40:28 <huff> if gdk says that internal mirrors at ec2 that would be Very low cost, my idea mute
21:40:31 <smooge> jforbes, thanks.. I will see about getting those over to the boxes we are having boot problems with
21:40:31 <huff> moot
21:40:56 <gregdek> ok.  huff, if internal ec2 mirrors turn out not to be viable, we'll revisit.
21:41:00 <gregdek> Any others?
21:41:06 <gholms> I asked for a el-6 branch for euca2ools, so some time in the future that ought to be available.
21:41:26 <gregdek> Yay!
21:41:45 <gholms> I'm going to be finishing my thesis over the next few weeks, so my apologies to any bug reporters I end up ignoring.
21:42:18 <gregdek> gholms, best of luck with that.  Go crush it.  :)
21:42:22 <gholms> ;)
21:42:50 <gregdek> All right, I'll leave the floor open for another minute and then close.
21:46:55 <gregdek> #endmeeting