21:00:28 <gholms> #startmeeting Cloud SIG (7 Oct 2010) 21:00:28 <zodbot> Meeting started Thu Oct 7 21:00:28 2010 UTC. The chair is gholms. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:28 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic. 21:00:38 <gholms> #meetingname cloud 21:00:38 <zodbot> The meeting name has been set to 'cloud' 21:00:47 <gholms> #chair rbergeron jforbes_ 21:00:47 <zodbot> Current chairs: gholms jforbes_ rbergeron 21:00:52 <gholms> #topic init process 21:00:55 <gholms> Who's here today? 21:01:23 * jforbes 21:01:53 * gholms throws a trout at brianlamere 21:02:03 <brianlamere> ah! 21:02:25 <brianlamere> tofu next time, I'm a veggie ;) 21:03:16 <gholms> It's so hard to tell these days, what with some people claiming to be so and still eating fish. :-\ 21:03:28 <gholms> #topic EC2 images 21:03:34 <gholms> jforbes: Take it away! 21:04:05 * jforbes is running into a boot issue with the F14 images, and I am in the process of trying to track that down 21:04:17 * mdomsch 21:04:39 <mdomsch> gholms, skvidal, and I are discussing the yum setup in -devel right now 21:05:02 * jgreguske here 21:05:15 <brianlamere> well bring it over here, sounds pretty appropriate 21:05:23 <brianlamere> did we decide which way it would be done? 21:05:26 <gholms> brianlamere: That's the next topic. :) 21:05:28 <jforbes> by in the process, I mean as we speak 21:05:53 <jforbes> gholms: go on to the next topic and I hope I have more valuable news when done :) 21:06:06 <gholms> Hehe, all right. 21:06:31 <gholms> #topic EC2 mirror proposal 21:06:36 <gholms> #link https://fedoraproject.org/wiki/User:Gholms/EC2_Mirror_Proposal 21:06:40 <gholms> .rel 4149 21:06:41 <zodbot> gholms: #4149 (Need a way to point EC2 instances to specific mirrors) - Fedora Release Engineering - Trac - https://fedorahosted.org/rel-eng/ticket/4149 21:07:04 <mdomsch> brianlamere: I was hoping we'd be done before the meeting. What was I thinking? :-) 21:07:58 <mdomsch> skvidal, gholms, ok, here now 21:08:08 <skvidal> mdomsch: I'm sorry if I sound like the bad guy here but I've done more than my share of random yum plugin maintenance and don't feel like adding more 21:08:20 <gholms> So what's wrong with just adding &zone=$somevariablename to the mirrorlist URIs? 21:08:38 <mdomsch> I think adding &zone=$zonevar is prettier and cleaner than &yum_extension=ec2zone:my_ec2zone 21:09:00 <skvidal> <shrug> I just expect that the next thing someone wants will get in under this rule, too 21:09:15 <skvidal> (which is the exact reason why I didn't want to write an ec2-specific plugin) 21:09:44 <brianlamere> yeah, I thought we were looking at a more generic thing that could be used elsewhere too 21:10:03 <gholms> &zone=blah *can* be used elsewhere. 21:10:24 <mdomsch> well, &zone= is whatever I want it to mean, it's handled entirely by MM. it's not even named 1:1 for EC2 21:10:36 <skvidal> umm..... 21:10:41 <skvidal> <shrug> 21:10:42 <skvidal> okay 21:10:44 <mdomsch> it's a simple "zone has hosts", "hosts are in zones" many-to-many mapping in MM 21:11:00 <skvidal> that feels like a rationalization to me 21:11:06 <skvidal> but this part really doesn't matter to me 21:11:21 <gholms> It's sort of a name-based replacement for &ip= when IP lookups aren't going to work for some reason. 21:11:48 <mdomsch> exactly 21:12:17 <gholms> Backstory: skvidal proposed using something akin to "&ext=ec2zone:us-east-1" instead 21:12:44 <gholms> Then people who need to pass even more args to mirrormanager can use the same file. 21:13:24 <skvidal> is 'zone' generic enough? 21:13:28 <skvidal> if so - then have a blast 21:13:37 <skvidal> but when I hear 'zone' I think 'dns' 21:13:40 <skvidal> not anything else 21:13:43 <gholms> region? 21:13:46 <gholms> location? 21:14:07 <mdomsch> region tends to denote large geographical boundaries, like continents, in my mind. I'm not wedded to 'zone' though. 21:14:35 <mdomsch> location would be fine 21:14:52 <gholms> Anyone else have any thoughts on the mirrormanager variable name? 21:15:23 <brianlamere> it could be a random string of characters as far as I'm concerned :) 21:16:02 <gholms> location works for me. 21:16:33 <mdomsch> locality? 21:16:41 <mdomsch> or location 21:16:48 <mdomsch> ok, location. Done. 21:16:56 <mdomsch> &location=$location 21:17:00 <mdomsch> var file named 'location' 21:17:15 <mdomsch> append to every mirrorlist URL in the config files 21:17:18 <skvidal> in /etc/yum/vars/ 21:17:40 <mdomsch> so say we all 21:17:56 <gholms> skvidal also proposed using one file for all such appended variables. Good/bad? 21:17:58 <skvidal> someone has been watching too much BSG 21:18:14 <skvidal> gholms: I did what? 21:18:34 <gholms> $yum_extension 21:18:55 <gholms> Eh, I probably misunderstood you. 21:19:34 <skvidal> that's not important to this discussion 21:19:52 <gholms> #agreed Yum will append a "location" variable to all mirrormanager URIs 21:20:28 <gholms> mdomsch: How much of this is implemented in MM so far? 21:21:27 <mdomsch> gholms: I'll have to s/zone/location/, but that's fine 21:21:44 <gholms> #action gholms to open blocker bugs against fedora-release and generic-release to add these variable placeholders to yum repo config files 21:21:52 <mdomsch> and right now I'll have to manually create locations and put hosts into locations when it matters 21:22:06 <mdomsch> I don't have the admin web UI plumbed for it 21:22:16 <mdomsch> and of course, none of this is in production yet, just in git 21:23:07 <gholms> Let's see what else is in the proposal... 21:23:22 <gholms> Ah, the S3 mirror creation process. 21:23:38 <mdomsch> yeah, I'm unclear on that part :-) 21:24:07 <gholms> brianlamere: Was it you who could write a script that will sync a S3 bucket with a mirror? 21:25:51 * gholms hopes he's still around 21:26:37 <brianlamere> sorry, yes 21:27:36 <brianlamere> yeah, that's easy enough - I write things to S3 all the time. There isn't a true "rsync" avail, there's just guessing (because it's REST...). But there are easy ways to accomplish it, yes 21:29:25 <gholms> One possibility that popped into my mind was a script that pulls down repodata from a mirror, compares it against the repodata it got last time, then pushes the changes. 21:29:45 <gholms> Or would a generic "sync a bucket with a HTTP directory tree" script be better? 21:30:09 <mdomsch> we would need to exclude some content, like iso/ 21:30:24 <brianlamere> would just need to know how we're wanting to do it in the end; use the info in filelist.xml/etc to know what to copy up? or do a check each time to verify that the file is there? you can't do a "rsync" really...because you won't be able to do a real checksum without copying it locally first 21:30:37 <brianlamere> option 1 21:30:38 <mdomsch> and I suppose for now we don't need to include any non-repo content 21:31:08 <gholms> mdomsch: Thankfully, we only need to mirror Everything and updates, but not Fedora. 21:31:26 <mdomsch> brianlamere: if you HTTP HEAD a file on S3, do you get a hash of any sort back in the headers? 21:31:41 <brianlamere> pull down, keep a local hash of what you have. Then next time, check the meta data for the files that have changed, and scroll through doing a .lookup(key) on everything and drop the new files in place along the way 21:32:35 <brianlamere> you can get a hash *if you put one there*. You can tag the keys. They can have metadata (which is where the md5 could go), the location, and the content itself 21:33:08 <brianlamere> the problem is, if anyone updates the key content without respecting that process (ie, updates it without updating the meta tag info) then you're all goofed up 21:33:20 <mdomsch> that would be valuable, as we do update content on the mirrors (re-sign files for example) 21:33:35 <gholms> Why not just compare the file timestamps that appear in the repodata? 21:34:03 <brianlamere> well as long as you can assure that all updates will respect that process, and that no one tries to do it a shortcut way, then that's probably the easiest thing 21:34:50 <mdomsch> so brianlamere, are you volunteering to write such a tool for us then? :-) 21:35:16 <brianlamere> sure, why not :) I'll get something avail that is at least a PoC by next week 21:35:46 <brianlamere> next week's meeting, that is 21:36:11 <gholms> Can you use two YumBase objects to compare different sets of repodata? 21:36:16 <gholms> Eh, whatever. 21:36:39 <mdomsch> #action broanlemere to write a proof-of-concept tool to sync content into S3 21:36:48 <mdomsch> brianlamere that is... 21:36:50 <gholms> Nice. 21:37:02 <gholms> #undo 21:37:02 <zodbot> Removing item from minutes: <MeetBot.items.Action object at 0x2b7d2f8fc0d0> 21:37:09 <gholms> #action brianlamere to write a proof-of-concept tool to sync content into S3 21:37:10 <mdomsch> #action brianlamere to write a proof-of-concept tool to sync content into S3 21:37:16 <gholms> lol 21:37:17 <mdomsch> jinx 21:37:22 <gholms> D: 21:37:25 <mdomsch> #undo 21:37:29 <gholms> #undo 21:37:29 <zodbot> Removing item from minutes: <MeetBot.items.Action object at 0x2b7d2edd6110> 21:37:42 * mdomsch is hands-off 21:37:56 <gholms> We originally proposed starting up one VM in each region to do the work. Could the process run on the existing infrastructure instead? 21:38:09 <brianlamere> remember that worst-case, we keep a local copy of the repo and just recopy it to S3 each sync 21:38:25 <brianlamere> gholms: no idea, dunno what the existing structure is ;) 21:38:26 <gholms> You don't have to be inside EC2 to push to S3, so why bother with instances at all? 21:38:32 <gholms> mmcgrath: ping 21:38:40 <brianlamere> gholms: for the speed of doing it 21:39:09 <brianlamere> s3 inside ec2 is considerably faster than s3 out in the wild wild west (www) 21:39:10 <gholms> What speed? You either have to pull from a mirror to an instance or push from a machine to S3. 21:39:15 <mdomsch> brianlamere: well, we'd have to sync into Amazon, and then push to S3 21:39:20 <gholms> Either way you have to transfer data in. 21:39:23 <mmcgrath> gholms: pong 21:39:25 <mdomsch> yeah 21:39:38 <mdomsch> I think it would be best to run this on a Fedora Infrastructure system, such as bapp01 21:39:55 <mmcgrath> the issue is the syncing part? 21:39:59 <gholms> mmcgrath: What would you think of a cron job that pushes repo changes to S3 buckets? 21:40:01 <gholms> Yeah. 21:40:07 <mdomsch> that way we don't have to copy the content, it's just locally visible on that server, and it pushes to S3 directly 21:40:07 <brianlamere> ww -> ec2 is fast. www->s3 is mediocre (it's designed to scale, not be fast). ec2->s3 is fast; I think they must treat that traffic differently 21:40:20 <mmcgrath> what's the problem of amazon having a job that pulls? 21:40:32 <gholms> mmcgrath: Then you guys have to manage the instances that do it. 21:40:32 <mdomsch> mmcgrath: the need for an AMi inside amazon to do it 21:40:34 <brianlamere> S3 can't pull :) 21:40:42 <mdomsch> and to have a local copy inside amazon then 21:41:12 <mmcgrath> random question: who's driving / requesting this? 21:41:23 <gholms> This? 21:41:23 <mdomsch> tell you what. Let's _try_ with running a cronjob inside FI. If the speed isn't acceptable, and we test and show that running instances inside EC2 really helps, then we do it. 21:41:29 <mmcgrath> yeah 21:41:51 <mmcgrath> are we as Fedora trying to do this to make sure amazon users have a good experience with Fedora in amazon? or is amazon doing it to save money and give their users a better experience. 21:42:07 <mdomsch> the former 21:42:07 <mmcgrath> I'm all for the better experience, but this is work, ongoing monitoring, etc. I just want to make sure it's shared if it can be. 21:42:31 <brianlamere> I think it's a bit of both, which is why Amazon has expressed interest in assisting 21:42:31 <mmcgrath> they can't give us an AMI to do it or can't provide themselves one? 21:42:37 <mdomsch> though we're getting some free hosting from amazon 21:42:58 <gholms> They can give us an AMI. Someone still has to manage it. 21:43:11 <brianlamere> mmcgrath: they have mentioned it is an option, but we need to really create a proposal with and without one to offer them and see what they will assist with 21:43:12 <mmcgrath> but not them? 21:43:27 <mmcgrath> I guess I'm just confused why we wouldn't treat them just as we do any other mirror. 21:43:31 <gholms> Not from the sound of things. 21:43:48 <gholms> S3 is only accessible via its rest api. 21:44:38 <gholms> Once stuff is inside S3 we can treat it like a regular mirror, though not one that is accessible outside EC2. 21:44:40 <brianlamere> mmcgrath: oh...now I see what you're saying...just ask them to do the mirror themselves? I dunno that anyone has approached that option 21:45:05 <mmcgrath> it just seems that, while yes this is an uncommon request for us, it seems like amazon more then has the resources to handle it. 21:45:22 <mmcgrath> though I understand we're kind of pushing it there already being that we're not the newest most popular on the block. 21:45:41 <mmcgrath> do the scripts to copy to S3 already exist? 21:45:55 <gholms> brianlamere plans to write one this week. 21:46:36 <mmcgrath> I guess my take on it is this: It seems odd they'd come to us and ask for help on this. That doesn't scale at all. If the University of Chicago asked us to do this I'd ask "Why do you think we should do this?" 21:46:49 <mmcgrath> mdomsch: am I off base there? 21:46:55 <gholms> AFAIK this is a push from our side, not theirs. 21:46:56 <brianlamere> we went the other direction - we went to them and asked for help 21:47:05 <mmcgrath> ah, k. 21:47:06 <brianlamere> so far as I know, at least 21:47:34 <mmcgrath> in that case I don't really care one way or the other as long as the scripts don't depend on some proprietary drivers. 21:47:35 <gholms> The hope is to (1) make updates faster, (2) make updates cheaper for users, and (3) reduce the additional load on public mirrors generated by thousands of instances 21:47:53 <brianlamere> the last one is key 21:48:23 <mmcgrath> has 3 been an issue? 21:48:27 <brianlamere> if I spin up 200 instances that immediate shoot off requests to fastestmirror for updates, that can hurt that poor little instance 21:48:29 <mdomsch> 3) isn't a big deal right now 21:48:32 <gholms> Yet. 21:48:45 <brianlamere> no, but fedora doesn't have an AMI newer than 8 there 21:48:47 <mdomsch> fastestmirror isn't always the best choice 21:49:29 <brianlamere> once fedora becomes viable in the cloud, has an option on the front screen that isn't eons old, etc - then that repo traffic will increase - quite a bit, I'd think 21:49:30 <mdomsch> 1 and 2 are user experience benefits, so once they're using Fedora as an AMI, they want to continue to do so. 21:50:04 <mmcgrath> <nod> 21:50:08 <mdomsch> ok, crazy question: how do the Ubuntu AMIs get updated? 21:50:14 <mmcgrath> well lets get a better look at this script first, tentatively I'd say we can host it though. 21:50:16 <mdomsch> what about the new Amazon Linux AMis? 21:50:31 <mdomsch> are we inventing, or needless re-inventing? 21:50:31 <gholms> Amazon Linux repos are on S3. No idea how they sync them. 21:50:53 <mdomsch> given those are RHEL-like, right, we should be able to do likewise w/o re-inventing 21:51:01 <gholms> Canonical runs mirror instances in all zones and eats the cost of doing so. 21:51:51 <gholms> Red Hat hosts a pseudo-proxying system for RHN content. 21:51:58 <mdomsch> can someone take the action to check with nathan et al about the Amazon Linux syncing then? 21:52:47 <brianlamere> mdomsch: (about the AMIs) Amazon recently updated the Fedora AMIs that were published, and wanted ours to put right there in the front line options; they removed fed8 and we didn't have a new one yet to give them. 21:53:37 <gholms> brianlamere: You have been communicating with Ben, right? Any chance you could ask how they sync their yum mirrors? 21:54:50 <brianlamere> yeah, can do. It may be that they back-end load it directly with an internal version of their import service http://aws.amazon.com/importexport/ 21:55:06 <gholms> That would be amusing. And disappointing. 21:55:23 <gholms> mmcgrath: To answer your earlier question the yum python module can parse repodata, while the boto module can push stuff to s3. Both are available on Fedora and RHEL. 21:55:36 <mmcgrath> k 21:56:15 <gholms> #action brianlamere to inquire about Amazon Linux's repo syncing process 21:56:30 <brianlamere> well we hammered that topic a bit. back to jforbes on the AMI yet? 21:56:42 <gholms> jforbes: You ready? ^ 21:57:13 <brianlamere> oh, and yeah - IAM works great, two thumbs up. Solves the problem well 21:57:19 <gholms> Sweet! 21:58:07 <brianlamere> you can give me a code to be able to change a single file on s3 (or the whole bucket...) or...any other resource, down to reasonable levels of granularity 21:58:36 <brianlamere> and boto already has support for it :) 21:58:59 <gholms> Oh, someone is going to need to manage the "official" Fedora AWS account credentials that are used to create S3 buckets and create the access keys used by the mirror syncing script. 21:59:02 <brianlamere> I say "me" but mean it genericly 21:59:41 * brianlamere nominates someone with an @ rh or fedora address ;) 21:59:47 <gholms> Such credentials shouldn't be necessary for day-to-day use, but rather only when setting up something new. 22:00:00 <brianlamere> aye 22:00:10 <gholms> mmcgrath: Do you folks have any suggestions? ^ 22:00:46 <mmcgrath> gholms: for managing the official aws account? 22:00:48 <mmcgrath> we can do that if you want. 22:00:57 * jgreguske bails 22:01:05 <gholms> Awesome 22:01:07 <jforbes> gholms: not yet :( 22:01:13 <gholms> jforbes: :( 22:01:17 <jforbes> mmcgrath: There is already something in place for that 22:01:24 <gholms> There is? 22:01:27 <jforbes> mmcgrath: that is the aws group you created for me 22:02:03 <mmcgrath> as long as someone is manageing it, doesn't matter to me who ;) 22:02:50 <jforbes> mmcgrath: at the moment I am, but it was created as a group so it can be turned over and I dont have to 22:02:53 <gholms> jforbes: Of what are you speaking? Please forgive my ignorance. 22:03:13 <jforbes> gholms: official aws account credentials and management 22:03:26 <gholms> So there is already an official account? 22:03:31 <jforbes> yup 22:03:35 <gholms> Awesome. 22:03:41 <jforbes> and it already has free S3 for official images 22:04:19 <mdomsch> I'm out of here, I'll read the zodbot minutes from this point forward 22:04:29 <gholms> Who wants to use that to create some S3 buckets and IAM keys for regional mirrors? 22:04:32 * jforbes has to run soon too, kids have karate soon 22:04:50 <gholms> brianlamere: I seem to recall you reserving some useful-looking bucket names. 22:05:08 * mmcgrath has to run 22:05:09 <brianlamere> jforbes: ok, pair that up with IAM ( http://aws.amazon.com/iam/ ) and we're set 22:05:13 <jforbes> gholms: at this point, it will have to be me, I am not turning it over until we get a RH credit card backing that account. 22:05:20 <brianlamere> I tried to grab anything I could think of that looked useful, yes 22:05:25 * gholms hopes to squeeze one last action item out of this meeting 22:05:48 <jforbes> S3 is comped, but with the expectation that it is for images, we need to discuss the whole mirror setup with Amazon 22:05:59 <gholms> Ah, that's right. 22:06:14 <jforbes> and I certainly dont want a bill for multiple mirrors hitting my card 22:06:50 <gholms> Sounds like we should wait for that script to find out more, then. 22:07:11 <jforbes> On that note, I need to run. I will get back to debugging this image issue when I get the kids in bed 22:07:20 <gholms> jforbes: Good luck. Thanks! 22:07:40 <gholms> #topic Open floor 22:07:52 <gholms> Anyone have anything else? 22:08:51 <gholms> Thanks for coming, everyone! 22:08:54 <gholms> #endmeeting