21:00:28 <gholms> #startmeeting Cloud SIG (7 Oct 2010)
21:00:28 <zodbot> Meeting started Thu Oct  7 21:00:28 2010 UTC.  The chair is gholms. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:28 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
21:00:38 <gholms> #meetingname cloud
21:00:38 <zodbot> The meeting name has been set to 'cloud'
21:00:47 <gholms> #chair rbergeron jforbes_
21:00:47 <zodbot> Current chairs: gholms jforbes_ rbergeron
21:00:52 <gholms> #topic init process
21:00:55 <gholms> Who's here today?
21:01:23 * jforbes 
21:01:53 * gholms throws a trout at brianlamere
21:02:03 <brianlamere> ah!
21:02:25 <brianlamere> tofu next time, I'm a veggie ;)
21:03:16 <gholms> It's so hard to tell these days, what with some people claiming to be so and still eating fish.  :-\
21:03:28 <gholms> #topic EC2 images
21:03:34 <gholms> jforbes: Take it away!
21:04:05 * jforbes is running into a boot issue with the F14 images, and I am in the process of trying to track that down
21:04:17 * mdomsch 
21:04:39 <mdomsch> gholms, skvidal, and I are discussing the yum setup in -devel right now
21:05:02 * jgreguske here
21:05:15 <brianlamere> well bring it over here, sounds pretty appropriate
21:05:23 <brianlamere> did we decide which way it would be done?
21:05:26 <gholms> brianlamere: That's the next topic.  :)
21:05:28 <jforbes> by in the process, I mean as we speak
21:05:53 <jforbes> gholms: go on to the next topic and I hope I have more valuable news when done :)
21:06:06 <gholms> Hehe, all right.
21:06:31 <gholms> #topic EC2 mirror proposal
21:06:36 <gholms> #link https://fedoraproject.org/wiki/User:Gholms/EC2_Mirror_Proposal
21:06:40 <gholms> .rel 4149
21:06:41 <zodbot> gholms: #4149 (Need a way to point EC2 instances to specific mirrors) - Fedora Release Engineering - Trac - https://fedorahosted.org/rel-eng/ticket/4149
21:07:04 <mdomsch> brianlamere: I was hoping we'd be done before the meeting.  What was I thinking? :-)
21:07:58 <mdomsch> skvidal, gholms, ok, here now
21:08:08 <skvidal> mdomsch: I'm sorry if I sound like the bad guy here but I've done more than my share of random yum plugin maintenance and don't feel like adding more
21:08:20 <gholms> So what's wrong with just adding &zone=$somevariablename to the mirrorlist URIs?
21:08:38 <mdomsch> I think adding &zone=$zonevar  is prettier and cleaner than &yum_extension=ec2zone:my_ec2zone
21:09:00 <skvidal> <shrug> I just expect that the next thing someone wants will get in under this rule, too
21:09:15 <skvidal> (which is the exact reason why I didn't want to write an ec2-specific plugin)
21:09:44 <brianlamere> yeah, I thought we were looking at a more generic thing that could be used elsewhere too
21:10:03 <gholms> &zone=blah *can* be used elsewhere.
21:10:24 <mdomsch> well, &zone= is whatever I want it to mean, it's handled entirely by MM.  it's not even named 1:1 for EC2
21:10:36 <skvidal> umm.....
21:10:41 <skvidal> <shrug>
21:10:42 <skvidal> okay
21:10:44 <mdomsch> it's a simple "zone has hosts", "hosts are in zones" many-to-many mapping in MM
21:11:00 <skvidal> that feels like a rationalization to me
21:11:06 <skvidal> but this part really doesn't matter to me
21:11:21 <gholms> It's sort of a name-based replacement for &ip= when IP lookups aren't going to work for some reason.
21:11:48 <mdomsch> exactly
21:12:17 <gholms> Backstory:  skvidal proposed using something akin to "&ext=ec2zone:us-east-1" instead
21:12:44 <gholms> Then people who need to pass even more args to mirrormanager can use the same file.
21:13:24 <skvidal> is 'zone' generic enough?
21:13:28 <skvidal> if so - then have a blast
21:13:37 <skvidal> but when I hear 'zone' I think 'dns'
21:13:40 <skvidal> not anything else
21:13:43 <gholms> region?
21:13:46 <gholms> location?
21:14:07 <mdomsch> region tends to denote large geographical boundaries, like continents, in my mind.  I'm not wedded to 'zone' though.
21:14:35 <mdomsch> location would be fine
21:14:52 <gholms> Anyone else have any thoughts on the mirrormanager variable name?
21:15:23 <brianlamere> it could be a random string of characters as far as I'm concerned :)
21:16:02 <gholms> location works for me.
21:16:33 <mdomsch> locality?
21:16:41 <mdomsch> or location
21:16:48 <mdomsch> ok, location.  Done.
21:16:56 <mdomsch> &location=$location
21:17:00 <mdomsch> var file named 'location'
21:17:15 <mdomsch> append to every mirrorlist URL in the config files
21:17:18 <skvidal> in /etc/yum/vars/
21:17:40 <mdomsch> so say we all
21:17:56 <gholms> skvidal also proposed using one file for all such appended variables.  Good/bad?
21:17:58 <skvidal> someone has been watching too much BSG
21:18:14 <skvidal> gholms: I did what?
21:18:34 <gholms> $yum_extension
21:18:55 <gholms> Eh, I probably misunderstood you.
21:19:34 <skvidal> that's not important to this discussion
21:19:52 <gholms> #agreed Yum will append a "location" variable to all mirrormanager URIs
21:20:28 <gholms> mdomsch: How much of this is implemented in MM so far?
21:21:27 <mdomsch> gholms: I'll have to s/zone/location/, but that's fine
21:21:44 <gholms> #action gholms to open blocker bugs against fedora-release and generic-release to add these variable placeholders to yum repo config files
21:21:52 <mdomsch> and right now I'll have to manually create locations and put hosts into locations when it matters
21:22:06 <mdomsch> I don't have the admin web UI plumbed for it
21:22:16 <mdomsch> and of course, none of this is in production yet, just in git
21:23:07 <gholms> Let's see what else is in the proposal...
21:23:22 <gholms> Ah, the S3 mirror creation process.
21:23:38 <mdomsch> yeah, I'm unclear on that part :-)
21:24:07 <gholms> brianlamere: Was it you who could write a script that will sync a S3 bucket with a mirror?
21:25:51 * gholms hopes he's still around
21:26:37 <brianlamere> sorry, yes
21:27:36 <brianlamere> yeah, that's easy enough - I write things to S3 all the time.  There isn't a true "rsync" avail, there's just guessing (because it's REST...).  But there are easy ways to accomplish it, yes
21:29:25 <gholms> One possibility that popped into my mind was a script that pulls down repodata from a mirror, compares it against the repodata it got last time, then pushes the changes.
21:29:45 <gholms> Or would a generic "sync a bucket with a HTTP directory tree" script be better?
21:30:09 <mdomsch> we would need to exclude some content, like iso/
21:30:24 <brianlamere> would just need to know how we're wanting to do it in the end; use the info in filelist.xml/etc to know what to copy up?  or do a check each time to verify that the file is there?  you can't do a "rsync" really...because you won't be able to do a real checksum without copying it locally first
21:30:37 <brianlamere> option 1
21:30:38 <mdomsch> and I suppose for now we don't need to include any non-repo content
21:31:08 <gholms> mdomsch: Thankfully, we only need to mirror Everything and updates, but not Fedora.
21:31:26 <mdomsch> brianlamere: if you HTTP HEAD a file on S3, do you get a hash of any sort back in the headers?
21:31:41 <brianlamere> pull down, keep a local hash of what you have.  Then next time, check the meta data for the files that have changed, and scroll through doing a .lookup(key) on everything and drop the new files in place along the way
21:32:35 <brianlamere> you can get a hash *if you put one there*.  You can tag the keys.  They can have metadata (which is where the md5 could go), the location, and the content itself
21:33:08 <brianlamere> the problem is, if anyone updates the key content without respecting that process (ie, updates it without updating the meta tag info) then you're all goofed up
21:33:20 <mdomsch> that would be valuable, as we do update content on the mirrors (re-sign files for example)
21:33:35 <gholms> Why not just compare the file timestamps that appear in the repodata?
21:34:03 <brianlamere> well as long as you can assure that all updates will respect that process, and that no one tries to do it a shortcut way, then that's probably the easiest thing
21:34:50 <mdomsch> so brianlamere, are you volunteering to write such a tool for us then? :-)
21:35:16 <brianlamere> sure, why not :)  I'll get something avail that is at least a PoC by next week
21:35:46 <brianlamere> next week's meeting, that is
21:36:11 <gholms> Can you use two YumBase objects to compare different sets of repodata?
21:36:16 <gholms> Eh, whatever.
21:36:39 <mdomsch> #action broanlemere to write a proof-of-concept tool to sync content into S3
21:36:48 <mdomsch> brianlamere that is...
21:36:50 <gholms> Nice.
21:37:02 <gholms> #undo
21:37:02 <zodbot> Removing item from minutes: <MeetBot.items.Action object at 0x2b7d2f8fc0d0>
21:37:09 <gholms> #action brianlamere to write a proof-of-concept tool to sync content into S3
21:37:10 <mdomsch> #action brianlamere to write a proof-of-concept tool to sync content into S3
21:37:16 <gholms> lol
21:37:17 <mdomsch> jinx
21:37:22 <gholms> D:
21:37:25 <mdomsch> #undo
21:37:29 <gholms> #undo
21:37:29 <zodbot> Removing item from minutes: <MeetBot.items.Action object at 0x2b7d2edd6110>
21:37:42 * mdomsch is hands-off
21:37:56 <gholms> We originally proposed starting up one VM in each region to do the work.  Could the process run on the existing infrastructure instead?
21:38:09 <brianlamere> remember that worst-case, we keep a local copy of the repo and just recopy it to S3 each sync
21:38:25 <brianlamere> gholms:  no idea, dunno what the existing structure is ;)
21:38:26 <gholms> You don't have to be inside EC2 to push to S3, so why bother with instances at all?
21:38:32 <gholms> mmcgrath: ping
21:38:40 <brianlamere> gholms:  for the speed of doing it
21:39:09 <brianlamere> s3 inside ec2 is considerably faster than s3 out in the wild wild west (www)
21:39:10 <gholms> What speed?  You either have to pull from a mirror to an instance or push from a machine to S3.
21:39:15 <mdomsch> brianlamere: well, we'd have to sync into Amazon, and then push to S3
21:39:20 <gholms> Either way you have to transfer data in.
21:39:23 <mmcgrath> gholms: pong
21:39:25 <mdomsch> yeah
21:39:38 <mdomsch> I think it would be best to run this on a Fedora Infrastructure system, such as bapp01
21:39:55 <mmcgrath> the issue is the syncing part?
21:39:59 <gholms> mmcgrath: What would you think of a cron job that pushes repo changes to S3 buckets?
21:40:01 <gholms> Yeah.
21:40:07 <mdomsch> that way we don't have to copy the content, it's just locally visible on that server, and it pushes to S3 directly
21:40:07 <brianlamere> ww -> ec2 is fast.  www->s3 is mediocre (it's designed to scale, not be fast).  ec2->s3 is fast; I think they must treat that traffic differently
21:40:20 <mmcgrath> what's the problem of amazon having a job that pulls?
21:40:32 <gholms> mmcgrath: Then you guys have to manage the instances that do it.
21:40:32 <mdomsch> mmcgrath: the need for an AMi inside amazon to do it
21:40:34 <brianlamere> S3 can't pull :)
21:40:42 <mdomsch> and to have a local copy inside amazon then
21:41:12 <mmcgrath> random question: who's driving / requesting this?
21:41:23 <gholms> This?
21:41:23 <mdomsch> tell you what.  Let's _try_ with running a cronjob inside FI. If the speed isn't acceptable, and we test and show that running instances inside EC2 really helps, then we do it.
21:41:29 <mmcgrath> yeah
21:41:51 <mmcgrath> are we as Fedora trying to do this to make sure amazon users have a good experience with Fedora in amazon?  or is amazon doing it to save money and give their users a better experience.
21:42:07 <mdomsch> the former
21:42:07 <mmcgrath> I'm all for the better experience, but this is work, ongoing monitoring, etc.  I just want to make sure it's shared if it can be.
21:42:31 <brianlamere> I think it's a bit of both, which is why Amazon has expressed interest in assisting
21:42:31 <mmcgrath> they can't give us an AMI to do it or can't provide themselves one?
21:42:37 <mdomsch> though we're getting some free hosting from amazon
21:42:58 <gholms> They can give us an AMI.  Someone still has to manage it.
21:43:11 <brianlamere> mmcgrath: they have mentioned it is an option, but we need to really create a proposal with and without one to offer them and see what they will assist with
21:43:12 <mmcgrath> but not them?
21:43:27 <mmcgrath> I guess I'm just confused why we wouldn't treat them just as we do any other mirror.
21:43:31 <gholms> Not from the sound of things.
21:43:48 <gholms> S3 is only accessible via its rest api.
21:44:38 <gholms> Once stuff is inside S3 we can treat it like a regular mirror, though not one that is accessible outside EC2.
21:44:40 <brianlamere> mmcgrath:  oh...now I see what you're saying...just ask them to do the mirror themselves?  I dunno that anyone has approached that option
21:45:05 <mmcgrath> it just seems that, while yes this is an uncommon request for us, it seems like amazon more then has the resources to handle it.
21:45:22 <mmcgrath> though I understand we're kind of pushing it there already being that we're not the newest most popular on the block.
21:45:41 <mmcgrath> do the scripts to copy to S3 already exist?
21:45:55 <gholms> brianlamere plans to write one this week.
21:46:36 <mmcgrath> I guess my take on it is this:  It seems odd they'd come to us and ask for help on this.  That doesn't scale at all.  If the University of Chicago asked us to do this I'd ask "Why do you think we should do this?"
21:46:49 <mmcgrath> mdomsch: am I off base there?
21:46:55 <gholms> AFAIK this is a push from our side, not theirs.
21:46:56 <brianlamere> we went the other direction - we went to them and asked for help
21:47:05 <mmcgrath> ah, k.
21:47:06 <brianlamere> so far as I know, at least
21:47:34 <mmcgrath> in that case I don't really care one way or the other as long as the scripts don't depend on some proprietary drivers.
21:47:35 <gholms> The hope is to (1) make updates faster, (2) make updates cheaper for users, and (3) reduce the additional load on public mirrors generated by thousands of instances
21:47:53 <brianlamere> the last one is key
21:48:23 <mmcgrath> has 3 been an issue?
21:48:27 <brianlamere> if I spin up 200 instances that immediate shoot off requests to fastestmirror for updates, that can hurt that poor little instance
21:48:29 <mdomsch> 3) isn't a big deal right now
21:48:32 <gholms> Yet.
21:48:45 <brianlamere> no, but fedora doesn't have an AMI newer than 8 there
21:48:47 <mdomsch> fastestmirror isn't always the best choice
21:49:29 <brianlamere> once fedora becomes viable in the cloud, has an option on the front screen that isn't eons old, etc - then that repo traffic will increase - quite a bit, I'd think
21:49:30 <mdomsch> 1 and 2 are user experience benefits, so once they're using Fedora as an AMI, they want to continue to do so.
21:50:04 <mmcgrath> <nod>
21:50:08 <mdomsch> ok, crazy question: how do the Ubuntu AMIs get updated?
21:50:14 <mmcgrath> well lets get a better look at this script first, tentatively I'd say we can host it though.
21:50:16 <mdomsch> what about the new Amazon Linux AMis?
21:50:31 <mdomsch> are we inventing, or needless re-inventing?
21:50:31 <gholms> Amazon Linux repos are on S3.  No idea how they sync them.
21:50:53 <mdomsch> given those are RHEL-like, right, we should be able to do likewise w/o re-inventing
21:51:01 <gholms> Canonical runs mirror instances in all zones and eats the cost of doing so.
21:51:51 <gholms> Red Hat hosts a pseudo-proxying system for RHN content.
21:51:58 <mdomsch> can someone take the action to check with nathan et al about the Amazon Linux syncing then?
21:52:47 <brianlamere> mdomsch: (about the AMIs) Amazon recently updated the Fedora AMIs that were published, and wanted ours to put right there in the front line options; they removed fed8 and we didn't have a new one yet to give them.
21:53:37 <gholms> brianlamere: You have been communicating with Ben, right?  Any chance you could ask how they sync their yum mirrors?
21:54:50 <brianlamere> yeah, can do.  It may be that they back-end load it directly with an internal version of their import service  http://aws.amazon.com/importexport/
21:55:06 <gholms> That would be amusing.  And disappointing.
21:55:23 <gholms> mmcgrath: To answer your earlier question the yum python module can parse repodata, while the boto module can push stuff to s3.  Both are available on Fedora and RHEL.
21:55:36 <mmcgrath> k
21:56:15 <gholms> #action brianlamere to inquire about Amazon Linux's repo syncing process
21:56:30 <brianlamere> well we hammered that topic a bit.  back to jforbes on the AMI yet?
21:56:42 <gholms> jforbes: You ready?  ^
21:57:13 <brianlamere> oh, and yeah - IAM works great, two thumbs up.  Solves the problem well
21:57:19 <gholms> Sweet!
21:58:07 <brianlamere> you can give me a code to be able to change a single file on s3 (or the whole bucket...) or...any other resource, down to reasonable levels of granularity
21:58:36 <brianlamere> and boto already has support for it :)
21:58:59 <gholms> Oh, someone is going to need to manage the "official" Fedora AWS account credentials that are used to create S3 buckets and create the access keys used by the mirror syncing script.
21:59:02 <brianlamere> I say "me" but mean it genericly
21:59:41 * brianlamere nominates someone with an @ rh or fedora address ;)
21:59:47 <gholms> Such credentials shouldn't be necessary for day-to-day use, but rather only when setting up something new.
22:00:00 <brianlamere> aye
22:00:10 <gholms> mmcgrath: Do you folks have any suggestions?  ^
22:00:46 <mmcgrath> gholms: for managing the official aws account?
22:00:48 <mmcgrath> we can do that if you want.
22:00:57 * jgreguske bails
22:01:05 <gholms> Awesome
22:01:07 <jforbes> gholms: not yet :(
22:01:13 <gholms> jforbes: :(
22:01:17 <jforbes> mmcgrath: There is already something in place for that
22:01:24 <gholms> There is?
22:01:27 <jforbes> mmcgrath: that is the aws group you created for me
22:02:03 <mmcgrath> as long as someone is manageing it, doesn't matter to me who ;)
22:02:50 <jforbes> mmcgrath: at the moment I am, but it was created as a group so it can be turned over and I dont have to
22:02:53 <gholms> jforbes: Of what are you speaking?  Please forgive my ignorance.
22:03:13 <jforbes> gholms: official aws account credentials and management
22:03:26 <gholms> So there is already an official account?
22:03:31 <jforbes> yup
22:03:35 <gholms> Awesome.
22:03:41 <jforbes> and it already has free S3 for official images
22:04:19 <mdomsch> I'm out of here, I'll read the zodbot minutes from this point forward
22:04:29 <gholms> Who wants to use that to create some S3 buckets and IAM keys for regional mirrors?
22:04:32 * jforbes has to run soon too, kids have karate soon
22:04:50 <gholms> brianlamere: I seem to recall you reserving some useful-looking bucket names.
22:05:08 * mmcgrath has to run
22:05:09 <brianlamere> jforbes: ok, pair that up with IAM ( http://aws.amazon.com/iam/ )  and we're set
22:05:13 <jforbes> gholms: at this point, it will have to be me, I am not turning it over until we get a RH credit card backing that account.
22:05:20 <brianlamere> I tried to grab anything I could think of that looked useful, yes
22:05:25 * gholms hopes to squeeze one last action item out of this meeting
22:05:48 <jforbes> S3 is comped, but with the expectation that it is for images, we need to discuss the whole mirror setup with Amazon
22:05:59 <gholms> Ah, that's right.
22:06:14 <jforbes> and I certainly dont want a bill for multiple mirrors hitting my card
22:06:50 <gholms> Sounds like we should wait for that script to find out more, then.
22:07:11 <jforbes> On that note, I need to run. I will get back to debugging this image issue when I get the kids in bed
22:07:20 <gholms> jforbes: Good luck.  Thanks!
22:07:40 <gholms> #topic Open floor
22:07:52 <gholms> Anyone have anything else?
22:08:51 <gholms> Thanks for coming, everyone!
22:08:54 <gholms> #endmeeting