fedora-meeting
LOGS
19:00:36 <CodeBlock> #startmeeting infrastructure
19:00:36 <zodbot> Meeting started Thu May 12 19:00:36 2011 UTC.  The chair is CodeBlock. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:36 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
19:00:46 <CodeBlock> #chair goozbach skvidal
19:00:46 <zodbot> Current chairs: CodeBlock goozbach skvidal
19:00:55 <CodeBlock> nirik is dealing with another issue atm
19:01:10 <CodeBlock> and I am pulling up the agenda.
19:01:13 <CodeBlock> ok.
19:01:28 * skvidal is here
19:01:35 <CodeBlock> #topic roll call
19:01:42 * StylusEater_work is here
19:01:42 * CodeBlock 
19:01:51 <cyberbyte> is here
19:02:08 <ranjibd> is here
19:02:17 <CodeBlock> alright, let's get started before I stab ibm cognos because it keeps erroring out </$dayjob gripes>
19:02:42 <CodeBlock> Going to leave the first two agenda items in hopes nirik stops by before </meeting>
19:02:52 * skvidal is here, again
19:03:05 <CodeBlock> #topic Future -- what FI should look like 6m - 1yr from now
19:03:12 <CodeBlock> aaannnd over to skvidal
19:03:20 <skvidal> what? oh me?
19:03:26 <CodeBlock> that was your topic
19:03:30 <skvidal> yes
19:03:33 <CodeBlock> have at it
19:03:37 <goozbach> me here
19:03:53 <skvidal> after a meeting on tuesday I started to get a bit frustrated at my inability to articulate what bugged me about our current layout of services and hosts
19:04:00 <skvidal> so I started on some diagrams to explain it better
19:04:20 <goozbach> skvidal: if you have links, please prepend #url to them
19:04:28 <skvidal> this is a rough description of where we are now
19:04:30 <skvidal> #url http://skvidal.fedorapeople.org/hidden/fedoraservices.png
19:05:01 <goozbach> wow, complex much?
19:05:09 <skvidal> this is a rough depiction of my thoughts (not to be taken as definitive or as representative of FI or RH) as to where we want to be:
19:05:11 <ranjibd> skvidal: WOW
19:05:12 <skvidal> #url http://skvidal.fedorapeople.org/hidden/fedorasvcsfuture.png
19:05:31 <skvidal> and this is a REALLY high level view of how I feel about it:
19:05:33 <skvidal> #url http://skvidal.fedorapeople.org/hidden/fedorasvcsfuture.png
19:05:35 * nirik notes meetbot picks up links automatically. no need to url them. ;)
19:05:46 <skvidal> nirik: I was just doing what goozbach told me
19:05:51 <goozbach> ohh
19:05:53 <goozbach> I didn't know that
19:05:57 <CodeBlock> nirik: you around now? :)
19:06:07 <nirik> sorta. Still trying to fix this.
19:06:09 * goozbach feels sheepish
19:06:11 <CodeBlock> #chair nirik
19:06:11 <zodbot> Current chairs: CodeBlock goozbach nirik skvidal
19:06:23 <skvidal> the thing I'm trying to figure out is this
19:06:30 <skvidal> what does FI look like before F16?
19:06:32 <skvidal> before F18?
19:06:49 <StylusEater_work> skvidal: so timelines for how to phase these changes in?
19:07:10 <skvidal> I think timelines are less critical than a plan of migration that does NOT disrupt users too much
19:07:33 <skvidal> timelines tend to evolve based on the invasiveness of the changes
19:07:43 <skvidal> also - none of these are plans
19:07:46 <skvidal> this is just ideas
19:08:07 <skvidal> and I would like to hear what kind of problems others have
19:08:16 <skvidal> I started the first diagram to explain my issues
19:08:21 <skvidal> and I think that diagram does explain it
19:08:24 <skvidal> as an example
19:08:32 <goozbach> #action (everyone) should chime in about FI issues
19:08:34 <smooge> late but here
19:08:37 <skvidal> the fedora wiki requires that both db02 and db05 be online and not in an outage
19:08:53 <skvidal> the wiki - which is pretty important and pretty critical imo
19:09:17 <skvidal> seems to require A LOT of other thigns be working for it to function
19:09:26 <skvidal> that creates two problems to me:
19:09:30 <CodeBlock> mediawiki can run on postgres, can't it?
19:09:37 <skvidal> 1. it means almost any thing we change/do will screw up the wiki
19:09:51 <skvidal> 2. it means when something breaks in the wiki chasing down which part is actually broken is REALLY hard
19:10:10 <skvidal> (is it a proxy? is it in memcached? maybe it's the db itself?)
19:10:36 <skvidal> I think everyone here who has troubleshot the FI layout of things has had that moment or more of hesitation trying to figure out WHERE to look
19:10:42 <skvidal> just to find out what the problem  is
19:10:48 <skvidal> or is that jus tme?
19:11:22 <CodeBlock> yes +1
19:11:33 <skvidal> for example - if I asked you to find out why a file on an FI repo was not accessible to a builder - where would you start?
19:11:35 * nirik nods. Simplifying things is good. Less parts to fail at once, better idea of what to fix, etc.
19:11:48 <goozbach> simplier is verymuch better
19:11:56 <skvidal> the thing is this
19:12:04 <skvidal> simpler is not necessarily going to scale for us
19:12:10 <skvidal> we do NEED things like the loadbalancing
19:12:21 <CodeBlock> skvidal: or in your case, which subdomain to use of dl or download >.> but that's another topic ...that kind of goes along with simplifying things though
19:12:22 <skvidal> but I think we should try to isolate services a bit more
19:12:49 <skvidal> so that our user facing and our dev-facing services don't end up overlapping as much
19:12:52 <skvidal> and most importantly to my mind
19:13:03 <skvidal> so that our authn/authz service is pretty much shared-nothing
19:13:30 <skvidal> I know shared-nothing is not-entirely-possible - but it can be MORE so than it is now
19:13:39 <skvidal> so if you look at the second diagram
19:13:43 <goozbach> simpiler != non-scalable
19:13:52 <goozbach> but yeah I see your point
19:13:58 <skvidal> I have a number of services moved away from the core services
19:14:29 <skvidal> those services: smolt, zarafa, wordpress-mu, etc
19:14:50 <skvidal> they are GREAT places for people who want to really help us to shine
19:15:04 <skvidal> smolt in particular is a great opportunity for help
19:15:09 <skvidal> it's a non-core service
19:15:13 <skvidal> it's relatively detached anyway
19:15:15 <skvidal> it is important
19:15:30 <skvidal> it is has a fairly well-known set of requirements
19:16:07 <skvidal> now - it is possible that smolt has enough traffic that will need more than a single instance - possible - but unlikely, I think.
19:16:12 <StylusEater_work> skvidal: do we have a suggested approach for these updates?
19:16:31 <skvidal> not yet :)
19:16:36 <skvidal> StylusEater_work: got an idea?
19:17:23 <nirik> isolating things also makes it easier to move them around... to another site, the cloud, etc...
19:17:29 <skvidal> nirik: +10
19:17:33 <ranjibd> skvidal: do we have any load balancing around smolt on current setup?
19:17:34 <skvidal> so - here's what I would love to see
19:17:41 <skvidal> ranjibd: it's in our existing lb setup
19:17:47 <skvidal> with, apparently, the whole damn world
19:17:49 <nirik> ranjibd: yes, it's using our proxy setup currently.
19:18:26 <skvidal> so take a look at the chart there
19:18:44 <skvidal> and figure out if there is something you want to take on
19:18:48 <nirik> I'd like to get some stats on some of these things... how many hits does smolt get?
19:18:56 <skvidal> nirik: a good question :)
19:19:19 <skvidal> I think we can use the infra mailing list for this discussion
19:19:26 <nirik> yeah.
19:19:30 <ranjibd> if we use puppet's facts, we can decouple those configs from the individual boxes
19:19:35 <nirik> I'd like to add a bunch of things to awstats post freeze.
19:19:43 <skvidal> ranjibd: we already have that
19:20:01 <skvidal> ranjibd: it's not a cfg mgmt issue - it is an service overlap issue - at least imo
19:20:01 <StylusEater_work> skvidal: I'd probably try to take the services you've identified (and their dependencies) and try to create cohesive/isolated sets and determine the outage/downtime thresholds for those sets ... then I'd look at the uptime/reliability requirements for each set to help guide us on a redesign along the lines of how many load balancers do we need to ensure we can failover for servicing the sets, how many master/slave db setups do we need an
19:21:14 <StylusEater_work> skvidal: we might run into issues where some dependencies overlap and that might spark app redesigns or migrations
19:21:20 <skvidal> agreed
19:21:27 <skvidal> which is why I'm in favor of doing the easy items first
19:21:36 <skvidal> I really do think smolt is going to be an easier item
19:21:42 <skvidal> another thought
19:21:44 <StylusEater_work> skvidal: seems logical
19:21:55 <skvidal> right now we're still waiting on approvals for funding for cloud-instances
19:22:02 <skvidal> but we do have some free slots at osuosl for a server
19:22:32 <skvidal> nirik, smooge: how would y'all feel about moving external services to osuosl in the eventual hope of moving publictest## to ec2 or rax?
19:23:16 <nirik> we have free slots there? meaning we can put another physical box there?
19:23:45 <skvidal> no - I mean a free cpu + ram
19:23:47 <nirik> I think smolt would be a great pilot project for seperating.
19:23:48 <skvidal> on osuosl01
19:23:51 <nirik> skvidal: yeah, but no disk
19:24:05 <skvidal> ah, damn
19:24:18 <StylusEater_work> what kind of disks would we need?
19:24:34 <skvidal> StylusEater_work: ones with disk space - raided :)
19:24:39 <rbergeron> i like big disks and i cannot lie
19:24:57 <smooge> to use osuosl we would need to turn off more pt boxes.
19:24:59 <skvidal> thank you sir-mix-a-lot
19:24:59 <StylusEater_work> skvidal: ha ha ... sorry that I wasn't clear ... model #? sas? sata?
19:25:10 <skvidal> smooge: that sounds like a win-win to me
19:25:11 <rbergeron> :D
19:25:27 <smooge> StylusEater_work, ones that will work in a system we have and not void our warranty with IBM
19:25:44 <StylusEater_work> rbergeron: http://www.youtube.com/watch?v=2ImZTwYwCug
19:25:52 <nirik> I think we can discuss where out of band, but I think we should look at doing a pilot project and see where that takes us. ;)
19:26:27 <skvidal> sure
19:26:39 <skvidal> so right now we're frozen
19:26:48 <skvidal> so now is a fantastic time to talk about plans for when we unfreeze
19:26:55 <skvidal> so if you have ideas - bring them to the list
19:26:56 <CodeBlock> indeed
19:27:01 <skvidal> and let's figure out the timeline for them
19:27:14 <nirik> sounds great.
19:27:20 <CodeBlock> alright
19:27:21 <nirik> thanks for working on this skvidal
19:27:27 <skvidal> np
19:27:57 <CodeBlock> yes thank you - I think it's really going to make life nicer in... well yeah, in 6m to 1yr from now
19:28:26 <CodeBlock> skvidal: is there a mailing list thread for this yet?
19:29:03 <skvidal> nope
19:29:13 <skvidal> just start a new one
19:29:21 <CodeBlock> alright
19:29:21 <goozbach> #idea move publictest## to osuosl with future plans to ec2 or rax
19:29:22 <skvidal> infrastructure is a quiet-ish list
19:29:34 <skvidal> goozbach: umm - no
19:29:42 <skvidal> publictest## are at osuosl
19:29:50 <goozbach> I miss-read that
19:29:54 <skvidal> the idea is to figure out where to move svcs which are more external
19:30:12 <skvidal> possibly osuosl if we can eventually move publictest "to the cloud!"
19:30:18 <goozbach> move external services to osuosl?
19:30:45 <goozbach> #action figure out where to move svcs, with goal of moving publictest to the cloud
19:30:49 <goozbach> that better?
19:31:14 <skvidal> yes
19:31:16 <skvidal> loverly
19:31:56 <CodeBlock> ok
19:32:02 <skvidal> next?
19:32:22 * skvidal looks at codeblock or nirik or someone
19:32:23 <CodeBlock> #topic fedora-infrastructure group ... not sure what that means
19:32:36 <CodeBlock> "* fedora-infrastructure group? How does it work?"
19:32:40 <nirik> no, this was supposed to be 'fi-apprentice'
19:32:42 <nirik> :)
19:32:44 <skvidal> nod
19:32:49 <nirik> so, when do we want to add folks to there?
19:32:49 <CodeBlock> ah
19:32:53 <nirik> what do we expect of them?
19:32:57 <CodeBlock> #topic fi-apprentice - how does it work
19:32:59 <nirik> how long should they remain in that group?
19:33:21 <nirik> we could add them as soon as they introduce themselves at a meeting.
19:33:39 <nirik> or we could add them when they express interest in working on some specific item.
19:33:55 <goozbach> doh' full of fail today
19:33:59 <goozbach> ere yesterday
19:34:04 <skvidal> I like adding them when they express interest
19:34:10 <ranjibd> im still trying to figure out how to start contributing :-(, apart from attending the meetings
19:34:27 <goozbach> #info when should we add them?
19:34:34 <goozbach> #info what do we expect of them?
19:34:52 <goozbach> #info how long should they remain in the group?
19:35:17 <StylusEater_work> nirik: I was/am in favor of adding them once they've identified a task to help work on but then again I'm still a "newb" 'round here
19:35:48 <nirik> ranjibd: well, thats the hard gap to get people past. ;) I guess watch in our other channels, when you see something that interests you speak up...
19:35:56 <nirik> and/or pick a ticket and ask for more info on it
19:36:43 <nirik> How about we add them when they start wanting to work on or investigate something, and we purge the entire group in the per cycle housecleaning.
19:37:03 <StylusEater_work> nirik: part of the reason why I favored that approach is the dormant user issue we discussed a few weeks ago ... people are added then they "disappear"
19:37:07 <nirik> the idea being that if they ramped into other groups, they don't need to be in fi-apprentice anymore?
19:37:17 <StylusEater_work> nirik: that's a great idea +1
19:37:55 <CodeBlock> nirik: good idea but we will need a cut-off still. If they get added 4 days before cleanup day .....
19:38:02 <nirik> yeah. People do run low on time, or have other reasons to disappear, but if they have a specific thing they are working on, they are more likely to need the access.
19:38:03 <goozbach> #idea add them when they start wanting to work, purge entire group in per-cycle housekeeping
19:38:19 <nirik> CodeBlock: well, we can always re-add. ;)
19:38:28 <nirik> or exclude people who we would just re-add
19:38:34 <StylusEater_work> CodeBlock: like 30 days?
19:38:37 <goozbach> do we have a "joined group on date" attribute?
19:38:38 <fenrus02> CodeBlock: 60d idle => purge?
19:39:22 * CodeBlock thinks 30. Too many people just up and disappear, and each of those accounts is just another point of access that doesn't need to be there
19:39:50 <nirik> The problem with 60day/90day is that they hit things like us being busy, or someone being in the middle of doing something important for a release, etc.
19:39:56 <fenrus02> CodeBlock, which ever works - but would that actually be trackable?
19:40:03 <nirik> it's not really.
19:40:08 <nirik> we can track who logs in via ssh.
19:40:16 <fenrus02> nirik, the idea with 60d is that it was long enough to pass over release issues
19:40:25 <abadger1999> The join date should be recorded in fas.  Activity date is harder to find.
19:41:06 <CodeBlock> ah
19:41:38 <nirik> I suppose release stuff has less impact on fi-apprentice as they have only ro access.
19:41:58 <nirik> but for other groups we could mess up and remove someone who is actually active and mess up release items. ;(
19:41:59 <CodeBlock> yeah
19:42:32 <CodeBlock> nirik: I thought we were only talking about apprentice - the other groups..yeah those are more difficult
19:42:47 <nirik> they could be different/seperate I guess.
19:43:21 <StylusEater_work> I thought we were talking fi-apprentice too.
19:43:31 <nirik> depending on how much we want to do here, how about:
19:43:57 <smooge> It took me 2 weeks to figure out who to purge last time and we ended up with a lot of "hey wait I was working on this here"
19:44:30 <nirik> on the 1st on each month, we mail fi-apprentice-members and ask them to reply back with a 'hey, I am still alive and interested, just busy, or the like'. Anyone who doesn't reply by the 7th is removed and can be re-added later.
19:44:35 <fenrus02> smooge, what logs did you look at to find idleness?
19:44:51 <nirik> also that might help us get folks interested, remind them, and find them something to work on.
19:44:54 <smooge> the bonus of how our authentication is set up is that we can get onto systems without having ldap/kerb keeping track of whne it occurred. THe minus is that getting info like this is hard
19:45:41 <smooge> wtmp, /var/log/secure and some stuff out of fas
19:45:56 <CodeBlock> nirik: works for me. Would be nicer if we could automate that somehow
19:46:14 <fenrus02> smooge, the folks you missed - where was their access logged at?
19:46:22 <goozbach> gotta check out, sorry
19:46:26 <goozbach> bit of a busy dayjob just now
19:46:27 <CodeBlock> at least the email/check for reply part. Then we just magically have a list of people who didn't reply
19:46:35 <StylusEater_work> nirik and CodeBlock: that does sound like a good idea ... automation would be cool too
19:46:39 <nirik> CodeBlock: yeah, although it's nice to have the ability to reply back... 'oh, I can help you find something to work on, lets meet up and talk'
19:46:53 <nirik> how about we try it for next month and see how it goes?
19:47:20 <ranjibd> do we have process accounting enabled ? then it will be easy
19:47:58 <CodeBlock> nirik: true that's nice, but I just have a feeling it's going to become a hassle going through the list, matching emails to fas accounts, etc, etc
19:48:17 <smooge> fenrus02, either in systems that I did not check the first time, or were logged in parts of the system I didn't check.
19:48:24 <CodeBlock> getting the list of people to email, and their email addresses, etc. I think it might get old quick :/
19:48:46 <fenrus02> smooge, erg.  so no centralized audit exists?  prelude or whatnot?
19:48:52 <CodeBlock> nirik: we can try it for a month but don't be surprised if I hack up a way to automate it, and see what you/others think of it. :P
19:48:53 <nirik> well, hopefully fi-apprentice will just contain the group of people seeking to join other groups and start doing things... once they do or once they don't have time and wander off they wouldn't be in there any more. So, it should be a small pool most of the time I would think.
19:49:14 <nirik> CodeBlock: sure. I will try and see.
19:49:31 <nirik> fenrus02: we tried prelude a while back, but it was not ready for prime time. ;)
19:49:32 <smooge> fenrus02, no. prelude/wikka never got integrated into our systems
19:49:58 <CodeBlock> nirik: The list will be small, but it's going to be a manual process every month...and what if we forget, etc
19:49:59 <nirik> anyhow, anything else on this? or shall we move on?
19:50:18 <nirik> CodeBlock: yeah. Automation is good, just not sure how it would work.
19:50:19 <fenrus02> nirik, oic.  hm.  sadly i do not know of anything else like it
19:50:28 <CodeBlock> nirik: I'll think on it this week
19:50:47 * nirik looks at the agenda. We are running low on time.
19:51:02 <nirik> #topic Upcoming Tasks/Items
19:51:19 <nirik> Currently we are in final freeze until 2011-05-25.
19:51:36 <nirik> So, we should be using this time to plan and think and test stuff. ;)
19:51:46 <nirik> after freeze is up we have:
19:51:50 <nirik> noc01 upgrade
19:52:07 <nirik> xen14 move (move vms on it off and shut it down)
19:52:20 <nirik> The seconday01->02 move
19:52:46 <CodeBlock> Do we have any dates for these projects yet? At least tentative?
19:52:49 <smooge> nameservers going to 4.
19:53:03 <smooge> building recursive dns servers for our boxes to use
19:53:13 <nirik> CodeBlock: when would you like noc01? 27th?
19:53:31 <nirik> we also will have https://fedoraproject.org/wiki/Infrastructure_post_release_housekeeping after a few weeks.
19:53:36 <CodeBlock> nirik: I was thinking the 30th, give everyone the weekend off to get caught up from release
19:53:41 <nirik> CodeBlock: sounds good.
19:53:54 <nirik> In related news, I am going to try and come up with some ical feeds for scheduled items.
19:53:58 <nirik> more news on that on list.
19:54:03 <CodeBlock> +1
19:54:10 <fenrus02> ical feed would be absolutely awesome.
19:54:29 <nirik> one outage one and one more general one.
19:54:46 <CodeBlock> yeah. And I found a way to import it into emacs org-mode, so I'm happy :D
19:54:56 <nirik> Any other upcoming items people want to mention?
19:55:00 <nirik> after freeze
19:55:29 <CodeBlock> I have tons of... questions/thoughts/ideas/things to talk about fedorahosted stuff, but will bring that up next week when we have more time
19:55:43 <nirik> ok.
19:55:50 <nirik> #topic new folks
19:56:07 <nirik> So, any new folks want to say hi? (Now that we have bored you to death with the entire meeting)
19:56:12 <Gomex> Hi
19:56:23 <Gomex> I am new here, it is my first meet
19:56:42 <fenrus02> hi, not really new, but have bacon to hand out.
19:56:49 * CodeBlock thinks this meeting was actually a good one
19:57:01 <Gomex> I was accept in sysadmin-noc few time ago
19:57:21 <Gomex> I lost last meet because my job
19:57:22 <nirik> welcome again Gomex. Thanks for your work on the old monitoring tickets.
19:57:41 <Gomex> nirik, I am waiting freeze time over to continues my work
19:57:41 <nirik> welcome fenrus02. :)
19:58:07 <Gomex> nirik, I hope help in another team too
19:58:15 <nirik> excellent.
19:58:32 <Gomex> nirik, if you wanna some old/new ticket that you wanna help, please let me know
19:58:46 <Gomex> nirik, some ticket to take a deep look and found some problem
19:58:52 <nirik> as always, folks should hang out in #fedora-admin and/or #fedora-noc and speak up when they see things of interest to them, ask questions, look over tickets... let us know when you want to work on things.
19:58:56 * StylusEater_work is willing to help folks learn python
19:59:08 <nirik> Gomex: will look. ;)
19:59:27 <Gomex> StylusEater_work, I know something about python
19:59:30 <CodeBlock> #topic quick open floor?
19:59:33 <CodeBlock> 30 seconds GO!
19:59:56 <CodeBlock> 5
20:00:01 <StylusEater_work> I made a bunch of changes to the Wiki. Please let me know if there is something crazy you don't like.
20:00:01 <CodeBlock> #endmeeting