infrastructure
LOGS
19:00:01 <nirik> #startmeeting Infrastructure (2013-06-20)
19:00:01 <zodbot> Meeting started Thu Jun 20 19:00:01 2013 UTC.  The chair is nirik. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:01 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
19:00:01 <nirik> #meetingname infrastructure
19:00:01 <zodbot> The meeting name has been set to 'infrastructure'
19:00:02 <nirik> #topic welcome y'all
19:00:02 <nirik> #chair smooge skvidal CodeBlock ricky nirik abadger1999 lmacken dgilmore mdomsch threebean
19:00:02 <zodbot> Current chairs: CodeBlock abadger1999 dgilmore lmacken mdomsch nirik ricky skvidal smooge threebean
19:00:24 <nirik> who all is around for a nice meeting?
19:00:25 * relrod here
19:00:26 * threebean is here
19:00:31 * LoKoMurdoK here
19:00:35 <oddshocks> Holla!
19:00:50 * skvidal is here
19:01:07 * lmacken 
19:01:31 <smooge> ere
19:02:12 <nirik> ok, lets go ahead and dive in...
19:02:15 <nirik> #topic New folks introductions and Apprentice tasks
19:02:25 * mdomsch 
19:02:25 <nirik> any new folks want to say hi, or apprentices with questions or comments?
19:02:36 * abadger1999 here
19:03:24 * nirik waits a minute.
19:03:43 <nirik> #topic Applications status / discussion
19:03:49 <nirik> ok, any applications news?
19:04:00 <mdomsch> MM upgraded!
19:04:04 <nirik> #info mirrormanager 1.4 is humming along in production now. :)
19:04:10 <nirik> thanks for working so much on that mdomsch
19:04:17 <oddshocks> Tahrir, badges, etc all humming along. Threebean and I are slaying some code
19:04:20 <threebean> mdomsch: nice blog post on it too :)
19:04:43 <threebean> yeah, and the badges repo can be found here if anyone wants to poke through it http://infrastructure.fedoraproject.org/infra/badges/rules/
19:04:44 <mdomsch> glad to have finally had the time.  My apologies for the crappy overnight pages and debugging sessions all weekend long!
19:04:44 <nirik> oddshocks: excellent. :)
19:04:58 <nirik> #info badges work is moving rapidly ahead
19:05:08 * pingou late
19:05:18 <nirik> oddshocks / threebean: any milestones coming up or things others could help with?
19:05:26 <pingou> oh I got something app related (after badges)
19:05:27 <nirik> mdomsch: no worries. It was worth it to get it all sorted out. :)
19:05:36 <threebean> oh, a question on badge-land.  in the next few weeks, can we set up the production badge machines even while we're still in freeze?
19:06:04 <smooge> mdomsch, we need to do that every now and then to remind of us why we don't like to have them
19:06:14 <nirik> we could, just would need a freeze break explaining exactly what is being done.
19:06:30 <skvidal> what is the badges production setup look like?
19:06:35 <threebean> cool. we're not there yet, but when we do get there I'll write it up
19:06:44 <skvidal> are we doing a separate db server for it or relying on db01,etc?
19:07:05 <threebean> skvidal: two "webapp" nodes and one "backend" node.  db usage should be pretty light.. I was anticipating adding it to db01.
19:07:23 * nirik really hopes to work on our db 'story' at flock some
19:07:38 <skvidal> nirik: +1
19:07:48 <threebean> yeah, that story right now is "how many eggs do you think we can fit in this basket?"
19:07:54 <nirik> yeah. :(
19:08:14 * relrod has an idea for a web app that I ran by threebean and puiterwijk a while back, but haven't asked anyone else yet because I haven't had time to work on it - but could maybe get some thoughts:
19:08:16 <nirik> we have some mitigation of a db server dying, but it could be/should be much better.
19:08:55 * nirik waits for relrod.
19:09:01 <oddshocks> As far as badges milestones, we're going to be seeing major front-end changes now that we've got the backend all groovy, and there are plenty of issues open peeps can help with on the github repo, just ping me or threebean about em if ya want :)
19:09:26 <relrod> Basically it would be neat/handy for some of our apps (e.g. Fedora Packages and Fedora Mobile) if there was a way for package maintainers to upload screenshots of their packages, and we could show them on package info pages. This could also play into if we ever move to an app-store style system in Fedora, it'd be nice if you could see screenshots before you install an app.
19:09:51 <pingou> relrod: you should talk with hugshie about that
19:10:10 <nirik> #info badges has a number of open issues, see oddshocks or threebean if you want to help out implementing.
19:10:29 <nirik> yeah, we probibly need to discuss app store stuff at flock?
19:10:37 <pingou> http://ambre.pingoured.fr/thisweekinfedora/
19:10:46 <nirik> I could see screenshots and/or screencasts being nice to have for that
19:10:47 <relrod> pingou: will do. It's pretty low priority on my list, but just a thought I had. I don't have time to do much with it yet, but if it's something that not everybody hates then I'll make a repo and at least hack on it once in a while ;)
19:11:38 <threebean> relrod: same story with me and fedmsg-notifications :)
19:11:54 <nirik> The app store story is still unclear to me. I know hugshie was working on it again, but we need to determine what of that falls on us to implment and if it makes sense to do things that way.
19:12:04 * threebean nods
19:12:24 <pingou> nirik: well afaiu fedora-tagger lift already a good part of the burden
19:12:27 * skvidal recuses himself from this discussion
19:12:32 <pingou> only icons and screenshots are missing
19:12:43 <nirik> pingou: ok.
19:12:49 <threebean> he unfortunately never responded to my pings after we launched the rewrite of fedora-tagger.  that was supposed to enable app store stuff integration but its not being taken advantage of as far as I can tell.
19:12:50 <nirik> any other application type news?
19:13:12 <pingou> http://ambre.pingoured.fr/thisweekinfedora/ <- results of tuesday work
19:13:43 <nirik> pingou: very nice!
19:13:48 <pingou> tatica said she was interested in theming it and I hope to plug it onto the planet one of these days
19:14:20 <nirik> sounds great. ;)
19:14:28 <nirik> thats all fedmsg based right?
19:14:43 <pingou> datagrepper actually
19:15:30 <nirik> yeah, ok.
19:15:45 <nirik> #info http://ambre.pingoured.fr/thisweekinfedora/ hopefully themed and added to planet soon
19:16:09 <nirik> any other app news? or shall we move on?
19:16:53 <pingou> I have to put some pressure on abadger1999 to release python-fedora with the flask openid plugin by the end of freeze :)
19:17:02 <abadger1999> <nod>
19:17:05 <abadger1999> It will be done :)
19:17:17 <abadger1999> and hopefully not buggy as hell ;-)
19:17:18 <nirik> is that needed for fedocal openid?
19:17:19 <nirik> or ?
19:17:35 <abadger1999> yeah -- fedocal openid should wor with the beta package + hotfix in infra
19:17:38 <pingou> nirik: yes, that's the last change I have on the todo before releasing 0.2.0
19:17:51 <abadger1999> but it won't work with just hte packages in the fedora/epel repos
19:18:12 <nirik> ok.
19:18:16 <skvidal> question on flask and openid in general
19:18:32 <skvidal> is there a status on the groups/teams work
19:18:46 <pingou> the group extension?
19:18:52 <skvidal> so we can do group-limited openid logins to websites via flask or mod_auth_openid
19:18:53 <skvidal> pingou: yes
19:19:10 <nirik> yeah, I know puiterwijk was waiting for some package for that, but I don't know current status
19:19:11 <skvidal> so I can say 'allow openid from fedora who are in this group'
19:19:21 <nirik> we also want it for the trac openid on hosted.
19:19:22 <skvidal> thx - in the long term I think it will really help us
19:19:27 <skvidal> nirik: yes we do
19:19:40 <pingou> skvidal: afaik for the moment the app is responsible for that
19:19:42 <nirik> I'd love to kick mod_auth_pgsql to the curb. ;)
19:19:54 <skvidal> pingou: right - but I need some way of requesting which groups a user is in
19:20:00 <skvidal> nirik: it's just nagios, right?
19:20:00 <nirik> python-openid-teams I think is the package?
19:20:16 <nirik> nagios and logs I think? or were we going to leave logs alone?
19:20:17 <pingou> nirik: I think so yes
19:20:22 <skvidal> nirik: I'd like to move logs
19:20:28 <skvidal> but logs is not mod_auth_pgsl
19:20:30 <skvidal> afaik
19:20:33 <skvidal> it's that local htpasswd thing
19:20:37 <nirik> it would mean that if openid is down we couldn't get to logs, but meh
19:20:41 <nirik> yeah
19:20:43 <skvidal> nirik: yah we could
19:20:51 <skvidal> nirik: we could either do a fallback or just ssh into the frelling box
19:21:00 <nirik> well, we couldn't get to html versions of the epylogs via a remote browser. ;)
19:21:07 <nirik> so, no big deal.
19:21:14 <nirik> since we would be likely fixing openid anyhow then. ;)
19:21:17 <skvidal> nirik: scp log02:/path/to/file.html . ; firefox file.html :)
19:21:25 <nirik> yep. ;)
19:21:27 <nirik> anyhow...
19:21:28 <skvidal> I'm okay with that
19:21:31 <nirik> #topic Sysadmin status / discussion
19:21:38 <nirik> lets see... sysadmin items...
19:21:43 <abadger1999> skvidal: that's in the extension.
19:21:47 <skvidal> abadger1999: excellent
19:21:51 <skvidal> I went to ansiblefest last week
19:21:54 <nirik> we have a number of things that are going to be pending after freeze.
19:22:03 <skvidal> and mainly heard about how other people are doing things in their infrastructures
19:22:07 <nirik> skvidal: you should do a trip report thingie/blog post/etc. ;)
19:22:10 <skvidal> ah
19:22:16 * skvidal goes back to lurking
19:22:17 <skvidal> :)
19:22:42 <nirik> so it sounded like we aren't doing anything _too_ crazy from others, right?
19:22:51 <skvidal> nirik: not too bad
19:23:03 <skvidal> I will note that a lot of folks are using it to merge between sets of tools
19:23:13 <skvidal> lots of 'use ansible to orchestrate other systems'
19:23:21 <skvidal> 'use ansible to manage merging multiple inventory sources'
19:23:33 <skvidal> 'use ansible to make our devs shut up and break their own systems'
19:23:42 <skvidal> and then by far
19:23:47 <skvidal> some really great new modules
19:23:50 <skvidal> that folks have written
19:23:58 <skvidal> mysql_replication was by far the most exciting imo
19:24:09 <skvidal> their able to flap their master back and forth
19:24:14 <skvidal> orchestrated in a playbook
19:24:21 <nirik> very nice.
19:24:24 <skvidal> and it is kinda seriously neat
19:24:29 * nirik wishes for a similar postgresql one. ;)
19:24:29 <skvidal> yah - I'd like to be there
19:24:34 <skvidal> on both of our db servers
19:24:48 <skvidal> okay I'll be quiet now :)
19:25:06 <pingou> and start write the email ;)
19:25:21 <nirik> #info ansible migration continues new machines are in ansible and slowly we are porting things over to it.
19:25:27 <skvidal> one thing I'll bring up
19:25:34 <nirik> skvidal: so what are our next steps on ansible migration?
19:25:47 <skvidal> so
19:25:54 <skvidal> in order to allocate access to folks
19:26:02 <skvidal> 2 or 3 folks at the conference who spoke  are using jenkins
19:26:10 <skvidal> to allocate access to non-root users to run ansible
19:26:19 <skvidal> so they setup jenkins to access the right keys
19:26:28 <skvidal> and then give the users in jenkins access to run certain scripts
19:26:41 <skvidal> then the user sees the whole output and jenkins logs it, etc
19:26:48 <skvidal> so if they break it - they can fix it, too :)
19:27:03 <pingou> neat
19:27:09 <skvidal> I have to admit I like that - but running jenkins as part of the admin process feels...... weird to me
19:27:15 <nirik> it's a nice idea (althought I think jenkins is not a solution for us, but perhaps one of the other things like it would be)
19:27:26 <skvidal> right
19:27:31 <skvidal> so my wonder is this
19:27:48 <skvidal> if we did RBAC via a script + your user/groups + sudo on lockbox
19:27:56 <skvidal> that let you see the output AND it captured the output
19:28:00 <skvidal> 1. is that ridiculous?
19:28:13 <skvidal> 2. is there something stupid about this I'm not thinking about?
19:28:35 <skvidal> http://paste.fedoraproject.org/19948/71747790/ <-- that's the basics
19:29:03 <nirik> I think that would work... but in addition I think we still would need a trigger based thing... so even if you couldn't run the playbook then, you could commit and let it trigger off.
19:29:10 <skvidal> nirik: triggers STILL have to happen
19:29:11 <skvidal> yes
19:29:14 <skvidal> nirik: no dispute
19:29:23 <nirik> yep. In agreement here then. :)
19:29:35 <skvidal> about triggers then
19:29:39 <skvidal> I was looking at our commits
19:30:05 <skvidal> and so man y of them would either actually be 'apply-global' or they are the kind that make it almost impossible to know what systems would definitely be impacted - which means 'apply-global'
19:30:09 <skvidal> so in the apply-global case
19:30:27 <skvidal> how often do we check for that
19:30:40 <skvidal> since clearly it cannot be 'apply-global immediately following any check in that could impact it'
19:30:48 <skvidal> that would just be a recipe for people making bad commit decisions
19:30:55 <nirik> so, that would run all playbooks on all affected hosts in those playbooks?
19:31:08 <nirik> yeah
19:31:24 <skvidal> apply-global means 'we dcannot tell which hosts are affected so we have to assume all'
19:31:54 <nirik> yeah.
19:32:03 <skvidal> I guess I was thinking
19:32:14 * nirik needs to ponder on this a bit more I think... I think we can come up with something...
19:32:16 <skvidal> in the abscence of any commits we should not run
19:32:29 <skvidal> that seems fairly obvious, right?
19:32:33 <nirik> yeah, that sounds reasonable.
19:32:42 <nirik> or if we did, it would be a nightly job or something.
19:32:56 <nirik> and it should SCREAM that something changed.
19:33:18 <skvidal> nirik: sfromm's ansible-report can produce email reports - which are kinda nice
19:33:46 <skvidal> I talked to him a bit last week and earlier this week - there's some.... tricks with it - but I think they can be worked around
19:33:52 <skvidal> okay so
19:33:57 <skvidal> let's say that a commit comes in
19:34:00 <nirik> we could put some of the burden on the committer... to indicate what host(s) or groups to run on... but thats not ideal I guess.
19:34:09 <skvidal> we look up which hosts it applies to
19:34:15 <skvidal> we write that out to a file
19:34:22 <skvidal> which a cron job checks for.... how often?
19:34:33 <nirik> 30min?
19:34:33 <skvidal> every hour? every half-hour? every 15 minutes?
19:34:54 <skvidal> so every 30 m the job will open up the file, look for which playbooks/hosts to run against and run those
19:34:58 <skvidal> if it exists
19:35:27 * nirik notes we can adjust down the road if it seems like the time isn't right...
19:35:40 <nirik> IMHO if it's urgent you want to run the playbook yourself or find someone to.
19:36:06 <nirik> if it's just a cleanup/whatever you want it to run and do those things before you forget about it, but with enough time to realize if you made a mistake/typo (or for other people to)
19:36:23 <skvidal> okay
19:36:52 <skvidal> so even more than before - this means if you're playbook is NOT idempotent - you need to make it be so
19:36:53 <skvidal> :)
19:37:05 <nirik> yes, I think we should strive to make them all so.
19:37:34 <nirik> that would help for a nightly run too... then if something changed, we know someone futzed with things.
19:38:16 <skvidal> okay
19:38:28 <nirik> #info working out how to let folks run specific playbook runs themselves.
19:38:39 <skvidal> so one more thing on the triggered runs
19:38:46 <nirik> #info working out how triggering playbook runs should work.
19:39:00 <skvidal> here's what i've been basing from
19:39:05 <skvidal> for determining which hosts
19:39:18 <skvidal> if the playbook is in playbooks/[hosts,groups]
19:39:21 <skvidal> then it is obvious and simple
19:39:30 <nirik> yeah, --list-hosts will show it.
19:39:41 <nirik> (for that easy case with a playbook)
19:39:48 <skvidal> if it is a task - then look up which playbooks that task shows up and run those playbooks
19:39:59 <skvidal> if the file is in $files then apply-global
19:40:15 <skvidal> I can probably make the files thing a BIT more specific - but it's kind a crapshoot :)
19:40:26 <skvidal> if the modification is in groupvars/hostvars - obvious rules apply
19:40:45 <nirik> roles would be like tasks? (ie, look up which it's included in)?
19:40:51 <skvidal> yah - that's the idea
19:41:25 <skvidal> so it's any mod/add/rm
19:41:33 <skvidal> does that make no sense to anyone?
19:41:54 <nirik> makes sense to me. I think there may be corner cases, but we can deal with them.
19:42:06 <skvidal> the paths i've not looked at are the plugins and library, handlers, etc
19:42:29 <nirik> we change those so rarely, just apply-global as a first cut
19:42:33 <nirik> IMHO
19:42:34 <skvidal> and finally people making 'inventory' changes makes things.. tricky
19:42:56 <skvidal> since it is impossible to know if the inventory change is substantive :)
19:43:00 <nirik> yeah.
19:43:10 <skvidal> okay that's all I have
19:43:12 <smooge> I am working a couple of tickets. Will be installing another PPC once it is racked and stacked and I get a kernel fix so I can run KVM sessions again
19:43:21 <nirik> and cases like if you remove a playbook, do nothing, etc.
19:43:21 <skvidal> if anyone is interested in any of this stuff - please let me know
19:43:39 <nirik> #info ansible help welcome. See skvidal. :)
19:43:48 <nirik> smooge: cool.
19:44:01 <smooge> KVM on PPC is not possible yet
19:44:01 <skvidal> nirik: removing a playbook - yes - would be doing nothing - but.... that's a little different ;)
19:44:06 <smooge> so it will be LPAR
19:44:12 <nirik> smooge: :( bummer.
19:44:29 <nirik> so other pending sysadmin items:
19:44:42 <nirik> #info new bladecenter is all arrived, trying to get network to it.
19:45:00 <nirik> #info will be adding mem and replacing 2 bvirthosts sometime this quarter
19:45:16 <nirik> #info phx2 on-site visit time probibly in mid/late july.
19:46:04 <nirik> #info working on new rdiff-backup setup for backups, probibly after freeze
19:46:25 <nirik> Anything else on the sysadmin side of things?
19:46:46 * skvidal has nothing else
19:46:50 <nirik> #topic Private Cloud status update / discussion
19:46:57 <nirik> not much cloud news recently.
19:47:05 <skvidal> well - sorta
19:47:10 <skvidal> openstack renamed more random crap
19:47:12 <nirik> I have some f19-tc5 images loaded in, need to update them to tc6. ;)
19:47:16 <nirik> yeah, boo.
19:47:30 <skvidal> nirik: hopefully we can get 03 moved to an external port
19:47:35 <skvidal> nirik: and we can do multihost grizzly setup
19:47:49 <nirik> #info moving networking on a spare node so we can do a multihost grizzly setup
19:48:03 <nirik> #info f19 test compose images available for testing/tweaking.
19:48:21 <nirik> #topic Upcoming Tasks/Items
19:48:21 <nirik> https://apps.fedoraproject.org/calendar/list/infrastructure/
19:48:33 <nirik> any upcoming tasks/items folks would like to schedule or note?
19:49:29 * nirik is crossing fingers for an on time release, but you know how it is... ;)
19:49:34 <nirik> #topic Open Floor
19:49:52 <nirik> anyone have items for open floor?
19:49:57 <nirik> questions? comments? suggestions?
19:50:59 <kc4zvw_> nirik: I have access issues to work after meeting
19:51:00 <skvidal> where is everyone today?
19:51:00 * nirik will close out the meeting in a minute if not
19:51:17 <smooge> ok
19:51:22 <nirik> kc4zvw_: ok. Can try and assist over in #fedora-admin
19:51:32 <nirik> skvidal: not sure.
19:52:42 <nirik> Thanks for coming everyone!
19:52:44 <nirik> #endmeeting