infrastructure
LOGS
18:00:00 <nirik> #startmeeting Infrastructure (2015-03-19)
18:00:00 <zodbot> Meeting started Thu Mar 19 18:00:00 2015 UTC.  The chair is nirik. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:00:00 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
18:00:01 <nirik> #meetingname infrastructure
18:00:01 <nirik> #topic aloha
18:00:01 <nirik> #chair smooge relrod nirik abadger1999 lmacken dgilmore mdomsch threebean pingou puiterwijk
18:00:01 <nirik> #topic New folks introductions / Apprentice feedback
18:00:01 <zodbot> The meeting name has been set to 'infrastructure'
18:00:01 <zodbot> Current chairs: abadger1999 dgilmore lmacken mdomsch nirik pingou puiterwijk relrod smooge threebean
18:00:26 * pingou here
18:00:30 * puiterwijk here
18:00:37 * oddshocks arrives
18:00:38 <dgilmore> hola
18:00:41 <andreasch> here
18:01:11 <tflink> here
18:01:12 <nirik> any new folks like to give a one-line introduction? or apprentices with questions or comments?
18:01:32 <nirik> also any last minute changes to the Gobby document would be good to do now. ;)
18:02:09 * threebean here
18:02:09 <dcmorton> Hey all, I'm Derek.. Linux Engineer at Rackspace, looking to help out with things around here..
18:02:22 <smooge> hello
18:02:27 <smooge> hello dcmorton
18:02:37 <nirik> welcome dcmorton. :) you interested in sysadmin type things? or app development? or both?
18:03:11 * roshi is here
18:03:21 * mirek-hm is here
18:03:25 * dgilmore installs gobby
18:03:29 <dcmorton> Mostly sysadmin stuff.. do a little coding here and there when needed
18:03:53 <nirik> dcmorton: cool. see me after the meeting over in #fedora-admin and I can get you started. ;)
18:04:08 <dcmorton> nirik: thanks, sounds great
18:04:24 <nirik> #topic announcements and information
18:04:39 <nirik> I'm going to dump this weeks info out of the gobby doc... stand by
18:04:52 <nirik> #info datanommer performance enhancements complete in production, resulting in significant speed gains. - ralph
18:04:52 <nirik> #info new zodbot feature deployed:  karma+fedmsg plugin - ralph
18:04:52 <nirik> #info new zodbot feature deployed:  more granular meetbot fedmsg messages - ralph
18:04:52 <nirik> #info converted another mirrorlist server to mirrormanager2 - kevin
18:04:52 <nirik> #info new OS instance is ready, we are now migrating persitent_cloud.yml and then we will deploy dev instance of Copr and test on real world data, then we do final reprovision - msuchy
18:04:53 * oddshocks plugs ears
18:04:56 <nirik> #info new mirrormanager2 release - pingou
18:04:58 <nirik> #info new pkgdb2 release (1.24), pushed to staging - testers welcome - pingou
18:05:00 <nirik> #info progit nicely progressing: http://209.132.184.222/progit/issues?tags=0.1 - pingou
18:05:02 <nirik> input welcome for the new name: http://209.132.184.222/progit/issue/47
18:05:04 <nirik> #info fixed packaging and other small issues for new rbac-playbook (ansible_utils) - tflink
18:05:06 <nirik> #info puppet changes have been pushed, configuration has been migrated. new script will be usable shortly - tflink
18:05:09 <nirik> #info starting to redeploy taskotron on f21 (dev complete), planning to revisit el7 migration later - tflink
18:05:12 <nirik> #info waiting for hardware to arrive in PHX2 - smooge
18:05:14 <nirik> #info Ipsilon put in staging (again), please test - puiterwijk
18:05:16 <nirik> lets all take a min to skim and see if there's anything we want more details on.
18:06:47 <nirik> ok, moving on then
18:06:57 <nirik> #topic Update Fedora Infrastructure Apprentice wiki with Ansible ticket workflow - mhurron
18:06:57 <nirik> Specifically there is a http://infrastructure.fedoraproject.org/infra/docs/puppet.txt but not a complimentary ansible.txt
18:07:00 <nirik> mhurron: you around?
18:07:03 <mhurron> I am
18:07:27 <nirik> so, we do have a README in ansible git root... but I don't think it's really the same here.
18:08:12 <mhurron> iirc it does cover some things, but the puppet document is more structured for new users to get working on things
18:08:31 <nirik> should this be in infra-docs? or just on the apprentice page do you think?
18:08:47 <mhurron> basically thought that it needed to be updated to include the same type of workflow for ansible
18:09:33 <nirik> FWIW, I wish we could impove our new contributor workflow... but it's hard. We don't have many cycles for one on one mentoring... and we are bad at filing easyfix tickets to allow people to work on things...
18:09:41 * nirik checks that puppet doc
18:09:45 <mhurron> I don't see why it couldn't be written in a clear but general manner to be useful for anyone to refer back to, so a link on the apprentice page to infra-docs might be a good way to handle it
18:10:20 <nirik> yeaah, I see... that has a lot of intro info in it...
18:10:33 <nirik> some of that could be just in a genric git one tho
18:10:56 <mhurron> ya, it's not like it's useless
18:11:00 <nirik> setting up git, your name, etc...
18:11:48 <nirik> ok, so how best can we move forward here? I can try and update it, but we know how well that ends up going. ;)
18:12:00 <nirik> or I can try and give you info to correct it?
18:12:31 <mhurron> I am happy to do it, we can start with a quick dump of information and I can work with that to begin with
18:12:35 <nirik> lots of the puppet one is the same for ansible, but some sections are puppet specific for sure.
18:12:45 <mhurron> and from there I can come back with specific questions from there
18:13:02 <threebean> would be much appreciated
18:13:20 <nirik> sure. want me to just dump things here? or meet up after the meeting?
18:13:39 <mhurron> we can do it after
18:13:52 <nirik> mhurron: ok. I should be around. ;)
18:13:53 <mhurron> I'll be around, or you can email me if that's easier
18:14:00 <nirik> and thanks for pushing forward on this stuff. ;)
18:14:20 <nirik> #action nirik to work with mhurron to create a ansible.txt for new folks
18:14:50 <nirik> ok, anything else on this? or move on?
18:15:07 <mhurron> i'm good to move on
18:15:08 <dotEast2015> hello everyone, did I miss the intro part
18:15:12 <nirik> like I said, I would like to try and improve the new contributor flow, but I think thats a higher level discussion for the list perhaps...
18:15:31 <nirik> dotEast2015: yeah. can you hang around for the open floor at the end to add yours?
18:15:42 <dotEast2015> sure
18:15:50 <nirik> #topic Shut off pkgdb2's emails? - pingou
18:15:54 <nirik> pingou: you added this?
18:15:57 <pingou> yes
18:16:09 <pingou> just asking if we should update PKGDB2_EMAIL_NOTIFICATION = True to False in prod
18:16:29 <nirik> pingou: one thing we might need before we can do that...
18:16:31 <puiterwijk> +1, we have FMN now
18:16:45 <nirik> the scm-commits user to post to scm-commits? I think those all go there too don't they?
18:17:05 <pingou> they do yes
18:17:09 <nirik> https://lists.fedoraproject.org/pipermail/scm-commits/Week-of-Mon-20150316/1541526.html
18:17:19 <pingou> threebean: iirc I read you were looking at this no?
18:17:30 <nirik> if we have that in place, +1
18:17:32 <threebean> yeah, I stepped back to fix problems with the fedmsg git hook first
18:17:37 <pingou> ok
18:18:00 <pingou> so once we've done it we can adjust the config in ansible (and restart apache)
18:18:01 <nirik> threebean: but for pkgdb it should be ok to make that user now?
18:18:05 * threebean nods
18:18:07 <nirik> or you want to wait for pkgs?
18:18:12 <pingou> maybe we can couple this with the move of 1.24 to prod?
18:18:13 <threebean> no, that can move ahead.  sure
18:18:46 <nirik> sounds good, we can get everything lined up out of meeting.
18:18:56 <pingou> cool
18:19:09 <nirik> #action pingou to make sure scm-commits user can post pkgdb changes to scm-commits list, then disable native emails.
18:19:27 <nirik> anything else on this?
18:19:32 <pingou> not for me
18:19:36 <nirik> #topic db performance/tuning - kevin
18:19:50 <nirik> I just added this... wanted to talk about that postgres change that threebean did for datanommer...
18:19:56 <threebean> oo
18:19:59 <nirik> do we want to do all our other dbs?
18:20:03 <smooge> YES
18:20:06 <pingou> threebean++ on that btw!
18:20:06 <nirik> and why is it not auto-doing them?
18:20:13 <smooge> We want threebean to do every database
18:20:17 <nirik> yeah! it was a big big win.
18:20:19 <pingou> lol
18:20:35 <threebean> heh, there's two halves to it.  one is really app-specific
18:20:53 <threebean> it involves going through the schema and looking for columns that would benefit from adding an index to it
18:21:06 <smooge> you removed all my SELECT * FROM * WITH * | sytem 'grep ...'?
18:21:06 <nirik> yeah, thats gonna have to be longer term I think...
18:21:08 <threebean> note that adding an index doesn't always make things better.  sometimes it makes things worse depending on how the column is used.
18:21:44 <threebean> the other more general half, we can look at doing everywhere... this is that our autovacuum-launcher daemon is never actually vacuuming our tables.
18:21:58 <threebean> I don't know why yet.. but there's a way to turn up the logging for it so we can see.
18:22:07 <nirik> ok, perhaps test on db01.stg ?
18:22:19 <nirik> and does this need an outage? or can we do it live?
18:22:28 <threebean> can do it live (the vacuuming stuff, no prob)
18:22:41 <threebean> adding indexes usually means downtime since its an ALTER TABLE kind of thing
18:23:00 <nirik> yeah.
18:23:28 <nirik> ok, I can try and poke at it. would be great to make things faster before freeze.
18:23:34 <puiterwijk> threebean: doing a full vacuum wil lrequire downtime
18:23:40 <puiterwijk> since that gets a full lock on the tables
18:23:51 <threebean> puiterwijk: correct.  but i don't see why we'd need to do a full vacuum.
18:24:00 <puiterwijk> okay, sure. just wanted to point it out
18:24:05 * threebean nods
18:24:11 <puiterwijk> wasn't sure which you used
18:24:21 <threebean> just 'VACUUM ANALYZE;'
18:24:29 <puiterwijk> ah, okay. that should be no problem indeed.
18:25:07 <threebean> if people didn't catch it, robyduck had a script that made tons of queries against datagrepper to try and determine which ambassadors are active and which are not
18:25:08 <nirik> excellent. ;)
18:25:23 <threebean> he reports that it used to take 2 days and 18 hours to run.  it only takes 13 minutes now.
18:25:34 <puiterwijk> wow. that's quite a bit better :)
18:26:04 <nirik> yeah, crazy.
18:26:10 * mirek-hm claps hands
18:26:31 <oddshocks> slightly improved, at least
18:26:48 <nirik> ok, anything else on this one? I just wanted to discuss it a bit and see if we can do it more. ;)
18:27:19 <nirik> #topic Learn about sigul (our rpm signing server) - kevin
18:27:30 <nirik> I thought I would talk today about our rpm signing server a bit.
18:27:48 <threebean> (real quick on the last point)  getting the auto-vacuuming working would be great.  if someone gets to it before I do, can you take some kind of measurement of the existing databases before and after so we can see what kind of difference it makes
18:28:09 <threebean> (eom)
18:28:12 <nirik> First a disclaimer: I haven't delved into the code, so I might be incorrect or mistaken about my understanding of things, so take everything here with a grain of salt... ;)
18:28:39 <nirik> Fedora uses a all software rpm package signing server called sigul. It's at fedorahosted.org for the code:
18:28:57 <nirik> https://fedorahosted.org/sigul/
18:29:19 <nirik> it's not gotten a lot of maint over the last few years, but there's some proposals around Google Summer of Code to improve aspects of it.
18:29:29 <nirik> Thats thru the releng folks mentoring.
18:29:42 <nirik> Basically there's 3 machines involved.
18:29:59 <nirik> 1. An end user machine or one of the releng machines that runs a client.
18:30:15 <nirik> 2. A "bridge". This is sign-bridge01 in our infrastructure.
18:30:33 <nirik> 3. A "vault". This is sign-vault03 and sign-vault04 in our infrastructure.
18:30:39 <nirik> secondary arches have their own versions of these.
18:31:04 <nirik> The client talks to the bridge. The server talks to the bridge. The client and server don't talk directly, they always go thru the bridge.
18:31:48 <nirik> There's permissions for each key. Every user has their own passphrases for each key.
18:32:21 <mirek-hm> and the protocol is AFAIK propietary and very very paranoid
18:32:22 <nirik> The server has it's own passphrase (entered at startup). It needs this + the users passphrase to unlock the key to sign something. Without both it can't.
18:32:41 <nirik> mirek-hm: well, it's 100% open source... ;) you can look at it, but it's very dense python...
18:32:55 <nirik> it's also pretty tied to koji.
18:33:10 <mirek-hm> I mean our own implementation, not normal TLS
18:33:27 <threebean> whoah.  really?
18:33:34 <nirik> A client asks to sign a package, the bridge checks that they have access to the key, that their passphrase is right, the vault signs it, and the bridge writes the signed thing to koji usually.
18:33:38 <nirik> no, it's all open source.
18:34:07 <nirik> https://git.fedorahosted.org/cgit/sigul.git/tree/
18:34:17 <nirik> we aren't running anything other than that. ;)
18:34:36 <threebean> https://git.fedorahosted.org/cgit/sigul.git/tree/src/double_tls.py
18:34:57 <nirik> It's got a number of issues... it doesn't rotate logs right, it's error handling is poor (the vault can start with the wrong key, it won't work and it won't tell you the key was wrong)
18:35:04 <nirik> it needs gpg1.
18:35:24 <nirik> but it does work. ;)
18:35:45 <nirik> The vaults are physical servers that don't run sshd. So, they must be accessed via management console.
18:35:56 <nirik> The bridge is a vm.
18:36:40 * nirik tries to think of what else to add.
18:36:53 <nirik> Any questions on the setup or anything?
18:37:09 <nirik> oh, the bridge used to also need a passphrase at startup.
18:37:17 <mirek-hm> there is one disadvantage of sigul, you have to copy whole RPM or (image) to signing machine and then copy the data back, which is tons of data.  In comparsion the Copr use obs-sign where you create digest from rpm (the same way as gpg does) this short digest is sent to signer and signed based on that digest and only signature is sent back which is then inserted to rpm
18:37:45 <nirik> turns out that this didn't add any security really, so we removed it. Anyone in sysadmin-main should be able to restart it now (but there is a process so ask for that first)
18:38:05 <nirik> mirek-hm: yeah, all the data slinging is pretty slow sometimes.
18:38:20 <threebean> nirik: any chance of moving that bridge restart process into a playbook?
18:38:31 <nirik> especially when you get a package like texlive src.rpm or 0ad data or webkitgtk4-debuginfo
18:38:40 <nirik> threebean: we could yeah...
18:38:55 <pingou> +1 for the playbook
18:38:59 <nirik> basically has to kill existing processes, then run the server with some options, then hit return at the passphrase prompt
18:39:14 <nirik> you still get a prompt, even tho the passphrase is 'return'
18:39:18 <pingou> (even if it is only in the private repo)
18:39:43 <nirik> I can see about making a playbook, sounds like a good idea.
18:40:08 <nirik> Oh, thats another problem it has... it has a 'batch mode', but it causes the bridge to lock up, needing restart. ;)
18:40:55 <mirek-hm> the private keys are located on vault machines and there is AFAIK some offline backup in some bank tresor, is this correct?
18:41:26 <nirik> the keys are on the 2 vault machines and yes, there are further encrypted backups.
18:42:03 <nirik> but note those keys are not just there. You need both the server passphrase and a valid user passphrase that has access to that key to decrypt it so you can sign something with it.
18:42:15 <nirik> the vault by itself cannot.
18:42:19 <nirik> nor the user.
18:42:44 <threebean> cool :)
18:42:50 <nirik> Anyhow, I hope we see improvements in things from GSoC, but we will see.
18:43:01 <pingou> when you say the server passphrase, which server is it?
18:43:14 <nirik> pingou: the one needed at startup on the vault.
18:43:15 <puiterwijk> pingou: vault
18:43:20 <pingou> ok
18:43:46 <nirik> #topic Open Floor
18:43:52 <pingou> so when you start the process on the vault (from the client, via the bridge), you enter your own pw to unlock the key
18:43:55 <nirik> dotEast2015: you still around? want to introduce yourself?
18:44:03 <nirik> pingou: yep.
18:44:07 <dotEast2015> sure
18:44:11 <pingou> and the passphrase of the server/application signing in the vault
18:44:18 <dotEast2015> Hello everyone, my name is ali elkhalidi and I am working as a linux systems engineer and I would like to help and contribute to the Fedora infrastructure
18:44:23 <pingou> thanks nirik helps understanding :)
18:44:39 <nirik> pingou: and each key has it's own pw. or can. The server passphrase was entered when it was started up last.
18:44:57 <nirik> dotEast2015: welcome. ;) You more interested in sysadmin or application devel type things?
18:45:03 <dotEast2015> nirik: thanks
18:45:15 <dotEast2015> nirik: sysadmin
18:45:50 <nirik> cool. see me after the meeting over in #fedora-admin and I can help you get started. ;)
18:46:00 <nirik> Does anyone have any other items for open floor?
18:46:16 <dotEast2015> nirik: will do. thanks
18:47:11 <nirik> ok then, lets all get back 15min of our day and end the meeting now? ;)
18:47:19 <threebean> +1
18:47:51 <nirik> #endmeeting