releng
LOGS
15:32:56 <dgilmore> #startmeeting RELENG (2015-06-15)
15:32:56 <zodbot> Meeting started Mon Jun 15 15:32:56 2015 UTC.  The chair is dgilmore. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:32:56 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
15:33:07 <dgilmore> #meetingname releng
15:33:07 <zodbot> The meeting name has been set to 'releng'
15:33:07 <dgilmore> #chair dgilmore nirik tyll sharkcz bochecha masta pbrobinson pingou maxamillion
15:33:07 <zodbot> Current chairs: bochecha dgilmore masta maxamillion nirik pbrobinson pingou sharkcz tyll
15:33:10 <dgilmore> #topic init process
15:33:14 <dgilmore> meeting time all
15:33:20 <nirik> morning
15:33:26 <maxamillion> morning, sorry I'm late
15:33:28 <tyll> Hi there
15:33:47 * masta waves
15:33:49 <masta> howdy all
15:34:24 <dgilmore> #topic #6158 Request to discuss Rel-Eng Project Planning Proposal
15:34:37 <dgilmore> https://fedorahosted.org/rel-eng/ticket/6158
15:34:45 <dgilmore> maxamillion: where are we here?
15:35:15 <maxamillion> dgilmore: Fedora Infra team voted to table the Taiga work until Flock
15:35:40 * pbrobinson is here
15:36:10 <nirik> well, depends on what you mean by table... we should still see if it will otherwise meet needs.
15:36:27 <nirik> but we didn't want to work on packaging it up yet until we know that we are going to use it?
15:36:42 <nirik> or figuring how we can deploy it, etc.
15:37:17 <dgilmore> maxamillion: is the dev instance back up again?
15:37:18 <nirik> at least that was my understanding.
15:37:41 <nirik> I did have a question on the dev instance tho... for maxamillion and threebean
15:37:50 <maxamillion> dgilmore: it is, it has a DNS name also http://taiga.cloud.fedoraproject.org
15:38:10 <cydrobolt> Hello
15:38:14 * threebean is here
15:38:14 <dgilmore> maxamillion: awesome, last I knew it was down
15:38:18 * cydrobolt waves
15:38:28 <nirik> so, that instance is one maxamillion made I think... ad-hock?
15:38:37 <maxamillion> dgilmore: yeah it was, threebean did magic to get things back in order
15:38:45 <maxamillion> nirik: it is
15:38:48 <nirik> and threebean also made another one that was a persistent one via ansible.
15:39:01 <nirik> can we sync that over to the persistant one?
15:39:43 <nirik> that would make sure we have it ansible and everyone has access to it (right now just maxamillion does) and we would bring it up after reboots, etc.
15:39:59 <threebean> sure.
15:40:09 <threebean> to be clear - you're not talking about ansibilizing the taiga configs, right?
15:40:32 <nirik> correct. Just the base instance... so we have other people's keys on it and our usual setup.
15:40:36 <threebean> just keeping 1) persistent disk in the cloud for it and 2) keeping the definition of the node around in ansible.
15:40:42 <threebean> cool, sounds good.
15:41:03 <threebean> I'll do it.  (i think I volunteered for this before already, just haven't gotten around to it.
15:41:05 <maxamillion> nirik: +1
15:41:24 <threebean> use 'taiga.cloud.fedoraproject.org' to access it.  it'll get a new ip, but just following the dns entry around now.
15:41:31 <nirik> cool. that other instance is shut off, but I can power it on and then you change dns to it, run ansible and then manually rsync the stuff and it should be good. ;)
15:41:45 <dgilmore> nirik: threebean: maxamillion: cheers for it all
15:42:46 <dgilmore> anyone else have anything here
15:43:43 <dgilmore> #topic #6164 bodhi2 status update requested
15:43:52 <dgilmore> https://fedorahosted.org/rel-eng/ticket/6164
15:44:05 <dgilmore> lmacken: where does bodhi2 stand?
15:46:30 <nirik> https://admin.stg.fedoraproject.org/bodhi2/ is there now, but no data yet
15:46:33 <maxamillion> dgilmore: last I heard bodhi2 is up and running in stage (though I'll be honest, I don't entirely know the implications of that statement)
15:47:06 <nirik> related to this I have another question. :)
15:47:07 <masta> yeah I thought bodhi2 was in stage and half working
15:47:52 <nirik> I was working on ansiblizing releng04/relepel01 (our production bodhi1 masher hosts for updates).
15:48:19 <nirik> However, if we are going to try and push for bodhi2 in production soon, should I just hold off on that and work on pushing bodhi2 masher stuff?
15:49:07 <dgilmore> nirik: I think we need to have it in use before f23, so that we can more easily deliver all the different things needed as part of the changed updates process
15:49:44 <nirik> ok, so let me not worry about bodhi1 anymore and try and press for bodhi2. ;)
15:49:55 <dgilmore> nirik: :) yep
15:49:55 <nirik> IMHO if we want to land it we need it in production a week before branching.
15:50:20 <dgilmore> nirik: well a week or two before change freeze when we enable it
15:50:27 <masta> sounds reasonable
15:51:09 <nirik> sure. ok.
15:51:18 <nirik> will try and work with lmacken to solve any blockers.
15:51:21 <nirik> lets move on?
15:51:46 <dgilmore> nirik: yep
15:52:04 <lmacken> sorry, was taking care of the dog
15:52:06 <tflink> not sure this is the right place but please keep us (qa) in the loop on planned changes to bodhi
15:52:10 <dgilmore> I need to file a bunch of tickets from the fad to track the last pieces
15:52:21 <dgilmore> tflink: we will
15:52:30 <lmacken> bodhi2 ansible playbook written, it's deployed to stg.. need to work on syncing up the prod db and getting the masher runnign next
15:52:35 <tflink> dgilmore: thanks
15:53:05 <dgilmore> lmacken: login seems horribly busted right now
15:53:09 <nirik> lmacken: please let me know if I can help with ansible or standing up hosts, etc.
15:53:16 <lmacken> dgilmore: yeah, there's an issue with the proxy urls and stuff at the moment
15:53:23 <dgilmore> okay
15:53:38 <dgilmore> lets move on, and follow up again next week
15:53:42 <dgilmore> #topic Secondary Architectures updates
15:53:43 <dgilmore> #topic Secondary Architectures update - ppc
15:53:50 <dgilmore> pbrobinson: how is ppc?
15:53:50 <nirik> lmacken: perhaps a mail to releng/infra lists with whats going on? ie, any blockers or plan...
15:53:50 <pbrobinson> we're looking reasonable here
15:54:08 <pbrobinson> I've been working to get the P8s into production
15:54:16 <dgilmore> awesome
15:54:17 <pbrobinson> and then review the rest of the Power infra
15:54:31 <pbrobinson> build wise we're moving forward
15:54:39 <dgilmore> if we are to support ppc64le in epel we will need some power8 vms
15:54:43 <pbrobinson> trying to get close to mainline in prep for mass rebuild
15:54:59 <nirik> note that copr added ppc64le support.
15:55:03 <pbrobinson> dgilmore: already got the capacity, need to work with nirik to work out connectivity etc
15:55:16 <pbrobinson> nirik: yep, I know
15:55:53 <nirik> pbrobinson: I should talk to you (doesn't have to be in meeting, but either way) about space on secondary arch nfs... we can probibly grow those now and give you a bit more room.
15:56:22 <pbrobinson> nirik: yes, funny on my todo list to ask about esp for ppc mass rebuild
15:56:51 <nirik> cool. It will require me figuring out how to grow volumes, but I am sure I can.
15:57:10 <pbrobinson> nirik: quite straight forward, single command
15:57:35 <dgilmore> nirik: I wonder if we can change the secondary arch storage to get benefits of dedupe etc across the secondaries
15:57:36 <nirik> yeah, all them seem to be single commands, just -with -lots -of -options -to -them
15:58:02 <nirik> dgilmore: put it in one big volume? sounds like a lot of work...
15:58:04 <pbrobinson> dgilmore: would need to merge them all into a single export/volume
15:58:11 <dgilmore> nirik: perhaps
15:58:24 <pbrobinson> sadly netapp only dedupe on a single volume
15:58:44 <dgilmore> might help us some
15:58:51 <dgilmore> not 100% sure how much though
15:59:10 <pbrobinson> I think it would quite a bit
15:59:26 <dgilmore> there is tons of noarch builds
15:59:33 <pbrobinson> but there's implications and other possible issues too
15:59:49 <dgilmore> yeah
16:00:06 <dgilmore> we could look at a common location for staging of secondary arches
16:00:22 <dgilmore> there are many possible wins, with some risks and cons
16:00:40 <pbrobinson> you could put all koji instances on a single volume within subdirectories / different exports and get dedupe benefits, but you have other issues etc you need to take into account
16:00:57 <nirik> currently all secondary arch volumes save about 11-12% via dedupe
16:01:01 <dgilmore> pbrobinson: right, that is kinda what I am thinking
16:01:16 <nirik> primary saves 33%
16:01:22 <dgilmore> nirik: but all noarch rpms are common across all secondaries
16:01:26 <nirik> right
16:01:41 <dgilmore> that is potentially a lot of savings
16:01:57 <pbrobinson> internally across 6 arches I think we get around 40%, it's not just noarch but src.rpm and a bunch of other things like text/graphics in binary rpms too
16:02:16 <pbrobinson> the netapps dedupe in 4K blocks
16:02:33 <nirik> something to think on, I don't think we want to do anything now/before f23
16:02:42 <pbrobinson> nirik: agreed
16:02:57 <pbrobinson> it would likely also save a bunch on cloud/live/images too
16:03:00 <dgilmore> nirik: right, i think wit would be post f23
16:03:29 <dgilmore> but something to look at, might be able to get better value out of our disk
16:03:50 <dgilmore> #topic Secondary Architectures update - s390
16:03:57 <dgilmore> dan is not here
16:04:00 <nirik> sure, no opposed, just want us to figure out tradeoffs
16:04:01 <pbrobinson> here's a whitepaper on the de-dupe for those interested https://www.netapp.com/us/system/pdf-reader.aspx?m=tr-3966.pdf
16:04:24 <dgilmore> #topic Secondary Architectures update - arm
16:04:29 <dgilmore> pbrobinson: how is aarch64?
16:04:39 <pbrobinson> we're looking pretty good, some cleanups from the perl merge
16:05:07 <pbrobinson> now we've got the gold linker working it's fixed a few things, notably the ghc mess from F-22 :-D
16:05:31 <dgilmore> nice
16:06:02 <pbrobinson> working with nirik smooge etc about moving some more builders from boston to PHX until we can get decent enterprise kit
16:06:13 <dgilmore> cool
16:06:22 * nirik nods
16:06:34 <pbrobinson> we now also have a process  for ARMv7 on aarch64
16:06:39 <pbrobinson> but it's rough as hell
16:07:00 <dgilmore> I would like to get at least one aarch64 host in primary koji so that we can use it to make docker base images for 32 bit arm in primary koji for f23
16:07:20 <pbrobinson> so I'm working to get that closer to a standard KVM libvirt VM so we can do some testing for things like kernel and docker builds
16:07:52 <pbrobinson> dgilmore: yes, that's my plan, but it's butt ugly ATM, but I'm hoping to get it better soon
16:08:34 <dgilmore> cool
16:08:44 <dgilmore> anything else on arm?
16:08:48 <pbrobinson> nope
16:09:05 <dgilmore> #topic FAD followup
16:09:23 <dgilmore> I need to file tickets for outstanding deliverables
16:09:40 <dgilmore> as well as do a writeup on what we did and achieved
16:09:46 <maxamillion> I posted my writeup to my blog/planet.fp.o -> http://pseudogen.blogspot.com/2015/06/fedora-activity-day-release-engineering.html
16:10:00 <pbrobinson> I need to finish my writeups and post them
16:10:01 <maxamillion> but mine was a little more from my perspective
16:10:32 <maxamillion> there's a lot of work that happened that I wasn't involved in so I tried to focus on things that I was familiar with and provide links elsewhere for more information
16:11:38 <pbrobinson> I personally think it was valuable on a number of paths
16:11:57 <masta> Yeah, it was a good FAD.
16:12:07 <dgilmore> maxamillion: right, there was lots of breakout discussions
16:12:38 <masta> do we each need to make a writeup of our FAD contributions?
16:13:09 <pbrobinson> masta: that's the idea
16:13:11 <maxamillion> masta: it would have been a good FAD if we walked away with a working pungi that could be easily iterated on
16:13:19 <maxamillion> pungi4*
16:13:32 <dgilmore> masta: ideally everyone would do some writeups yes
16:13:59 <pbrobinson> maxamillion: 2 hours right? :-P
16:14:05 * pbrobinson hides
16:14:13 <maxamillion> pbrobinson: yeah, what a horrible failure that was
16:14:45 <dgilmore> i believed it to be in better shape than it is
16:14:49 <maxamillion> pbrobinson: I did not know how deep that rabbit hole went ... I was under the silly understanding that the code worked before it was thrown over the wall
16:14:59 <maxamillion> .... little did I know ....
16:15:09 <pbrobinson> maxamillion: it's seriously not your fault, I expected you were being a little ambitious from experience..... but you'll never live it down anyway ;-)
16:15:44 <maxamillion> pbrobinson: that's fine, I just want to make shit work
16:15:55 <dgilmore> I had the same opinion as maxamillion
16:15:58 <pbrobinson> maxamillion: me too!
16:17:48 <dgilmore> anything else people want to mention FAD wise?
16:18:12 <maxamillion> other than me bitching aimlessly about pungi4? ... no not really
16:18:20 <dgilmore> #topic Open Floor
16:18:25 <dgilmore> okay lets move on
16:18:39 <nirik> so I do now have a s390-koji01 and db...
16:18:47 <dgilmore> nirik: cool :)
16:18:48 <nirik> (in ansible/rhel7)
16:19:02 <nirik> it needs some more work I think... there's a ticket on it.
16:19:04 <dgilmore> how far from moving to it are we?
16:19:12 <pbrobinson> nirik: groovy, was going to ask about that, also can we configure it with new secondary admin etc groups?
16:19:16 <nirik> sharkcz wanted to implement some shared shadow koji thing
16:19:20 <nirik> which I have no idea about. :)
16:19:25 <dgilmore> ideally we will move all three secondary arch hubs to it
16:19:28 <nirik> pbrobinson: already done. ;)
16:19:31 <pbrobinson> nirik: happy to work with you on that for both access and shadow
16:19:39 * nirik digs up the ticket
16:19:56 <nirik> https://fedorahosted.org/fedora-infrastructure/ticket/4783
16:19:58 <pbrobinson> nirik: cool, can you give me details of those so I can review and add other users we'll need for arm/ppc
16:20:19 <pbrobinson> nirik: cool, will review
16:20:26 <nirik> pbrobinson: sure thing. They are s390-koji01.qa.fedoraproject.org and db-s390-koji01.qa.fedoraproject.org
16:20:36 <nirik> it has a db dump from a week or two ago in it.
16:20:53 <nirik> we need to make sure everything is good, then schedule a migration.
16:20:58 <pbrobinson> nirik: brilliant, thanks
16:21:03 <nirik> once we have this one all done the others should be really easy
16:21:11 <nirik> just different names in templaetes and such
16:21:39 <dgilmore> will be a big positive change
16:21:47 <nirik> yeah. :)
16:21:54 <pbrobinson> nirik: YAY!!!
16:22:11 <nirik> pbrobinson: I can spin up arm ones anytime too, but we might make sure we have everything set for s390 first...
16:22:18 <nirik> but if arm is easier to do first thats fine too
16:22:42 <dgilmore> nirik: I guess the hardest bit for ppc will be that the vms are on a ppc box
16:22:48 <nirik> yeah.
16:22:50 <pbrobinson> nirik: yes, lets get s390 live and then look at arm etc
16:22:58 <nirik> ppc may need some tweaking due to that...
16:23:23 <pbrobinson> dgilmore: nirik: yes, agreed but I have ideas/plans to assist with that I'm working on for PPC in general
16:23:47 <nirik> if we can get ppc so ansible can talk to the hypervisor and it uses libvirt, then we may be ok. ;)
16:24:09 <dgilmore> nirik: that should be doable
16:24:31 <dgilmore> nirik: the newer boxes do use libvirt afaik
16:24:38 <pbrobinson> nirik: yes, that's my plan as I go through the P8 bits, we should be able to get standard KVM/libvirt configs across all arches
16:24:42 <nirik> yeah
16:24:48 <nirik> cool
16:25:03 <tyll> I have also some topics
16:25:30 <dgilmore> nirik: pbrobinson: anything else here? or can we move on to tyll's topics?
16:25:38 <jkurik> any progress with mass-rebuild on F23 ? https://fedorahosted.org/rel-eng/ticket/6162
16:25:44 <nirik> nothing else from me, go ahead
16:25:51 <dgilmore> jkurik: please wait
16:25:51 <pbrobinson> nope, not from me
16:26:00 <dgilmore> tyll: what do you have?
16:26:02 * jkurik is inqueue
16:26:02 <tyll> Is https://fedorahosted.org/rel-eng/ticket/6111 still scheduled to happen before F23?
16:26:21 <dgilmore> tyll: I do not think so
16:26:39 <dgilmore> we have not yet worked out a new solution for the CA situation
16:28:01 <dgilmore> there is some headway made on moving lookaside away from md5
16:28:21 <tyll> I see, is it then maybe possible to do a flag day for the other three changes and do the client CA change later?
16:28:32 <dgilmore> I would rather not
16:28:51 <dgilmore> as that would need two flag days
16:29:20 <tyll> but if it is like one flag day per release it is not like there are a lot of changes that often
16:29:30 <bochecha> that being said, if the CA thing won't be done before F23, maybe it's fine to have one flagday for F23 and another one for F24?
16:29:44 <dgilmore> open to talking about it
16:30:11 <dgilmore> the last flag day we had for this type of thing was after the incident in 2008
16:30:27 <dgilmore> it is not a common thing, and I prefer to keep it that way
16:30:55 <nirik> what are the changes?
16:31:14 <nirik> ah, I see.
16:32:43 <dgilmore> tyll: do you have anything else?
16:32:44 <tyll> iirc it will not be as intrusive - for example people using default configs will not have to do that much except requesting a new client certificate as they have to do every 6 months
16:33:22 <dgilmore> tyll: the client configs will quite possibly have to change
16:33:35 <dgilmore> it really depends on how exactly it is implemented
16:34:34 <dgilmore> if we end up using a whole new CA, then it is much more intrusive
16:35:45 <tyll> even with a new CA we can use two CAs for a migration phase
16:35:48 <pbrobinson> it doesn't seem to make sense to do it if the final implementation isn't even decided
16:36:10 <dgilmore> tyll: maybe.
16:36:42 <dgilmore> I quite strongly want to have a single set of changes requiring client side adjustments
16:37:07 <nirik> I don't know that the first three things need any client side changes.
16:37:33 <nirik> so cant we just do them and defer the one thats not yet implemented/decided?
16:37:47 <dgilmore> if we can work out a way to seamlesly convert people as their certificates expire, then we can role out the other changes at some flag day event
16:38:24 <dgilmore> nirik: any CA change in the webserver side does require client changes
16:38:44 <dgilmore> nirik: as does the change to sha512
16:38:46 <tyll> if we do not invalidate the old certificates, we can accept both the old and the new CA on the server side
16:39:07 <tyll> then people only need to get a new certificate after their old certificate expired
16:39:18 <dgilmore> I have to go for 10 minutes to get my rubbish out
16:40:03 <nirik> perhaps I am not clear what exact steps are to be done for the first 2 changes.
16:40:05 <tyll> for the well-known certificate for pkgs and koji we need to push new client configs that accept the new well-known CA which might not be the case currently
16:40:16 <nirik> The sha512 change I thought was all in fedpkg/upload.cgi.
16:40:42 <tyll> but afaik fedpkg does not check the server certificate in most cases
16:41:58 <nirik> tyll: could you perhaps add to the ticket more verbose description of what happens for each change and the effect on clients?
16:44:01 <tyll> nirik: yes, but there are actually several choices, depending on whether we want to require maintainers to do a "flag day" event or have a migration period, I believe I outlined a lot of options in the meeting notes: http://meetbot.fedoraproject.org/fedora-meeting-1/2015-02-23/releng.2015-02-23-16.34.log.html#l-219
16:44:39 <tyll> so the other topic is the status of fedorahosted trac and pagure for rel-eng
16:45:06 <dgilmore> back
16:45:18 <dgilmore> tyll: send all changes as pull requests in pagure
16:45:42 <tyll> I was wondering if tickets will be migrated from trac to pagure as well if they require code changes and if task items like "unblock pkg foo" will be tracked in trac
16:45:47 <dgilmore> we are still using trac for tickets etc and there is no plans to change that yet
16:46:21 <dgilmore> the only changes for now is to use pagure for code and git
16:47:17 <tyll> I see, but I guess issues related to code would be better in pagure then as well to be able to easily reference them (given that pagure supports references to them like github does)
16:47:58 <dgilmore> sure
16:48:57 <dgilmore> anything else?
16:49:03 <tyll> and what is the rule/workflow for merging pull requests? Can you only merge them?
16:49:04 <jkurik> any news about mass rebuild on F23 ?
16:49:31 <tyll> or is it like in infra that one needs to give a +1 to a pull request and then anyone might merge it?
16:49:47 <bochecha> tyll: fedpkg checks the server cert when uploading and checking if a file exists
16:49:49 <dgilmore> tyll: at the moment pingou and I can. we need to setup a full review process and open it up a  bit more
16:49:53 <bochecha> tyll: it downloads over http, though
16:50:30 <tyll> dgilmore: ok, thx
16:50:40 <dgilmore> jkurik: apparently it is supposed to happen tomorrow
16:51:00 <nirik> thats what we had on the schedule yeah
16:51:11 <jkurik> dgilmore: ok, thanks, just wanted to be sure releng knows of it :-)
16:51:12 <dgilmore> jkurik: we have always started them on Friday's in the past, but when it got added to the schedule it was put on Tuesday
16:51:14 <bochecha> tyll: also, the way the code currently works, it will use the .fedora-server-ca.cert to validate the server cert
16:51:39 <bochecha> tyll: and if the servert cert is not signed by that CA (because it is signed by a well-known CA, for example), then the validation will fail
16:51:56 <dgilmore> bochecha: yeah. probably needs code change
16:52:18 <dgilmore> or we put .fedora-server-ca.cert in a well known place
16:52:28 <dgilmore> and ship the ca cert for what we use
16:52:30 <bochecha> dgilmore: moving to a well-known CA for the server would only require not to use a .fedora-server-ca.cert file any more, which is quite trivial :)
16:52:51 <dgilmore> bochecha: well that depends
16:53:06 <bochecha> it's a client-side code change nevertheless, which means after that change older clients won't work, only the updated ones will
16:53:07 <dgilmore> if we want to accept any well known CA or just the one we are using
16:53:09 <tyll> bochecha: the last time I checked it ran curl with -k iirc and maybe disabled some other cert check as well
16:53:14 <bochecha> dgilmore: ah, right
16:53:18 <bochecha> tyll: not anymore
16:53:22 <dgilmore> tyll: that was fixed afaik
16:53:30 <bochecha> tyll: I redid the whole lookasidecache handling in pyrpkg :)
16:53:45 <bochecha> tyll: now it all uses pycurl nicely
16:53:54 <tyll> bochecha: ah, ok
16:54:08 <bochecha> tyll: it's in pyrpkg-1.35
16:54:21 <bochecha> (which is in stable for everything except EPEL7, and is on its way there)
16:55:51 <tyll> bochecha: wonderful :-)
16:56:24 <dgilmore> does anyone have anything else?
16:56:37 <dgilmore> we are 30 minutes over
16:56:59 * tyll has nothing left
16:57:17 <dgilmore> #endmeeting