fedora-meeting
LOGS

18:15:57 <Oxf13> #startmeeting Fedora Release Engineering Meeting
18:15:57 <zodbot> Meeting started Mon Aug 31 18:15:57 2009 UTC.  The chair is Oxf13. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:15:57 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
18:16:05 <Oxf13> #topic roll call
18:16:16 <nilsonbs> ok
18:16:36 <Oxf13> ping: notting jwb warren wwoods rdieter lmacken poelcat dgilmore spot
18:17:01 <warren> hello
18:17:20 <warren> Have you folks seen the ongoing nss drama?  It isn't fully fixed yet, but at least most packages build against it now.
18:17:39 <Oxf13> yes, been watching it
18:18:16 * dgilmore is here
18:18:30 * spot is mostly here
18:19:05 <jwb> i'm here in spirit, though on the phone
18:20:23 <Oxf13> alright lets get started.
18:21:11 <Oxf13> Our last meeting was the 17th, and there weren't any action items for the meeting
18:21:21 <Oxf13> #topic Fedora 12 Alpha recap
18:21:50 <Oxf13> So 12 Alpha went out, albeit a bit late
18:21:52 <Oxf13> woo and all that
18:22:10 <Oxf13> pretty uneventful release process, no big surprises
18:22:21 <warren> Oxf13: I see there is a dist-f12-maven, are they actively working on it now?
18:22:41 <Oxf13> warren: we'll get to that later, thanks
18:22:42 <warren> Oxf13: are the extra targets like that the reason for newRepo taking so long, or other reasons?
18:22:46 <warren> ok
18:23:52 <Oxf13> We also enabled early branching for F-12
18:24:39 <Oxf13> so far, only fedora-release has been built for fedora 13
18:24:52 <Oxf13> but I suspect we'll see more as time progresses
18:25:07 <tibbs> I've branched a few packages.
18:25:34 <Oxf13> cool
18:26:13 <jwb> still going into rawhide for now, yes
18:26:14 <jwb> ?
18:26:27 <Oxf13> F-12 branched packages go into rawhide
18:26:36 <Oxf13> and devel/ for unbranced packages go to rawhide as well
18:26:48 <jwb> k.  need access to the f12 key via sigul before they start going to updates-candidates
18:26:51 <Oxf13> builds from devel/ where the package already has an F-12 branch go into dist-f13, which does nothing yet.
18:26:59 <Oxf13> jwb: did I give that to you yet?
18:27:05 <jwb> don't think so
18:27:10 <Oxf13> k
18:27:14 <Oxf13> easy enough to fix
18:27:26 <Oxf13> pre-branching was the last ticket for the Alpha milestone, and I can go close that now
18:27:45 <Oxf13> I have a lot of tickets to create for the Beta milestone
18:27:59 <dgilmore> warren: a bunch of targets have been removed
18:28:07 <Oxf13> anything else regarding F12 Alpha?
18:28:16 <jwb> Oxf13, did we ever decide if bit flip could be automated?
18:28:25 <jwb> mostly so you don't have to be awake at ass-early AM
18:28:30 <Oxf13> it could be yes
18:28:45 <jwb> doable for beta?
18:29:46 <Oxf13> sure, it's just an at job
18:30:06 <jwb> k.  add another ticket to open for beta i guess :)
18:30:15 <jwb> 'schedule bit flip at job'
18:30:23 <warren> I have a question about the targets when it is appropriate to ask.
18:30:33 <Oxf13> warren: noted.
18:31:29 <Oxf13> #topic Snapshot 1
18:32:21 <Oxf13> Snapshot 1 is scheduled to be released this Friday
18:32:28 <Oxf13> snapshots are typically just Live images
18:34:05 <Oxf13> Live images haven't been composing lately due to ssl and nss fallout
18:34:21 <Oxf13> multiple people are working on that issue today, hopefully it'll be cleared up by the time we're ready to compose the snapshot
18:34:54 <jwb> is anything composing right now?  anaconda has broken deps on ssl
18:35:40 <Oxf13> yeah, not much
18:35:44 <Oxf13> other than a repo of packages
18:36:41 <warren> is nscd still broken with ssl?
18:36:50 <Oxf13> #info Need to track ssl/nss efforts leading up to snapshot release.
18:37:12 <Oxf13> warren: no clue.  We don't really test the bits, we just make them available for others.
18:37:28 <warren> it was a broken dep
18:37:32 <warren> looks fixed
18:37:42 <warren> we were having hell rebuilding glibc last week due to nss
18:37:43 <Oxf13> anything else on snapshot 1?
18:38:11 <rdieter> here (late)
18:39:35 <Oxf13> #topic Slow newRepo tasks
18:39:47 <Oxf13> newRepo tasks seem to have gotten even slower lately.
18:39:59 <Oxf13> We knew they got slow after the koji upgrade, but recently it's gotten worse
18:40:07 <Oxf13> multiple hour delays trying to do chain-build
18:40:08 <warren> oh, the maven target was removed?
18:40:20 <Oxf13> I looked into this a bit and noticed a few things.
18:40:39 <Oxf13> 1) we had a lot of extra build targets that cause newRepo tasks anytime anything in the fedora stack changed.
18:40:47 <Oxf13> of note, dist-f12-openssl and dist-f12-maven
18:40:58 <warren> both gone now right?
18:41:01 <Oxf13> the openssl target was being actively used until last week, and I retired it.
18:41:19 <Oxf13> dist-f12-maven was created a while ago, back toward the beginning of the f12 cycle.  It was heavily used for a short period of time
18:41:24 <warren> Was maven really needed?  It seems maven is self-contained and doesn't need its own buildroot?
18:41:38 <warren> I mean, maven isn't any less broken with its own buildroot.
18:41:49 <Oxf13> the maven work has been taken over by other people, who either didn't know about, or care about the buildroot and private branches we created and did things on devel/ into the main repos
18:41:56 <Oxf13> so I've removed the dist-f12-maven target too.
18:42:31 <Oxf13> 2) I also made it a point to cut short the inheritance inspection for each newRepo task
18:42:43 <warren> cut short means?
18:42:50 <Oxf13> our current tags, dist-f12 and dist-f12-build had inheritance going all the way back to dist-fc6
18:42:58 <warren> oh
18:43:11 <Oxf13> however nothing is changing in anything from dist-f9 or further back
18:43:29 <Oxf13> so I've created f9-cutoff and f9-build-cutoff
18:43:42 <Oxf13> I'm cloning dist-f9-updates and dist-f9-build into these tags
18:43:55 <Oxf13> and will make dist-f10(-build) inherit from them accordingly
18:44:08 <dgilmore> Oxf13: i thought we decided awhile ago to use dist-f9-eol
18:44:14 <dgilmore> and then bump it each release
18:44:20 <Oxf13> This will save some time when getting the latest build information, for the sake of newRepo and other such tasks
18:44:39 <Oxf13> dgilmore: I don't recall any decision.  I had asked about it one night and by the time anybody gave me feedback I had already created the cutoff tags
18:44:58 <dgilmore> im looking to upgrade /mnt/koji hopefully that will help some also
18:44:59 <Oxf13> the name really doesn't matter, it will only be seen by people digging through inheritance listings
18:45:08 <dgilmore> Oxf13: i gave feedback when i saw your comment
18:45:21 <Oxf13> dgilmore: but you didn't see it until after I made the tags (:
18:45:33 <dgilmore> regardless
18:45:47 <Oxf13> this is a small but important speedup for our process.
18:45:51 <dgilmore> we can probably try limiting newRepo tasks to just 3
18:45:55 <dgilmore> since its at 4 now
18:46:07 <Oxf13> 3) I realized that we recently added epel to our koji setup.
18:46:28 <dgilmore> Oxf13: which added 3 targets
18:46:31 <Oxf13> this adds at least 3 targets, dist-4e-epel, dist-5e-epel, and dist-5e-epel-infra
18:46:47 <dgilmore> Oxf13: we should be able to disable the olpc targets
18:47:01 <Oxf13> and with updates and overrides going out frequently this adds even more newRepo tasks to our queue
18:47:13 <Oxf13> and as dgilmore mentioned, we have a hard cap on how many concurrent newRepo tasks we run
18:47:23 <dgilmore> its 4 right now
18:47:33 <dgilmore> its 3 internally
18:47:42 <dgilmore> and there are many more targets internally
18:47:48 <Oxf13> dgilmore: do we have good data as to if 3 at a time will result in more finished in an hours time than 4 at a time?
18:48:17 <dgilmore> Oxf13: no.  i only know that internally limited to 3 to reduce db thrashing
18:48:20 <Oxf13> dgilmore: are you certain that nothing is using the olpc targets?
18:48:47 <dgilmore> Oxf13: cjb would know ofr sure.  but there last releases have come from fedora proper
18:49:52 <Oxf13> ok.  so as I see it, we have a number of avenues to pursue in order to speed up our repo creations
18:49:56 <warren> ol
18:50:05 <Oxf13> A) reduce number of targets.
18:50:17 <Oxf13> B) Shorten inheritance chains
18:50:27 <Oxf13> C) Fine tune concurrent repo run limits
18:50:53 <Oxf13> D) profile newRepo task to discover delays and improve
18:50:59 <dgilmore> E) speed up disk to reduce time to run createrepo
18:51:26 <Oxf13> dgilmore: when I last looked at it, the createrepo time was a small small fraction of the total task time
18:51:59 <warren> which box does it run on?
18:52:06 <dgilmore> Oxf13: it spends ~ half the time in init  and half on the builder doing the createrepo and uploading the metadata
18:52:11 <dgilmore> warren: the builderws
18:52:14 <dgilmore> builders
18:52:16 <warren> ooh
18:52:17 <Oxf13> er not even half.
18:52:27 <Oxf13> http://koji.fedoraproject.org/koji/taskinfo?taskID=1646486 for example
18:52:35 <Oxf13> Mon, 31 Aug 2009 16:57:46 UTC
18:52:50 <Oxf13> the i386 task didn't even get started until Mon, 31 Aug 2009 17:44:32 UTC
18:53:01 <Oxf13> and it was done by  Mon, 31 Aug 2009 17:51:08 UTC
18:53:02 <warren> Do the createrepo runs on ppc builders take longer?
18:53:14 <Oxf13> that's only 7 minutes for createrepo and import
18:53:19 <dgilmore> Oxf13: ok, last i looked it was about half/half
18:53:28 <warren> err, newRepo tasks
18:53:29 <Oxf13> by far the most time we're spending is on init
18:53:29 <dgilmore> Oxf13: so the init its taking forever
18:53:46 <dgilmore> Oxf13: which really is all on the db/hub
18:54:05 <dgilmore> Oxf13: i have an idea to try something
18:54:18 <dgilmore> Oxf13: kojira runs on koji2
18:54:34 <dgilmore> ill make sure that the builders and the public hit koji1
18:54:37 <Oxf13> warren: ppc builders seem to do the createrepo / import task in about 7 minutes, same as the other arches.
18:54:40 <dgilmore> adn see if that helps at all
18:54:58 <Oxf13> ok.
18:55:21 <Oxf13> #action dgilmore to move builders + public to koji1 allowing kojira more of koji2 resources in an attempt to speed up newRepo init time
18:55:55 <Oxf13> #info typically newRepo init time is close to an hour, where as the actual createrepo time + import is 7~ minutes
18:56:10 <warren> database queries are the slow part?
18:56:36 <Oxf13> warren: something in the init process.  We need proper profiling to know which part is the "slow" part.
18:56:41 <Oxf13> dgilmore: when was the koji upgrade?
18:56:55 <dgilmore> Oxf13: before the f11 mass rebuild
18:57:00 <warren> Oxf13: this code is in koji git?
18:57:16 <Oxf13> warren: yes
18:57:20 <dgilmore> Oxf13: we needed to support strongerhashes
18:57:51 <Oxf13> dgilmore: the upgrade also allowed adding epel buildroots right?
18:58:01 <dgilmore> warren: Oxf13 it allowed external repos
18:58:01 <Oxf13> so any epel newRepo task was after the upgrade?
18:58:23 <dgilmore> Oxf13: i added epel buildroots shortly after the upgrade
18:58:32 <Oxf13> ok
18:58:36 <dgilmore> so people could scratch build epel builds
18:58:45 <warren> dgilmore: which has been incredibly helpful
18:58:59 <Oxf13> http://koji.fedoraproject.org/koji/taskinfo?taskID=1267794
18:59:12 <Oxf13> Ok, prior to the upgrade, newRepo tasks were going as quick as 10 minutes
18:59:18 <Oxf13> that's init, createrepo, and import
18:59:33 <Oxf13> init was taking 4~ minutes
19:00:27 <Oxf13> #info prior to koji upgrade, newRepo init duration was 4~ minutes.  Now it's 60~ minutes
19:01:28 <warren> interesting
19:01:30 <mbonnet> Oxf13: that's also "prior to mass rebuild"
19:01:48 <warren> mbonnet: what other variable does that throw in?
19:01:55 <Oxf13> mbonnet: to that particular mass rebuild.  We've had mass rebuilds before that too
19:02:10 <mbonnet> warren: many more packages, more data to deal with, more loop iterations
19:02:52 <mbonnet> Oxf13: sure, just saying it may not be tied directly to the upgrade.  The newRepo code hasn't changed significantly.
19:03:02 <Oxf13> mbonnet: erm, it was the same amount of packages, just the top level tag had more builds
19:03:12 <warren> what's the best way to instrument python apps?
19:03:19 <mbonnet> Oxf13: that's what I meant, more builds
19:03:59 <mbonnet> Oxf13: I've seen a lot of variability in newRepo task duration too
19:04:16 <mbonnet> I don't think they *all* take 60 minutes.
19:04:44 <warren> do any now take less than 40 minutes?
19:04:53 <warren> I've seen between 40 - 120 minutes lately
19:04:59 <Oxf13> mbonnet: as of late, anything dist-f* seems to be taking 60+ minutes
19:05:45 <Oxf13> does anybody remember the date of the upgrade?
19:06:16 <mbonnet> do we have numbers on the database load?
19:06:25 <warren> http://koji.fedoraproject.org/koji/tasks?state=closed&view=tree&method=newRepo&order=-completion_time
19:06:38 <warren> might be easy to add a "Duration" column
19:06:41 <warren> just subject two numbers
19:06:47 <warren> subtract
19:07:56 <mbonnet> https://koji.fedoraproject.org/koji/taskinfo?taskID=1644875
19:07:59 <mbonnet> < 30 minutes
19:08:07 <mmcgrath> what were you guys wondering about on db load?
19:08:38 <mbonnet> and actually, that init took about 15minutes
19:08:44 <mmcgrath> db load (db3 where koji is) has stayed pretty flat.  Around 1
19:08:55 <warren> mmcgrath: any different from March?
19:08:59 <mbonnet> mmcgrath: speculating on the cause of the slow newRepo tasks
19:09:40 <Oxf13> anyway, this needs more investigation
19:09:44 <mbonnet> mmcgrath: how about load on the kojihub machines?
19:09:54 <mmcgrath> mbonnet: I could help poke if someone could open a ticket telling me what commands get run and where
19:10:03 <Oxf13> #action 0xf13 to plot newRepo task duration over time for the past year or so
19:10:15 <Oxf13> #action Oxf13 to plot newRepo task duration over time for the past year or so
19:10:19 <Oxf13> I mistype my own freaking nick
19:10:24 <mmcgrath> koji2's busy, load around 4 but not horribly busy
19:10:37 <mmcgrath> not swapping or anything
19:10:40 <mbonnet> mmcgrath: is that the one the builders hit?
19:10:44 <dgilmore> mmcgrath: everything is on koji2 right now
19:10:55 <mmcgrath> mbonnet: yeah, ping me after the meeting if you we want to look closer.
19:10:56 <mmcgrath> dgilmore: correct.
19:11:01 <dgilmore> mbonnet: builders, public, kojira
19:11:08 <mbonnet> dgilmore: gotcha
19:12:00 <Oxf13> anything else regarding newRepo tasks for today's meeting?
19:12:39 <Oxf13> alright.
19:12:40 <warren> Oxf13: are logs of each newRepo kept anywhere?
19:12:57 <dgilmore> warren: what kind of logs?
19:12:59 <Oxf13> there aren't really any logs
19:13:10 <Oxf13> there might be some minimal output from createrepo, but that's not the slow part
19:13:25 <warren> someplace instrumentation could be printed to
19:13:45 <Oxf13> warren: I suggest taking that to #koji after the meeting
19:13:51 <warren> ok
19:13:52 <Oxf13> if you're interested in adding profiling code
19:14:01 <Oxf13> probably much easier to just log to syslog from the kojid
19:14:41 <Oxf13> alright looks like we're done with newRepo, we've got some action items out of it.
19:14:46 <Oxf13> warren: did you have any further topics?
19:14:50 <warren> no
19:15:01 <Oxf13> #topic Open Floor
19:15:10 <Oxf13> anything else for the gallery?
19:16:09 <tibbs> Any reason for us not to make F-12 branches of new packages?
19:16:15 <tibbs> us == CVS admins?
19:16:32 <tibbs> Folks are asking for them; I'm not sure why, but I assume they understand where to find their builds.
19:17:36 <dgilmore> tibbs: you could
19:17:52 <dgilmore> tibbs: im sure people are asking because they dont know that they dont need to
19:18:24 <tibbs> I guess in that case they'll be confused about why they don't see their devel builds anywhere.
19:19:15 <tibbs> As long as there's no technical reason why I shouldn't do it, I'll go ahead and make F-12 builds when requested.
19:21:29 <Oxf13> tibbs: yeah, probably just a communication issue
19:22:45 <Oxf13> alright, thanks all for coming!
19:22:51 <Oxf13> #endmeeting