fedora-kernel
LOGS
18:00:31 <davej> #startmeeting
18:00:31 <zodbot> Meeting started Fri Mar 16 18:00:31 2012 UTC.  The chair is davej. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:00:31 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
18:00:44 <davej> #meetingname Fedora-kernel
18:00:44 <zodbot> The meeting name has been set to 'fedora-kernel'
18:01:07 <davej> #chair davej
18:01:07 <zodbot> Current chairs: davej
18:01:15 <davej> whee.
18:01:55 <davej> jwb, jforbes: ready ?
18:02:01 <jforbes> davej: indeed
18:02:44 <jwb> BORN READY
18:02:54 <jwb> ok, not really.  but sure
18:03:19 <davej> heh. alrighty.
18:03:40 <davej> so, lets start by recapping on last meetings discussion about the common bugs we've been seeing
18:04:14 <davej> I think since then, we've been able to attribute a lot more of the "weird shit happened" bugs to the i915 hibernate corruption problem
18:05:13 <davej> there's still no progress towards a fix, but it seems that at least keithp is thinking about some theories now.
18:05:24 <jwb> so on that
18:05:38 <jwb> we disabled the threaded compression code because we had no idea what was broken
18:05:54 <jwb> do we want to leave that off, or turn it back on given that it computes a CRC32 over the image?
18:05:57 <jforbes> Ahh, think we should back that out?
18:06:09 <davej> yeah, I think we've ruled that as a potential problem now
18:06:17 <jforbes> Okay, I will pull it
18:06:21 <jwb> i don't think disabling it has changed much, and Bojan is paying attention in the main bug now
18:06:25 <jwb> seems safe to drop
18:06:26 <davej> yeah
18:06:31 <jwb> it'll make Bojan happy :)
18:07:00 <davej> #action revert the 'disable threaded compression code' patch
18:08:33 <jwb> anything else on hibernate?
18:08:41 <davej> so I think we're blocking on keith/intel on this one, so we're just going to have to keep poking them periodically.
18:08:56 <jwb> it seems there was a brief discussion of IOMMU, but it sounded like that wasn't really at play either
18:09:22 <davej> yeah, I think keith's current theory on the GTT needing to be torn down is probably the way forward.
18:09:33 <davej> I looked at it myself, but was a bit out of my depth
18:10:16 <davej> ajax: you just missed the bit where we talked about how much i915 sucks.
18:11:09 <ajax> i wouldn't say i've been missing it, bob.
18:11:29 <davej> heh.
18:11:38 * EvilBob looks around
18:11:46 <ajax> i hear you made some progress on that though?
18:12:03 <davej> ajax: from what we can tell, the memory corruption is caused by stale GTT entries
18:12:18 <davej> so maybe that needs to be torn down before the hibernate happens
18:12:30 <davej> but keithp mentioned that there might be dragons there
18:12:52 <ajax> i'm sure there are, but it surely needs doing.  existing bz i should be assigned to, or should i make my own?
18:13:06 <davej> the existing bz's are a mess tbh
18:13:13 <davej> kernel_hibernate has become a pile-on
18:13:29 <jwb> GTT and GART are basically the same thing, right?
18:13:35 <davej> we're using that bug as a master tracker, rather than anything useful in the comments
18:14:29 <davej> jwb: yeah, just a fancy mmu of sorts
18:14:32 <ajax> k.  added to the queue.
18:15:12 <davej> don't think there's much else that needs discussing on this ? move on ?
18:15:24 <jwb> yeah
18:15:33 <davej> #topic irqpoll
18:15:36 <davej> Josh's favorite patch
18:15:41 <jwb> ok
18:16:00 <jwb> so we added a patch submitted upstream that made the kernel fall back to polling IRQs if it found a "stuck" one
18:16:10 <jwb> it did that, but it was really really verbose
18:16:26 <jwb> it also was much less tolerant of what it considered a "stuck" irq
18:16:47 <jwb> so instead of looking to see if it was unhandled 999,000 times, it decided to poll after 9
18:16:51 <jwb> (yes, 9)
18:16:59 <davej> heh
18:17:20 <jwb> that seems to cause some machines that are either a bit slow or busy to falsely trigger the fallback code
18:17:48 <jwb> however, really the original patch was only supposed to be a workaround for a specific broken PCI bridge
18:17:58 <jwb> ASM1083/ASM1085
18:18:19 <jwb> we've reworked it now to use a PCI quirk to only do this kind of behavior if that bridge is detected
18:18:45 <jwb> it took a couple of iterations, but it seems to be working well (in other words be completely benign) on non-ASM boxes
18:19:13 <jwb> for the machines that _do_ have ASM108x bridges, it does fall back to the polling behavior, but it makes the box somewhat laggy
18:19:14 <jforbes> Have we gotten any feedback from an ASM user yet?
18:19:19 <jwb> yeah, one
18:19:36 <jforbes> laggy and running is better than dead
18:19:55 <jwb> it's kind of expected that things are going to get laggy when you're polling, particularly if your graphics card happens to share an interrupt with the one that toggles the behavior
18:20:20 <jwb> i might be able to lessen the poll frequency a bit more and make it not quite so bad, but i need a willing user to test it out
18:20:27 <jwb> i'm sure i'll find one in not too long
18:20:29 <davej> so something that needs doing once the dust settles on this, is to go through the remaining irqpoll bugs that aren't asm108 and see if there's any commonality there.
18:20:35 <jwb> yes
18:21:09 <jwb> i think i already collapsed all the asm108 reports into a single bug
18:21:20 <jwb> so anything other than the one is a candidate for review
18:21:25 <davej> I'm wondering why we saw such an uptick in this warning over the last release or so. It may even be a kernel bug for all we know right now
18:21:58 <jwb> yeah, i'm thinking it might be.  again, finding someone impacted that doesn't say it's a one-off and is willing to test/bisect is the key i think
18:22:22 <davej> maybe we'll get lucky when we move to 3.3 ;)
18:22:32 <jwb> could be
18:22:42 <davej> speaking of, move onto that topic ?
18:22:50 <jwb> as for the patch itself, i think it needs more eyes and thoughts before it gets upstream
18:23:00 <jwb> it's fairly hacky at the moment
18:23:08 <jwb> anyway, yeah let's move on
18:23:11 <davej> ok
18:23:21 <davej> #topic upcoming f15/f16 3.3 rebase
18:23:35 <jforbes> I think davej brought up a good question there, if there is another bug that has made the irqpoll problem so much more prominent, it might be that the patch is of much more limited use
18:24:08 <jwb> jforbes, yeah.  the only reason we're sticking to the patch at the moment is that upstream already did quite a bit of analysis on that piece of hardware
18:24:22 <davej> it'll still be of use for asm108 I suspect.
18:24:49 <davej> anyway, let's see how it works out.
18:25:18 <davej> so, 3.3 will probably be final sometime next week.
18:25:53 <davej> hopefully in time for the beta.
18:26:18 <davej> jwb mentioned earlier that it might be worth jumping on it as soon as it's released for f15/f16 instead of waiting for .1
18:26:53 <jwb> it's a thought i had anyway.  we've been carrying the wireless stack from 3.3 for a while in f16 now already
18:26:59 <jforbes> I don't see a problem with that.  We follow upstream closely, and the stable queue.  We can easily grab patches before .1 comes out if needed
18:27:33 <jwb> yeah.  and i'm still leary of .1 releases in general anyway
18:27:42 <davej> my only concern here is the (small) window where a security bug might come in, and 3.3 regresses booting for someone, so they have to go back to 3.2 without the fix.
18:28:06 <jwb> is that going to be much different than with 3.3.1?
18:28:09 <davej> but we face that problem every time anyway, even if we wait
18:28:14 <jwb> right
18:28:29 <jwb> i'd be willing to build some f16 3.3 kernels and put them on my people page
18:28:41 <jwb> blog about them, get some informal feedback
18:29:02 <jforbes> There's also the question of security bug severity.  We cover all CVEs to make sure we are not exposed, but in reality, a majority of CVEs are corner cases most people will never be exposed to
18:29:19 <davej> yeah, it's rare that we see something really severe
18:30:19 <davej> so I think we're all in the same mindset here that moving forward is the best option.
18:30:37 <jwb> sounds good to me
18:30:48 <davej> #action rebase f15/f16 to 3.3 when released.
18:31:13 <jwb> i figured we wait a week or so after 3.3 hits f16 stable before we rebase f15?
18:31:26 <jwb> or maybe that's not worthwhile anymore.  seems we don't have a ton of f15 users
18:31:47 <davej> I've noticed f15 updates tend to sit waiting for karma a lot longer
18:31:54 <jforbes> I think so, I think your people page idea might be good for a quick 3.2 + patch if we find a really severe issue
18:32:06 <jwb> true
18:32:35 <jforbes> Yeah, there seem to be very few F15 users, qemu updates were the same way.  The people who all jumped on F15 have moved to F16, and a lot skipped it
18:34:26 <jwb> anything else on rebasing?
18:34:37 <davej> not from me.
18:34:50 <davej> oh
18:34:54 <davej> one idea I had.
18:35:15 <davej> once we get 3.3 landed in 16, shall we do a mass "please retest" on the open 16 bugs ?
18:35:29 <davej> we've kinda done them by hand up until now
18:35:32 <jwb> yeah, probably
18:35:49 <jwb> going through old bugs asking that by hand at the start of a month gets old
18:35:59 <davej> there will obviously be some that we know aren't going to be fixed by the rebase, but they should be the minority
18:36:05 <jforbes> That's a good idea
18:36:33 <davej> and then maybe 2-3 weeks later, if they're still needinfo, close them insufficient data (using judgment, rather than automated)
18:36:56 <jforbes> that works (there's an automated way?)
18:37:09 <davej> yeah there's a "change multiple bugs" link at bottom of a bug list
18:37:12 <jwb> yeah.  i've been waiting 2-3 months before i close out a bug like that, but it seems much too long
18:37:54 <davej> so looking at things right now, open bugs: f15:349 f16:509 f17:31 rawhide:144
18:37:58 <jforbes> 2-3 months is way too long.  Most people who will bother responding will do so in the first week, 1 month is more than enough for automated.  They can reopen if they ever get to it
18:38:14 <jwb> grr... i had f16 under 500 last week
18:38:30 <jforbes> jwb: i was under 500 yesterday, just a timing thing
18:38:33 <davej> yeah, me too. then a couple hours later, it went back over.
18:38:52 <davej> we've closed 53 f16 bugs this last week alone.
18:39:31 <jwb> ok, so i think we settled on ask about the rebase, and close if no response in 2-3 weeks
18:40:25 <davej> #action after 3.3 rebase, mass update open bugs asking to retest, and if close if no response after 2-3 weeks if appropriate.
18:40:44 <davej> the 'if appropriate' part there is obviously where we know something hasn't been fixed
18:42:05 <davej> ok, think that's it for the rebase.
18:42:17 <davej> and that's all I have on the agenda I think.
18:42:29 <davej> #topic open floor
18:42:41 <jwb> what about DEBUG_VM?
18:43:03 <davej> I think leaving that on, at least for a while might be beneficial.
18:43:14 <brunowolff> Will f18 go to 3.4 right after the merge window closes?
18:43:14 <davej> that it breaks fglrx seems to be the only real fallout so far
18:43:36 <jwb> brunowolff, probably during the merge window in fedora git, but it might not get built until -rc1
18:43:40 <jwb> i think that'll be up to jforbes
18:44:30 <jforbes> Yes, depending on the quality of various points in the merge window it will move before rc1, but it will certainly move at rc1
18:44:50 <brunowolff> Thanks.
18:45:04 <davej> linux-next has at least lowered the number of compile failures we get pre -rc1
18:45:07 <jforbes> I am not going to spend too much time debugging build issues before rc1, but if it builds and boots locally I will push it
18:48:38 <jwb> any other questions from anyone?
18:49:00 <drago01> can we shut up alsa? ;)
18:49:13 <jwb> oh, you asked me about that last time, right?
18:49:17 <drago01> yeah
18:49:31 <jwb> ok, let me look at the logs for the last meeting and i'll email upstream about it
18:49:41 <drago01> ok thanks
18:49:54 <jwb> #action jwb to look at why alsa is so chatty
18:49:57 <davej> on the subject of alsa, we have a lot of sound related bugs that basically get no attention from us at all.
18:50:12 <jwb> davej, i'm not a meeting chair.  the above didn't work
18:50:17 <davej> I think we need to be more aggressive about pointing the alsa people at them
18:50:23 <davej> #action jwb to look at why alsa is so chatty
18:50:23 <jwb> yes, agreed
18:50:43 <davej> I've sort of been doing that more for the networking related bugs this last month
18:50:57 <davej> the netdev guys have been pretty responsive, and easy to deal with
18:51:11 <jwb> when we do point them at things, a lot of the time we get "load with model=<something>" and it works
18:51:18 <jwb> which leads to why it can't just figure that out
18:51:25 <jwb> and if we should be creating udev quirks
18:51:28 <davej> yeah, that is annoying.
18:51:36 <jwb> anyway, more stuff to ask them
18:52:20 <davej> perhaps going through and tagging all the sound bugs so we can present them as a list might be useful.
18:52:36 <davej> we talked about doing something like this generally before, but never really did it.
18:52:43 <davej> (using whiteboard)
18:53:09 <jwb> does mucking with a whiteboard that has an abrt hash in it break abrt?
18:53:21 <davej> ugh, I hope not
18:53:56 <jwb> if it does, we could use "alsa:" as the start of the subject
18:54:07 <davej> or use the keywords field
18:54:10 <jwb> s/subject/title
18:54:13 <jwb> keywords are pre-defined
18:54:17 <davej> ah, crap
18:54:22 <jwb> i don't think it lets you put arbitrary ones in there
18:54:29 <davej> yeah looks like you're right
18:54:41 <jwb> there's 'Devel Whiteboard'
18:54:44 <jwb> i have no idea what that is
18:54:46 <davej> would be nice if bugzilla had a subcomponent field
18:54:57 <jwb> yes.  it would
18:57:00 <jforbes> We could use subjects to make bugsearch more effective
18:57:22 <jwb> like the "alsa:" proposal in the title?
18:57:28 <jforbes> alsa: netdev: mm: etc
18:57:33 <jforbes> jwb: yeah
18:57:43 <jwb> it'd match how upstream does patches too.  ;)
18:58:05 <davej> yeah. also, I've been trimming some of them so that they line up better in the lists. (ie, removing 'kernel:' '[abrt]' etc so they all have a uniform pattern
18:58:19 <davej> it's made it easier to see dupes in some cases
18:58:26 <jwb> i'm good with using a subsystem subject
18:58:45 <davej> ok, let's give that a try.
18:58:58 <jwb> maybe some day we'll even have community triagers that can triage bugs and put the appropriate subject there
18:59:18 <davej> we live in hope
18:59:57 <davej> ok, let's call this done.
19:00:04 <jforbes> if we came up with a list of subject prepends we want, I can throw up a quick "kernel bug triage page" and throw it to the test list
19:00:26 <davej> jforbes: there's one already (linked off the main kernel page)
19:00:30 <davej> so maybe update that
19:00:34 <jforbes> Can do that
19:00:35 <davej> #endmeeting