cockpit_weekly_meeting_2016-12-05
LOGS
14:03:41 <andreasn> #startmeeting Cockpit weekly meeting 2016-12-05
14:03:41 <zodbot> Meeting started Mon Dec  5 14:03:41 2016 UTC.  The chair is andreasn. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:03:41 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
14:03:41 <zodbot> The meeting name has been set to 'cockpit_weekly_meeting_2016-12-05'
14:03:44 <andreasn> .hello andreasn
14:03:45 <zodbot> andreasn: andreasn 'Andreas Nilsson' <anilsson@redhat.com>
14:04:10 <dperpeet> .hello dperpeet
14:04:11 <zodbot> dperpeet: dperpeet 'None' <dperpeet@redhat.com>
14:05:04 <andreasn> #topic Agenda
14:05:43 <mvollmer> * Network checkpoints status
14:06:10 <andreasn> * NFS Server
14:07:13 <andreasn> maybe that's it. Ok, lets run with that
14:07:21 <andreasn> #topic Network checkpoints status
14:07:33 <mvollmer> okay
14:07:45 <mvollmer> so we had checkpoints for some time now
14:07:58 <mvollmer> and people are running into the "edge cases"
14:08:08 <andreasn> what kind of edge cases?
14:08:09 <mvollmer> which are of course some people's main case
14:08:31 <mvollmer> such as making changes that take longer than we allow
14:08:45 <andreasn> when does that happen?
14:08:46 <mvollmer> cockpit gives up after 15 seconds
14:08:59 <mvollmer> the one real case I have seen is where DHCP takes 40 seconds
14:09:06 <andreasn> ah, I see
14:09:07 <dperpeet> in retrospect, that does seem like a tight window for some cases
14:09:30 <mvollmer> yes, we should look at tuning the timeouts
14:09:48 <dperpeet> I would like to discuss a usability aspect of this
14:09:53 <mvollmer> what I have been looking at first is to have a path through the UI that works for any change, no matter how slow
14:09:55 <andreasn> so at 15 seconds it times out and rolls back, just because the DHCP is slow?
14:09:57 <dperpeet> but let me know when you have talked about what you wanted to say first
14:10:16 <dperpeet> andreasn, correct
14:10:25 <mvollmer> it's a tradeoff
14:10:51 <mvollmer> but since the UI is pretty clear about is going on during a checkpoint
14:11:01 <mvollmer> ("Testing connection")
14:11:18 <mvollmer> we can make that time longer without confusing people too much
14:11:39 <andreasn> is one minute a resonable time?
14:11:41 <mvollmer> and if people give up at that point and reload or just go away
14:11:46 <mvollmer> that's harmless
14:11:58 <mvollmer> since the rollback happens in any case
14:12:13 <mvollmer> andreasn, I have no idea
14:12:26 <dperpeet> well, that leads into my thought: can we trigger a rollback early?
14:12:26 <mvollmer> people are talking about DHCP taking several minutes
14:12:47 <dperpeet> i.e. more like an "undo"
14:13:01 <mvollmer> dperpeet, no, we are disconnected at that point
14:13:19 <dperpeet> how about we allow overriding the rollback
14:13:25 <dperpeet> but only make that work on purpose
14:13:27 <mvollmer> let me describe one more thing
14:13:30 <dperpeet> ok
14:13:31 <dperpeet> :)
14:13:52 <andreasn> but is the option between showing "testing connection" for everyone for 5 minutes, and having it fail for those with slow DHCP servers?
14:14:06 <mvollmer> so, one idea is to let the rollback happen, and then the user gets an opportunity to make the change anyway
14:14:18 <mvollmer> that was in the original design
14:14:39 <mvollmer> andreasn, yes
14:14:41 <dperpeet> mvollmer, that sounds good
14:14:48 <dperpeet> combined with a reasonable timeout on the first attempt
14:14:54 <dperpeet> I don't think anything >30 seconds makes sense there
14:15:16 <dperpeet> I seriously consider that my connection has died after 10-15 seconds
14:15:24 <dperpeet> I don't think I would wait for a minute
14:15:41 <mvollmer> except if you know that your DHCP is broken and is slow
14:15:49 <dperpeet> right
14:15:56 <dperpeet> but then I could have it fail
14:16:01 <dperpeet> double check my dhcp
14:16:10 <dperpeet> and tell cockpit to go without a rollback
14:16:13 <dperpeet> or specify a custom timeout
14:16:16 <dperpeet> on that second try
14:16:26 <dperpeet> the "try anyway" could have an input for a timeout
14:16:26 <mvollmer> uhh
14:17:12 <mvollmer> dperpeet, good idea, but that's appraoching a spaceship cockpit, no?
14:17:21 <dperpeet> hm
14:17:31 <mvollmer> one more thing:
14:17:58 <mvollmer> if a rollback is really slow, cockpit used to timeout, and you wouldn't get a second try
14:18:00 <andreasn> setting the timeout time feels very fiddly, because I assume there is no good way to measure the speed of the DHCP-server
14:18:17 <dperpeet> I believe that valid workflows should have precedence
14:18:26 <dperpeet> so if you do the right thing, even if it's slow, it should work
14:18:47 <dperpeet> on the other hand, you could say that preserving the connection is also pretty important
14:18:57 <mvollmer> we could just increase the rollback timeout with every try
14:19:00 <dperpeet> because otherwise you might not be able to "get back in"
14:19:04 <mvollmer> so first try, 15 seconds
14:19:13 <mvollmer> if the user presses "Do it anyway"
14:19:14 <dperpeet> but that runs into cockpit disconnect, right?
14:19:20 <mvollmer> we use a timeout of 90 seconds
14:19:39 <andreasn> could you have the system just increase the time between several tests by itself?
14:19:55 <dperpeet> andreasn, I don't think that's a good idea
14:20:01 <andreasn> it starts by testing for 15 seconds. Realizes things doesn't work, starts a 30 sec test etc.
14:20:05 <dperpeet> usually it's probably actually a wrong setting
14:20:10 <dperpeet> you'd just be disconnected for ages
14:20:13 <andreasn> right
14:20:15 <mvollmer> andreasn, it's not the time between testing the connection, but before rolling back the change
14:20:22 <andreasn> ah, I see
14:20:27 <andreasn> sorry for the confusion
14:20:40 <dperpeet> mvollmer, I like increasing on the second try
14:20:47 <dperpeet> and just give up if 90 seconds don't work
14:20:58 <dperpeet> or keep increasing, but display that time
14:21:01 <dperpeet> so the user knows what to expect
14:21:13 <mvollmer> we could have three tries: one with 15 seconds
14:21:29 <mvollmer> fails -> "This didn't work, would you like to try again and wait a bit longer?"
14:21:48 <mvollmer> also fails -> "This didn't work, would you like to do it without any timeout"
14:22:23 <andreasn> "Test again"
14:22:49 <andreasn> so that would make it a 3rd button?
14:24:27 <mvollmer> no, a second dialog
14:24:35 <mvollmer> with a slightly different wording
14:24:57 <mvollmer> the one we have says "This will disconnect you"
14:25:15 <mvollmer> the new one would say "This looks like it might disconnect you"
14:25:28 <mvollmer> not sure if this is worth it
14:25:37 <andreasn> it's worth a shot
14:25:55 <dperpeet> the difference should be obvious
14:25:59 <andreasn> but how does the system know what dialog to trigger?
14:26:01 <mvollmer> let's recap why we don't just increase the timeout to 5 minutes
14:26:27 <mvollmer> andreasn, they would always come in order, first the weak one, then the hard one
14:26:41 <andreasn> but how do you trigger the hard one?
14:26:41 <dperpeet> if something is obviously broken, we don't want to wait for a long time to have the system roll back
14:26:51 <mvollmer> we want people to know that there is a good chance that the connection comes back
14:27:03 <dperpeet> mvollmer, my comment was for the recap
14:27:22 <mvollmer> dperpeet, can you repeat?
14:27:32 <dperpeet> if something is obviously broken, we don't want to wait for a long time to have the system roll back
14:27:46 <mvollmer> and why not? :-)
14:27:51 <dperpeet> therefore we shouldn't have a very long timeout to begin with
14:28:06 <mvollmer> because people would panic and start driving to the datacenter
14:28:13 <dperpeet> why should I have to wait minutes if I hit a wrong button
14:28:25 <mvollmer> as punishment?
14:28:38 <dperpeet> ...
14:28:44 <dperpeet> I'll let andreas answer that one
14:28:49 <dperpeet> why we don't want to punish users
14:28:53 <mvollmer> this should be rare, and if you really switch off the wrong interface, it doesn't matter much to wait a few minutes
14:29:00 <dperpeet> I think it does
14:29:06 <mvollmer> if you know what is going on
14:29:10 <dperpeet> I expect a web ui to be responsive
14:29:15 <andreasn> so five minutes is a long time to wait just because you hit the wrong button
14:29:28 <andreasn> in every single case
14:29:38 <mvollmer> what about 90 seconds?
14:30:04 <andreasn> the whole interaction would feel long, unresponsive, it would give the feeling that the server is annoying and is working against you
14:30:09 * mvollmer is devils advocate
14:30:23 <dperpeet> I don't think anything >30 seconds is good for the first try
14:30:39 <mvollmer> so we would lose the "this is awesome" feeling
14:30:45 <andreasn> yes
14:30:56 <mvollmer> okay, I am happy to hear that
14:31:05 <mvollmer> i agree, of course
14:31:23 <andreasn> is there an issue open about this?
14:31:47 <mvollmer> yes and no
14:32:03 <mvollmer> people have trouble with the 15 second timeout
14:32:18 <mvollmer> and they are asking for a 90 second timeout
14:32:39 <andreasn> that's a minute and a half, right?
14:32:43 <andreasn> hm
14:32:44 <dperpeet> mvollmer, mod_proxy proxy workers have a default disconnect timeout of 30 seconds, fyi
14:33:19 <dperpeet> mvollmer, and the apache server I think defaults to 60 seconds
14:33:19 <mvollmer> I am offering this: https://github.com/cockpit-project/cockpit/pull/5472
14:33:35 <mvollmer> NetworkManager times out DHCP after 45 seconds
14:34:20 <mvollmer> so, this is confusing (or I make it confusing)
14:34:28 <mvollmer> thanks for the feedback on the timout tuning
14:34:44 <mvollmer> let's take the rest off-line if there are more question
14:34:46 <mvollmer> okay?
14:34:48 <dperpeet> agreed
14:34:51 <andreasn> sounds good
14:34:59 <andreasn> tricky questions, but good to discuss
14:35:00 <mvollmer> one more thing
14:35:30 <mvollmer> checkpoints have some bugs that make it look as if Cockpit simply misconfigures everything
14:35:40 <mvollmer> so the rollback is not perfect
14:36:05 <mvollmer> so I'll switch checkpoints off for complicated things like creating bonds
14:36:16 <dperpeet> yes, I think that's good for the time being
14:36:45 <dperpeet> although we need to consider how we make users aware which action is rollback protected and which ones aren't
14:36:59 <dperpeet> or at least make sure we don't make it sound like everything will be rolled back
14:36:59 <mvollmer> do we?
14:37:39 <mvollmer> there is no indication of checkpoints/rollback in the UI until they actually hit
14:38:26 <mvollmer> or do you think that people will get used to having their asses saved that they get careless?
14:38:27 <dperpeet> it's ok right noiw
14:38:29 <dperpeet> now
14:38:39 <dperpeet> we just have to watch release note wording
14:38:48 <mvollmer> alright, yes.
14:39:00 <dperpeet> what we wrote so far works
14:39:07 <dperpeet> I'm just saying to keep it in mind
14:39:09 <andreasn> it's pretty cool
14:39:15 <dperpeet> yup
14:40:22 <mvollmer> topic timeout?
14:40:37 <andreasn> hahaha
14:40:39 <andreasn> yes
14:40:51 <andreasn> #topic NFS Server
14:41:25 <github> [cockpit] mvollmer opened pull request #5554: test: Fix race in check-storage-luks (master...storaged-fix-luks-password-race) https://git.io/v18Xd
14:41:39 <andreasn> so based on the work made by dperpeet, sgallagh and others in the Fedora Server Group, I started working on the Cockpit part of it
14:41:41 <andreasn> https://docs.google.com/document/d/1jLyKsECdHdlKltmHGgf_-iOKj-hj4Qjbh5Zgm7a-eMc/edit
14:41:56 <andreasn> https://github.com/cockpit-project/cockpit/wiki/Feature:-NFS-Server
14:42:08 <andreasn> mostly collecting prior art right now
14:43:09 <andreasn> if anyone know of any good NFS UIs, feel free to add them to that page
14:43:24 <andreasn> going to distill the requirements into stories next
14:43:33 <andreasn> and I think that was it on that
14:43:53 <larsu> haha, "good nfs uis"
14:44:13 <dperpeet> :)
14:44:18 <andreasn> file sharing UIs is a better description maybe
14:44:31 <andreasn> there are some NAS ones that are pretty all right
14:44:40 <larsu> right
14:45:08 <larsu> just making a little stab towards the complexity of nfs :)
14:45:12 <dperpeet> heh
14:45:26 <dperpeet> I hope we can make it work without using the wizard pattern
14:45:35 <larsu> haha
14:45:36 <andreasn> me too
14:45:44 <andreasn> oh yes, nfs specs, ugh :)
14:46:46 <dperpeet> andreasn, do you think we can make this work in iterations?
14:46:56 <andreasn> probably
14:47:00 <dperpeet> or do you want to have a pretty good overall picture early on
14:47:15 <andreasn> overall is good to have from a design perspective
14:47:26 <andreasn> but implementation can happen in steps
14:47:29 <dperpeet> yeah, I agree - we shouldn't miss any big stuff
14:47:51 <dperpeet> you can probably ping the server list pretty early with a first iteration
14:47:58 <dperpeet> so we can run it by everyone
14:48:04 <dperpeet> and see if we missed anything obvious
14:48:11 <andreasn> sounds good
14:48:51 <andreasn> all right, I think that's it for that
14:48:55 <andreasn> #topic Open floor
14:49:10 <mvollmer> tomorrow is holiday in Finland
14:49:21 <mvollmer> 99th birthday of the country
14:50:34 <andreasn> nice
14:50:40 <andreasn> happy birthday, Finland
14:51:36 <andreasn> all right, I guess that's all
14:51:49 <andreasn> thanks everyone!
14:51:53 <dperpeet> thanks!
14:52:52 <andreasn> #endmeeting