infrastructure
LOGS
18:00:04 <nirik> #startmeeting Infrastructure (2014-08-21)
18:00:04 <zodbot> Meeting started Thu Aug 21 18:00:04 2014 UTC.  The chair is nirik. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:00:04 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
18:00:05 <nirik> #meetingname infrastructure
18:00:05 <nirik> #topic aloha
18:00:05 <nirik> #chair smooge relrod nirik abadger1999 lmacken dgilmore mdomsch threebean pingou puiterwijk
18:00:05 <zodbot> The meeting name has been set to 'infrastructure'
18:00:05 <zodbot> Current chairs: abadger1999 dgilmore lmacken mdomsch nirik pingou puiterwijk relrod smooge threebean
18:00:10 * threebean is here
18:00:14 * puiterwijk is arround
18:00:20 * relrod here
18:00:25 <mpduty> is here
18:00:29 * banas is here as well
18:00:44 * bochecha is here
18:01:04 * lmacken 
18:01:19 * adimania is here
18:01:51 * lanica is here for the infra meeting.
18:02:02 <nirik> hello everyone. ;) lets go ahead and get started...
18:02:05 <nirik> #topic New folks introductions and Apprentice tasks
18:02:14 <nirik> any new folks? or apprentices with questions/comments/ideas?
18:03:08 <mpduty> can there be an IRC class on ansible?
18:03:15 <banas> I can't remember if this is the slot I'm supposed to update - but work on GG is going good, as per schedule :)
18:03:17 <sart> is here in an un-italicized fashion
18:03:43 <nirik> mpduty: we had one a while back, we could do another if there's interest, sure. ;)
18:04:49 <banas> we had to skip two gg meetings because I was at Flock, and now the whole team is sick, so we just decided to do it next week.
18:04:52 <mpduty> nirik, well I don't know but some newcomers would be able to learn things faster I feel
18:05:08 <mpduty> including myself
18:05:20 * mirek-hm is late, but here
18:05:26 <nirik> mpduty: sure, we could ask on the list if folks are interested. I'm happy to do some intro one. We also had a nice intro at flock that was recorded... ;)
18:05:29 <sart> <-- long-time newcomer
18:05:35 * adimania thinks that it might be helpful to everyone
18:06:04 <nirik> https://www.youtube.com/watch?v=sCXCgsmQuSY&feature=youtu.be
18:06:18 * lbazan here
18:06:25 <mpduty> thanks I shall go through that
18:06:33 <nirik> welcome sart. ;)
18:07:06 <sart> thanks and ty for the intro vid
18:08:11 <nirik> ok, any other new folks or questions?
18:08:31 * pjones waves hello
18:08:48 <nirik> hey pjones. :)
18:09:04 <nirik> #topic Applications status / discussion
18:09:18 <nirik> pingou wasn't able to make it but had a few things for me to pass on:
18:09:28 <nirik> #info new pkgdb2 release in production this week.
18:09:50 <nirik> #info critpath lists are now working again.
18:10:22 <nirik> #info infrastructure jenkins is updated to the latest plugins, etc.
18:10:29 <nirik> also, tflink wanted to share:
18:10:46 <nirik> #info taskotron in stg/dev is going along fine, probibly won't be in production until after alpha now.
18:11:06 <nirik> any other applications news?
18:11:32 <threebean> #info we're almost at 10 million fedmsg messages all time
18:11:36 <threebean> https://apps.fedoraproject.org/datagrepper/
18:11:38 <threebean> :P
18:11:56 <banas> #info GlitterGallery release process planned to be started within this week
18:12:09 <nirik> threebean: nice. :)
18:12:13 <threebean> also, we started a scratch pad for discussing promoting FMN from an opt-in service to an opt-out service (for packagers)
18:12:15 <threebean> http://piratepad.net/FMN-opt-out
18:12:20 <lmacken> threebean: woot!
18:12:47 <lmacken> threebean: 10mil & 1TiB+ of traffic? :)
18:12:50 <nirik> do we need bodhi2 in place?
18:13:07 <threebean> optimizations pushed out last friday seem to have FMN keeping it's workload under control.
18:13:11 <threebean> .tiny https://admin.fedoraproject.org/collectd/bin/graph.cgi?hostname=notifs-backend01.phx2.fedoraproject.org;plugin=fedmsg;plugin_instance=hub;type=queue_length;type_instance=FMNConsumer_backlog;begin=-706400
18:13:12 <zodbot> threebean: http://tinyurl.com/le9xywt
18:13:35 <pjones> threebean: and *very* few of those are me trying to game badges
18:13:44 <threebean> pjones: almost none
18:13:58 <threebean> ;)
18:14:35 <lmacken> threebean: didn't the optimizations DoS FAS though?
18:15:22 <threebean> they did :/  (although only once at startup)
18:15:28 <nirik> we have been seeing fas servers complain the last few days...
18:15:36 <nirik> but I haven't had a chance to look at what might be causing it.
18:15:57 <threebean> many threads all trying to cache fas at the same time - fixed in git http://da.gd/BW60H
18:16:21 <threebean> nirik: I think only some of those are related to fmn, which should only be doing this at startup.
18:16:39 <threebean> I'd think the fas problems would have gone down due to the fedmsg-fasclient stuff pingou pushed out..  :/
18:16:54 <nirik> yeah. theres a lot less runs of that for sure.
18:17:06 <nirik> but something is still causing them to start swapping.
18:17:14 <nirik> more investigation needed
18:18:39 <nirik> we have also seen recently openvpn hitting cpu limits.
18:18:45 <nirik> might be related, not sure.
18:19:02 <lmacken> nirik: were you thinking that was related to outbound email?
18:19:18 <nirik> thought so at first, but I was looking at the wrong thing. ;)
18:19:20 <lmacken> i've been seeing a lot more fedmsg error spam, along with the usual batch of fedora-packages 200k mails :P
18:19:23 <lmacken> oh, okay.
18:19:36 <nirik> If you sniff the traffic on the openvpn tun device you only see traffic going to the node you are on.
18:19:50 <nirik> if you sniff the eth device you see all the vpn streams, etc.
18:20:05 <nirik> it looked last night like proxy02 was pushing a lot more than the others, but that might have just been at that time
18:20:33 <nirik> I'll keep trying to isolate it.
18:21:16 <nirik> Any other application news? :)
18:21:21 <lmacken> when do we freeze?
18:21:31 * threebean forgot about freeze
18:22:14 <nirik> when we have a viable test compose. ;)
18:22:16 <pjones> some time after dgilmore winds up having working trees for all the stuff.
18:22:23 <pjones> which is... taking a while.
18:22:24 <nirik> yeah... hopefully soon.
18:22:32 <nirik> so many bugs.
18:22:56 <pjones> nirik: looks like createImage is still busting on cloud-atomic?
18:23:03 <pjones> http://koji.fedoraproject.org/koji/taskinfo?taskID=7435960 <-- the screenshot is horrifying
18:23:11 <nirik> pjones: yeah, boggling. ;(
18:23:20 <pjones> but now I'm just interrupting
18:23:46 <nirik> we all like a good horror sideshow. ;)
18:24:00 <nirik> #topic Sysadmin status / discussion
18:24:06 <nirik> so, on the sysadmin side of things...
18:24:22 <nirik> #info retrace servers have been handed off to retrace folks. They are setting things up on them now.
18:24:32 <nirik> thanks to smooge for getting those all installed.
18:25:14 <nirik> There's the openvpn and fas hiccups we have been seeing, but we already mentioned those.
18:25:25 <lmacken> pjones: if only that used chained exceptions in py3, that screenshot might be a bit more useful :\
18:25:34 <nirik> We still have qa09 and virthost-comm03 to setup as new machines.
18:25:50 <pjones> lmacken: Eh, I think it's just saying the image the python loaded from is corrupt.  I doubt the exception is meaningful at all.
18:26:03 <lmacken> pjones: yeah, looks like an exception in the exception handler :\
18:27:37 <nirik> there's also a bit of discussion ongoing about netapp space (since we are getting very dangerously full on our koji storage)
18:27:47 <nirik> hopefully there will be some good news there sometime soon.
18:27:58 <threebean> hm, turns out that our staging and production environments weren't firewalled off from one another anymore after the switch to ansible.  that's fixed up now.
18:28:23 <nirik> oh yeah, thanks much for fixing that.
18:28:39 <threebean> np.  things might behave funny while it shakes out.
18:28:50 <nirik> I've so far not seen any fallout from that...
18:28:56 <nirik> but that doesn't mean we won't hit some
18:30:11 <nirik> #info memcached is now setup to restart on exit and also is monitored in nagios
18:30:35 <nirik> #info staging and prod are now once again blocked from talking to each other via firewall rules.
18:31:28 <nirik> #topic nagios/alerts recap
18:31:33 <nirik> this should be fun this week...
18:31:37 * nirik digs up link
18:32:11 <nirik> .tiny https://admin.fedoraproject.org/nagios/cgi-bin//summary.cgi?alerttypes=3&displaytype=3&eday=15&ehour=24&emin=0&emon=5&esec=0&eyear=2014&host=all&hostgroup=all&hoststates=3&limit=25
18:32:11 <zodbot> nirik: http://tinyurl.com/ms5u5qm
18:32:17 <nirik> I think thats right.
18:32:59 <nirik> no, not right.
18:33:30 <nirik> .tiny https://admin.fedoraproject.org/nagios/cgi-bin//summary.cgi?report=1&displaytype=3&timeperiod=last7days&smon=8&sday=1&syear=2014&shour=0&smin=0&ssec=0&emon=8&eday=21&eyear=2014&ehour=24&emin=0&esec=0&hostgroup=all&servicegroup=all&host=all&alerttypes=3&statetypes=3&hoststates=7&servicestates=120&limit=25
18:33:31 <zodbot> nirik: http://tinyurl.com/koaopnc
18:34:03 <nirik> almost all of those are due to the vpn issue.
18:34:15 <nirik> the server hits 100% cpu and starts dropping packets.
18:35:00 <nirik> #topic Upcoming Tasks/Items
18:35:00 <nirik> https://apps.fedoraproject.org/calendar/list/infrastructure/
18:35:12 <nirik> anything upcoming folks would like to schedule or note?
18:35:18 <nirik> the freeze is hopefully coming up soon.
18:36:19 <nirik> Also, we are in the early planning stages for another FAD in december...
18:36:45 <nirik> https://fedoraproject.org/wiki/FAD_MirrorManager2_FAS3_2014
18:36:54 <adimania> would remote participation be possible?
18:36:59 <nirik> absolutely.
18:37:41 <adimania> cool. Flying is very expensive in my part of the world :(
18:37:42 <nirik> it's very subject to change right now, just sounding out if it will be possible and useful.
18:37:56 <threebean> just to note:  i'm on some on-again, off-again vacation for the next two weeks. (I'll be around on tuesdays and thursdays)
18:38:10 <lmacken> yeah, there's a good chance that I won't be in Denver during that time
18:38:18 <nirik> threebean: cool.
18:38:24 <sart> I'm definitely interested if I'm not at work (or possibly if so but work is completely dead, which it may be that time of year... )
18:38:29 <lmacken> so a backup location is definitely needed
18:38:29 <nirik> lmacken: fun. :) So, we might want to pick another place or time
18:39:10 <nirik> yeah, we will come up with something. ;)
18:39:44 <nirik> Oh, I should mention that I am likely going to be traveling the latter part of september...
18:40:23 <nirik> #topic Open Floor
18:40:38 <nirik> anyone have anything for open floor? comments, suggestions, ideas, favorate cookies?
18:41:47 <lanica> mmm�cookies.
18:42:10 * adimania eats
18:42:16 <threebean> encrypted browser cookies?
18:42:16 * adimania nom nom nom
18:42:28 <nirik> chocolate chip. ;) Or molassass sugar. nom.
18:42:39 <sart> oatmeal butterscotch
18:43:46 <nirik> alrighty. If nothing else will close out in a minute... (to go find cookies)
18:44:11 <lanica> Good meeting�later all!
18:44:39 <nirik> Thanks for coming everyone. Lets all continue in #fedora-admin, #fedora-apps and #fedora-noc.
18:44:41 <nirik> #endmeeting