infrastructure-nextgen-tools
LOGS
20:00:23 <nirik> #startmeeting Infrastructure Next Generation tools Brainstorm (2015-05-06)
20:00:23 <zodbot> Meeting started Wed May  6 20:00:23 2015 UTC.  The chair is nirik. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:00:23 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
20:00:24 <nirik> #meetingname infrastructure-nextgen-tools
20:00:24 <nirik> #topic intro and background
20:00:24 <zodbot> The meeting name has been set to 'infrastructure-nextgen-tools'
20:00:37 <nirik> who all is around for a next gen infrastructure brainstorming session? ;)
20:00:59 * lsm5 is here
20:01:25 <nirik> hey lsm5. :)
20:01:31 <lsm5> nirik: hi :)
20:01:36 * nirik waits to see if some more folks that said they would attend can make it.
20:01:50 <smooge> hrte
20:01:50 * bparees is
20:01:54 * adimania is here
20:02:00 <nirik> in the mean time, I wrote up: https://fedoraproject.org/wiki/Infrastructure/Nextgen_Deployment
20:02:52 * maxamillion is here
20:04:38 * nirik will wait another min and then get going
20:05:35 <nirik> ok, lets go ahead then. :)
20:06:12 <nirik> so, basically this is a bit of a brainstorming session. We (fedora infrastructure) want to see how we might be able to use the new cool tools out there to deliver applications to our users.
20:06:27 <nirik> note that this is not about whats in fedora, whats made in fedora or how it's made.
20:06:33 <nirik> This is just about fedora infrastructure.
20:06:44 <nirik> The wiki page above has some background.
20:07:04 <nirik> and I started filling in things that we require and things that might be nice to have
20:07:22 <nirik> also some ideas for pilots
20:08:05 <lsm5> nirik: are we looking at setting up CI or something (fedora-wide or package-based or something)
20:08:06 <bparees> nirik: given that you've got docker/containers under next gen tools i'd suggest openshift should be on the list.
20:08:07 <maxamillion> running Docker + Atomic + Kubernetes is kind of interesting in that space, docker is pretty easy to get up and running with, kubernetes has a really steep leanring curve but there's a lot of momentum there so probably worth looking into at least
20:08:09 <nirik> There's Containers(docker, others?), atomic/ostree and even our cloud we aren't really using to any kind of full potential
20:08:46 <bparees> maxamillion: hopefully openshift flattens that curve a bit.
20:08:47 <nirik> lsm5: I don't think so, not for a first prototype anyhow.
20:08:50 <maxamillion> something else to note is that every conference I've ever been to, anyone running docker in any serious capacity runs their own docker registry instead of using the public hub for internal apps
20:08:57 <nirik> bparees: you mean running out own openshift?
20:09:01 <nirik> or using the existing rh one?
20:09:23 <bparees> nirik: well i'm referring to openshift v3.  not aware of the plans for the existing rh one.
20:09:26 <bparees> (which is v2)
20:09:31 <smooge> I would think that it would be our own
20:09:34 <maxamillion> bparees: that'd be awesome if it can be done
20:09:44 <nirik> thats possible, but might be more work than we have people to do.
20:09:51 <maxamillion> OpenShift V3 is built on top of kubernetes
20:09:52 <gmm_> Late, but here.
20:10:01 <bparees> maxamillion: RCM is running it for OSBS, so there's some precedent.
20:10:16 <nirik> I was hoping we could identify some small pilot applications and people interested in working on them. :)
20:10:45 <bparees> nirik: ok, didn't mean to sidetrack.  but openshift v3 should definitely be listed under next gen tools along with docker/containers.
20:10:50 <smooge> well here is the issue.. with any of these things... we need to know how much hardware is needed for it
20:10:52 <lsm5> nirik: also, another suggestion was including docker in the default mock environment
20:11:10 <adimania> I have some experience around Docker + Atomic + Kubernetes and can help in a project around that.
20:11:10 <smooge> bparees, we are usually looking at what we can setup/run on 1-3 machines
20:11:16 <maxamillion> bparees: yeah, I've filed a number of bugs against OSBS ... like 3 out of 5 things work
20:11:20 <nirik> There is a dizzying array of tools out there now... without a practical thing we are going to implement I don't care at all about the tools.
20:11:26 <nirik> lsm5: this is not about changing anything in fedora.
20:11:33 <lsm5> nirik: ah ok
20:11:34 <lsm5> my bad
20:11:37 <nirik> this is only fedora infrastructure. How we handle and deploy apps
20:11:38 <maxamillion> bparees: well, "a number" isn't many because I can't get all that far with it yet .... but we can take all that offline
20:11:53 <bparees> maxamillion: sure :)  i don't know how much of that is openshift issues vs osbs itself.
20:11:58 <nirik> bparees: feel free to add it. ;)
20:12:10 <maxamillion> bparees: you're guess is likely better than mine
20:12:40 <nirik> I really think working back from a prototype could help us...
20:12:47 <adimania> probably we can start off with a container running trac. I _think_ that we run a lot of trac instances. Maybe docker can help there.
20:12:54 <maxamillion> bparees: my biggest fear about openshift hiding too much of kubernetes is that infra team members as admins will need to know what to do if k8s blows up
20:12:59 <bparees> nirik: I don't have a FAS account.  signing up now.
20:13:04 <nirik> adimania: ah, thats a great suggestion. ;) please add to wiki?
20:13:13 <adimania> sure.
20:13:30 <docent> Remember guys that besides Docker there is also pure LXC or even systemd / nspawn w/machinectl (actualy developed by Lennart Poettering currently). The latter one is so integrated into systemd that I wouldn't be surprized if it would overthrow Docker in 1,2 years
20:13:31 <nirik> bparees: for shame. ;) but thanks.
20:13:53 <maxamillion> nirik: when you talk about prototype, do you have any scoping on what you'd like that to be? functionality or some kind of metric by which suggested solutions should/could be measured/graded?
20:14:33 <maxamillion> docent: there's also rkt which is basically just a container spec implementation using nspawn
20:14:37 <nirik> maxamillion: good question. I did lay out some must haves on the wiki page and some nice to haves.. but I guess otherwise it would be the normal stuff.
20:14:45 <nirik> does it work? can it be maintained, etc.
20:15:01 <maxamillion> nirik: as a newbie to the group I legitimately don't know what you mean by "the normal stuff"
20:15:36 <smooge> ok so here is normal
20:16:10 <nirik> maxamillion: right now we have a long process to bring an app from "you should make this app" to 'in production'
20:16:12 <smooge> 1) The hardware is a 1 G network in a central location or a bunch of lowend loaner systems at some colo
20:17:02 <smooge> 2) we run RHEL as the base operating system for most production items. If a container technology needs to run on something not supported by RHEL it is a harder problem for us to deploy
20:17:31 <nirik> some of the process is social, and I don't think that will change with any tool changes. But some of it like strict requirement for packaging in epel or the like might changable.
20:17:48 <nirik> smooge: yeah, we should add rhel7 to requirements.
20:17:54 <smooge> 3) .... what nirik just said
20:17:55 <nirik> I'm assuming we want to run things on our existing hw.
20:18:00 <docent> So LXC is out as support for this won't be maintained
20:18:14 <smooge> correct
20:18:26 <nirik> the requirements are also different between a dev instance, a staging instance and a production instance.
20:19:17 <smooge> the main reason on this is that we will be running this for multiple years and volunteers working on things usually only do so for 3-6 months before other life things come up :)
20:19:20 <maxamillion> and I'm not sure the systemd version in RHEL7 is new enough to have the 'machinectl pull-dkr' hotness in it
20:19:31 <maxamillion> so that might be out as well
20:19:33 <docent> but regarding rhe / fedora we can safely assume that docker or systemd-nspawn are ok
20:19:39 <docent> maxamillion: +1 ;)
20:21:34 * nirik is having network weirdness here. sorry.
20:22:59 <nirik> ok, back.
20:23:01 <smooge> so for these container and other items.. what kind of hardware and storage requirements usually are needed?
20:23:15 <nirik> smooge: I'm sure it depends vastly on what you are containing.
20:23:47 <nirik> I like the idea of contained trac's... might be worth doing.
20:23:53 <smooge> I am asking because one of the container over-techs tehcnologies was requiring an isci at 10+G
20:24:33 <nirik> that sounds odd.
20:24:34 <smooge> but you didn't know that until you had dived in deep because the group just assumed everyone would have 10G networks these days
20:24:56 * adimania has added the trac idea to the wiki
20:24:58 <smooge> it was so that front ends could move containers between systems to keep load steady in the fabric
20:25:25 <maxamillion> smooge: whoa ... what requires all that?
20:25:37 <nirik> so, what else would be good container fodder?
20:26:15 <docent> smooge: we use lxc & docker on 1G network for whole CI stack; it really depends on environment and your needs; I'd rather assume that any backend actions (contaienrs migrations - but for what purpose??, pulling and pushing images) could generate some heavy traffic on the network
20:26:22 <maxamillion> the Atomic+Kubernetes+Docker stuff can be run in a VM on your laptop and storage is pretty standard LVM or NFS stuff for the most part (there's some gluster and ceph stuff being worked on iirc, no idea how far along)
20:26:37 <nirik> and I guess the other 2 things to think about: what would be good to use ostree/atomic with? just as a docker container host? or is there something more we could do there?
20:27:12 <nirik> and 2: we now have a somewhat modern cloud going into production. What nice things can we do with it?
20:27:35 <maxamillion> kubernetes :)
20:27:56 <docent> also for HW reqs - using FS like overlay can save a lot of iops, disk space and memory (overlay uses page cache sharing)
20:28:14 <nirik> docent: but thats not available in rhel7 is it?
20:28:16 <nirik> or is it?
20:28:22 <maxamillion> (not trying to sound like a broken record, but kubernetes just seems to show up in my news feeds and mailing lists way too often to be something that doesn't get evaluated for consideration... also RHEL7 Atomic officially supports it, so there's that)
20:28:46 <docent> overlay will be in 7.2
20:28:58 <docent> it's in 3.18 already and now only waits for selinux policies
20:29:22 <maxamillion> isn't there a bug with overlay that will hose your rpmdb?
20:29:36 <nirik> maxamillion: so, thats a management layer on top right? /me has not played with it.
20:29:43 <docent> maxamillion: don't heard and didn't hit it
20:29:54 <adimania> random idea: how about using a container with compose set in it to churn out rpm-ostree, atomic images. kinda like koji to build your own custom atomic image. too insane?
20:30:10 <maxamillion> nirik: yeah, it basically handles the orchestration and scale-out of applications composed of one or many docker containers (I think they also support rkt)
20:30:12 <docent> maxamillion: is't basically union filesystem; you can union filesystems or even directories together
20:30:23 <maxamillion> docent: right, I'm familiar with it
20:30:24 <smooge> it looks to be a management layer when you have multiple of the same instance. so I could see it useful for builders.. not sure about our 1 offs
20:30:34 <nirik> adimania: probibly for a first cut, but might not be down the road.
20:30:44 <docent> maxamillion: this is a nice reading about it: http://developerblog.redhat.com/2014/09/30/overview-storage-scalability-docker/
20:30:52 <maxamillion> docent: but it had something to do with the way the file lock from the lower was held vs the lock on the upper ... I'll have to see if I can find the BZ
20:30:53 <nirik> adimania: kinda like the web revisor of old (give it a kickstart and it makes you a live cd)
20:31:15 <adimania> yes! revisor. wow! really forgot about that.
20:31:19 <docent> maxamillion: sure, plz pm me if found that - it can bite me
20:31:40 <maxamillion> docent: https://github.com/docker/docker/issues/10180
20:32:29 <nirik> so, do any of the suggested prototypes appeal to people more than others?
20:32:40 <threebean> are there any apps we have for which packaging into rpms is a pain?  if so, it seems like those would be good fits for containers
20:32:56 <threebean> so we can shift dep chains ahead or behind the host stack.
20:33:00 <nirik> threebean: yeah, perhaps. None that we have deployed yet... (because we require packaging). ;)
20:33:06 <threebean> iirc, askbot might fit that category?
20:33:22 <nirik> well, not currently. it might sometime yeah...
20:33:26 * threebean nods
20:33:45 * adimania waves to threebean.
20:33:49 <threebean> hi hi :)
20:33:56 <adimania> hi :)
20:34:24 <tflink> buildbot on el6 did but that's less relevant now
20:34:45 <nirik> I think one other nice application might be transitory things...
20:35:03 <tflink> it's possible that the same issue may manifest with newer versions of buildbot on el7 but I've not been following upstream closely enough to know what their plans are for twisted version requirements
20:35:32 <nirik> ie, "hey we are going to work on this, we need an etherpad. Give me an etherpad" <etherpad container created> "ok, done" <saves all content to git and publishes it and destroyes itself"
20:35:45 <threebean> cool idea.
20:35:47 <adimania> +1
20:36:19 <nirik> or even tied into a bunch of stuff... 'make me a FAD pack'
20:36:59 <threebean> is there any big Big Win to converting the way we deploy all our apps?
20:37:51 * nirik is having trouble seeing any big win to existing apps or apps that are easily packaged.
20:38:14 <threebean> I'm interested in seeing a continuous deployment setup evolve for our infra (to cut down on developer time).  if containers can make that easier, that would be cool.  but it seems like we can just do that with our existing tool chain or rpms and vms.
20:38:16 <nirik> but happy to be proven wrong. ;)
20:38:19 <threebean> s/or/of/
20:38:26 <adimania> I think one of the win would be that we may be able to pack a lot of apps on less hardware and move them around when they grow bigger.
20:38:59 <nirik> adimania: well, perhaps, but we used to have all our apps on less hardware and it was horrible
20:39:02 <smooge> adimania, most of our apps are already in the grow bigger state :)
20:39:09 <nirik> and we arent really hurting for hw
20:40:07 <threebean> to clarify on it being horrible -> when there were issues, it became difficult to discern which app was causing the problem on one hand and which apps were suffering because of the other app's problem.
20:40:13 <pingou> nirik: atomic/os-tree for the koji builders?
20:40:18 <threebean> (when they were all stuff on the same app* boxes)
20:40:29 <adimania> nirik, I am not sure why it was bad but if the problem was the way we packed the apps then it might be solve'able with the containers.
20:40:49 <nirik> pingou: out of scope I think currently. Koji 2.0 might make that possible I would hope.
20:41:10 <adimania> pingou, are you suggesting using container instead of mock? I'd like that.
20:41:17 <pingou> adimania: nope
20:41:26 <pingou> nirik: I was thinking, instead of the plain Fedora we're using now
20:41:33 <pingou> we could use the atomic/os-tree images
20:41:48 <pingou> and cycle the update/reboot as we do now
20:42:06 <nirik> pingou: so each builder is it's own atomic/ostree fedora?
20:42:19 <nirik> that might indeed be possible now...
20:43:08 <smooge> adimania, the problems are usually the apps need a lot of CPU, memory, disk usage at the same time as the other apps needed CPU, memory and diskusage.
20:43:09 <nirik> adimania: it would surely be better with containers, but when we can just have them be in seperate vm's that seems like a stronger seperation and works fine...
20:43:21 <pingou> nirik: ostree is another way of distributing packages no? so the builders could just be running ostree which might also increase security a little more as someone escaping the chroot would end up in a almost completely r/o environment
20:43:36 <nirik> pingou: true. ;) can you add it to the wiki page?
20:43:41 <pingou> nirik: will do
20:43:58 <docent> threebean: it's not about packaging; w/containers you don't have the problem of putting couple of apps inside on OS (because each of container is a separeted OS let's say) - that's about logical isolation. Also having apps inside containers gives you easy possibility to tune it's performance (e.g. cgroups and for uick analysis sysdig w/containers cheesels). Lastly - running containers over overlay will save a lot of space and memory. ...
20:44:05 <docent> ... That are big wins imo whic can't be achieved with "only" packaging
20:45:05 <nirik> so most of our applications right now are on vm's... and there's usually 2 vm's for production and 1 for stg.
20:45:39 <nirik> so, while I agree containers could save space and such, we really aren't facebook or google with 10,000 instances. ;)
20:46:09 <nirik> but yeah, we might get there someday. I just want to do some pilot type things to figure out what works.
20:46:57 <nirik> so we have talked containers and atomic a bit... any other cool ideas for how to use out cloud?
20:47:08 <nirik> we have copr and openshift was suggested...
20:47:51 <nirik> (or is that a conversation killer :)
20:47:57 <maxamillion> ha
20:48:33 <nirik> clearly we could run containers and atomic stuff in the cloud as well of course.
20:49:13 <nirik> how hard would openshift v3 be to deploy?
20:49:29 <smooge> .... still waiting for us to deploy a cloud past folsom....
20:49:34 <maxamillion> I don't know that it's a conversation killer but I think containers are definitely the hot topic right now, maybe mesos if there's any realistic use case for that inside Fedora Infra space? .... Atomic is just kind of a nice fit for containers so it more or less "goes with"
20:49:53 <nirik> smooge: we are pretty much go on the new cloud. Will fix that network issue with the next one. ;)
20:50:23 <maxamillion> nirik: it's pretty simple, but it's heavily under development so there's no real upgrade right now, you basically just destroy and re-deploy --> https://github.com/openshift/openshift-ansible
20:50:43 <nirik> oh, it's in ansible?
20:50:48 <maxamillion> nirik: in the OpenShift Online environment we could destroy and re-deploy in about 15 minutes
20:50:51 <maxamillion> nirik: yeah :)
20:51:12 <maxamillion> nirik: there's a wrapper util for ease of use, but it's all ansible
20:51:19 <nirik> so, that deploys to the hw in the cloud? or on vms in the cloud?
20:51:28 <maxamillion> nirik: VMs
20:51:55 <nirik> very nice. might have to give that a go... also could handle some cases I was thinking of... like wordpress instances, etc.
20:52:11 <maxamillion> nirik: absolutely
20:52:56 <adimania> nirik, do we use wordpress anywhere in fedora infra?
20:53:14 <pingou> fedora magazine does
20:53:18 <pingou> but isn't hosted by us
20:53:18 <nirik> adimania: nope. we used to have one a long time ago...
20:53:38 <nirik> There are several in openshift related to fedora...
20:53:48 <nirik> flocktofedora, the board, fedora magazine.
20:54:35 <nirik> I'm trying to think of any other places (like trac) where we have a number of them deployed.
20:54:51 <pingou> mirrorlist?
20:55:01 * pingou runs
20:55:16 <nirik> mirrorlists could be containerized perhaps.
20:55:20 <adimania> okay. so do you want a wordpress container or a VM or may be an ansible module? I think all of these are already done but I don't mid doing it or helping someone do it.
20:55:23 <nirik> it's pretty simple app
20:55:38 <adimania> s/mid/mind/
20:55:59 <nirik> adimania: not sure at this point. :) openshift might be a good option if we can deploy that...
20:56:07 <nirik> So, we are running low on time. ;(
20:56:16 <mizdebsk> re cloud, it would be nice to be able to dynamically allocate resources to services (like copr, jenkins, beaker, or even koji) as needed - if the service is running short of resources, dynamically allocate more VMs from cloud
20:56:23 <nirik> Is there any of the prototypes we want to try and work on? or ponder some more on list and see...
20:56:51 <nirik> mizdebsk: well, koji has 0 concept of doing that sadly. :( I sure hope koji 2.0 does...
20:57:01 <nirik> copr we can indeed give more to easily.
20:57:12 <nirik> we just didn't in the old cloud because it was so...
20:57:17 <nirik> broken
20:57:25 <smooge> nicely put
20:57:37 <tflink> not sure how well that would work with beaker - haven't gotten into that part but I think they support openstack
20:57:53 <nirik> I'd love to have a better way to get cloud instances to fedora contributors... hopefully we will have a better way to do that before too long.
20:57:57 * tflink suspects that there may be problems with firewalls if we continue setting stuff up the same way
20:58:13 <tflink> last comment was about beaker specifically, don't know about other stuff
20:58:19 <nirik> tflink: that could be an option. I think beaker lets you 'check out' resources?
20:58:41 <tflink> yeah, we've talked about using beaker to get secarch machines/vms to folks
20:59:02 <tflink> you can provision a vm with a given distro and check it out for X hours
20:59:21 <tflink> s/secarch/non-x86
20:59:29 <nirik> thats nice. I'd like to make sure and reap instances that arent' being used anymore...
20:59:37 <nirik> the time limit is good.
21:00:02 <tflink> I suspect that openstack instances are destroyed when released but I've not gotten into that part of the featureset
21:00:23 <nirik> ok, I guess lets close out here and I will post to the list a summary and the wiki page and we can continue discussing what prototypes we want to persue there?
21:00:28 <mizdebsk> with beaker you get time slice, but can extend it at any time, or return machine back to beaker
21:00:30 <nirik> I'd hope so
21:00:39 <nirik> mizdebsk: yeah, sounds reasonable.
21:00:42 <tflink> fwiw, we don't have FAS integration for beaker yet - it's possible but just haven't spent the time to get it done yet
21:00:58 <tflink> er, it should be possible
21:01:06 <nirik> tflink: I think puiterwijk actually whipped up a patch, but I don't know if it's complete yet. Or was that phabricator?
21:01:32 <tflink> that was buildbot, I think
21:01:37 <nirik> ah.
21:01:49 <tflink> phabricator uses the persona provider
21:01:59 <nirik> ok, anyhow. Thank you all for coming. Some good ideas bounced around and hopefully we can move forward soon. :)
21:02:03 <nirik> #endmeeting