15:02:52 <tstclair> #startmeeting 15:02:52 <zodbot> Meeting started Thu Dec 5 15:02:52 2013 UTC. The chair is tstclair. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:02:52 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic. 15:03:04 <tstclair> #meetingname BIG DATA SIG 15:03:04 <zodbot> The meeting name has been set to 'big_data_sig' 15:03:12 <mattf> howdy 15:03:37 <tstclair> #meetingtopic SIG_PACKAGING_STATUS 15:04:02 <tstclair> #topic updated status 15:05:01 <tstclair> Morning folks, figured we could do a quick run through on current status, blockers, etc. Then maybe get a small core-dump from willb on anything cool from spark-summit 15:05:23 <tstclair> Or noteworthy items we should be watching. 15:05:34 <mattf> sounds good. i hear there are some xmvn macro issues too, but pete isn't around atm 15:06:18 <tstclair> rsquared, ^ do you have insight? My xmvn issue had been fixeds. 15:06:31 <tstclair> er fixed. 15:06:54 <mizdebsk> for any xmvn issues please ping me, i usually fix them quickly 15:07:04 <rsquared> There is a supposed fixed for the mvn_file macro issue I was having but I haven't verified it. I'm not sure it's in a build yet. 15:07:30 <mizdebsk> rsquared: it's not yet in rawhide, upstream only for now 15:07:38 <tstclair> #topic issues 15:07:57 <mattf> mizdebsk, is there a communication channel we can hook in to hear about changes before they hit? or possibly have a test rebuild of hadoop before pushing xmvn changes? 15:08:10 <rsquared> mizdebsk: That's what I thought. 15:08:18 <tstclair> +1 mattf 15:08:43 <tstclair> I've been having issues with mock on f20, but there is a workaround for now. 15:09:09 <mattf> tstclair, what's the workaround? 15:09:11 <mizdebsk> mattf: latest xmvn rpms are always available at http://jenkins.cloud.fedoraproject.org/job/xmvn/ws/RPM/latest/ 15:09:32 <mizdebsk> you can subscribe to java-sig-commits for all git commits from xmvn 15:09:43 <mizdebsk> or you can watch it on github 15:09:52 <tstclair> mattf, downgrade mock 15:09:57 <tstclair> https://bugzilla.redhat.com/show_bug.cgi?id=1028352 15:09:58 <mattf> tstclair, oof, ok 15:10:31 <mattf> mizdebsk, is there a low traffic/summary way to get a heads up on what's about to land in xmvn? 15:11:20 <mizdebsk> mattf: besides TODO file, not really 15:11:29 * tstclair wonders if his filters bundle java-sig traffic w/fedora-devel 15:11:52 <mizdebsk> but after every release i send release notes to java-devel which is a low-traffic list 15:11:59 <mattf> https://github.com/mizdebsk/xmvn is the source of truth? 15:12:24 <mizdebsk> mattf: yes, that's a mirror i push to regularly 15:12:34 <mattf> is that release a build for jenkins.cloud.fp.o or when the xmvn gets pushed to fedora-updates-testing ? 15:13:08 <mattf> (btw, https://github.com/mizdebsk/xmvn/blob/master/TODO updated 2 months ago) 15:13:49 <mizdebsk> mattf: jenkins builds track current HEAD, every release is a tag 15:14:07 <mizdebsk> whatever is in fedora is a different story, but usually rawhide has the latest released (tagged) version 15:14:28 <tstclair> pmackinn, want to give details on your xmvn issues ^ 15:14:42 <mattf> ok, so we can watch for your release email, but at that point we have all the new goodness in rawhide already? 15:15:16 <tstclair> In general I think we should probably setup some continuous integration env around some of our packaging as the whole stack is a shifting sea. 15:15:18 <mizdebsk> mattf: not yet, but they will be there very soon 15:15:29 <mattf> mizdebsk, ok 15:15:57 <pmackinn> well, 1.3 introduced restrictions around the use of system scope 15:16:05 <mattf> tstclair, good idea, maybe similar to the jenkins.cloud.fp.o? 15:16:07 <mizdebsk> (the order of events is: release, mail with release notes, build in rawhide) 15:16:25 <mattf> mizdebsk, that's good, means we have an option for a heads-up 15:17:21 <tstclair> mattf, .cloud.fp.o? ref..? 15:17:32 <mattf> http://jenkins.cloud.fedoraproject.org/job/xmvn/ws/RPM/latest/ 15:18:19 <mattf> if we have similar ci that builds against tip xmvn (and others), we'd get a warning if something breaks us 15:18:21 <mizdebsk> what we do in maven world is: every night we do automated koji scratch builds of 4 selected packages which are complicated enough and have nice coverage of whole repo 15:18:25 <tstclair> perfect... 15:18:36 <mizdebsk> besides that we do scratch builds of all packages every now and then 15:18:44 <mattf> oooooor, can we add hadoop to that set of 4 packages? 15:18:58 <mizdebsk> mattf: sure 15:19:03 <tstclair> wink wink nudge nudge 15:19:08 <mattf> say no more 15:19:37 <mattf> http://gadandelion.files.wordpress.com/2010/01/nudge-nudge.jpg 15:20:18 <mattf> pmackinn, would adding hadoop to the builds that the maven world periodically builds work for you, or is there a better candidate? 15:20:20 <mizdebsk> so, do you want an email to be sent to bigdata mailing list when hadoop scratch build fails? 15:20:36 <tstclair> I don't really have other issues atm, other then I need to setup continuous integ. Last night I decided to pull latest tip of tachyon and found issues with recent commits. 15:20:46 <mizdebsk> or how to inform you of the status? 15:21:10 <tstclair> mizdebsk, +1 email list. 15:21:11 <mattf> mizdebsk, sending to the list is a good idea. if it becomes a problem we can change it. 15:21:26 <pmackinn> mattf, couldn't hurt 15:21:44 <tstclair> any other issues or should we move on? 15:21:51 <willb> as long as we're talking build infrastructure, mizdebsk, can we make an action to chat about Ivy support soon? 15:21:53 <mattf> none from me 15:22:11 <mizdebsk> willb: sure 15:22:24 <mizdebsk> i have time after meeting 15:22:35 <willb> I have an ad-hoc solution for resolving Ivy artifacts but it would be better if it were integrated more with fedora metadata :-) 15:22:46 <willb> great, let's talk on -java 15:22:48 <tstclair> #topic status 15:23:12 <tstclair> Anything new to report to the SIG? 15:23:55 <mattf> i have an action to respond to joe brockmeier to make sure the hadoop work gets good billing in the f20 release pr 15:24:28 <tstclair> I had timed out on bjorn for review (he's been busy as of late) and willb took over mesos review. I'm planning on working on mesos today and circling back to get some patches in. Their process is... different. 15:24:55 <tstclair> mattf, that would be nice PR. 15:25:36 <pmackinn> status - javolution, jdo-api 3 on their way to f20 stable; refactored candidate hive spec to eschew mvn_build/mvn_install mojo restrictions in favor of manual steps due to new system scope limitations 15:26:24 <tstclair> are there any reviews pending that need looking into, I've finished off one the other day but can take more if folks are blocked. 15:26:48 <rsquared> status - I'm bearing down on hbase. I'm working my way through their myriad of startup scripts and methods to figure out how to create systemd scripts. BRs are figured out. Rs still need to be determined and testing of the package needs to be done. 15:26:49 <mattf> did gil have some? 15:26:54 <willb> status: hoping to get sbt patched for Ivy 2.3 support by the end of the week; that's my major remaining blocker to getting stuff reviewable 15:27:34 <tstclair> mattf, yup, but he has not been online in a while to determine priority.. I'll email the list this am and find out. 15:27:55 <rsquared> tstclair: Hbase needs high-scale-lib and metrics-core reviewed. The later has a few deps that need reviews as well 15:28:39 <tstclair> rsquared, could you email the list with priority order. 15:28:52 <pmackinn> hive needs hbase :-) freeway jam! 15:29:03 <rsquared> tstclair: sure 15:29:43 <rsquared> pmackinn: Just disable hbase support and re-enable when hbase is ready. :) 15:30:31 <pmackinn> no sir 15:31:35 <tstclair> rsquared, are you just on system integ now? 15:32:14 <pmackinn> rsquared, anyway need my dn deps cleared up first 15:32:14 <tstclair> or are there other bitz still missing? 15:33:17 <rsquared> tstclair: No, still need to do functional testing. I started tackling the systemd integration thinking it would be easy with my hadoop experience. HBase is a different beast though. 15:33:36 <rsquared> pmackinn: I was kidding. :) Hive w/o HBase isn't much of anything. 15:34:24 <pmackinn> chips ahoy...without the chips 15:34:51 <tstclair> I was just wondering if there was a preliminary rpm that could get pmackinn the jars he needs, despite being a WIP. 15:35:07 <tstclair> ahoy? 15:35:16 <pmackinn> tstclair, ^^ other hive deps need to land anyway 15:35:32 <rsquared> Oh, I do have that. I have an rpm that packages the jars and everything. pmackinn if that'll unblock you I can make that available. 15:36:00 <pmackinn> rsquared, sure i'll run with that for the time being 15:36:43 <rsquared> pmackinn: Contents subject to change at any time without notice. :) 15:36:44 <tstclair> rsquared, I almost always create a package-rpm repo on github for * then dump into fedora repos once accepted. 15:36:55 <pmackinn> rsquared, WHATTT? 15:38:34 <tstclair> pmackinn, are you blocked on other deps which are non-existent in fedora? 15:39:11 <tstclair> I might have bandwidth to take on some small deps. 15:39:43 <pmackinn> for hive? https://fedoraproject.org/wiki/SIGs/bigdata/packaging/Hive?rd=Hive_packaging#Dependencies tells the tale 15:40:28 <tstclair> ack. PIG seems troublesome. 15:41:30 <pmackinn> meh, not so much https://fedoraproject.org/wiki/SIGs/bigdata/packaging/Pig#Dependencies 15:41:44 <mattf> except it looks circilar 15:41:46 <pmackinn> just the cute curly end of the chain 15:41:46 <mattf> and circular 15:42:01 <tstclair> k.. new topic.. where are the other folks who wanted to lend a hand.. 15:42:09 <tstclair> #topic growing_SIG 15:42:20 <mattf> any of the ibm folks around? 15:42:34 <tstclair> mattf, I have not heard from them in some time. 15:42:50 <mattf> let's reach out, maybe personal invite to the meeting 15:43:25 <tstclair> mattf, didn't you converse with them during strata conf? 15:43:51 <pmackinn> should we extend invites to the respective upstream PMC? or has that been done? 15:43:52 <tstclair> Also, is there any PR mechanism for a SIG? 15:43:59 <mattf> i didn't 15:44:17 <mattf> that was for another effort and different company 15:44:38 <tstclair> pmackinn, good idea. 15:44:42 <mattf> tstclair, right now our only pr mechanism is via fedora pr 15:45:13 <tstclair> Mentioning the SIG in the fedora release somehow might be good? If it's possible? 15:45:29 <mattf> i have an action to respond to joe brockmeier to make sure the hadoop work gets good billing in the f20 release pr 15:47:12 <tstclair> Would be kind of nice if we had some type of social media outlet for SIG pr. 15:47:23 <mattf> i agree 15:47:47 <tstclair> does fedora even have such channels, might be a worthy topic for FESCo meeting. 15:48:06 <willb> most of us have blogs and occasionally talk to people :-) 15:48:22 <mattf> speaking of, i'd like to hear about spark summit 15:48:23 <tstclair> yeah, but official channels are nice dissemination. 15:48:30 <tstclair> mattf, +1 15:48:40 <tstclair> #topic spark_summit 15:48:42 <willb> I gave a talk about our work to package Spark at Spark Summit (I also called out the work the SIG is doing in getting other parts of the BDAS stack and the Hadoop ecosystem available). I got a lot of positive feedback; I think videos of the talks will be online soon and will send out a link when they are available. 15:48:49 <Viking-Ice> it would be news to me if SIG suddenly get mentioned in the release notes ( or anyother sub-community that got created since it has nothing to do with the release itself ) 15:48:50 <willb> Probably of general interest are the guided Spark exercises; these are a great way to get some familiarity with Spark: http://spark-summit.org/2013/exercises/ 15:49:03 <willb> (See also my blog post on getting started with Spark MLLib: http://chapeau.freevariable.com/2013/12/a-simple-machine-learning-app-with-spark.html ) 15:49:18 <willb> Some dependency notes from conversations and talks: It looks like Spark 0.9 will be the first release with official Scala 2.10 support, so I will continue working against the upstream 2.10 feature branch. Current releases of Shark are still based on Hive 0.9. 15:49:33 <willb> Some projects to watch (there were talks on all of these): SIMR (Spark-as-MapReduce jobs), Sparrow (fine-grained scheduling), the Ooyala job server (not open-source, but IIRC that's planned). Ryan Weald from ShareThrough presented on an algebra-based approach to Spark Streaming; I didn't get to see this talk because I was in another session but have read up on it and am looking to learn more. 15:49:43 <willb> Lots of people are interested in deploying Spark to elastic clouds, and many of them have projects to make this easier. This is a space to watch. 15:51:05 <tstclair> willb, my brain SEGV's on SIMR 15:51:18 <tstclair> why? is the question. 15:53:01 <willb> My takeaway is that it's interesting to look at the various ways we can manage these jobs. That might not be the way you'd choose to run Spark jobs in an unrestricted environment, but it's cool that it's possible. (It reminded me of lots of similar projects around bending Condor to handle unconventional use cases.) 15:54:28 <tstclair> I'll lookup sparrow and add it to the watch on the wiki. 15:55:09 <tstclair> anything else to discuss? 15:55:32 <mattf> nothing from me 15:56:13 <pmackinn> #topic jetty8 15:56:49 <tstclair> #topic jetty8 15:56:58 <tstclair> pmackinn, what's the issue? 15:57:09 <pmackinn> so that's what real power looks like... 15:57:11 <pmackinn> https://lists.fedoraproject.org/pipermail/bigdata/2013-July/000048.html 15:57:40 <pmackinn> in a feverish dream last night, it occurred that we may want to revisit this 15:57:51 <pmackinn> i worry about rsquared sanity 15:58:13 <pmackinn> basically re-introduce jetty8 as a compat 15:58:46 <rsquared> In addition to my sanity, there's a chance that upstream would accept jetty8 since it doesn't force a change to java 7. 15:59:01 <pmackinn> what the man said ^^ 15:59:26 <pmackinn> and the single largest delta for the hadoop package would become "locked" for a peroid of time 15:59:39 <tstclair> rsquared, really? really? Last word I had from upstream was a not-a-chance. 15:59:48 <pmackinn> the concern is the possible dep drift around jetty8 itself 15:59:58 <mizdebsk> do you want to package jetty8 as library only (only JARs), or fully featured server with systemd service et al? 16:00:10 <rsquared> tstclair: Well, jetty 9 is not a chance. jetty 8 is maybe but people who are experienced with jetty need to look at it. They haven't stepped up. 16:00:16 <pmackinn> jetty8 has a better chance than jetty9 16:00:17 <rsquared> atm, it amounts to the same thing 16:00:43 <pmackinn> jetty8 is jdk6 compatible, jetty9 is not 16:00:53 <rsquared> I think just the jars would be fine. pmackinn agree? 16:01:36 <pmackinn> hmm a devel/lib pkg? maybe 16:01:50 <mizdebsk> i don't see much problem with reintroducing jetty8 then (from packaging pov, idk what about security updates on 8.x line etc.) 16:02:50 <pmackinn> mizdebsk, comfort level around jar only approach? 16:02:56 <rsquared> I don't have a good feel for how far jetty deps may have drifted 16:03:37 <mizdebsk> pmackinn: i already tested this approach and it forks in most cases 16:03:45 <mizdebsk> you don't even need to package full jetty, only subset of modules you need 16:03:48 <pmackinn> btw, we are not just talking about hadoop bu the rest of the ecosystem also 16:03:51 <pmackinn> but 16:04:10 <mizdebsk> s/forks/works/ 16:04:13 <pmackinn> mizdebsk, forks? can you elaborate? 16:04:17 <pmackinn> nvmd 16:04:20 <pmackinn> :-) 16:04:38 <tstclair> The approach seems sound, and consistent. So create a jetty8 package? 16:05:13 <pmackinn> CAVEAT: we need to back up on our hadoopy pkgs then 16:06:50 <tstclair> any other items? 16:07:10 <rsquared> pmackinn: Say again? 16:07:17 <rsquared> back up what packages? 16:07:55 <pmackinn> hadoop for starters...new patches from old gh.c commits 16:08:14 <rsquared> Ah, right. 16:08:52 <pmackinn> rsquared, indeed new patches from scratch since we never were at the point of jetty8+2.2.0 16:09:08 <pmackinn> we had moved on by then 16:09:09 <rsquared> We should probably keep the jetty9 stuff around even though it'll bit rot for reference. I have similar changes for hbase I hadn't submitted upstream yet but we should hang on to 16:09:11 <rsquared> Yep 16:09:40 <rsquared> We'll need to re-create the integration and test branches again as well as roll back the jetty branch to jetty8 16:09:56 <pmackinn> #action jetty8 exploratory compat pkg 16:10:03 <pmackinn> first things first 16:10:06 <tstclair> Kyle, folks I'm going to end the record and we can take it back to our channel, in part b/c there is another meeting soon and I'm sure folks will be filtering in. 16:10:21 <tstclair> k not Kyle.. tab complete. 16:10:31 <tstclair> #endmeeting