big_data_sig
LOGS
15:02:52 <tstclair> #startmeeting
15:02:52 <zodbot> Meeting started Thu Dec  5 15:02:52 2013 UTC.  The chair is tstclair. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:02:52 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
15:03:04 <tstclair> #meetingname BIG DATA SIG
15:03:04 <zodbot> The meeting name has been set to 'big_data_sig'
15:03:12 <mattf> howdy
15:03:37 <tstclair> #meetingtopic SIG_PACKAGING_STATUS
15:04:02 <tstclair> #topic updated status
15:05:01 <tstclair> Morning folks, figured we could do a quick run through on current status, blockers, etc.  Then maybe get a small core-dump from willb on anything cool from spark-summit
15:05:23 <tstclair> Or noteworthy items we should be watching.
15:05:34 <mattf> sounds good. i hear there are some xmvn macro issues too, but pete isn't around atm
15:06:18 <tstclair> rsquared, ^ do you have insight?  My xmvn issue had been fixeds.
15:06:31 <tstclair> er fixed.
15:06:54 <mizdebsk> for any xmvn issues please ping me, i usually fix them quickly
15:07:04 <rsquared> There is a supposed fixed for the mvn_file macro issue I was having but I haven't verified it.  I'm not sure it's in a build yet.
15:07:30 <mizdebsk> rsquared: it's not yet in rawhide, upstream only for now
15:07:38 <tstclair> #topic issues
15:07:57 <mattf> mizdebsk, is there a communication channel we can hook in to hear about changes before they hit? or possibly have a test rebuild of hadoop before pushing xmvn changes?
15:08:10 <rsquared> mizdebsk: That's what I thought.
15:08:18 <tstclair> +1 mattf
15:08:43 <tstclair> I've been having issues with mock on f20, but there is a workaround for now.
15:09:09 <mattf> tstclair, what's the workaround?
15:09:11 <mizdebsk> mattf: latest xmvn rpms are always available at http://jenkins.cloud.fedoraproject.org/job/xmvn/ws/RPM/latest/
15:09:32 <mizdebsk> you can subscribe to java-sig-commits for all git commits from xmvn
15:09:43 <mizdebsk> or you can watch it on github
15:09:52 <tstclair> mattf, downgrade mock
15:09:57 <tstclair> https://bugzilla.redhat.com/show_bug.cgi?id=1028352
15:09:58 <mattf> tstclair, oof, ok
15:10:31 <mattf> mizdebsk, is there a low traffic/summary way to get a heads up on what's about to land in xmvn?
15:11:20 <mizdebsk> mattf: besides TODO file, not really
15:11:29 * tstclair wonders if his filters bundle java-sig traffic w/fedora-devel
15:11:52 <mizdebsk> but after every release i send release notes to java-devel which is a low-traffic list
15:11:59 <mattf> https://github.com/mizdebsk/xmvn is the source of truth?
15:12:24 <mizdebsk> mattf: yes, that's a mirror i push to regularly
15:12:34 <mattf> is that release a build for jenkins.cloud.fp.o or when the xmvn gets pushed to fedora-updates-testing ?
15:13:08 <mattf> (btw, https://github.com/mizdebsk/xmvn/blob/master/TODO updated 2 months ago)
15:13:49 <mizdebsk> mattf: jenkins builds track current HEAD, every release is a tag
15:14:07 <mizdebsk> whatever is in fedora is a different story, but usually rawhide has the latest released (tagged) version
15:14:28 <tstclair> pmackinn, want to give details on your xmvn issues ^
15:14:42 <mattf> ok, so we can watch for your release email, but at that point we have all the new goodness in rawhide already?
15:15:16 <tstclair> In general I think we should probably setup some continuous integration env around some of our packaging as the whole stack is a shifting sea.
15:15:18 <mizdebsk> mattf: not yet, but they will be there very soon
15:15:29 <mattf> mizdebsk, ok
15:15:57 <pmackinn> well, 1.3 introduced restrictions around the use of system scope
15:16:05 <mattf> tstclair, good idea, maybe similar to the jenkins.cloud.fp.o?
15:16:07 <mizdebsk> (the order of events is: release, mail with release notes, build in rawhide)
15:16:25 <mattf> mizdebsk, that's good, means we have an option for a heads-up
15:17:21 <tstclair> mattf, .cloud.fp.o? ref..?
15:17:32 <mattf> http://jenkins.cloud.fedoraproject.org/job/xmvn/ws/RPM/latest/
15:18:19 <mattf> if we have similar ci that builds against tip xmvn (and others), we'd get a warning if something breaks us
15:18:21 <mizdebsk> what we do in maven world is: every night we do automated koji scratch builds of 4 selected packages which are complicated enough and have nice coverage of whole repo
15:18:25 <tstclair> perfect...
15:18:36 <mizdebsk> besides that we do scratch builds of all packages every now and then
15:18:44 <mattf> oooooor, can we add hadoop to that set of 4 packages?
15:18:58 <mizdebsk> mattf: sure
15:19:03 <tstclair> wink wink nudge nudge
15:19:08 <mattf> say no more
15:19:37 <mattf> http://gadandelion.files.wordpress.com/2010/01/nudge-nudge.jpg
15:20:18 <mattf> pmackinn, would adding hadoop to the builds that the maven world periodically builds work for you, or is there a better candidate?
15:20:20 <mizdebsk> so, do you want an email to be sent to bigdata mailing list when hadoop scratch build fails?
15:20:36 <tstclair> I don't really have other issues atm, other then I need to setup continuous integ.  Last night I decided to pull latest tip of tachyon and found issues with recent commits.
15:20:46 <mizdebsk> or how to inform you of the status?
15:21:10 <tstclair> mizdebsk, +1 email list.
15:21:11 <mattf> mizdebsk, sending to the list is a good idea. if it becomes a problem we can change it.
15:21:26 <pmackinn> mattf, couldn't hurt
15:21:44 <tstclair> any other issues or should we move on?
15:21:51 <willb> as long as we're talking build infrastructure, mizdebsk, can we make an action to chat about Ivy support soon?
15:21:53 <mattf> none from me
15:22:11 <mizdebsk> willb: sure
15:22:24 <mizdebsk> i have time after meeting
15:22:35 <willb> I have an ad-hoc solution for resolving Ivy artifacts but it would be better if it were integrated more with fedora metadata :-)
15:22:46 <willb> great, let's talk on -java
15:22:48 <tstclair> #topic status
15:23:12 <tstclair> Anything new to report to the SIG?
15:23:55 <mattf> i have an action to respond to joe brockmeier to make sure the hadoop work gets good billing in the f20 release pr
15:24:28 <tstclair> I had timed out on bjorn for review (he's been busy as of late) and willb took over mesos review.  I'm planning on working on mesos today and circling back to get some patches in.  Their process is... different.
15:24:55 <tstclair> mattf, that would be nice PR.
15:25:36 <pmackinn> status - javolution, jdo-api 3 on their way to f20 stable; refactored candidate hive spec to eschew mvn_build/mvn_install mojo restrictions in favor of manual steps due to new system scope limitations
15:26:24 <tstclair> are there any reviews pending that need looking into, I've finished off one the other day but can take more if folks are blocked.
15:26:48 <rsquared> status - I'm bearing down on hbase.  I'm working my way through their myriad of startup scripts and methods to figure out how to create systemd scripts.  BRs are figured out.  Rs still need to be determined and testing of the package needs to be done.
15:26:49 <mattf> did gil have some?
15:26:54 <willb> status:  hoping to get sbt patched for Ivy 2.3 support by the end of the week; that's my major remaining blocker to getting stuff reviewable
15:27:34 <tstclair> mattf, yup, but he has not been online in a while to determine priority.. I'll email the list this am and find out.
15:27:55 <rsquared> tstclair: Hbase needs  high-scale-lib  and metrics-core reviewed.  The later has a few deps that need reviews as well
15:28:39 <tstclair> rsquared, could you email the list with priority order.
15:28:52 <pmackinn> hive needs hbase :-) freeway jam!
15:29:03 <rsquared> tstclair: sure
15:29:43 <rsquared> pmackinn: Just disable hbase support and re-enable when hbase is ready. :)
15:30:31 <pmackinn> no sir
15:31:35 <tstclair> rsquared, are you just on system integ now?
15:32:14 <pmackinn> rsquared, anyway need my dn deps cleared up first
15:32:14 <tstclair> or are there other bitz still missing?
15:33:17 <rsquared> tstclair: No, still need to do functional testing.  I started tackling the systemd integration thinking it would be easy with my hadoop experience.  HBase is a different beast though.
15:33:36 <rsquared> pmackinn: I was kidding. :)  Hive w/o HBase isn't much of anything.
15:34:24 <pmackinn> chips ahoy...without the chips
15:34:51 <tstclair> I was just wondering if there was a preliminary rpm that could get pmackinn the jars he needs, despite being a WIP.
15:35:07 <tstclair> ahoy?
15:35:16 <pmackinn> tstclair, ^^ other hive deps need to land anyway
15:35:32 <rsquared> Oh, I do have that.  I have an rpm that packages the jars and everything.  pmackinn if that'll unblock you I can make that available.
15:36:00 <pmackinn> rsquared, sure i'll run with that for the time being
15:36:43 <rsquared> pmackinn: Contents subject to change at any time without notice. :)
15:36:44 <tstclair> rsquared, I almost always create a package-rpm repo on github for * then dump into fedora repos once accepted.
15:36:55 <pmackinn> rsquared, WHATTT?
15:38:34 <tstclair> pmackinn, are you blocked on other deps which are non-existent in fedora?
15:39:11 <tstclair> I might have bandwidth to take on some small deps.
15:39:43 <pmackinn> for hive? https://fedoraproject.org/wiki/SIGs/bigdata/packaging/Hive?rd=Hive_packaging#Dependencies tells the tale
15:40:28 <tstclair> ack.  PIG seems troublesome.
15:41:30 <pmackinn> meh, not so much https://fedoraproject.org/wiki/SIGs/bigdata/packaging/Pig#Dependencies
15:41:44 <mattf> except it looks circilar
15:41:46 <pmackinn> just the cute curly end of the chain
15:41:46 <mattf> and circular
15:42:01 <tstclair> k.. new topic.. where are the other folks who wanted to lend a hand..
15:42:09 <tstclair> #topic growing_SIG
15:42:20 <mattf> any of the ibm folks around?
15:42:34 <tstclair> mattf, I have not heard from them in some time.
15:42:50 <mattf> let's reach out, maybe personal invite to the meeting
15:43:25 <tstclair> mattf, didn't you converse with them during strata conf?
15:43:51 <pmackinn> should we extend invites to the respective upstream PMC? or has that been done?
15:43:52 <tstclair> Also, is there any PR mechanism for a SIG?
15:43:59 <mattf> i didn't
15:44:17 <mattf> that was for another effort and different company
15:44:38 <tstclair> pmackinn, good idea.
15:44:42 <mattf> tstclair, right now our only pr mechanism is via fedora pr
15:45:13 <tstclair> Mentioning the SIG in the fedora release somehow might be good?  If it's possible?
15:45:29 <mattf> i have an action to respond to joe brockmeier to make sure the hadoop work gets good billing in the f20 release pr
15:47:12 <tstclair> Would be kind of nice if  we had some type of social media outlet for SIG pr.
15:47:23 <mattf> i agree
15:47:47 <tstclair> does fedora even have such channels, might be a worthy topic for FESCo meeting.
15:48:06 <willb> most of us have blogs and occasionally talk to people :-)
15:48:22 <mattf> speaking of, i'd like to hear about spark summit
15:48:23 <tstclair> yeah, but official channels are nice dissemination.
15:48:30 <tstclair> mattf, +1
15:48:40 <tstclair> #topic spark_summit
15:48:42 <willb> I gave a talk about our work to package Spark at Spark Summit (I also called out the work the SIG is doing in getting other parts of the BDAS stack and the Hadoop ecosystem available).  I got a lot of positive feedback; I think videos of the talks will be online soon and will send out a link when they are available.
15:48:49 <Viking-Ice> it would be news to me if SIG suddenly get mentioned in the release notes ( or anyother sub-community that got created since it has nothing to do with the release itself )
15:48:50 <willb> Probably of general interest are the guided Spark exercises; these are a great way to get some familiarity with Spark:  http://spark-summit.org/2013/exercises/
15:49:03 <willb> (See also my blog post on getting started with Spark MLLib:  http://chapeau.freevariable.com/2013/12/a-simple-machine-learning-app-with-spark.html )
15:49:18 <willb> Some dependency notes from conversations and talks:  It looks like Spark 0.9 will be the first release with official Scala 2.10 support, so I will continue working against the upstream 2.10 feature branch.  Current releases of Shark are still based on Hive 0.9.
15:49:33 <willb> Some projects to watch (there were talks on all of these):  SIMR (Spark-as-MapReduce jobs), Sparrow (fine-grained scheduling), the Ooyala job server (not open-source, but IIRC that's planned).  Ryan Weald from ShareThrough presented on an algebra-based approach to Spark Streaming; I didn't get to see this talk because I was in another session but have read up on it and am looking to learn more.
15:49:43 <willb> Lots of people are interested in deploying Spark to elastic clouds, and many of them have projects to make this easier.  This is a space to watch.
15:51:05 <tstclair> willb, my brain SEGV's on SIMR
15:51:18 <tstclair> why? is the question.
15:53:01 <willb> My takeaway is that it's interesting to look at the various ways we can manage these jobs.  That might not be the way you'd choose to run Spark jobs in an unrestricted environment, but it's cool that it's possible.  (It reminded me of lots of similar projects around bending Condor to handle unconventional use cases.)
15:54:28 <tstclair> I'll lookup sparrow and add it to the watch on the wiki.
15:55:09 <tstclair> anything else to discuss?
15:55:32 <mattf> nothing from me
15:56:13 <pmackinn> #topic jetty8
15:56:49 <tstclair> #topic jetty8
15:56:58 <tstclair> pmackinn, what's the issue?
15:57:09 <pmackinn> so that's what real power looks like...
15:57:11 <pmackinn> https://lists.fedoraproject.org/pipermail/bigdata/2013-July/000048.html
15:57:40 <pmackinn> in a feverish dream last night, it occurred that we may want to revisit this
15:57:51 <pmackinn> i worry about rsquared sanity
15:58:13 <pmackinn> basically re-introduce jetty8 as a compat
15:58:46 <rsquared> In addition to my sanity, there's a chance that upstream would accept jetty8 since it doesn't force a change to java 7.
15:59:01 <pmackinn> what the man said ^^
15:59:26 <pmackinn> and the single largest delta for the hadoop package would become "locked" for a peroid of time
15:59:39 <tstclair> rsquared, really?  really?  Last word I had from upstream was a not-a-chance.
15:59:48 <pmackinn> the concern is the possible dep drift around jetty8 itself
15:59:58 <mizdebsk> do you want to package jetty8 as library only (only JARs), or fully featured server with systemd service et al?
16:00:10 <rsquared> tstclair: Well, jetty 9 is not a chance.  jetty 8 is maybe but people who are experienced with jetty need to look at it.  They haven't stepped up.
16:00:16 <pmackinn> jetty8 has a better chance than jetty9
16:00:17 <rsquared> atm, it amounts to the same thing
16:00:43 <pmackinn> jetty8 is jdk6 compatible, jetty9 is not
16:00:53 <rsquared> I think just the jars would be fine.  pmackinn agree?
16:01:36 <pmackinn> hmm a devel/lib pkg? maybe
16:01:50 <mizdebsk> i don't see much problem with reintroducing jetty8 then (from packaging pov, idk what about security updates on 8.x line etc.)
16:02:50 <pmackinn> mizdebsk, comfort level around jar only approach?
16:02:56 <rsquared> I don't have a good feel for how far jetty deps may have drifted
16:03:37 <mizdebsk> pmackinn: i already tested this approach and it forks in most cases
16:03:45 <mizdebsk> you don't even need to package full jetty, only subset of modules you need
16:03:48 <pmackinn> btw, we are not just talking about hadoop bu the rest of the ecosystem also
16:03:51 <pmackinn> but
16:04:10 <mizdebsk> s/forks/works/
16:04:13 <pmackinn> mizdebsk, forks? can you elaborate?
16:04:17 <pmackinn> nvmd
16:04:20 <pmackinn> :-)
16:04:38 <tstclair> The approach seems sound, and consistent.  So create a jetty8 package?
16:05:13 <pmackinn> CAVEAT: we need to back up on our hadoopy pkgs then
16:06:50 <tstclair> any other items?
16:07:10 <rsquared> pmackinn: Say again?
16:07:17 <rsquared> back up what packages?
16:07:55 <pmackinn> hadoop for starters...new patches from old gh.c commits
16:08:14 <rsquared> Ah, right.
16:08:52 <pmackinn> rsquared, indeed new patches from scratch since we never were at the point of jetty8+2.2.0
16:09:08 <pmackinn> we had moved on by then
16:09:09 <rsquared> We should probably keep the jetty9 stuff around even though it'll bit rot for reference.  I have similar changes for hbase I hadn't submitted upstream yet but we should hang on to
16:09:11 <rsquared> Yep
16:09:40 <rsquared> We'll need to re-create the integration and test branches again as well as roll back the jetty branch to jetty8
16:09:56 <pmackinn> #action jetty8 exploratory compat pkg
16:10:03 <pmackinn> first things first
16:10:06 <tstclair> Kyle, folks I'm going to end the record and we can take it back to our channel, in part b/c there is another meeting soon and I'm sure folks will be filtering in.
16:10:21 <tstclair> k not Kyle.. tab complete.
16:10:31 <tstclair> #endmeeting