fedora-flock-ectr112
LOGS
15:32:29 <flock-ectr112> #startmeeting Measuring the Fedora community with Census
15:32:29 <zodbot> Meeting started Sun Aug 11 15:32:29 2013 UTC.  The chair is flock-ectr112. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:32:29 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
15:33:06 <flock-ectr112> #chair mizmo
15:33:06 <zodbot> Current chairs: flock-ectr112 mizmo
15:34:20 <flock-ectr112> good
15:34:23 <flock-ectr112> - large DB
15:34:29 <flock-ectr112> - adopted outside fedora
15:34:31 <flock-ectr112> Bad
15:34:36 <flock-ectr112> - opt-in
15:34:44 <flock-ectr112> - flaws in the design
15:34:48 <flock-ectr112> -- scalability
15:34:54 <flock-ectr112> -- complicated collections of plugins
15:35:12 <flock-ectr112> -- custom / one-off queries  difficult
15:35:23 <flock-ectr112> -- custom UI code required
15:35:44 <flock-ectr112> need to find who is maintaining it
15:35:54 <flock-ectr112> need to figure out the db design
15:36:02 <flock-ectr112> need to make the query
15:36:15 <flock-ectr112> by the time you have all this, you gave up
15:36:40 <flock-ectr112> less useful than hoped
15:36:54 <flock-ectr112> less useful -> less maintain -> retired
15:37:00 <flock-ectr112> new idea: Census
15:37:23 <flock-ectr112> - Opt-out for basic anonymous data
15:37:42 <flock-ectr112> (on by default, anonymous data, hardware info, crashes info, packages info...)
15:37:47 <flock-ectr112> - scalability
15:37:51 <flock-ectr112> Better design
15:37:58 <flock-ectr112> - flexible collector framework
15:38:05 <flock-ectr112> client should be very simple to collect the data
15:38:09 <flock-ectr112> - reusable query API
15:38:23 <flock-ectr112> allowing to run queries w/o having to talk to the dev themsevles
15:38:32 <flock-ectr112> embed that in your application
15:38:41 <flock-ectr112> - simple query prototyping tool
15:39:00 <flock-ectr112> sandbox to build your query to integrate in your own app
15:39:09 <flock-ectr112> we provide a service and an API
15:39:21 <flock-ectr112> you integrate that in your app/stats
15:39:40 <flock-ectr112> you can submit data about anything
15:40:13 <flock-ectr112> (# of download, # of updates, not just simply when people install)
15:40:25 <flock-ectr112> Prototype in openshift
15:40:55 <flock-ectr112> receiver : collections of the plugin, receive the data received and put it in the database
15:41:27 <flock-ectr112> you'll need plugin to the receiver for each type of data one want to store
15:41:40 <flock-ectr112> The query API is read only and return JSON output
15:41:53 <flock-ectr112> in the same way that JSON is used to upload data
15:42:14 <flock-ectr112> the query prototyper is just submitting the info to the query API
15:42:42 <flock-ectr112> scalable on the DB level of query API level but the API itself will remain consisten
15:42:43 <flock-ectr112> +t
15:43:37 <flock-ectr112> plugin should be very easy to write
15:43:51 <flock-ectr112> plugins just print JSON to stdout
15:44:07 <flock-ectr112> receiver has two tasks
15:44:09 <flock-ectr112> - indexing
15:44:14 <flock-ectr112> - insert the data
15:44:35 <flock-ectr112> more might come in the future according to needs
15:45:06 <flock-ectr112> every plugin has access to the whole dataset submitted
15:45:32 <flock-ectr112> inter-plugin compatibility -> data passing / ordering
15:46:41 <flock-ectr112> the index method of the plugin is only ran once
15:46:52 <flock-ectr112> and define the structure required
15:47:14 <flock-ectr112> the process method processes the input submitted and return the JSON blob to insert in the database
15:47:37 <flock-ectr112> actually the process method directly inserts in the db
15:47:45 <flock-ectr112> the query API
15:47:54 <flock-ectr112> - HTTP Post with 2 parameters
15:48:06 <flock-ectr112> -- a JavaScript function (func)
15:48:17 <flock-ectr112> -- the argument to pass to the function (args)
15:48:36 <flock-ectr112> the js is ran on a read-only javascript sandbox
15:48:44 <flock-ectr112> returned values is JSON encoded
15:48:52 <flock-ectr112> Query prototyper
15:48:59 <flock-ectr112> - static HTML page w/ javascript
15:49:18 <flock-ectr112> helps to build the query and submit it to the Query APi
15:49:41 <flock-ectr112> useful for one-off queries
15:49:54 <flock-ectr112> data returned dynamically displayed (using js)
15:50:18 <flock-ectr112> using these tools one can directly browse the db scheme live
15:50:29 <flock-ectr112> demo is at: <missed the link>
15:51:13 <flock-ectr112> first example : ' return " Hello world!";
15:51:35 <flock-ectr112> second example : return {"Title" : " Hello world!"}
15:51:50 <flock-ectr112> second example : return {"Subject" : " Hello world!", " foo" : "bar"}
15:52:02 <flock-ectr112> third example querying the db itself
15:52:08 <flock-ectr112> return db.getCollectionNames();
15:52:31 <flock-ectr112> return [" collections"].concat(db.getCollectionNames());
15:52:38 <flock-ectr112> Names the table ^
15:52:52 <flock-ectr112> return db.hardware.pci.findOne();
15:53:02 <flock-ectr112> returns a pci device information
15:53:08 <flock-ectr112> return db.hardware.ub.findOne();
15:53:11 <flock-ectr112> return db.hardware.usb.findOne();*
15:53:16 <flock-ectr112> same query for a usb device
15:53:37 <flock-ectr112> return db.hardware.usb.find(); -> returns a cursor rather than a JSON valid object
15:53:46 <flock-ectr112> return db.hardware.usb.find().toArray();
15:53:56 <flock-ectr112> which return the whole collection as JSON
15:54:08 <flock-ectr112> return db.hardware.usb.find({vendor:3599}).toArray();
15:54:20 <flock-ectr112> return usb info for a specific vendor
15:54:44 <flock-ectr112> /!\ What out : DB scheme subject to changes!
15:54:56 <flock-ectr112> return db.hardware.profile.findOne();
15:55:03 <flock-ectr112> returns a profile of hardware
15:55:13 <flock-ectr112> state of the current hardware on the device
15:55:32 <flock-ectr112> return db.checkin.find().toArray();
15:55:37 <flock-ectr112> list of all the checkin
15:55:51 <flock-ectr112> this will get bigger as there is a checkin for each insert
15:56:09 <flock-ectr112> lots of possibilities
15:56:20 <flock-ectr112> can be integrated into more application
15:57:07 <flock-ectr112> id are uniques
15:57:16 <flock-ectr112> profiles will not be store redundantly
15:58:04 <flock-ectr112> one object for each hardware device in the hardware.pci document
15:58:35 <flock-ectr112> return db.hardware.pci.find().toArray();
15:59:17 <flock-ectr112> can be used for anything, not just kernel information
15:59:32 <flock-ectr112> hits on urls can be stored
16:00:34 <flock-ectr112> checkin is used to quantify the number of profile submitted
16:00:50 <flock-ectr112> so to get the top 10 video cards you will go from checkin to profile to hardware
16:01:09 <flock-ectr112> profiles will give the number of time a specific hardware exists
16:01:24 <flock-ectr112> checkin will provide the number of time each profile has been submitted
16:01:33 <flock-ectr112> future considerations
16:01:36 <flock-ectr112> - replicates
16:01:38 <flock-ectr112> -- sharing
16:01:45 <flock-ectr112> -- master/slave construction
16:01:54 <flock-ectr112> would allow scaling up while preserving the API
16:02:06 <flock-ectr112> - Opt-in/Opt-out data policy
16:02:27 <flock-ectr112> opt-in by default in anonymous, some data might require an opt-in
16:02:41 <flock-ectr112> - Expand the collection framework
16:02:51 <flock-ectr112> Based on smolt but can be expanded
16:03:33 <flock-ectr112> last stop in the path to get data from its origin to the user
16:03:38 <flock-ectr112> queries should be fast
16:03:59 <flock-ectr112> might require a level of translation b/w the data and censu maybe in some case
16:04:10 <flock-ectr112> client/server is ready, will be uploaded to fedorahosted
16:04:39 <flock-ectr112> current collectors: uuid, hardware.pci, hardware.usb, software.os and software.rpm
16:05:15 <flock-ectr112> linking the uuid from census to the retrace server (and darkserver?)
16:05:43 <flock-ectr112> using the gnu_build_id ?
16:06:10 <flock-ectr112> pci slot might be nice to store as well
16:06:20 <flock-ectr112> software.os -> cpe info
16:06:29 <flock-ectr112> software.rpm -> output of rpm -qa
16:06:36 <flock-ectr112> TODO:
16:06:43 <flock-ectr112> - define collection requirements
16:06:47 <flock-ectr112> - Nail down schema
16:06:54 <flock-ectr112> - Opt-in/Opt-out policy
16:07:02 <flock-ectr112> - Anaconda / Firstboot integration?
16:07:12 <flock-ectr112> (checkbox to say don' t submit my data)
16:07:18 <flock-ectr112> or more complex
16:07:35 <flock-ectr112> plugin can be enabled/disabled from the command line
16:07:39 <flock-ectr112> - release
16:07:42 <flock-ectr112> - package into Fedora
16:07:52 <flock-ectr112> Hackfest at 2:00pm today!
16:08:05 <flock-ectr112> more info on http://fedorahosted.org/census
16:08:26 <flock-ectr112> live-demo: http://census-npmccallumfedora.rhcloud.com
16:08:49 <flock-ectr112> service easy to deploy
16:09:03 <flock-ectr112> Fedora 21 to get it more integrated into Fedora
16:09:18 <flock-ectr112> questions ?
16:09:28 <flock-ectr112> thanks
16:09:30 <flock-ectr112> #endmeeting