15:32:29 <flock-ectr112> #startmeeting Measuring the Fedora community with Census 15:32:29 <zodbot> Meeting started Sun Aug 11 15:32:29 2013 UTC. The chair is flock-ectr112. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:32:29 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic. 15:33:06 <flock-ectr112> #chair mizmo 15:33:06 <zodbot> Current chairs: flock-ectr112 mizmo 15:34:20 <flock-ectr112> good 15:34:23 <flock-ectr112> - large DB 15:34:29 <flock-ectr112> - adopted outside fedora 15:34:31 <flock-ectr112> Bad 15:34:36 <flock-ectr112> - opt-in 15:34:44 <flock-ectr112> - flaws in the design 15:34:48 <flock-ectr112> -- scalability 15:34:54 <flock-ectr112> -- complicated collections of plugins 15:35:12 <flock-ectr112> -- custom / one-off queries difficult 15:35:23 <flock-ectr112> -- custom UI code required 15:35:44 <flock-ectr112> need to find who is maintaining it 15:35:54 <flock-ectr112> need to figure out the db design 15:36:02 <flock-ectr112> need to make the query 15:36:15 <flock-ectr112> by the time you have all this, you gave up 15:36:40 <flock-ectr112> less useful than hoped 15:36:54 <flock-ectr112> less useful -> less maintain -> retired 15:37:00 <flock-ectr112> new idea: Census 15:37:23 <flock-ectr112> - Opt-out for basic anonymous data 15:37:42 <flock-ectr112> (on by default, anonymous data, hardware info, crashes info, packages info...) 15:37:47 <flock-ectr112> - scalability 15:37:51 <flock-ectr112> Better design 15:37:58 <flock-ectr112> - flexible collector framework 15:38:05 <flock-ectr112> client should be very simple to collect the data 15:38:09 <flock-ectr112> - reusable query API 15:38:23 <flock-ectr112> allowing to run queries w/o having to talk to the dev themsevles 15:38:32 <flock-ectr112> embed that in your application 15:38:41 <flock-ectr112> - simple query prototyping tool 15:39:00 <flock-ectr112> sandbox to build your query to integrate in your own app 15:39:09 <flock-ectr112> we provide a service and an API 15:39:21 <flock-ectr112> you integrate that in your app/stats 15:39:40 <flock-ectr112> you can submit data about anything 15:40:13 <flock-ectr112> (# of download, # of updates, not just simply when people install) 15:40:25 <flock-ectr112> Prototype in openshift 15:40:55 <flock-ectr112> receiver : collections of the plugin, receive the data received and put it in the database 15:41:27 <flock-ectr112> you'll need plugin to the receiver for each type of data one want to store 15:41:40 <flock-ectr112> The query API is read only and return JSON output 15:41:53 <flock-ectr112> in the same way that JSON is used to upload data 15:42:14 <flock-ectr112> the query prototyper is just submitting the info to the query API 15:42:42 <flock-ectr112> scalable on the DB level of query API level but the API itself will remain consisten 15:42:43 <flock-ectr112> +t 15:43:37 <flock-ectr112> plugin should be very easy to write 15:43:51 <flock-ectr112> plugins just print JSON to stdout 15:44:07 <flock-ectr112> receiver has two tasks 15:44:09 <flock-ectr112> - indexing 15:44:14 <flock-ectr112> - insert the data 15:44:35 <flock-ectr112> more might come in the future according to needs 15:45:06 <flock-ectr112> every plugin has access to the whole dataset submitted 15:45:32 <flock-ectr112> inter-plugin compatibility -> data passing / ordering 15:46:41 <flock-ectr112> the index method of the plugin is only ran once 15:46:52 <flock-ectr112> and define the structure required 15:47:14 <flock-ectr112> the process method processes the input submitted and return the JSON blob to insert in the database 15:47:37 <flock-ectr112> actually the process method directly inserts in the db 15:47:45 <flock-ectr112> the query API 15:47:54 <flock-ectr112> - HTTP Post with 2 parameters 15:48:06 <flock-ectr112> -- a JavaScript function (func) 15:48:17 <flock-ectr112> -- the argument to pass to the function (args) 15:48:36 <flock-ectr112> the js is ran on a read-only javascript sandbox 15:48:44 <flock-ectr112> returned values is JSON encoded 15:48:52 <flock-ectr112> Query prototyper 15:48:59 <flock-ectr112> - static HTML page w/ javascript 15:49:18 <flock-ectr112> helps to build the query and submit it to the Query APi 15:49:41 <flock-ectr112> useful for one-off queries 15:49:54 <flock-ectr112> data returned dynamically displayed (using js) 15:50:18 <flock-ectr112> using these tools one can directly browse the db scheme live 15:50:29 <flock-ectr112> demo is at: <missed the link> 15:51:13 <flock-ectr112> first example : ' return " Hello world!"; 15:51:35 <flock-ectr112> second example : return {"Title" : " Hello world!"} 15:51:50 <flock-ectr112> second example : return {"Subject" : " Hello world!", " foo" : "bar"} 15:52:02 <flock-ectr112> third example querying the db itself 15:52:08 <flock-ectr112> return db.getCollectionNames(); 15:52:31 <flock-ectr112> return [" collections"].concat(db.getCollectionNames()); 15:52:38 <flock-ectr112> Names the table ^ 15:52:52 <flock-ectr112> return db.hardware.pci.findOne(); 15:53:02 <flock-ectr112> returns a pci device information 15:53:08 <flock-ectr112> return db.hardware.ub.findOne(); 15:53:11 <flock-ectr112> return db.hardware.usb.findOne();* 15:53:16 <flock-ectr112> same query for a usb device 15:53:37 <flock-ectr112> return db.hardware.usb.find(); -> returns a cursor rather than a JSON valid object 15:53:46 <flock-ectr112> return db.hardware.usb.find().toArray(); 15:53:56 <flock-ectr112> which return the whole collection as JSON 15:54:08 <flock-ectr112> return db.hardware.usb.find({vendor:3599}).toArray(); 15:54:20 <flock-ectr112> return usb info for a specific vendor 15:54:44 <flock-ectr112> /!\ What out : DB scheme subject to changes! 15:54:56 <flock-ectr112> return db.hardware.profile.findOne(); 15:55:03 <flock-ectr112> returns a profile of hardware 15:55:13 <flock-ectr112> state of the current hardware on the device 15:55:32 <flock-ectr112> return db.checkin.find().toArray(); 15:55:37 <flock-ectr112> list of all the checkin 15:55:51 <flock-ectr112> this will get bigger as there is a checkin for each insert 15:56:09 <flock-ectr112> lots of possibilities 15:56:20 <flock-ectr112> can be integrated into more application 15:57:07 <flock-ectr112> id are uniques 15:57:16 <flock-ectr112> profiles will not be store redundantly 15:58:04 <flock-ectr112> one object for each hardware device in the hardware.pci document 15:58:35 <flock-ectr112> return db.hardware.pci.find().toArray(); 15:59:17 <flock-ectr112> can be used for anything, not just kernel information 15:59:32 <flock-ectr112> hits on urls can be stored 16:00:34 <flock-ectr112> checkin is used to quantify the number of profile submitted 16:00:50 <flock-ectr112> so to get the top 10 video cards you will go from checkin to profile to hardware 16:01:09 <flock-ectr112> profiles will give the number of time a specific hardware exists 16:01:24 <flock-ectr112> checkin will provide the number of time each profile has been submitted 16:01:33 <flock-ectr112> future considerations 16:01:36 <flock-ectr112> - replicates 16:01:38 <flock-ectr112> -- sharing 16:01:45 <flock-ectr112> -- master/slave construction 16:01:54 <flock-ectr112> would allow scaling up while preserving the API 16:02:06 <flock-ectr112> - Opt-in/Opt-out data policy 16:02:27 <flock-ectr112> opt-in by default in anonymous, some data might require an opt-in 16:02:41 <flock-ectr112> - Expand the collection framework 16:02:51 <flock-ectr112> Based on smolt but can be expanded 16:03:33 <flock-ectr112> last stop in the path to get data from its origin to the user 16:03:38 <flock-ectr112> queries should be fast 16:03:59 <flock-ectr112> might require a level of translation b/w the data and censu maybe in some case 16:04:10 <flock-ectr112> client/server is ready, will be uploaded to fedorahosted 16:04:39 <flock-ectr112> current collectors: uuid, hardware.pci, hardware.usb, software.os and software.rpm 16:05:15 <flock-ectr112> linking the uuid from census to the retrace server (and darkserver?) 16:05:43 <flock-ectr112> using the gnu_build_id ? 16:06:10 <flock-ectr112> pci slot might be nice to store as well 16:06:20 <flock-ectr112> software.os -> cpe info 16:06:29 <flock-ectr112> software.rpm -> output of rpm -qa 16:06:36 <flock-ectr112> TODO: 16:06:43 <flock-ectr112> - define collection requirements 16:06:47 <flock-ectr112> - Nail down schema 16:06:54 <flock-ectr112> - Opt-in/Opt-out policy 16:07:02 <flock-ectr112> - Anaconda / Firstboot integration? 16:07:12 <flock-ectr112> (checkbox to say don' t submit my data) 16:07:18 <flock-ectr112> or more complex 16:07:35 <flock-ectr112> plugin can be enabled/disabled from the command line 16:07:39 <flock-ectr112> - release 16:07:42 <flock-ectr112> - package into Fedora 16:07:52 <flock-ectr112> Hackfest at 2:00pm today! 16:08:05 <flock-ectr112> more info on http://fedorahosted.org/census 16:08:26 <flock-ectr112> live-demo: http://census-npmccallumfedora.rhcloud.com 16:08:49 <flock-ectr112> service easy to deploy 16:09:03 <flock-ectr112> Fedora 21 to get it more integrated into Fedora 16:09:18 <flock-ectr112> questions ? 16:09:28 <flock-ectr112> thanks 16:09:30 <flock-ectr112> #endmeeting