i18n
LOGS
06:03:12 <tagoh_> #startmeeting i18n
06:03:12 <zodbot> Meeting started Thu Jun 27 06:03:12 2013 UTC.  The chair is tagoh_. Information about MeetBot at http://wiki.debian.org/MeetBot.
06:03:12 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
06:03:13 <tagoh_> #meetingname i18n
06:03:13 <zodbot> The meeting name has been set to 'i18n'
06:03:14 <tagoh_> #topic agenda and roll call
06:03:14 <tagoh_> #link https://fedoraproject.org/wiki/I18N/Meetings/2013-06-27
06:03:24 <mfabian> Hi!
06:03:27 <dueno> hi
06:03:40 <tagoh_> hi, long time no meeting. shall we have one today
06:04:17 <anish_> Hi!
06:05:12 <juhp> hi
06:05:12 <fujiwarat> hi
06:06:06 <pravins> hi
06:06:15 <epico_> hi
06:06:37 <paragan> hi
06:08:55 <tagoh_> okay, let's get started.
06:09:21 <tagoh_> #topic Upcoming schedule
06:09:22 <tagoh_> #info 2013-07-02        Fedora 19 Final Release
06:09:48 <tagoh_> RC2/2.2 is out and on testing now
06:11:23 <tagoh_> go/no-go meeting will be this night in our timezone...
06:11:52 <juhp> (I think 2.2 is just a respin for kde live)
06:11:56 <tagoh_> right
06:12:01 <mfabian> No delta isos from rc2 to rc2.2?
06:12:12 <juhp> no deltas for live
06:12:13 <mfabian> juhp: Ah, so no big difference to rc2.
06:12:58 <juhp> I am not quite sure what is in 2.2 - there was some proposed blocker about kdepim iirc
06:13:33 * juhp is re-downloading desktop live...
06:15:07 <tagoh_> okay, move on then
06:15:15 <tagoh_> #topic New topics
06:15:16 <tagoh_> #info #20: Fedora 20 Planning (tagoh)
06:15:16 <tagoh_> #link https://fedorahosted.org/i18n/ticket/20
06:15:18 <juhp> I have been seeing so strange behaviour with gnome search field not getting focus trying harder to reproduce
06:15:24 <juhp> so = some
06:15:40 <juhp> in overview, but anyway
06:16:26 <tagoh_> so it's time to think about f20 planning. just fyi the planning process has been changed since f20. please read http://fedoraproject.org/wiki/Changes/Policy if you are planning to change something in f20.
06:17:38 <pravins> sure
06:18:40 <paragan> mfabian, rc2.2 is just kde respin
06:18:50 <tagoh_> is there any plans so far? :)
06:22:01 <tagoh_> okay, it's still earler so just keep it in mind.
06:22:16 <tagoh_> #info #21: Compare different Desktop Environments in Fedora 19 (pnemade)
06:22:16 <tagoh_> #link https://fedorahosted.org/i18n/ticket/21
06:22:37 <paragan> just thinking to have this post GA
06:24:06 <juhp> aha
06:24:35 <tagoh_> is there any new DE in f19?
06:26:28 <tagoh_> I did test MATE and Cinnamon to support for imsettings in f18 time frame btw.
06:26:44 <paragan> I don't think anything is new. all DE are also in F18
06:26:57 <mfabian> No enlightenment 17 yet ...
06:27:26 <juhp> but versions updated etc
06:28:53 <pravins> paragan: yes, idea is worth. Having comparison chart will help users when migrating from one DE to other
06:29:24 <pravins> but what things we should include?
06:29:31 <pravins> 1. How to change IME?
06:30:24 <paragan> yes
06:31:26 <tagoh_> change?
06:31:28 <paragan> if these DE differs in ime available, ime setup, fonts default, app names
06:31:49 <mfabian> Shouldn’t the default fonts be the same?
06:31:51 <pravins> tagoh_: means add, new IME
06:32:36 <pravins> mfabian: i think it should be same based on fontconfig priorities.
06:33:36 <tagoh_> btw who's the target for this information?
06:34:57 <paragan> people who want to find what changes are there between any DE
06:37:41 <tagoh_> any other comments?
06:37:43 <juhp> how about making some mini-draft to get a better idea?
06:38:05 <juhp> maybe help page for IM etc might be useful dunno
06:38:15 <paragan> ok
06:40:25 <tagoh_> anything else?
06:41:24 <mfabian> Has anybody tried ibus-typing-booster 1.1.0?
06:41:50 <mfabian> It can learn from a text file now.
06:42:05 <tagoh_> #topic Open Floor
06:42:47 <tagoh_> mfabian: not yet - aha
06:43:13 <pravins> mfabian: i am using ibus-typing-booster for Marathi language.
06:43:32 <pravins> nope, dont have 1.1.0 yet
06:43:51 <pravins> mfabian: that is an excellent feature.
06:43:57 <mfabian> By reading  files, it is easy to get a huge database which makes typing slow. I need to think how to prune statistically irrelevant entries from the database ...
06:44:32 <juhp> mfabian, still python right?
06:44:32 <mfabian> Learning from a text file is done via the setup tool, there is a button to read a text file.
06:44:33 <pravins> mfabian: so words from text files will get added to Hunspell dictionaries? or i-t-b specific database?
06:44:52 <mfabian> It is still python, yes, but the speed is limited by the SELECT statements.
06:44:58 <juhp> hmm
06:45:17 <juhp> so maybe a better db format is needed perhaps?
06:45:18 <mfabian> The words from the text file get added to the user database of ibus-typing-booster.
06:45:23 <mfabian> Actually trigrams.
06:45:33 <mfabian> Yes, I am thinking whether a different database format is needed.
06:45:42 <juhp> cool
06:45:56 <epico_> how about bi-gram, which can also improves performance?
06:46:16 <mfabian> it stores trigram, bigram, and unigram in the database.
06:46:49 <mfabian> Not separately, the bigrams are just two words of a trigram, the unigram is just the first word of a trigram.
06:47:29 <mfabian> I.e. they are all read from the same table, only the select statements differe.
06:47:30 <pravins> looks fare enough to have them in local database rather than pushing to hunspell
06:47:30 <mfabian> differ.
06:47:57 <mfabian> Hunspell doesn’t do the trigram stuff anyway.
06:48:12 <mfabian> pravins: did you notice that it learns better now when using it for Marathi?
06:48:16 <juhp> mfabian, so stored in reverse word order?
06:48:43 <pravins> yes. It should my added word nicely
06:48:47 <mfabian> Something like this is a database row:
06:48:49 <anish_> juhp, what is reverse word order?
06:48:49 <pravins> s/should/shows
06:48:50 <mfabian> 14284418|s|suit|space|the|5
06:49:04 <juhp> I see
06:49:05 <mfabian> The text is "the space suit"
06:49:28 <mfabian> The last word was inserted when the user typed  only "s" and selected "suit" from the lookup table.
06:49:41 <mfabian> I.e. the 2nd row is the last user input.
06:50:06 <mfabian> 5 is the number of times this particular combination of trigram and last user input happened.
06:50:25 <juhp> aha
06:51:29 <mfabian> Even if the select statements can be made faster, some limit of the database size is probably a good thing.
06:51:35 <juhp> and the first number?
06:51:43 <mfabian> The first number is just a rowid.
06:51:48 <juhp> ok
06:51:54 <mfabian> Maybe not needed. Was always there so far.
06:52:21 <juhp> is it used?
06:52:30 <mfabian> No, not used a all.
06:52:32 <mfabian> at all.
06:52:34 <juhp> ok
06:52:41 <pravins> we took initial database from ibus-tables, so it might be used in ibus-tables
06:53:08 <mfabian> I changed the database a lot from ibus-tables, but so far I left the rowid alone.
06:53:34 <mfabian> I believe ibus-table doesn’t use the rowid either.
06:53:44 <anish_> may be it's used to set primary key
06:54:23 <mfabian> sqlite implicitly creates rowids anyway, one would probably not save anything by not having it explicitely in a table row.
06:55:01 <mfabian> http://www.sqlite.org/autoinc.html
06:55:24 <mfabian> “If a table contains a column of type INTEGER PRIMARY KEY, then that column becomes an alias for the ROWID”
06:55:37 <mfabian> So in some form, that column would always be there.
06:55:39 <juhp> i see
06:56:28 <mfabian> To limit the database size, I think about introducing another row with a timestamp when this row was last used.
06:56:50 <mfabian> And then delete rows from the database which have not been used for a long time *and* have a low count.
06:57:33 <juhp> sounds reasonable
06:57:51 <mfabian> Timestamps might also be nice to display something like: “You have saved 82% of the keystrokes to day and 79% yesterday” like swiftkey does on Android.
06:58:55 <tagoh_> if you want to scale out, better stop use of sqlite I guess...
06:59:11 <mfabian> tagoh_: can you recommend anything faster?
06:59:54 <mfabian> I also thought of writing python dictionaries directly to disk, even that would probably be faster then sqlite already.
07:00:01 <tagoh_> hmm, do you want SQL-like ?
07:00:25 <mfabian> The select statements are rather simple, a simple key-value store is probably enough.
07:01:57 <mfabian> sqlite also has a strange problem that the -wal file gets huge when a huge number of inserts is done.
07:02:25 <anish_> hmm
07:03:28 <epico_> mfabian: yes, simple key-value store is more faster. :)
07:03:34 <mfabian> When learning from a file with 3 million words, the -wal files grows to 1.7 GB.
07:03:47 <epico_> s/more/much/
07:04:13 <mfabian> It is then merged into the main database during a checkpoint, but temporarily it gets huge.
07:05:22 <juhp> mfabian, lol
07:05:40 <juhp> probably sql is too heavy
07:06:02 <mfabian> The main database is only 38MB then, so 1.7GB for the WAL (Write-Ahead-Log) is surprisingly huge.
07:06:10 <juhp> nod
07:06:32 <epico_> mfabian: does more commit transactions help?
07:06:39 * epico_ just a guess.
07:06:56 <mfabian> epico_: You mean for the huge WAL?
07:07:00 <epico_> yes
07:07:09 <mfabian> Apparently not.
07:07:30 <epico_> np, just guess it.
07:07:31 <mfabian> Fastest is inserting the whole data from a text file with a single commit.
07:07:50 <mfabian> self.db.executemany(sqlstr, sqlargs)
07:07:50 <mfabian> self.db.commit()
07:07:50 <mfabian> self.db.execute('PRAGMA wal_checkpoint;')
07:07:50 <mfabian> 
07:08:05 <mfabian> The executemany inserts everything, then one commit.
07:08:17 <epico_> mfabian, ask sqlite upstream?
07:08:51 <mfabian> Doing the insert in groups of say 10000 records and then a commit and a checkpoint does not stop the WAL file from growing to 1.7GB, it just makes it a lot slower.
07:09:11 <epico_> mfabian, I see.
07:09:16 <mfabian> epico_: Yes, maybe I should ask upstream
07:09:24 <epico_> :)
07:09:47 <mfabian> Also strange is that the WAL file does not seem to grow above 1.7GB even if I use much larger texts.
07:14:08 <tagoh_> okay, anything else we want to discuss in the meeting?
07:17:14 <tagoh_> let's stop here then.
07:17:23 <tagoh_> thanks everyone for the meeting!
07:17:27 <tagoh_> #endmeeting