monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Re: user-friendly hash formats, redux


From: Oren Ben-Kiki
Subject: Re: [Monotone-devel] Re: user-friendly hash formats, redux
Date: Sat, 4 Dec 2004 20:56:17 +0200
User-agent: KMail/1.7.1

On Saturday 04 December 2004 15:52, Nathaniel Smith wrote:
> > phonetically distinct syllables: { B D F H J K L M R S T V Y } x {
> > a e i o u } - things like "YeRuDa".
>
> Ah, but you still have problems.  There are both within-language
> phonetic processes: in English, "ada" and "ata" are pronounced
> identically

But "Ta" and "Da" aren't. Each syllable is separate. From the BB readme 
I see things like "obipe". An English speaker will pronounce this 
o-bye-p, I suppose, but that would come as a shock to a Finnish-only 
speaker (they actually spell things the way they sound. Go figure). I 
wouldn't dare to guess how it would be pronounced in French :-)

> -- and cross-language issues: Japanese doesn't have a 
> distinct "L" and "R", to take one famous example.

Yes, that's a bummer. That's the only one I know of though - the b/p of 
Arabic is covered, as is the American tendency to pronounce r/w the 
same way. Of course, north European languages tend to pronounce 'j' as 
'y', but at least they'll not mix up their sounds. Sigh. The problem is 
you need 13 consonants to get to 64.

> Also, I misremembered; BB actually gets 16 bits/5-letter "word".  I

Ah. That makes much more sense.

> Not exactly.  Transmission over an audio medium is one problem, sure.
> But it's actually not a very difficult one --- you can use something
> like the "phonetic alphabet" (Alpha/Bravo/Charlie/Delta/...)

Quick, what's G? Gnome? :-) At any rate, if this isn't a goal, then I 
must say Nathan's approach seems better. At least you have a chance to 
remember the words. And as for being English - if you are not an 
English speaker, you are not worse off than when using BibbleBabble. If 
you are one, your chances of memorizing the id will be that much 
higher. Besides, let's face it; most people will know at least _some_ 
English, and three letter words are about as "some" as you can get.

> Keeping 4 different distinct ids in working memory is different --

Why keep a distinct id in memory? Whether you are using bb/syl/tlw, or 
something else, it is just a presentation/parsing issue. Internally it 
is just bit strings, same as today.

> > How does such a program results tell you about how people use ids...
>
> ... stick
> people in an eye-tracker, and stick them down at a machine that's
> recording their mouse movements and keypresses and doing screen
> captures, and analyze the data in terms of various models...

Ah, that sort of program. We did this sort of testing for usability 
studies in a company I worked for. "Easy" is the ast word I'd use to 
describe it.

> The poor man's version I have in mind, though, just tests things like
> recall span, recognition span, typing speed, etc. -- cognitive
> processes that we can be pretty sure are important.

Assuming you can get people to sit for it... These things are usually 
boring as hell. I suppose you could turn it into an engaging game :-).

> ... your
> calculations suggest that 2 words (10 characters) should be enough
> for just about anything :-).

Well, that was just back-of-the-envelope average value, there's the 
distribution to consider.(hacking a C++ program to test this... running 
it for 2,000,000 ids x 10 times...). OK, here are the results. syl is 
my 2-char syllables, bb is your 5-char BibbleBabble, and tlw is 
Nathan's 3-letter-words. I'm only showing the maximal and minimal 
number of ids that were successfully identified in the 10 trials, 
testing upto 2,000,000 ids:

syl/chr bb/chr tlw/chr bits : Were enough for
  1/2    1/5     1/3     0  : 1 - 1
  1/2    1/5     1/3     1  : 2 - 2
  1/2    1/5     1/3     2  : 3 - 2
  1/2    1/5     1/3     3  : 4 - 3
  1/2    1/5     1/3     4  : 6 - 4
  1/2    1/5     1/3     5  : 9 - 4
  1/2    1/5     1/3     6  : 11 - 7
  2/4    1/5     1/3     7  : 29 - 4
  2/4    1/5     1/3     8  : 53 - 31
  2/4    1/5     1/3     9  : 63 - 22
  2/4    1/5     1/3     10 : 68 - 38
  2/4    1/5     2/6     11 : 108 - 62
  2/4    1/5     2/6     12 : 153 - 79
  3/6    1/5     2/6     13 : 219 - 96
  3/6    1/5     2/6     14 : 210 - 138
  3/6    1/5     2/6     15 : 317 - 213
  3/6    1/5     2/6     16 : 401 - 201
  3/6    2/10    2/6     17 : 787 - 367
  3/6    2/10    2/6     18 : 1,252 - 451
  4/8    2/10    2/6     19 : 1,848 - 553
  4/8    2/10    2/6     20 : 1,882 - 843
  4/8    2/10    3/9     21 : 2,353 - 1,566
  4/8    2/10    3/9     22 : 5,568 - 2,939
  4/8    2/10    3/9     23 : 4,980 - 2,419
  4/8    2/10    3/9     24 : 9,650 - 3,558
  5/10   2/10    3/9     25 : 14,522 - 6,140
  5/10   2/10    3/9     26 : 25,823 - 10,320
  5/10   2/10    3/9     27 : 32,988 - 16,566
  5/10   2/10    3/9     28 : 48,659 - 10,435
  5/10   2/10    3/9     29 : 62,809 - 24,949
  5/10   2/10    3/9     30 : 88,333 - 35,201
  6/12   2/10    4/12    31 : 97,539 - 42,992
  6/12   3/15    4/12    32 : 148,945 - 125,579
  6/12   3/15    4/12    33 : 157,246 - 118,748
  6/12   3/15    4/12    34 : 289,030 - 137,511
  6/12   3/15    4/12    35 : 514,452 - 99,846
  6/12   3/15    4/12    36 : 421,613
  7/14   3/15    4/12    37 : 1,106,307 - 363,971
  7/14   3/15    4/12    38 : 1,561,518 - 692,049
  7/14   3/15    4/12    39 : >2,000,000 - 987,024
  7/14   3/15    4/12    40 : 1,789,355 - 1,297,565
  7/14   3/15    5/15    41 : >2,000,000 - 1,013,324
  7/14   3/15    5/15    42 : >2,000,000 - 1,359,519
  8/16   3/15    5/15    43 : >2,000,000

Well, this gives you an idea. I'm attaching the program if you are 
interested.

It is interesting to compare the methods; although bb is in theory the 
most dense, it ends up being the most wastefulbecause of the large 
quanta . Of course, if you don't bother with computing the minimal 
prefix and just use some "safe" constant, that's not a disadvantage.

I must say I'm growing to like Nathan's tlw idea - given a careful 
choice of words to minimize the bug/bag issue. It would be the best of 
both worlds if you could get rid of the more blatant confusions like 
bug/bag and buy/bye by introducing a "few" "nice" non-words. E.g., 
'taz' isn't a word, but it works great (besides, its the name of a 
cartoon character :-).

Have fun,

 Oren Ben-Kiki

Attachment: unique-ids.cc
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]