gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Is tagline an attractive nuisance for international


From: chth
Subject: Re: [Gnu-arch-users] Is tagline an attractive nuisance for international users?
Date: Tue, 12 Oct 2004 08:03:50 +0200

> The tagline tag-snarfing algorithm is
> 
> 1. Restrict the file to 1st 1K plus last 1K.
> 2. Find
> "^[[:blank:][:punct:]]*arch-tag:[[:blank:]]*(.*?)[^[:graph:]]*$"
>    where *? is the shy repetition operator (subject to the 1024 byte
>    boundary)
> 3. Grab the group, and smash any octet in it outside of [33,126] to
> '_'. 4. Return the result of 3.
> 
> Anybody using a "human-readable" algorithm for tag construction in a
> language other than English is liable for lots of collisions.  For
> example, in EUC or UTF-8 Japanese, the tag disappears (it gets scarfed
> by the non-captured trailing [[:graph:]]* in the regexp) unless
> there's some stray non-blank non-Japanese in the tag part.
> 
> I think nowadays everybody is using uuidgen or the like, but this
> probably should be documented.

I already suggested to use escaping instead of smashing the
nongraph characters. The problem is not only taglines, actually all id's
are handled quite similar and the smash function very ambigous.
A file with the name "foo_bar" has the same id as "foo bar" and so on...

Unfortunally such a change would require to bump the archive format
version and not compatible to older tla versions. I hope that Tom has
this on schedule for the next release.



Someone on IRC had concerns about taglines being affected by Search and
Replace actions. How about adding checksums to taglines which are
validated by tla? 
(Something like: "tla-uuid: 2395f100-a41b-4af3-a310-c11ae7515d10
ea0bc4fb85a0e892bd646ab07c6647ad" mhm crc32 or something even more
simple should suffice)


        Christian




reply via email to

[Prev in Thread] Current Thread [Next in Thread]