gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Is tagline an attractive nuisance for international


From: John Meinel
Subject: Re: [Gnu-arch-users] Is tagline an attractive nuisance for international users?
Date: Mon, 11 Oct 2004 21:02:38 -0500
User-agent: Mozilla Thunderbird 0.8 (Windows/20040913)

Stephen J. Turnbull wrote:
The tagline tag-snarfing algorithm is

1. Restrict the file to 1st 1K plus last 1K.
2. Find "^[[:blank:][:punct:]]*arch-tag:[[:blank:]]*(.*?)[^[:graph:]]*$"
   where *? is the shy repetition operator (subject to the 1024 byte
   boundary)
3. Grab the group, and smash any octet in it outside of [33,126] to '_'.
4. Return the result of 3.

Anybody using a "human-readable" algorithm for tag construction in a
language other than English is liable for lots of collisions.  For
example, in EUC or UTF-8 Japanese, the tag disappears (it gets scarfed
by the non-captured trailing [[:graph:]]* in the regexp) unless
there's some stray non-blank non-Japanese in the tag part.

I think nowadays everybody is using uuidgen or the like, but this
probably should be documented.


As it is being brought up. I would like to see the [[:blank:]][:punct:]]
relaxed a little bit if possible.

The specific problem is that batch files use "rem" as the comment
marker. At least with my testing, I wasn't able to add a tagline. Now I
don't use a lot of .bat files, but I've been working with a system that
does. I can always explicit tag, but as always, I prefer taglines.

Further than that, I agree, human readable taglines will certianly cause
all sorts of problems in non-ASCII text. Would it be possible to also
restrict the tagline to being longer than some minimum, which would help
detect bad taglines from the beginning, before a commit, and possible
collisions?

Just some thoughts,
John
=:->


Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]