Re: [Gnu-arch-users] Re: How does arch/tla handle encodings?

gnu-arch-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Re: How does arch/tla handle encodings?

From:	Marcus Sundman
Subject:	Re: [Gnu-arch-users] Re: How does arch/tla handle encodings?
Date:	Sun, 29 Aug 2004 14:57:37 +0300
User-agent:	KMail/1.7

On Sunday 29 August 2004 03:02, Michael Poole wrote:
> Marcus Sundman writes:
> > On Saturday 28 August 2004 20:02, Michael Poole wrote:
> > > 1) You have not defined any specific problems that you want to solve,
> > >    then you assume that we are too stupid to solve your problem.  So
> > >    far you have made complaints analogous to "arch should solve the
> > >    code attribution problem."
> >
> > First of all I originally only tried to get answers to a few questions,
> > and almost immediately people started bashing wildly.
> >
> > That said, the problem here was very specific. No link in the chain
> > should lose the essential piece of metadata referred to as "encoding
> > info". If it is lost then there is no way to get it back. How is this
> > not specific?
>
> Where is this metadata established?  I know of no editor on my Linux
> or Windows machines that records "encoding info," except within the
> byte stream of the files they work on.

The metadata is established when the string is encoded. Duh!
Sigh.. I've already said this, but sure, I can say it once more...
If the encoding isn't specified explicitly then it's implicitly the system's 
default encoding, as defined by your environment settings. At least this is 
how it's done in most systems today. E.g. when you write "echo foo >bar" 
then the file "bar" will be created in the local system's default encoding.

This usually works reasonably well until the file leaves the local system. 
Then you have to also send the encoding metadata along, lest the file 
becomes unusable.

> The kind of specifics I would like is a description like "I commit a file
> using ISO-8859-15 into arch, and someone who gets that file and opens it
> using an ISO-8859-1 editor gets the wrong non-ASCII characters."

OK.
"I commit a file using windows-1252 into arch, and someone who gets that 
file and opens it using an UTF-8 editor gets the wrong characters."

So, why didn't he simply open the as windows-1252 instead of UTF-8? Because 
he didn't know what encoding the file was in, damn it! Why not? Because 
arch threw away that piece of info!

> The obvious question about that case is: 
> Suppose arch records and can report the encoding.  How does that help
> a user who needs arch's assistance to discover the encoding?

Huh? You have answered your own question. If you need to know the encoding 
then obviously it helps if arch can tell it.

> > > 2) You insist that the best way to solve an uncommon problem (most
> > >    users have no confusion about encoding systems) is by arch
> > >    providing a special-purpose hook.
> >
> > I have insisted no such thing. Also, in my experience the problem is
> > way too common.
>
> If it is not a special-purpose hook, what generic mechanism exists
> that permits arch to record this metadata?

There are several alternatives. E.g., you could provide the info as command 
line args, and you could have per-user, per-project, per-module and/or 
per-filetype defaults, so that you don't have to use the command line 
switch. The arch client could also detect the local system's default 
encoding and default to that if nothing else is specified. There are 
probably a lot more ways, too, but something tells me you're not in the 
slightest interested in even thinking about it. No, since you haven't 
experienced the problem (or at least think you haven't) then the problem 
obviously doesn't exist, so you bitch and moan to your heart's extent when 
the issue is brought up. What a nice attitude.

> I do not discard the value of your experience, but "way too common" is
> both subjective and vague.  My experience is to the contrary -- mostly
> because people tend to know what coding system is used by files they
> open or edit -- and I do not know of any reason to accept your
> experience as more accurate than mine.

Huh? First of all, my experience is very "accurate". There's nothing 
inaccurate about having trouble with different encodings in mixed systems 
environments.
E.g., in my company we currently have two teams, one that uses UTF-8 and one 
that uses a mix of ISO-8859-15 and windows-1252. We also have a library 
"module" that is imported into both teams' source code trees. It's obvious 
that this causes trouble, and there is nothing inaccurate about the fact.

Secondly, I disagree that people tend to know what encoding is used. Mostly 
people seem to simply ignore the issue and hope for the best. Many have 
decided to use only English, just because they've noticed those characters 
looks the same for all team members.

Still, even if the majority wouldn't be experiencing problems that doesn't 
mean that you should just screw over the minority. Of course you have to 
draw the line somewhere, but this particular minority isn't very small, and 
it'll only get larger as a result of further internationalization.

> > > If you want us to take you seriously, it would be helpful to be very
> > > specific about how and where you believe your problem occurs and why
> > > arch is a good place to solve this problem.
> >
> > The problem occurs when one link in the chain behaves badly. Arch is
> > one link in the chain. Exactly what is it that you don't understand?
>
> As I explained above, I still do not understand what specific problem
> you want to solve.

Now there we have the comprehension problem again. Sigh...
Sorry, I just don't know how to say it more clearly.

> There is a chicken-and-egg problem with standards to record this:
> until some standard storage mechanism exists, tools will randomly
> destroy the metadata.  But until tools exist, many implementors will
> reject a proposed storage mechanism as not truly standard.

How can anyone have such an amazingly narrow field of view?
Read my lips: you don't have to use EAs or similar. I have already mentioned 
several alternatives. Get a clue already!

In general, you don't have to make something perfect from day one. That 
doesn't mean that there is no way of making it good, or even perfect, in 
steps.

> The main problem I see with common filesystems is that, in the general
> case, the metadata has to be stored in a separate file.  When multiple
> streams per file are supported by more operating systems, a meaningful
> mechanism can be used.  Until then, there can be only fragile kludges
> to address the problem.

One has to start somewhere. Otherwise it's impossible to get around 
chicken-and-egg problems.

> If your proposal describes how to use EAs, named streams, or whatever
> other OS/FS-specific mechanism implements per-file metadata, I would
> like to hear it

It doesn't. Those are implementation details, and as such needs to be worked 
out by people more experienced with arch.

Oh, it just dawned on me that maybe we are miscommunicating because you 
think I'm talking on the implementation level when I'm actually talking on 
the conceptual level. I'm sorry if I've misled you.

> I apologize for offending you. 

Apology accepted. I also apologize for using somewhat harsh words 
occasionally.

- Marcus Sundman

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Gnu-arch-users] Re: How does arch/tla handle encodings?, (continued)

Prev by Date: Re: [Gnu-arch-users] Build System links/ recommendations
Next by Date: [Gnu-arch-users] Encoding handling proposal
Previous by thread: Re: [Gnu-arch-users] Re: How does arch/tla handle encodings?
Next by thread: Re: [Gnu-arch-users] Re: How does arch/tla handle encodings?
Index(es):
- Date
- Thread