[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gnu-arch-users] Re: How does arch/tla handle encodings?
From: |
Marcus Sundman |
Subject: |
Re: [Gnu-arch-users] Re: How does arch/tla handle encodings? |
Date: |
Sat, 28 Aug 2004 01:56:20 +0300 |
User-agent: |
KMail/1.7 |
> >> > An editor cannot possibly know which encoding a file has.
> >>
> >> Looks like you began with an empirically false statement. Mine does.
> >
> > You are kidding, right?
>
> Go read the definition of "empirically".
I know the definition of the term quite well. However, it doesn't matter
whether or not some editor happened to guess the right encoding. It still
doesn't _know_ that it's the correct one. It can't know which encoding the
file has unless someone tells it to the editor.
Now, you may argue that this isn't a big problem, since such guesses are
correct more often than not. Or that incorrect guesses aren't really that
important. However, if this can be fixed easily, then why on earth would
you not? I just can't believe the mentality of you people.
> Of course, in theory it's all a really serious problem.
> In practice, it's a very minor problem which bites raely and when it does
> it's usually obvious and easy to fix.
No, it's not a minor problem. No, it's most certainly not obvious. And it
can be quite tedious to fix. Have you only worked with source code with
english comments? Or perhaps only with teams where all members are using
very similar setups? Well, many haven't.
If you haven't had this kind of problems then good for you. I have. A lot.
Last problem was today, when I couldn't do a cvs update because someone had
commited a file in some 8-bits/char encoding into the repository which
otherwise contains only UTF-8 encoded files. Luckily the byte sequence
wasn't a valid UTF-8 one, and luckily my cvs client actually checked this
so it noticed the error (SmartCVS rules!). Thus I was able to fix it before
it went into production code where it would have taken *considerably* more
effort to fix. Usually errors like this aren't noticed. Still, fixing it
wasn't very easy. First I had to tell my cvs client to pretend that all
files were in some 8-bits/char encoding, so that I could get my hands on
the file. Then I had to find _all_ those garbled characters and find out
what they were supposed to be (God bless IM-systems, although I wish my IM
client's jabber-plugin wouldn't display "å" chars as "Í"...
another annoying encoding problem).
> this problem is so much larger than Arch that it just feels wrong for tla
> to try and fix it.
I don't believe this! That is the worst attitude anyone could have. "Why
should /we/ do anything about it? Let someone else fix it."
The problem is only larger than Arch in that Arch isn't the only badly
behaving program. However, for this problem to go away completely it needs
to be fixed in _all_ systems, including arch. When a piece of text is sent
around as bytes _no_ link in the chain may throw away the encoding
metadata. (It's not like some global pollution problem where it'd be OK if
the majority fixed their systems, and then some minority could pollute all
they want since they are so few anyway. No, this needs to be fixed
everywhere. It's a typical "weakest link" scenario.)
Besides, a basic fix should be done anyway, namely adding support for
arbitrary file metadata.
Anyway, it shouldn't even be necessary for me to tell you this. If you'd
just sit down and think for a few seconds you'd no doubt come to the
conclusion that it's a Really Bad Thing(tm) to throw away the encoding
metadata of the data.
Of course complete, 100% automatic solutions will only be possible when
everyone has file systems that store the encoding info, and all tools
actually use that info. However, this doesn't mean that we can't make
something that would work today.
- Marcus Sundman
Re: [Gnu-arch-users] How does arch/tla handle encodings?, Marcus Sundman, 2004/08/27
- Re: [Gnu-arch-users] How does arch/tla handle encodings?, Andrew Suffield, 2004/08/27
- Re: [Gnu-arch-users] How does arch/tla handle encodings?, Marcus Sundman, 2004/08/27
- [Gnu-arch-users] Re: How does arch/tla handle encodings?, Stefan Monnier, 2004/08/27
- Re: [Gnu-arch-users] Re: How does arch/tla handle encodings?,
Marcus Sundman <=
- Re: [Gnu-arch-users] Re: How does arch/tla handle encodings?, Michael Poole, 2004/08/27
- Re: [Gnu-arch-users] Re: How does arch/tla handle encodings?, Esben Mose Hansen, 2004/08/28
- Re: [Gnu-arch-users] Re: How does arch/tla handle encodings?, Jan Hudec, 2004/08/28
- Re: [Gnu-arch-users] Re: How does arch/tla handle encodings?, Esben Mose Hansen, 2004/08/28
- Re: [Gnu-arch-users] Re: How does arch/tla handle encodings?, Jan Hudec, 2004/08/28
- Re: [Gnu-arch-users] Re: How does arch/tla handle encodings?, Esben Mose Hansen, 2004/08/28
Re: [Gnu-arch-users] Re: How does arch/tla handle encodings?, Marcus Sundman, 2004/08/28
Re: [Gnu-arch-users] Re: How does arch/tla handle encodings?, Michael Poole, 2004/08/28
Re: [Gnu-arch-users] Re: How does arch/tla handle encodings?, Marcus Sundman, 2004/08/28
Re: [Gnu-arch-users] Re: How does arch/tla handle encodings?, Michael Poole, 2004/08/28