[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gnu-arch-users] How does arch/tla handle encodings?
From: |
Marcus Sundman |
Subject: |
[Gnu-arch-users] How does arch/tla handle encodings? |
Date: |
Fri, 27 Aug 2004 18:25:10 +0300 |
User-agent: |
KMail/1.7 |
Members in quite a few programming teams, especially OSS ones, use different
encodings. Some people aren't even aware of this fact, even though it
causes quite a lot of trouble every now and then. (E.g., although
Windows-1252 and ISO-8859-1 are quite similar they are not the same. E.g. a
long dash in Windows-1252 is a control character in ISO-8859-1. And then
when the windows guys use command line utilities they are suddenly using
ibm850 which is completely different. (You can of course "chcp 1252", but
that leads to a plethora of other problems.) Then we have many people
working in different languages and they want to use UTF-8 or similar
multi-byte encoding, which isn't well, if at all, supported on some
platforms.)
The vast majority of programs assume input text files to be in the local
system's default encoding. Therefore the files on disk should preferably
use whatever happens to be the local system's default encoding. (In some
files the encoding is part of the file's semantics, though, so such files
should be left as they are.)
Since we can't get people to agree on one single encoding we obviously have
to transcode files. For this to be possible we need to always know which
encoding a particular text file is written in. After all, a text file is
basically just a binary blob combined with an encoding metadata attribute.
If we lose the encoding info then the file is no longer text, but "raw
data". Thus arch needs to keep track of this very important piece of
metadata. (This goes for other files than text files, too.)
We also have some other textual metadata, such as file names and paths,
commit comments etc. Some of these *must* be transcoded on some systems.
All this leaves us with two options. Either everyone has transcoding
wrappers around their arch client, or the arch client does the transcoding.
(Obviously you can't implement transcoders for all current and future
encodings/formats so there would have to be some kind of plug-in system for
this. In tla one would probably use xl for such plugins.)
So, my questions are these:
1) Does arch/tla keep track of the type/encoding of each file?
2) How does arch/tla handle file names and paths that are incompatible with
some system?
3) Diff/merge/annotate et al. have to understand the encoding of files for
them to be able to present something sensible to the user. Do they and are
they?
- Marcus Sundman
- [Gnu-arch-users] How does arch/tla handle encodings?,
Marcus Sundman <=
Re: [Gnu-arch-users] How does arch/tla handle encodings?, Andrew Suffield, 2004/08/27