gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] How does arch/tla handle encodings?


From: Marcus Sundman
Subject: Re: [Gnu-arch-users] How does arch/tla handle encodings?
Date: Sat, 28 Aug 2004 13:46:40 +0300
User-agent: KMail/1.7

On Saturday 28 August 2004 12:53, Jan Hudec wrote:
> On Fri, Aug 27, 2004 at 21:38:06 +0300, Marcus Sundman wrote:
> > On Friday 27 August 2004 21:23, Andrew Suffield wrote:
> > > On Fri, Aug 27, 2004 at 08:20:00PM +0300, Marcus Sundman wrote:
> > > > On Friday 27 August 2004 19:52, Andrew Suffield wrote:
> > > > > On Fri, Aug 27, 2004 at 06:50:23PM +0200, Vaclav Haisman wrote:
> > > > > > File's encoding is imho metadata as much as permisions are.
> > > > >
> > > > > It's not. Encoding is data.
> > > >
> > > > Oh, get a clue. And a dictionary. The encoding info is data about
> > > > the data that is the content of the file. "Data about data" is
> > > > called "metadata". "Encoding" is an attribute of the file, just as
> > > > "filename" and "permissions" are.
> > >
> > > And I repeat: encoding is data.
> >
> > Yes, but it's also metadata. You said it isn't, but it is. Don't
> > pretend to be more stupid than you are.
>
> It is **NOT** metadata in the sense of filename, permissions, timestamp,
> ie. file attributes. It is metadata in the general sense "data about
> data".
>
> So while *calling* it metadata is ok, *treating* it as file attributes
> is not. The encoding is needed to understand the file, so it better be
> deduced from it's contents. The attributes do not bind that tighlty and
> they can be lost at any moment. Especially since applications don't know
> how to handle them.

Are you seriously suggesting that metadata is not actually metadata if it is 
mandatory? Only optional metadata is actually metadata? Both a file's name 
and its encoding are properties of the file. The former can be changed 
without modifying the contents of the file, the latter can't necessarily. 
This is irrelevant. Both are equally metadata.

You just don't make sense. Is the "description" attribute metadata? Let's 
say you have a picture that is displaying a particular shade of red, and 
has the attribute "description: the color of my car". You use this picture 
to find the correct shade when shopping for car paint. If you lose the 
description attribute the picture is meaningless. The description is an 
essential part of the picture and can't be deduced from it. Does this make 
the attribute not metadata? Or how is this different from the encoding of a 
text file? (And please don't say something stupid like "it's different 
because the color of characters are irrelevant".)

Also, the encoding can *not* be deduced from the file's contents. I have 
already told why this is. E.g. if a file is in ISO-8859-2 there is no way 
that the editor could know that it's not ISO-8859-1 or ISO-8859-4 or 
ISO-8859-5 or ISO-8859-8 or ISO-8859-9 or ISO-8859-10 or ISO-8859-13 or 
ISO-8859-14 or ISO-8859-15 or some other of the 30+ encodings for which the 
given byte sequence is valid.

> After all, that's what the byte-order-mark is for.  In most editors, the 
> sequence 0xfe 0xff indicates utf-16be, 0xff 0xfe indicates utf-16le and
> 0xef 0xbb 0xbf indicates utf-8 encoding.

No, the BOM is for specifying endianess of the encoding. (All unicode 
formats support a BOM, it's just that it's not needed for single byte based 
ones, such as UTF-8. That said, I fully support using BOMs also in UTF-8 
files to more often detect badly behaving programs.) If you don't know 
which encoding (or group of encodings) a file is in then you can't possibly 
know how to interpret the first bytes of the file. There is no way of 
knowing if a file beginning with the bytes 0xFE and 0xFF is a big-endian 
UTF-16 file or an ISO-8859-1 file starting with "thorn yuml" or something 
completely different in some other encoding.


- Marcus Sundman




reply via email to

[Prev in Thread] Current Thread [Next in Thread]