gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] Re: Encoding handling proposal


From: Stefan Monnier
Subject: [Gnu-arch-users] Re: Encoding handling proposal
Date: 30 Aug 2004 13:34:34 -0400
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50

> B) "Content-Type" should be a mandatory metadata string attribute.

In keeping with the "enforce naming convention" policy of Arch, I guess that
we could just use a mime.types file to map extensions to content types.

> C) "Auto-Filter" should be a mandatory metadata boolean attribute.

As mentioned, I think this is unnecessary: Arch should keep handling files
as "sequences of bytes", just like most/all other tools do.  Meta-data has
been a recurrent theme in Unix and still hasn't appeared, so I wouldn't hold
my breath.  The current way encoding problems are solved is via tags in the
data (ui.e. the data is self-describing), which have the advantage of
blending better within the Unix world.

The various type-specific diff algorithms are only ways to optimize
changeset size and help merging, but they should all work correctly on
arbitrary binary files.

> D) There should be a filter/plugin architecture to enable a transcoding of
> files on input and output based on their content-types and user settings
> and user-provided parameters.

How is a utf-8 going to be transcoded into latin-1 without loss?

> E) Utilities such as "diff", "merge" and "annotate" (aka "blame") should be 
> provided by plugins mapped to content-types.

As mentioned by someone else, such type-specific algorithms (at least when
used for in-archive-changesets) should be "standard" within the
user community.

But I think it also makes sense to allow any wacko user-specific algorithm,
as long as it stays "for the user's eyes only", i.e. part of tla but not
part of Arch.  This gets us back to the "diff options" thread.

> G) Filenames and paths should use UTF-8 in the repository, and be transcoded
> to the proper encoding when a client accesses the local file system.

IIRC, that's basically what is planned.  For now, filenames are limited to
a subset of ASCII so the problem is currently moot.


        Stefan




reply via email to

[Prev in Thread] Current Thread [Next in Thread]