[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Checkout text files with the Unix LF (Oxa) - from command line

From: Peter Ring
Subject: RE: Checkout text files with the Unix LF (Oxa) - from command line
Date: Tue, 9 Oct 2001 10:18:36 +0200

Files are not suitable for the host (of your sandbox). 
They are suitable for certain uses in certain contexts. 
The host may or may not be part of the use, and the 
end-of-record format may or may not be important for the 
host. For example, I might need to manage configuration 
files for a multi-OS product. While I have both CR (MacOS), 
LF (*nix), and CR/LF (CPM/MS-DOS) end-of-record files, 
my editor (emacs) deals with this transparently, and the 
OS that holds the sandbox have no use for 2/3 of the files. 
Why should I need to check out files on a Mac just to do 
something that might as well be done on Windows or Linux?

The canonical end-of-record format in RCS/CVS history files 
is LF. This has the interesting consequence that you can manage 
text files with CR/LF end-of-record as if they had LF end-of-record; 
there is just an CR as the last character of each record. Whether 
this is acceptable or not depends on the uses of each file. 

IMHO, there's no safe way to infer the encoding (character set) 
etc. except by managing some metadata explicitly. This is why 
SGML, XML and HTML files (which are text files by any account) 
have explicit statements about their encoding, except that 
UTF-8 is the default encoding for XML files, i.e., UTF-8 is 
assumed if you do not specify something else. UTF-8 has some 
nice properties that makes it suitable as a default encoding 
for exchange of files: there's no byte order that you must know,
because each character is encoded as a sequence of bytes, and 
most letters in Western alphabets get encoded in one or a few 
bytes. For many purposes, you can deal with it as if it was ASCII.

I really don't know what should be the preferred end-of-record 
format. I tend to favour LF because *nix is in more widespread 
use than MacOS and because the CR/LF format introduces an extra 
and superfluous distinction between 'binary' and 'text'.

Kind regards

Peter Ring

-----Original Message-----
From: address@hidden [mailto:address@hidden Behalf Of
Thornley, David
Sent: 8. oktober 2001 17:07
To: Peter Ring; address@hidden
Subject: RE: Checkout text files with the Unix LF (Oxa) - from command

> -----Original Message-----
> From: Peter Ring [mailto:address@hidden
> What's flawed is the idea that the end-of-record format in 
> any text file
> should be inherently determined by the operating system. 
> Would you also like
> your OS to determine what character set you should be allowed to use?
What, then, is the OS-independent way of marking an end of record?
There are several that occur to me as possibilities, and which have
been used by various operating systems I am or have been familiar with.
All of them have advantages and disadvantages, and have been selected
for various reasons.

I've also worked on systems that mandated EBCDIC, ASCII with assorted
variations, several CDC display codes, and Unicode.  There is some
grounds for standardization here, but should it be on ASCII, Unicode,
ISO 8559-1, or what?

The CVS idea that the program, be it client or server, uses whatever
convention is suitable for its host, does quite well when people refrain
from mixing that which should not be mixed.

Info-cvs mailing list

reply via email to

[Prev in Thread] Current Thread [Next in Thread]