|
From: | Hans Aberg |
Subject: | Re: BOM mark from Windows notepad |
Date: | Thu, 19 Nov 2009 13:28:36 +0100 |
On 19 Nov 2009, at 11:14, Francisco Vila wrote:
I think that it was changed. If the BOM is only allowed in the beginning ofthe file, it becomes a state-dependent character. For example, if oneincludes two files verbatim in another, then the BOMs will no longer be inthe beginning of the combined stream. So therefore this state-less definition is to be preferred.This problem is more frequent than you may think, at least in my environment. Last week I promised to bring a case of faulty LY from Windows notepad; now I realize that all cases which might have failed were previously edited by me, putting the BOM away from the start of the file. All my students work on Windows and every instance of their documents that I edited did fail.
On UNIX-like systems, one can chain commands that only handle byte sequences though often used for text processing. UTF-9 was invented to make such usage possible. For example,
cat file1 file2 > file3will concatenate file1 and file2 into file3. It is not feasible to change 'cat', as it is a part of the operative system one will then have to asses the impact on all tools that may use 'cat'.
One can also use pipes and RPC - files can be made looking like streams and vice versa. A state-dependent BOM (only accepted at the beginning of a file) does not really work on UNIX-like platforms. So I think that state-less definition was triggered by requirement for those platforms, though it has wider applicability.
Hans
[Prev in Thread] | Current Thread | [Next in Thread] |