Ulf Ochsenfahrt wrote:
UTF-8 is a _multi-byte_ encoding.
Yes you do, because all multi-byte character sequences in UTF-8 have
the high-bit set. If you see 0x0A in a UTF-8 stream you can be certain
it is an LF and not part of a multi-byte sequence.
If you see an LF byte, you don't know whether this is a single-byte LF
or part of a multi-byte sequence.
Brian May wrote:
"Daniel" == Daniel Lakeland
Well I'd certainly agree it isn't platform-independent code. But where
is it written that monotone should not support checking in "dodgy" code?
Daniel> Consider languages like Python that have the ability to
Daniel> create multiline strings, now the \r or \n characters are
Daniel> part of the string. Converting them changes the behavior
Daniel> and meaning of the program. This is very tricky.
Any code that relies on this behaviour is very dodgy IMHO.