[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Groff] mom : unicode in .INCLUDE'd files
From: |
John Gardner |
Subject: |
Re: [Groff] mom : unicode in .INCLUDE'd files |
Date: |
Sun, 23 Jul 2017 22:29:03 +1000 |
>
> UTF-8 and UTF-16 Text Encoding Detection Library
That was posted in *2014?? *Suddenly I've forgotten if time's flowing
backwards or forwards...
What's the rationale for choosing UTF-16 in the first place? It offers
nothing that UTF-8 can't already handle... (to my flimsy understanding)
On 23 July 2017 at 22:23, Mike Bianchi <address@hidden> wrote:
> This library purports to be a way to approach the problem ...
>
> https://www.autoitconsulting.com/site/development/utf-8-
> utf-16-text-encoding-detection-library/
>
> UTF-8 and UTF-16 Text Encoding Detection Library
> by Jonathan Bennett | Aug 23, 2014 | Development |
>
> This post shows how to detect UTF-8 and UTF-16 text and presents a fully
> functional C++ and C# library that can be used to help with the detection.
>
> I recently had to upgrade the text file handling feature of AutoIt to
> better
> handle text files where no byte order mark (BOM) was present. The older
> version of code I was using worked fine for UTF-8 files (with or without
> BOM)
> but it wasn't able to detect UTF-16 files without a BOM. I tried to the the
> IsTextUnicode Win32 API function but this seemed extremely unreliable and
> wouldn't detect UTF-16 Big-Endian text in my tests.
>
> Note, especially for UTF-16 detection, there is always an element of
> ambiguity.
> This post by Raymond shows that however you try and detect encoding there
> will
> always be some sequence of bytes that will make your guesses look stupid.
>
> Here are the detection methods I'm currently using for the various types of
> text file. The order of the checks I perform are:
>
> BOM
> UTF-8
> UTF-16 (newline)
> UTF-16 (null distribution)
> :
> :
>
> --
> Mike Bianchi
>
>
- Re: [Groff] mom : unicode in .INCLUDE'd files, (continued)
- Re: [Groff] mom : unicode in .INCLUDE'd files, Ralph Corderoy, 2017/07/21
- Re: [Groff] mom : unicode in .INCLUDE'd files, Peter Schaffter, 2017/07/21
- Re: [Groff] mom : unicode in .INCLUDE'd files, Ralph Corderoy, 2017/07/21
- Re: [Groff] mom : unicode in .INCLUDE'd files, Peter Schaffter, 2017/07/21
- Re: [Groff] mom : unicode in .INCLUDE'd files, Ralph Corderoy, 2017/07/22
- Re: [Groff] mom : unicode in .INCLUDE'd files, Keith Marshall, 2017/07/22
- Re: [Groff] mom : unicode in .INCLUDE'd files, Ralph Corderoy, 2017/07/23
- Re: [Groff] mom : unicode in .INCLUDE'd files, Mike Bianchi, 2017/07/23
- Re: [Groff] mom : unicode in .INCLUDE'd files,
John Gardner <=
- Re: [Groff] mom : unicode in .INCLUDE'd files, Ralph Corderoy, 2017/07/23
- Re: [Groff] mom : unicode in .INCLUDE'd files, John Gardner, 2017/07/23
- Re: [Groff] mom : unicode in .INCLUDE'd files, Keith Marshall, 2017/07/23
- Re: [Groff] mom : unicode in .INCLUDE'd files, E. Hoffmann, 2017/07/22
- Re: [Groff] mom : unicode in .INCLUDE'd files, Mike Bianchi, 2017/07/22
- Re: [Groff] mom : unicode in .INCLUDE'd files, Mike Bianchi, 2017/07/22
- Re: [Groff] mom : unicode in .INCLUDE'd files, Steffen Nurpmeso, 2017/07/22
- Re: [Groff] mom : unicode in .INCLUDE'd files, Ralph Corderoy, 2017/07/23
- Re: [Groff] mom : unicode in .INCLUDE'd files, E. Hoffmann, 2017/07/23
- Re: [Groff] mom : unicode in .INCLUDE'd files, Dale Snell, 2017/07/23