[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [O] ODT Charset/Encoding issues (was question about ODT export behav
From: |
Renzo Been |
Subject: |
Re: [O] ODT Charset/Encoding issues (was question about ODT export behavior) |
Date: |
Sun, 17 Jul 2011 16:12:09 +0200 |
Hi Jambunathan,
See comments below.
Ciao,
Renzo
P.S. I'm on a camping-site right now, so I do not have good Internet access...
On 16 July 2011 22:13, Jambunathan K <address@hidden> wrote:
>
> Renzo
>
>> I just want to add one point that I did not find in the org-manual. I tested
>> some of my org-files and exported them to the OpenOffice format. When I
>> tried to
>> open these documents in OpenOffice, they were corrupt and could not be
>> opened.
>>
>> I soon found out why. If you want to export an org-mode file to .odt, you
>> need
>> to explicitly set the file encoding to UTF-8 (I usually use iso-8859-1
>> encoding
>> for my files), like:
>> #-*- mode: org; coding: utf-8; -*-
>> After that OpenOffice could open the files without any problems.
>
> I use English for communication and I have to admit that I have zero
> understanding of things like character sets, encodings etc.
As for communicating; I'm from the border regions of The Netherlands, Belgium
and Germany... And therefore I'm multilingual, and often need to type words
with accents.
> Thanks for the above note. I surely see is a bug but my poor
> understanding prevents me from quantifying it further.
Well... I would not really see it as a bug... As long as it is mentioned in the
documentation, that org-file encoding's other then utf-8 could result in corrupt
output-files.
> Could you please send me a minimal iso-8859-1 test.org file and the
> associated corrupted test.odt file? I will look in to this issue.
See attachment. I can only send you the org file, because I do not have access
to a working Emacs at the moment...
> 1. Do you have any specific requirement on how the component xml files
> be encoded? A cursory look at the odt exporter suggests that it could
> actually be emitting xml files in iso-8859-1 format while wrongly
> claiming UTF-8 encoding as below
>
> --8<---------------cut here---------------start------------->8---
> <?xml version="1.0" encoding="UTF-8"?>
> --8<---------------cut here---------------end--------------->8---
>
> 2. Should the xml file be always ejected in UTF-8 irrespective of how
> the original Org file is encoded.
Yes that would seem a good solution to me... If the odt-exporter checks the
files encoding, and then changes the encoding to utf-8 (maybe using a temporary
buffer?) before the actual exporting, then there would be no further
problems...
As for the idea that the OpenOffice xml can actually be in another encoding
than utf-8; I do not know how much work that would be for you, to implement in
the odt-exporter. It might be to much effort...
Also I don't know if such an OpenOffice document will open with no problems in
all OpenOffice applications.
> [Notes to Self]
> [Notes from odbook]
>
> Para 3 of http://books.evc-cit.info/odbook/apa.html#appc-11-fm2xml
> says
>
> --8<---------------cut here---------------start------------->8---
> OpenDocument files are always encoded in UTF-8.
> --8<---------------cut here---------------end--------------->8---
>
> Para 2 of
> http://books.evc-cit.info/odbook/apa.html#xml-other-char-encodings-section
> says
>
> --8<---------------cut here---------------start------------->8---
> XML 1.0 allows a document to be encoded in any character set registered
> with the Internet Assigned Numbers Authority (IANA). European documents
> are commonly encoded in one of the ISO Latin character sets, such as
> ISO-8859-1. Japanese documents commonly use Shift-JIS, and Chinese
> documents use GB2312 and Big 5.
> --8<---------------cut here---------------end--------------->8---
>
> Para 4 of
> http://books.evc-cit.info/odbook/apa.html#xml-other-char-encodings-section
> says
>
> --8<---------------cut here---------------start------------->8---
> XML processors are not required by the XML 1.0 specification to support
> any more than UTF-8 and UTF-16, but most commonly support other
> encodings, such as US-ASCII and ISO-8859-1.
> --8<---------------cut here---------------end--------------->8---
>
>
> [Notes from XMLmind XSL-FO Converter]
>
>
> XFC supports outputting of content.xml and styles.xml in UTF-8 as well
> as ISO-8859-1.
>
> http://xml.web.cern.ch/XML/www.xmlmind.com/xfc_perso_java-4_4_0/doc/user/command_line_java.html
>
> says
>
> ,---- [see outputEncoding section]
> | For OpenDocument output (.odt), this option specifies the encoding of
> | XML content (files styles.xml and content.xml) in the output
> | document. All encodings available in the current JVM are supported. The
> | option value may be either the encoding name (e.g. ISO8859_1) or the
> | charset name (e.g. ISO-8859-1). The default value is UTF8.
> `----
>
> --
test-encoding.zip
Description: Zip archive
- Re: [O] question about ODT export behavior, (continued)
Re: [O] question about ODT export behavior, Jambunathan K, 2011/07/13
- Re: [O] question about ODT export behavior, Jambunathan K, 2011/07/13
- Re: [O] question about ODT export behavior, Rainer Stengele, 2011/07/14
- Re: [O] question about ODT export behavior, Bastien, 2011/07/14
- Re: [O] question about ODT export behavior, Jambunathan K, 2011/07/15
- Re: [O] question about ODT export behavior, Renzo Been, 2011/07/15
- Re: [O] ODT Charset/Encoding issues (was question about ODT export behavior), Jambunathan K, 2011/07/16
- Re: [O] ODT Charset/Encoding issues (was question about ODT export behavior),
Renzo Been <=
- Re: [O] ODT Charset/Encoding issues (was question about ODT export behavior), Jambunathan K, 2011/07/17
- Re: [O] ODT Charset/Encoding issues (was question about ODT export behavior), Bastien, 2011/07/18
[O] [PATCH 1/2] org-odt: Improve customization of org-export-odt-styles-file, Jambunathan K, 2011/07/22
Re: [O] [PATCH 1/2] org-odt: Improve customization of org-export-odt-styles-file, Bastien, 2011/07/22