[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [O] Orgmode → ODT: Certain chars break export
From: |
Tory S. Anderson |
Subject: |
Re: [O] Orgmode → ODT: Certain chars break export |
Date: |
Fri, 13 Feb 2015 10:18:24 -0500 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux) |
>From a user perspective just stripping the characters seems best to me, but
>finding out what the characters seems obnoxious. Neither a quick search nor
>skimming the ODT doc specification[1][2] seem to give any insight into a set
>of illegal characters. Does elisp have anything similar to Java's
>"isWhitespace"[3] that could be used to check character features?
Rasmus <address@hidden> writes:
> address@hidden (Tory S. Anderson) writes:
>
>> While we're on the topic of ODT export problems: I was in the process
>> of converting PDF to Text to Org to ODT/DocX and discovered that
>> certain characters seem to break exported odt documents, which fail
>> with a line and col number. So far the only one I know for sure is the
>> "" (Char: C-l (12, #o14, #xc)). Hopefully a single fix can handle
>> all such cases.
>>
>> You probably don't need it, but I verified with the following file:
>> http://toryanderson.com/files/breakorg.org
>
> The export is fine, but the produced XML is invalid since it contains an
> illegal character. But how to resolve this? Should ox strip illegal
> charterers (if so what are they)? If so, could they be used for entities?
>
> —Rasmus
Footnotes:
[1] https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office
[2]
http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part1.html#__RefHeading__1415196_253892949
[3] http://www.fileformat.info/info/unicode/char/000c/index.htm
Re: [O] Orgmode → ODT: Certain chars break export, Vaidheeswaran, 2015/02/14