guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] Fix of upstream parsing of CDATA


From: Linus Björnstam
Subject: [PATCH] Fix of upstream parsing of CDATA
Date: Thu, 16 Jan 2020 13:00:25 +0100
User-agent: Cyrus-JMAP/3.1.7-754-g09d1619-fmstable-20200113v1

Hello Guilers!

RhodiumToad found an error in sxml where it would not properly parse CDATA: &gt 
would be converted to > inside CDATA blocks. This is probably due to some wrong 
reading of the XML spec:

    "Within a CDATA section, only the CDEnd string is recognized as markup, so 
that left angle brackets and ampersands may occur in their literal form; they 
need not (and cannot) be escaped using ' < ' and ' & '.".

Notice that it mentions that only CDEnd is recognized, but omitts > in the 
enumeration of things that need-not-and-cannot be escaped. 

No other XML libraries behave this way. Take for example python's Etree:

Python 2.7.17 (default, Dec 23 2019, 21:25:33)
>>> import xml.etree.ElementTree as ET
>>> root = ET.fromstring("<e><![CDATA[&gt;]]></e>")
>>> root.text
'&gt;'

The same thing with the un-patched (sxml ssax) (or rather (sxml simple)): looks 
different:

(xml->sxml "<e><![CDATA[&gt;]]></e>")
;; => (*TOP* (e ">"))

The question is whether this patch should be sent upstream. Since there has 
been very little activity there, I suspect it is a lost cause.

Failing tests have been looked through, verified and fixed. No unexpected 
errors were encountered. All SXML tests pass after this patch.

Best regards
  Linus Björnstam

Attachment: 0001-module-sxml-upstream-SSAX.scm-Fix-improper-handling-.patch
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]