bug-gnupod
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnupod] input not verified as sane


From: H. Langos
Subject: Re: [Bug-gnupod] input not verified as sane
Date: Thu, 10 Apr 2008 13:39:55 +0200
User-agent: Mutt/1.5.13 (2006-08-11)

Hi Dylan,

On Tue, Apr 08, 2008 at 01:48:35PM -0700, Dylan Martin wrote:
> Hi, I use gnupod all the time.  Thanks for making it!
> 
> I just downloaded a podcast with non-ascii characters and possibly
> problematical '<' and '>' marks in the title.
> 

If you take a look at gnupod/src/ext/XMLhelper.pm you'll see that the
xml file is generated mostly by text manipulation. So chances are quite
high that something will slip through the cracks ...

Could you post the url of that podcast, the downloaded xml file (should
be something called /tmp/gnupodcast1_47f61cb0_87903dd.)
Or at least the relevant part of your GNUtunesDB.xml?
It would make it far easier to reproduce the bug and fix it.


As I read gnupod/src/ext/XMLhelper.pm all attribute names (why that?) and 
their values get filtered by this function:

sub xescaped {
        my ($ret) = @_;
        $ret =~ s/&/&amp;/g;
        $ret =~ s/"/&quot;/g;
        $ret =~ s/</&lt;/g;
        $ret =~ s/>/&gt;/g;
        #$ret =~ s/^\s*-+//g;
        my $xutf = Unicode::String::utf8($ret)->utf8;
        #Remove 0x00 - 0x1f chars (we don't need them)
        $xutf =~ tr/\000-\037//d;

        return $xutf;
}

So your < and > marks should be taken care of. Non-ascii characters 
however are a different story. There are people out there who seem 
to write their rss files in wordpad and since the line 
  my $xutf = Unicode::String::utf8($ret)->utf8;
assumes that $ret is in utf8 and does output it in utf8 again.
there is no real conversion happening here.

The only effect of this line as far as I can tell is some filtering.

Try this:

perl -e 'use Unicode::String; my $xutf = Unicode::String::utf8("foo")->hex; 
print $xutf."\n";'

Output should be:
U+0066 U+006f U+006f

Now replace "foo" with a string that contains nonascii characters. if 
your terminal is utf8 you should see output of those characters in hex
notation. 

If the input isn't utf8 in my case the invalid characters seem to be 
ignored but the behavior for invalid input is undefinded (at least 
my documentation of Unicode::String doesn't tell me anything) and 
your perl version might be different.


Anyway there should be no invalid utf8 characters here since the input
that comes from the podcast's xml file is already filtered and converted 
by XML/Parser.pm 


After thinking about it again I assume that by "title" you mean the
title that is extracted from the id3 tag of that podcast.

One more reason to follow up on this with more information about the
podcast.

> gnupod_addsong.pl added these items to GNUtunesDB.xml and every
> subsequent attempt to read the database produces an error.
> 
> not well-formed (invalid token) at line 654, column 398, byte 548847
> at /usr/lib/perl5/vendor_perl/5.8.8/i386-linux-thread-multi/XML/Parser.pm
> line 187
> 
> I was able to fix this by changing the questionable string to
> something reasonable in the GNUtunesDB.xml file.
> 
> Also, it would be really nice if the error message was more helpful,
> e.g. said which file contained the problem.

At the point where the parser dies it only knows a file handle, and no
filename anymore. So the output can't be done there.

I don't realy understand perl so somebody please correct me if I talk
bullshit here but as I understand it perl doesn't have any useful means
of exception handling/propagation. So if an error is not handled where 
it occurs, it crashes your programm right there and right then.
The only way to avoid this is to use "eval" and see if the eval block
died. This is what XML/Parser.pm does.

Before ranting on about the inherent evil of "eval" I decided to take 
a look at the parser and there's at least a way to make it output the 
offending line. 

I've also wrapped the call to the parser in an eval block to catch the
dying parser and make it output the file name.

( I don't have a clue if in this case perl will realy assign the right 
value to $p but it is not used by anybody who calls the "doxml" sub
anyway. )

I guess I will add the same savety mechanism to addsong for handling
badly formated rss feeds.

cheers
-henrik

Attachment: gnupod_ext_XMLhelper-improve-error-output.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]