classpath-inetlib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Classpath-inetlib] [PATCH] inetlib bugs


From: Székelyi Szabolcs
Subject: Re: [Classpath-inetlib] [PATCH] inetlib bugs
Date: Thu, 26 Jan 2006 18:16:09 +0100
User-agent: Debian Thunderbird 1.0.7 (X11/20051017)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Chris,

Chris Burdess wrote:
> Székelyi Szabolcs wrote:
> 
>>I've discovered some bugs in inetlib.
>>
>>The first fixes the "mark" bug in CRLFInputStream. The readLine() method
>>of LineInputStream relies on the underlying input stream ("in") not
>>being mark()ed before reset()ting it so reset() rewinds the stream to
>>the last marked position. But CRLFInputStream marks the same stream so
>>LineInputStream rewinds to the position marked by CRLFInputStream, not
>>its own mark.
>>
>>This patch makes CRLFInputStream to use some sort of "faking" technique
>>instead of marking the stream so LineInputStream works correctly.
> 
> 
> This sounds as if the problem is really in LineInputStream. Shouldn't it
> be fixed there?
> 
> I can't really see what the problem is - the underlying stream always gets
> reset to the correct position, doesn't it? Is there a problem with the
> internal state of LineInputStream after a reset()?
The problem is that CRLFInputStream is not transparent. Let's take a
look at LineInputStream.readLine().

If I am right, theory of operation is the following:

1, mark the current position so we can rewind here
2, read up to 1024 bytes from the underlying stream -- let's assume that
the first line ending is within 1024 bytes so it is in the buffer
3, look for the first line ending, store the index in pos
4, rewind the stream to the mark
5, read pos + 1 bytes from the stream again so we read the first line
6, return the read data so the first line

Step 2 is problematic, because if in is a CRLFInputStream, it also uses
mark() on the (same) underlying stream to rewind it in case of it sees a
CR but cannot (yet) determine the next byte. So it marks the stream and
reads the next byte. If the byte after CR is not a LF, it rewinds to the
position right after the mark (so pushes back the byte after CR) and
returns the CR. This means it marks the stream right after every CR.
When CRLFInputStream.read() returns, the mark for the underlying stream
is after the CR, so in step 4 LineInputStream rewinds to the byte after
CR, not to the position marked in step 1.

We can avoid marking the stream if we read the next byte after the CR,
too, and if it's not a LF, we return the CR, store the second read
"non-LF" byte and (fake) return it on the next read().

> I am very wary of making changes to CRLFInputStream as it is used
> virtually everywhere. I have done some pretty heavy lifting with it in
> Classpath, so I feel it's probably correct.
I think CRLFInputStream should be transparent. If other code relies on
the current behaviour, fixes should be done for those, too. This is my
opinion, but your code. ;)

>>The second patch is against MessageInputStream. The bug is here that the
>>eof flag is not reset when the stream is reset so reading past EOF,
>>resetting the stream and reading again gives EOF instead of the stream
>>content right after the mark.
> 
> 
> The second part of this patch is correct, we should reset eof during reset().
> But in the first case eof will already have been set by the call to read(),
> won't it?
Yes, that's what we should clear on reset(), so if we mark(), then read
till EOF, then reset() and read again we can read from the mark before
getting EOF again.

Have a nice day,
- --
Sze'kelyi Szabolcs

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFD2QPZGJRwVVqzMkMRAuGMAKCLZxk5YtsDmEevk63JVyoi4OwTCwCbBhIk
0wzTxz0JKSmG+qXDw+s3RXo=
=hP6Z
-----END PGP SIGNATURE-----




reply via email to

[Prev in Thread] Current Thread [Next in Thread]