classpath
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: java.io.DataInputStream.readLine misbehaviour


From: Dalibor Topic
Subject: Re: java.io.DataInputStream.readLine misbehaviour
Date: Sun, 02 Nov 2003 17:33:30 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030312

Mark Wielaard wrote:
Hi,

On Fri, 2003-10-31 at 19:12, Guilhem Lavaux wrote:

Continuing the Classpath-Kaffe merge, I noticed that DataInputStream is failing one of kaffe's regression test (InputStreamTest). It seems the failure comes from readLine(): readLine() is a little too conservative compared to JDK's implementation. The real behaviour seems to be the following:

readLine() is invoked => it reads bytes until it finds a '\r' => it reads the next byte
* if it is  a '\n' it eats it
* if it isn't it keeps it in an internal buffer until someone calls another read methods.

But in any case the byte isn't put back in the input buffer.
>

The Classpath implementation clearly took the lowest overhead for the
non-readLine case. For the benefit of the other mailinglist readers,
here is the actual comment from Classpaths (same for libgcj)
DataInputStream.readLine():

    // FIXME: The following code tries to adjust the stream back one
    // character if the next char read is '\n'.  As a last resort,
    // it tries to mark the position before reading but the bottom
    // line is that it is possible that this method will not properly
    // deal with a '\r' '\n' combination thus not fulfilling the
    // DataInput contract for readLine.  It's not a particularly
    // safe approach threadwise since it is unsynchronized and
    // since it might mark an input stream behind the users back.
    // Along the same vein it could try the same thing for
    // ByteArrayInputStream and PushbackInputStream, but that is
    // probably overkill since this is deprecated & BufferedInputStream
    // is the most likely type of input stream.
    //
    // The alternative is to somehow push back the next byte if it
    // isn't a '\n' or to have the reading methods of this class
    // keep track of whether the last byte read was '\r' by readLine
    // and then skip the very next byte if it is '\n'.  Either way,
    // this would increase the complexity of the non-deprecated methods
    // and since it is undesirable to make non-deprecated methods
    // less efficient, the following seems like the most reasonable
    // approach.

That's a bug in Classpath then. Trying to 'read ahead' after \r fails on for those systems whose 'end of line' is a plain \r. See this note from Apple: http://developer.apple.com/technotes/tn/tn1157.html . Short story: you're reading lines over network from a machine on mac os x: you read \r, try to read ahead but the next character never comes since the darwin machine already sent you its end of line, so you hang forever, i.e. util the connection is terminated.

In fact, while you're at it, check out the rationale behind kaffe's implementation as described in this thread:
http://www.kaffe.org/pipermail/kaffe/2002-October/040501.html
, where i go into some detail how to deal with the \r vs \r\n issue. ;) The FilterInputStream kludge was the only way to prevent fixing one bug on the cost of introducing the other I could come up with, and I think it's the only possible way to do it.

BTW I don't think we have to worry about the unsynchronized thing since
streams are already not thread safe.

They should be. My experience from working on kaffe's readers and writers and testing against JDK was that the JDK implementation was very thread safe, whereas kaffe's IO had thread safety kludged in afterwards by me for some classes. From what I remember, Classpath code wasn't thread safe at all, either.

For what it's worth, yes, there is code out there that expects threads to be able to read from readers/writers and not trip over their feet ;) AFAIK, the Java APIs explicitely mention when a method/class is not thread safe, so by default we should assume that APIs are to be thread safe. I mean, it's built in the laguage, why would you want to limit yourself to single threads when you're doing something as important as IO?

The real problem is what happens when someone calls one of the inherited method from FilterInputStream which are not overloaded (e.g. available).

Kaffe's answer was to create a mini-buffer with default access in FilterInputStream. It is obvious it is clumsy because it slightly changes the API, but the only other option would be to overload all read functions.


That is overhead for every FilterInputStream, that should be avoided.
But I agree that the Classpath way is not very clean.

Would a solution be to do the following in the constructor?

  public DataInputStream (InputStream in)
  {
    super (in.markSupported() ? in : new BufferedInputStream(in, 1));
  }

No. Even if the superstream supports marking, then a mark/rewind based solution would delete a previous mark set by user of DataInputStream class each time a \r occurs. That would be a quite ugly bug to find.

And then use the fact that in will always be a BufferedInputStream in
the readLine method? This does create a small overhead for
DataInputStream in case it isn't already wrapped around something that
buffers already, but it would be easy to make it always work correctly.

It doesn't necessarily mean it will always be a buffered input stream. There are input streams that support marking that are not BufferedInputStreams.

Another question is how often (ever?) does this (mixing readLine() with
other DataInputStream and/or FilterInputStream calls happen in real
programs. The regression test of kaffe looks contrived, it even uses
StringInputStream which is an ugly (and thankfully deprecated) class.

Happened in real life with very popular libraries like xerces. See the kaffe mailing list thread. User was not pleased at all, and wanted to revert back to broken 'read ahead, and unread' method. I had to figure out a way to fix both issues, that's the whole story. If someone can find a more elegant way to do it, that doesn't create any new bugs, I'm all ears. I've been there before and spent some time researching this, though. You may have more luck. ;)

cheers,
dalibor topic





reply via email to

[Prev in Thread] Current Thread [Next in Thread]