[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Dazuko-devel] Problems with RHEL 3 with hugemem kernel

From: Gerhard Sittig
Subject: Re: [Dazuko-devel] Problems with RHEL 3 with hugemem kernel
Date: Wed, 21 Sep 2005 18:48:16 +0200
User-agent: Mutt/1.4.1i-ja.1

* Juha Autero <address@hidden> [2005-09-19 19:07]:
> Our customer has a problem with dazuko. It causes kernel panic on Red
> Hat Enterprise Linux 3 with hugemem kernel. We manage to reproduce
> kernel panic in our end which probably means that it is not kernel
> configuration problem.
> Kernel panic happens when our program tries to connect Dazuko. We get
> following error messages to our log:
> Dazuko error: writing to device: Bad address
> [ ... ]
> What worries me is that when googling hugemem I found a blog comment
> that said:"Incidentally, the 4G/4G patch had the nice side-benefit of
> exposing numerous bugs in the use of user pointers in drivers, most of
> which were quickly resolved."
> <http://www.orablogs.com/mt-bin/mt-comments271276.cgi?entry_id=1363>

Well, Dazuko has not been having any problems with "use of user
pointers in drivers" for at least two years.  We are quite aware of
the fact that not all the world is Linux. :)

After searching the web I understand that the hugemem 4G/4G split
patch does not change the size of pointers, they still have 32 bits
(on ia32 that is).  So there should not be a problem passing them
around in long variables.

Translation between user space and kernel space addresses is already
done by means of copyin and copyout.  There is no direct use of user
pointers in the kernel module.

What you experience might be some kind of signedness problem.  The
hugemem approach increases the probability of applications using
"high" addresses above 2G.  I'm not sure how dazuko_strtoul() handles
these cases.  You may fetch a new version of dazuko_core.c from CVS
which changed dazuko_strtoul() to use and return unsigned long values.
If this does not remove the problem it would be interesting to learn
which addresses get mangled and what kind of damage they suffer from
(the code is in dazuko_core.c:dazuko_handle_user_request(), search for
"RA=").  Could you check the RA= or ra= text representation against
what the pointer looks like after dazuko_strtoul() conversion?
Resetting the ll_request and user_request pointers to NULL will even
avoid panics or faults and make the request fail immediately.

A different approach is to hand pointers from user space to the kernel
in %llu text representation and to internally use unsigned long long
for conversion inside the kernel.  This will be attacked next.  The
assumption that unsigned long long will always be big enough to hold
an address should be safe.  It's a pity that there is no clean and
portable way to detect the presence and type of uintptr_t. :(  The
int64_t/int32_t detection in dazuko_transport.c is a mess and
actually is only done to silence compiler warnings when assigning
between integer types and pointers.

Are you aware of some live/rescue system CD with a hugemem kernel on
it so we can easily reproduce the problem here?  That would be very
nice to diagnose the problem and confirm it's fixed.

virtually yours                                     Gerhard Sittig
pgp fingerprint AF29 3CD2 A531 F5A8 5F42  CB9A 1B7F 59F8 BA7A 9EE5
Gerhard Sittig
Software Engineer

H+BEDV Datentechnik GmbH
Lindauer Strasse 21, 88069 Tettnang, Germany
tel +49 (0) 7542-500500, fax +49 (0) 7542-500576

reply via email to

[Prev in Thread] Current Thread [Next in Thread]