bug-ddrescue
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-ddrescue] Hanging ddrescue (infinite read operations)


From: Antonio Diaz Diaz
Subject: Re: [Bug-ddrescue] Hanging ddrescue (infinite read operations)
Date: Tue, 16 Jan 2018 17:03:59 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.9.1.19) Gecko/20110420 SeaMonkey/2.0.14

Hi Linus,

Linus Lüssing wrote:
First of all, I need to complain vehemently: GNU ddrescue works
too well :-). Now for the third time, it saved one of my neighbors
data! How should people learn to make backups if such an
awesome tool like ddrescue exists? :P

How true! :-)


Just kidding ;) - you guys are awesome, GNU ddrescue is one of the
most valuable (and still too unknown) pieces of free software in my
opinion.

Thanks.


During these hangs, the Ctrl-C would do nothing and even a SIGKILL
would not kill ddrescue. The SATA-to-USB adapter would continue
flashing its blue LED, seemingly still trying to read.

We have already had bad experiences with USB adapters in this list. (The advice is to plug the drive directly to the motherboard). But in this case there seems to be also a bug in the kernel driver regarding SIGKILL. According to POSIX, SIGKILL cannot be handled or ignored. The GNU C library manual even states that: "In fact, if 'SIGKILL' fails to terminate a process, that by itself constitutes an operating system bug which you should report."


Question A): Would it be possible to reset the operation from
software somehow? A timeout in ddrescue? Or does this sound like a
hangup on an even lower level, the Linux kernel (I was using a
4.14.12 kernel on a 32bit ARM device, an Odroid U3) or maybe even
the disk and/or SATA-USB adapter so that power cycling the disk /
reconnecting the adapter is the only choice?

The kernel driver for a device should know and implement whatever timeout required for that type of device. The problem, I think, is that USB is not a device, but a communication bus, and maybe the driver just sits and waits forever. In any case, if SIGKILL fails to terminate ddrescue, there is nothing that ddrescue can do.


Another observation: During the trimming and scraping phases (so
with the chunk size of 1 / 512B instead of 128x 512B chunks?) I
did not experience those tedious hangs anymore. Could it be a
firmware bug happening when requesting larger chunks?

Maybe. Next time maybe you could try if --cluster-size=1 prevents the hangs during the copying phase.


Also, after pulling the USB cable, ddrescue unfroze and exited
with an error, as expected.

This seems consistent with "the driver just sits and waits forever" (until the connection is interruped).


Regarding the unplugging I also noticed: Pulling without a
previous Ctrl+C seemed like a bad idea. This lead to ddrescue
adding many Megabytes of false negatives to the mapfile.

Question B): Would it be possible to prevent this?

Yes, using --reopen-on-error, --max-error-rate or --max-bad-areas. --reopen-on-error should return immediately reporting "Can't reopen input file". (Maybe --reopen-on-error sould be enabled by default).


For the Ctrl+C and then unplugging I noticed: Sometimes it exits
with an "interrupted by user", sometimes with a "input file/device
vanished". I couldn't figure out when one or the other might
happen, the result was seemingly random.

It depends on how fast the kernel removes the device name from /dev. ddrescue stats the device name after each read error and, if it still exists, moves to the next block (and then exits with "interrupted by user").


Also it seemed, that only for the latter exit case a bad cluster
was added to the mapfile? Which was the desirable result for me as
this was indeed a cluster hanging forever. For the "interrupted by
user" case it seemed that (usually?) no error was added to the
mapfile. Does that make sense?

If ddrescue is blocked in the read call when unplugging, it should always mark the block as "bad" (non-trimmed, etc) in the mapfile. The user interrupt is checked before making the read call. Maybe the USB adapter is returning fake data, tricking ddrescue into marking the block as finished in the "interrupted by user" case?


Best regards,
Antonio.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]