bug-ddrescue
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-ddrescue] ddrescue 10x slow under osx


From: Florian Sedivy
Subject: Re: [Bug-ddrescue] ddrescue 10x slow under osx
Date: Thu, 20 Dec 2012 19:11:50 +0100

Hola Antonio, 

Am 19.12.2012 um 17:42 schrieb Antonio Diaz Diaz:
Hola Florian,

Florian Sedivy wrote:
Given that - judging from the participants of the mailing list - approximately half of ddrescue's users are on OS X,
Or maybe OS X users need more help than other users. ;-)

Probably yes, if "other users" excludes those, who have their first encounter with a shell just after inserting an Emergency-Boot-CD … 8-|

Something else: cat is quite a generic command, and still it manages to somehow select the optimal Copy Block Size for a raw character device on OS X. If the relevant code is not platform specific, it might contain some nice ideas.
If it is the cat from GNU coreutils, it probably includes platform specific code for every system under the sun through gnulib. But I'll look at the code just in case. :-)

In fact its is BSD's cat. I had a look and the relevant code is:
		if (fstat(wfd, &sbuf))
			err(1, "stdout");
		bsize = MAX(sbuf.st_blksize, BUFSIZ); 
	later:	read(rfd, buf, bsize)
So cat takes st_blksize from stat() and the value of the BUFSIZ macro and uses the bigger one. (cat specifically stats the output and uses the result for both read and write. To find a suitable value for ddrescues purpose, one would rather stat the input file.)
The author probably read (or wrote?) http://www.delorie.com/gnu/docs/glibc/libc_226.html :
Macro: int BUFSIZ 
The value of this macro is an integer constant _expression_ that is good to use for the size argument to setvbuf. This value is guaranteed to be at least 256. 
The value of BUFSIZ is chosen on each system so as to make stream I/O efficient. So it is a good idea to use BUFSIZ as the size for the buffer when you call setvbuf. 
Actually, you can get an even better value to use for the buffer size by means of the fstat system call: it is found in the st_blksize field of the file attributes. See section 14.9.1 The meaning of the File Attributes. 

Enhancement proposal:
How about ddrescue determining the default Copy Block Size this way? (Alternatively -c 0 could switch to this "Auto detection", but I doubt anybody would miss the hardcoded 64KiB default.) While fast transfer may not be ddrescue's main purpose, with dying hard drives it's always a race against time. 

File type, reported stat.st_blksize values and speed effect observed on my system:
regular (HFS+) files 4096  always read at maximum speed, regardless of dd(rescue)'s Copy Block Size
/dev/disk… block devices  2048 always read slow, regardless of Copy Block Size, as if you were using 4KiB CBS
/dev/rdisk… raw character devices  131072  =128KiB, which is in my testing actually the smallest CBS giving maximum speed

I don't know the value of BUFSIZ, but stat.st_blksize seems to work better (+15%) or equal than the 64KiB default. 
The only time when setting a CBS bigger than stat.st_blksize improved speed even further was when I accessed a sparse image that was attached as a device. 

Why block devices on OS X behave like this, I don't know. As long as they do, however, they should be avoided when using tools like ddrescue. (I also suspect, that with /dev/disk… read errors are always reported in the first sector requested, instead of where they really occurred.)

The big question is: where would be the right spot for this
information, if you want to keep even the documentation strictly
OS-agnostic?
I'll add some of this information to the ddrescue manual (for example that raw character devices may have a defined size). But, given that most of the information is not specific to ddrescue but to accessing discs on OS X, I guess the right spot for it would be some OS X site or forum.

Documentation proposal:
If the results of my testing are general for all OS X operation, then ddrescue's -c --cluster-size option only has an effect at all on OS X, when used with /dev/rdisk… raw character devices. Add the much better speed that can be achieved by using a sensible value for the Copy Block Size together /dev/rdisk… and I think this would be specific enough to advise ddrescue users on OS X to always specify drives and partitions by their /dev/rdisk… descriptor instead of /dev/disk… 

  • A good place for that would be in the "Basic Concepts" section in items "Device" and "Partition". ("On OS X use /dev/rdisk0, /dev/rdisk1, etc.")
  • Making every example multi-flavored would be too much maybe. But giving at least the three examples in the "Algorithm" chapter both in Linux and in OS X style (other Unixes aside), would make the documentation less OS-specific . 
  • The description of the -c --cluster-size option could add some information about the circumstances needed for it to have an effect. (What about other OS: does -c have an effect on speed with regular files on Linux? The item about -d includes a note on regular files, -c does not.) 
  • Finally the chapter "Direct Disk Access" could give some technical background to the situation on OS X and warn users of using the /dev/disk… block devices.

I can write some text proposals, If needed. It may be beneficial however, to have a more complete picture (other OS, other opinions) before putting it in words. 

---------------
I noticed speed differences long ago, found values that worked for me and never investigated any further. Maybe now is the time. 

Greetings, 
Florian

reply via email to

[Prev in Thread] Current Thread [Next in Thread]