bug-ddrescue
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Suggested feature + implementation: randomize order of bad sector ac


From: Franc Zabkar
Subject: Re: Suggested feature + implementation: randomize order of bad sector access in retry phase
Date: Sun, 19 Jun 2022 03:01:17 +1000
User-agent: Mozilla/5.0 (Windows NT 10.0; rv:91.0) Gecko/20100101 Thunderbird/91.10.0

Your approach is only meaningful for those drives of 30 years ago or more. Those used stepper motors for positioning the heads, and these were affected by gear lash. (Were you perhaps misguided by the technical documentation for Spinrite?)

Any drive manufactured since 1990 or so uses a voice coil positioner. These will settle on the same position irrespective of which direction they approach the track. In fact the drive's own error correction mechanism deliberately injects both positive and negative track offsets to account for the possibility of offtrack writes.

See 13 & 14 in the following sequence of error recovery operations for the Conner Peripherals drive. Operations 3 & 4 presumably adjust the read amplification, while 4 & 5 probably adjust the timing of the data clock.

The second drive is a recent HGST model, which happens to be a rebadged Western Digital model.

--------------------------------------------------------------------
Conner Filepro Performance Series CFP1060E/CFP1060S/CFP1060W Product Manual:

http://www.codemicro.com/support/disc/manuals/scsi/1060rev.pdf

Read Retry Operations

The drive retries data field read operations in the following sequence if it detects an error which cannot be corrected on-the-fly. The default retry algorithm repeats eight times for a total of 128 retries or until the data is recovered.

1. Initial read
2. First retry
3. Read retry with data threshold offset +1
4. Read retry with data threshold offset -1
5. Read retry with data window offset +1
6. Read retry with data window offset -1
7. Write Spash
8. Read retry with data threshold offset +2
9. Read retry with data threshold offset -2
10. Read retry with data window offset +2
11. Read retry with data window offset -2
12. Normal read retry
13. Read retry with servo offset +8%      <-----
14. Read retry with servo offset -8%      <-----
15. Normal read retry
16. Software (2-burst) EDAC correction attempt

If retries are disabled, the drive retries the operation one time before it reports an error.
-----------------------------------------------------------------------

HGST Travelstar Z5K1 (HTS541010B7E610, rebadged Western Digital Palmer family) Technical Reference Manual:

https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/hgst/travelstar-z5k-series/product-manual-travelstar-z5k1.pdf

3.7.9 Error Recovery Process

The drive has the following means of Error Recovery:

- One-the-Fly Recovery:

Using LDPC's internal recovery mechanism with Global and Local Iteration for Error Recovery and Correction. This recovery is real time and does not require any additional re-reads for the correction.

- Simple Firmware Assist Recovery:

Single variable re-reads involving off-track recoveries and synch mark retires.

- Extended Firmware Assist Recovery:

This retry procedure will step through a combination of positive/negative track offsets and VGA DAC manipulations to recover the data. In addition, further retries will also involve Low Pass Filter (FIR) manipulations, Timing recovery, and Synch Error Recoveries deeper into the retry process. Furthermore, in specific retry steps Hardware assist is also involved for Extended Retry involving more modification of FIR filters & Erasure Sweep.

- LDPC Erasure Sweep Correction:

This step sweeps the entire data stream with a Erasure Length and a Sliding Window, both programmable, to maximize the correction capability.

When an extended retry operation is successful, the controller continues with the command. The controller clears any changes during the F/W Assist Recovery before commencing to the next operation.
-----------------------------------------------------------------------


On 18/06/2022 4:25 am, Stephen Stair wrote:
I've implemented a feature to optionally randomize the order of sector
accesses in the retry phase
It operates by allocating a list of 64k sector addresses, first scanning the
bad sector list to add entries to the list, then randomizing the order of
the list and fetching each sector in the list in sequence, repeating this
until it reaches the end of the volume.
It scans in forward/backwards order depending on the pass although this is
less relevant for a random sweep.
It's also not a completionist so canceling and resuming may skip sectors,
but again, less relevant for a random sweep.
The decision was made that skipping sectors is preferable to retrying
sectors, as some HDDs may contain sectors that cause the control interface
to fail when read.

The theory behind this feature is that perhaps the subtle differences in
head position due to seeking to a sector from different offsets may improve
the chances of getting a different read of the data and succeeding in
recovery faster than it would otherwise.
It was a feature that I and some of my friends were interested in having so
hopefully it finds some use elsewhere too.

I've been running this code on my system and it seems to work well, but it
hasn't been exhaustively tested so I make no guarantees.
You are welcome to use / modify / distribute / relicense the code/patch
without limits.

Thanks,
-Stephen

Unified diff follows:

Common subdirectories: ddrescue-1.26-original/doc and
ddrescue-1.26-patch/doc
diff -u ddrescue-1.26-original/main.cc ddrescue-1.26-patch/main.cc
--- ddrescue-1.26-original/main.cc      2022-01-21 04:48:37.000000000 -0800
+++ ddrescue-1.26-patch/main.cc 2022-06-17 10:35:15.887395900 -0700
@@ -125,6 +125,7 @@
                 "  -t, --truncate                 truncate output file to
zero size\n"
                 "  -T, --timeout=<interval>       maximum time since last
successful read\n"
                 "  -u, --unidirectional           run all passes in the same
direction\n"
+               "  -U, --random-retry-order       randomize the order
sectors are tried in retry\n"
                 "  -v, --verbose                  be verbose (a 2nd -v gives
more)\n"
                 "  -w, --ignore-write-errors      make fill mode ignore
write errors\n"
                 "  -x, --extend-outfile=<bytes>   extend outfile size to be
at least this long\n"
@@ -842,6 +843,7 @@
      { 't', "truncate",             Arg_parser::no  },
      { 'T', "timeout",              Arg_parser::yes },
      { 'u', "unidirectional",       Arg_parser::no  },
+    { 'U', "random-retry-order",   Arg_parser::no  },
      { 'v', "verbose",              Arg_parser::no  },
      { 'V', "version",              Arg_parser::no  },
      { 'w', "ignore-write-errors",  Arg_parser::no  },
@@ -920,6 +922,7 @@
        case 't': o_trunc = O_TRUNC; break;
        case 'T': rb_opts.timeout = parse_time_interval( arg, pn ); break;
        case 'u': rb_opts.unidirectional = true; break;
+      case 'U': rb_opts.randomize_retry_order = true; srand( time( NULL )
); break;
        case 'v': if( verbosity < 4 ) ++verbosity; break;
        case 'V': show_version(); return 0;
        case 'w': fb_opts.ignore_write_errors = true; break;
diff -u ddrescue-1.26-original/rescuebook.cc
ddrescue-1.26-patch/rescuebook.cc
--- ddrescue-1.26-original/rescuebook.cc        2022-01-21
04:48:37.000000000 -0800
+++ ddrescue-1.26-patch/rescuebook.cc   2022-06-17 10:35:15.893897100 -0700
@@ -579,8 +579,9 @@
      first_post = true;
      snprintf( msgbuf + msglen, ( sizeof msgbuf ) - msglen, "%d %s",
                pass - first_pass + 1, forward ? "(forwards)" : "(backwards)"
);
-    int retval = forward ? fcopy_errors( msgbuf, pass, resume ) :
-                           rcopy_errors( msgbuf, pass, resume );
+    int retval = randomize_retry_order ? randcopy_errors( msgbuf, pass,
resume, forward ) :
+                                         forward ? fcopy_errors( msgbuf,
pass, resume ) :
+                                                   rcopy_errors( msgbuf,
pass, resume );
      if( retval != -3 ) return retval;
      resume = false;
      if( !unidirectional ) forward = !forward;
@@ -663,6 +664,89 @@
    return -3;
    }

+// Return values: 1 I/O error, 0 OK, -1 interrupted, -2 mapfile error.
+// Read the damaged areas one sector at a time in a random pattern
+//
+int Rescuebook::randcopy_errors( const char * const msg, const int pass,
+                              const bool resume, const bool forward )
+  {
+  long long pos = forward ? 0 : LLONG_MAX - hardbs();
+  bool block_found = false;
+  int have_sectors = 0;
+
+  if( resume && domain().includes( current_pos() ) )
+    {
+    Block b( current_pos() + ( forward ? 0 : -1 ), 1 );
+    if( forward ) find_chunk( b, Sblock::bad_sector, domain(), hardbs() );
+    if( !forward ) rfind_chunk( b, Sblock::bad_sector, domain(), hardbs()
);
+    if( b.size() > 0 ) pos = forward ? b.pos() : b.end();              //
resume at this entire block
+    }
+
+  if( !randomize ) randomize = new Randomize_details();
+
+  while( pos >= 0 )
+    {
+    // Generate a list of sector numbers to test, up to the maximum size of
the random sector list.
+    have_sectors = 0;
+    while( pos >= 0 && have_sectors < randomize->max_sectors )
+    {
+      Block b( pos, hardbs() );
+      if( forward ) find_chunk( b, Sblock::bad_sector, domain(), hardbs()
);
+      if( !forward ) rfind_chunk( b, Sblock::bad_sector, domain(), hardbs()
);
+      if( b.size() <= 0 ) { pos = -1; break; }                 // no more
blocks
+      block_found = true;
+      // Translate all the sectors in this block into the sector location
list
+      if( forward )
+        {
+        if( pos < b.pos() ) pos = b.pos();
+        while( pos < b.end() && have_sectors < randomize->max_sectors )
+          {
+          randomize->sector_list[ have_sectors++ ] = pos;
+          pos += hardbs();
+          }
+        }
+      else
+        {
+        if( pos >= b.end() ) pos = b.end() - hardbs();
+        while( pos >= b.pos() && have_sectors < randomize->max_sectors )
+          {
+          randomize->sector_list[ have_sectors++ ] = pos;
+          pos -= hardbs();
+          }
+        }
+    }
+
+    // Randomly reorder the sector list
+    // For portability just use the built-in libc rand.
+    for( int sector_index = 0; sector_index < (have_sectors-1);
sector_index++ )
+      {
+      // Swap places between [sector_index] and a random index >=
sector_index
+      int swap_with = rand() % (have_sectors - sector_index) +
sector_index;
+      if( sector_index != swap_with )
+        {
+        long long temp = randomize->sector_list[sector_index];
+        randomize->sector_list[sector_index] =
randomize->sector_list[swap_with];
+        randomize->sector_list[swap_with] = temp;
+        }
+      }
+
+    // Iterate over the randomized sector list and try to copy sector
contents
+    for( int sector_index = 0; sector_index < have_sectors; sector_index++
)
+      {
+      int copied_size = 0, error_size = 0;
+      Block b( randomize->sector_list[sector_index], hardbs() );
+      const int retval = copy_and_update( b, copied_size, error_size, msg,
+                                          retrying, pass, true );
+      if( retval ) return retval;
+      update_rates();
+      if( error_size > 0 && pause_on_error > 0 ) do_pause_on_error();
+      if( !update_mapfile( odes_ ) ) return -2;
+      }
+    }
+  if( !block_found ) return 0;
+  return -3;
+  }
+

  // Return true if slow read.
  //
diff -u ddrescue-1.26-original/rescuebook.h ddrescue-1.26-patch/rescuebook.h
--- ddrescue-1.26-original/rescuebook.h 2022-01-21 04:48:37.000000000 -0800
+++ ddrescue-1.26-patch/rescuebook.h    2022-06-17 10:35:15.898397900 -0700
@@ -42,6 +42,22 @@
      }
    };

+class Randomize_details
+  {
+public:
+  const int max_sectors = 65536;
+  long long * sector_list;
+
+  Randomize_details()
+    {
+      sector_list = new long long[max_sectors];
+    }
+  ~Randomize_details()
+    {
+      delete[] sector_list;
+    }
+
+  };

  struct Rb_options
    {
@@ -78,6 +94,7 @@
    bool sparse;
    bool try_again;
    bool unidirectional;
+  bool randomize_retry_order;
    bool verify_on_error;

    Rb_options()
@@ -91,7 +108,8 @@
        noscrape( false ), notrim( false ), reopen_on_error( false ),
        reset_slow( false ), retrim( false ), reverse( false ),
        same_file( false ), simulated_poe( false ), sparse( false ),
-      try_again( false ), unidirectional( false ), verify_on_error( false )
+      try_again( false ), unidirectional( false ), randomize_retry_order(
false ),
+      verify_on_error( false )
        {}

    bool operator==( const Rb_options & o ) const
@@ -120,6 +138,7 @@
                 simulated_poe == o.simulated_poe &&
                 sparse == o.sparse && try_again == o.try_again &&
                 unidirectional == o.unidirectional &&
+               randomize_retry_order == o.randomize_retry_order &&
                 verify_on_error == o.verify_on_error ); }
    bool operator!=( const Rb_options & o ) const
      { return !( *this == o ); }
@@ -155,6 +174,7 @@
    Sliding_average sliding_avg;         // variables for show_status
    bool first_post;                     // first read in current pass
    bool first_read;                     // first read overall
+  Randomize_details * randomize;

    void change_chunk_status( const Block & b, const Sblock::Status st );
    void do_pause_on_error();
@@ -182,6 +202,8 @@
    int copy_errors();
    int fcopy_errors( const char * const msg, const int pass, const bool
resume );
    int rcopy_errors( const char * const msg, const int pass, const bool
resume );
+  int randcopy_errors( const char * const msg, const int pass, const bool
resume, const bool forward );
+
    bool update_rates( const bool force = false );
    void show_status( const long long ipos, const char * const msg = 0,
                      const bool force = false );
Common subdirectories: ddrescue-1.26-original/testsuite and
ddrescue-1.26-patch/testsuite





reply via email to

[Prev in Thread] Current Thread [Next in Thread]