bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#5915: [coreutils] Feature: Introduce fiemap copy to cp(1) and mv(1)


From: jeff.liu
Subject: bug#5915: [coreutils] Feature: Introduce fiemap copy to cp(1) and mv(1) v2
Date: Thu, 22 Apr 2010 22:44:14 +0800
User-agent: Thunderbird 2.0.0.14 (X11/20080505)

jeff.liu wrote:
> jeff.liu wrote:
>> Hi Jim,
>>
>> Thanks for your comments for my last patch submition!
>>
>> Now the fiemap copy logical changed to:
>> 1. If fiemap copy succeeds, go ahead with the meta data preserve procedure 
>> if they are user specified.
>> 2. If fiemap_copy fails and the 'normal_copy_required' is true, which means 
>> the ioctl(2) fails.
>> In this case, fall back to normal copy.  'bool normal_copy_required' in 
>> fiemap_copy_ok() is a new
>> argument added in this version used to determine whether normal copy require 
>> or not.
>> 3. If fiemap_copy fails for other reason, go to 'close_src_and_dst_desc' 
>> directly.
>>
>> I also add a test script to 'tests/cp/sparse-fiemap-copy'(I can not find out 
>> a better name at the
>> moment), it create a 41M sparse file on btrfs/ext4/ocfs2 separately, and 
>> then try copy them in those
>> 2 different ways.
>> It supply two result verify steps, the first is to compare the sum of 'sys 
>> and user' time by run
>> each copy via '/usr/bin/time', the fiemap copy should run faster than normal 
>> copy.
>> 2nd is compare the file size in bytes, the fiemap copied file size should be 
>> equal to the orignally.
>>
>> Would you please review the following patches at your convenient time?
>>
>> From 561b7b3413de16595fc71f0cba5bfa918b42f9e5 Mon Sep 17 00:00:00 2001
>> From: Jie Liu <address@hidden>
>> Date: Sun, 18 Apr 2010 14:31:20 +0800
>> Subject: [PATCH 1/2] Introduct fiemap ioctl(2) for efficient sparse file 
>> copy.
>>
>> Signed-off-by: Jie Liu <address@hidden>
>> ---
>>  src/copy.c   |  153 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  src/fiemap.h |  102 ++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 255 insertions(+), 0 deletions(-)
>>  create mode 100644 src/fiemap.h
>>
>> diff --git a/src/copy.c b/src/copy.c
>> index 3c32fa3..e7adecc 100644
>> --- a/src/copy.c
>> +++ b/src/copy.c
>> @@ -65,6 +65,10 @@
>>  # include <sys/ioctl.h>
>>  #endif
>>
>> +#ifndef HAVE_FIEMAP
>> +# include "fiemap.h"
>> +#endif
>> +
>>  #ifndef HAVE_FCHOWN
>>  # define HAVE_FCHOWN false
>>  # define fchown(fd, uid, gid) (-1)
>> @@ -151,6 +155,133 @@ clone_file (int dest_fd, int src_fd)
>>  #endif
>>  }
>>
>> +#ifdef __linux__
>> +# ifndef FS_IOC_FIEMAP
>> +#  define FS_IOC_FIEMAP _IOWR ('f', 11, struct fiemap)
>> +# endif
>> +/* Perform FIEMAP(available in mainline 2.6.27) copy if possible.
>> +   Call ioctl(2) with FS_IOC_FIEMAP to efficiently map file allocation
>> +   excepts holes.  So the overhead to deal with holes with lseek(2) in
>> +   normal copy could be saved.  This would result in much faster backups
>> +   for any kind of sparse file.  */
>> +static bool
>> +fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size,
>> +                off_t src_total_size, char const *src_name,
>> +                char const *dst_name, bool *normal_copy_required)
>> +{
>> +  bool fail = false;
>> +  bool last = false;
>> +  char fiemap_buf[4096];
>> +  struct fiemap *fiemap = (struct fiemap *)fiemap_buf;
>> +  struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
>> +  uint32_t count = (sizeof (fiemap_buf) - sizeof (*fiemap)) /
>> +                    sizeof (struct fiemap_extent);
>> +  unsigned int i;
>> +  off_t last_ext_logical = 0;
>> +  uint64_t last_ext_len = 0;
>> +  uint64_t last_read_size = 0;
>> +
>> +  do
>> +    {
>> +      fiemap->fm_start = 0ULL;
>> +      fiemap->fm_length = FIEMAP_MAX_OFFSET;
>> +      fiemap->fm_extent_count = count;
>> +
>> +      /* When ioctl(2) fails for any reason, fall back to the normal copy.  
>> */
>> +      if (ioctl (src_fd, FS_IOC_FIEMAP, (unsigned long) fiemap) < 0)
>> +        {
>> +          *normal_copy_required = true;
>> +          return false;
>> +        }
>> +
>> +      /* If 0 extents are returned, then more ioctls are not needed.  */
>> +      if (fiemap->fm_mapped_extents == 0)
>> +          return true;
> This is a bug I just tried out. 'return true' is not enough if '0' extents 
> returned.
> Below is the test case could reveal this issue, it just seek(2) but not write 
> any byte, its extent
> count is *zero* because there is no data blocks allocated from the extent 
> tree.  In this case,
> I should truncate it to the src_desc.st_size, this issue will be fixed in my 
> next post.
> dd bs=1 seek=10240 of=/ocfs2/sparse </dev/null 2>/dev/null
> 
> Besides, I have tried this case on ocfs2,ext4 and btrfs, the first 2 file 
> systems could handle this
> kind of sparse file properly, but btrfs actually allocaed an extent for it.  
> IMHO, maybe this is a
> btrfs issue, I will post it to the mail-list for further discussion.
FYI, Tao has submitted a patch against the issue which I mentioned above.
For detail, please check the following link:
http://permalink.gmane.org/gmane.comp.file-systems.btrfs/5416

I have tried this fix, it works fine.  Besides that, looks this fix also solved 
the performance
issue for fiemap copy vs normal copy on btrfs:
address@hidden:~/opensource_dev/coreutils$ ls -l /btrfs/sparse_file
-rw-r--r-- 1 jeff jeff 41947136 Apr 19 16:54 /btrfs/sparse_file
address@hidden:~/opensource_dev/coreutils$ time /bin/cp --sparse=always 
/btrfs/sparse_file
/btrfs/sparse_normal_copy

real    0m0.208s
user    0m0.016s
sys     0m0.164s
address@hidden:~/opensource_dev/coreutils$ time ./src/cp --sparse=always 
/btrfs/sparse_file
/btrfs/sparse_fiemap_copy

real    0m0.050s
user    0m0.000s
sys     0m0.040s

Below is the updated patch, it fixed 2 issues:
1. If the extent count is 0, we should truncated it to the "src_desc (struct 
stat)buf.st_size" if
possible instead of return.
2. I just think of another possible issue, if ioctl(2) fails, but it is not the 
first time in the
loop.  In this point, we also need to stop the copy rather than fall back to 
the normal one.
Otherwise, the destination file will be corrupted as well, Am I right?


>From 9d049ceef3cfefbf3a85254044458dd9a39eb944 Mon Sep 17 00:00:00 2001
From: Jie Liu <address@hidden>
Date: Thu, 22 Apr 2010 21:48:09 +0800
Subject: [PATCH 1/1] Add fiemap support for efficient sparse file copy.

Signed-off-by: Jie Liu <address@hidden>
---
 src/copy.c |  158 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 158 insertions(+), 0 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index 3c32fa3..5f2da52 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -65,6 +65,10 @@
 # include <sys/ioctl.h>
 #endif

+#ifndef HAVE_FIEMAP
+# include "fiemap.h"
+#endif
+
 #ifndef HAVE_FCHOWN
 # define HAVE_FCHOWN false
 # define fchown(fd, uid, gid) (-1)
@@ -151,6 +155,138 @@ clone_file (int dest_fd, int src_fd)
 #endif
 }

+#ifdef __linux__
+# ifndef FS_IOC_FIEMAP
+#  define FS_IOC_FIEMAP _IOWR ('f', 11, struct fiemap)
+# endif
+/* Perform FIEMAP(available in mainline 2.6.27) copy if possible.
+   Call ioctl(2) with FS_IOC_FIEMAP to efficiently map file allocation
+   excepts holes.  So the overhead to deal with holes with lseek(2) in
+   normal copy could be saved.  This would result in much faster backups
+   for any kind of sparse file.  */
+static bool
+fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size,
+                off_t src_total_size, char const *src_name,
+                char const *dst_name, bool *normal_copy_required)
+{
+  bool fail = false;
+  bool last = false;
+  char fiemap_buf[4096];
+  struct fiemap *fiemap = (struct fiemap *)fiemap_buf;
+  struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
+  uint32_t count = (sizeof (fiemap_buf) - sizeof (*fiemap)) /
+                    sizeof (struct fiemap_extent);
+  off_t last_ext_logical = 0;
+  uint64_t last_ext_len = 0;
+  uint64_t last_read_size = 0;
+  unsigned int n_ioctl = 0;
+  unsigned int i;
+
+  do
+    {
+      fiemap->fm_start = 0ULL;
+      fiemap->fm_length = FIEMAP_MAX_OFFSET;
+      fiemap->fm_extent_count = count;
+
+      /* When ioctl(2) fails, if this is the frist time we met, fall back
+         to the normal copy.  */
+      if (ioctl (src_fd, FS_IOC_FIEMAP, (unsigned long) fiemap) < 0)
+        {
+          if (n_ioctl < 1)
+            *normal_copy_required = true;
+          return false;
+        }
+
+      ++n_ioctl;
+
+      /* If 0 extents are returned, then more ioctls are not needed.  */
+      if (fiemap->fm_mapped_extents == 0)
+        break;
+
+      for (i = 0; i < fiemap->fm_mapped_extents; i++)
+        {
+          assert (fm_ext[i].fe_logical <= OFF_T_MAX);
+
+          off_t ext_logical = fm_ext[i].fe_logical;
+          uint64_t ext_len = fm_ext[i].fe_length;
+
+          if (lseek (src_fd, ext_logical, SEEK_SET) < 0LL)
+            {
+              error (0, errno, _("cannot lseek %s"), quote (src_name));
+              return fail;
+            }
+
+          if (lseek (dest_fd, ext_logical, SEEK_SET) < 0LL)
+            {
+              error (0, errno, _("cannot lseek %s"), quote (dst_name));
+              return fail;
+            }
+
+          if (fm_ext[i].fe_flags & FIEMAP_EXTENT_LAST)
+            {
+              last_ext_logical = ext_logical;
+              last_ext_len = ext_len;
+              last = true;
+            }
+
+          while (0 < ext_len)
+            {
+              char buf[buf_size];
+
+              /* Avoid reading into the holes if the left extent
+                 length is shorter than the buffer size.  */
+              if (ext_len < buf_size)
+                buf_size = ext_len;
+
+              ssize_t n_read = read (src_fd, buf, buf_size);
+              if (n_read < 0)
+                {
+#ifdef EINTR
+                  if (errno == EINTR)
+                    continue;
+#endif
+                  error (0, errno, _("reading %s"), quote (src_name));
+                  return fail;
+                }
+
+              if (n_read == 0)
+                {
+                  /* Figure out how many bytes read from the last extent.  */
+                  last_read_size = last_ext_len - ext_len;
+                  break;
+                }
+
+              if (full_write (dest_fd, buf, n_read) != n_read)
+                {
+                  error (0, errno, _("writing %s"), quote (dst_name));
+                  return fail;
+                }
+
+              ext_len -= n_read;
+            }
+
+          fiemap->fm_start = (fm_ext[i-1].fe_logical + fm_ext[i-1].fe_length);
+        }
+    } while (! last);
+
+  /* If a file ends up with holes, the sum of the last extent logical offset
+     and the read-returned size will be shorter than the actual size of the
+     file.  Use ftruncate to extend the length of the destination file.  */
+  if (last_ext_logical + last_read_size < src_total_size)
+    {
+      if (ftruncate (dest_fd, src_total_size) < 0)
+        {
+          error (0, errno, _("extending %s"), quote (dst_name));
+          return fail;
+        }
+    }
+
+  return ! fail;
+}
+#else
+static bool fiemap_copy_ok (ignored) { errno == ENOTSUP; return false; }
+#endif
+
 /* FIXME: describe */
 /* FIXME: rewrite this to use a hash table so we avoid the quadratic
    performance hit that's probably noticeable only on trees deeper
@@ -679,6 +815,25 @@ copy_reg (char const *src_name, char const *dst_name,
 #endif
         }

+      if (make_holes)
+        {
+          bool require_normal_copy = false;
+          /* Perform efficient FIEMAP copy for sparse files, fall back to the
+             standard copy only if the ioctl(2) fails.  */
+          if (fiemap_copy_ok (source_desc, dest_desc, buf_size,
+                              src_open_sb.st_size, src_name,
+                              dst_name, &require_normal_copy))
+            goto preserve_metadata;
+          else
+            {
+              if (! require_normal_copy)
+                {
+                  return_val = false;
+                  goto close_src_and_dst_desc;
+                }
+            }
+        }
+
       /* If not making a sparse file, try to use a more-efficient
          buffer size.  */
       if (! make_holes)
@@ -807,6 +962,8 @@ copy_reg (char const *src_name, char const *dst_name,
         }
     }

+preserve_metadata:
+
   if (x->preserve_timestamps)
     {
       struct timespec timespec[2];
@@ -897,6 +1054,7 @@ close_src_desc:

   free (buf_alloc);
   free (name_alloc);
+
   return return_val;
 }

-- 
1.5.4.3



Best Regards,
-Jeff






reply via email to

[Prev in Thread] Current Thread [Next in Thread]