|
From: | GNU bug Tracking System |
Subject: | bug#8061: closed (Introduce SEEK_DATA/SEEK_HOLE to extent_scan module) |
Date: | Fri, 26 Jun 2020 02:16:02 +0000 |
Your message dated Thu, 25 Jun 2020 19:15:21 -0700 with message-id <3bc2236c-4e41-7f55-5a63-f4221d38041b@cs.ucla.edu> and subject line Re: Introduce SEEK_DATA/SEEK_HOLE to extent_scan module has caused the debbugs.gnu.org bug report #8061, regarding Introduce SEEK_DATA/SEEK_HOLE to extent_scan module to be marked as done. (If you believe you have received this mail in error, please contact help-debbugs@gnu.org.) -- 8061: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8061 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems
--- Begin Message ---Subject: Introduce SEEK_DATA/SEEK_HOLE to extent_scan module Date: Thu, 17 Feb 2011 21:57:14 +0800
Hello All,This is the first try to introduce the SEEK_DATA/SEEK_HOLE support to extent_scan module for efficient sparse file copy on ZFS, I have delayed it for a long time, sorry for that!Below is the code change lists:src/extent_scan.h: add a new structure item 'src_total_size' to "struct extent_info", since I have to make use of this value to determinea file is sparse of not for the initial scan. If the returns of lseek(fd, 0, SEEK_HOLE) is equal to the src_total_size or large than it, it means the source fileis definitely not a sparse file or maybe it is a sparse file but it does not make sense for proceeding scan read.another change in this file is the signature of extent_scan_init(), just as I mentioned above, it need to accept the src_total_size variable.src/extent_scan.c: implement the new exent_scan_read() through SEEK_DATA/SEEK_HOLE, it will be called if those two values are defined at <unistd.h>.src/copy.c: pass src_total_size to extent_scan_init().On my test environment, Solaris10, SunOS 5.10 Generic_142910-17, I have tried a few simple cases, they are works to me.For now, I am using diff(1) to verify the copy result, does anyone know some utilities can be used to write the test script?I have sent an email to ZFS DEV mail-list to ask this question yesterday, a nice guy suggest me to use ZDB(http://cuddletech.com/blog/?p=407) for that, I'mstill study this utility now, I also noticed there is patch to add SEEK_HOLE/SEEK_DATA support to os module in Python community, please refer to:but it require very latest python build I think, so could anyone give some other advices in this point?The patch is shown as following, any help testing and comments are appreciated!!Thanks,-JeffFrom: Jie Liu <jeff.liu@oracle.com>Date: Thu, 17 Feb 2011 21:14:23 +0800Subject: [PATCH 1/1] copy: add SEEK_DATA/SEEK_HOLE support to extent_scan module* src/extent_scan.h: add src_total_size to struct extent_info, we needto check the SEEK_HOLE result against it for initial extent scan.modify the extent_scan_init() signature, to add size_t src_total_size.* src/extent_scan.c: implement a new extent_scan_read() through SEEK_DATAand SEEK_HOLE.* src/copy.c: pass src_total_size to extent_scan_init().Signed-off-by: Jie Liu <jeff.liu@oracle.com>---src/copy.c | 2 +-src/extent-scan.c | 113 ++++++++++++++++++++++++++++++++++++++++++++++++++++-src/extent-scan.h | 9 +++-3 files changed, 120 insertions(+), 4 deletions(-)diff --git a/src/copy.c b/src/copy.cindex 104652d..22b9911 100644--- a/src/copy.c+++ b/src/copy.c@@ -306,7 +306,7 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,We may need this at the end, for a final ftruncate. */off_t dest_pos = 0;- extent_scan_init (src_fd, &scan);+ extent_scan_init (src_fd, src_total_size, &scan);*require_normal_copy = false;bool wrote_hole_at_eof = true;diff --git a/src/extent-scan.c b/src/extent-scan.cindex 1ba59db..ffeab7a 100644--- a/src/extent-scan.c+++ b/src/extent-scan.c@@ -32,13 +32,17 @@/* Allocate space for struct extent_scan, initialize the entries ifnecessary and return it as the input argument of extent_scan_read(). */extern void-extent_scan_init (int src_fd, struct extent_scan *scan)+extent_scan_init (int src_fd, size_t src_total_size,+ struct extent_scan *scan){scan->fd = src_fd;scan->ei_count = 0;scan->scan_start = 0;scan->initial_scan_failed = false;scan->hit_final_extent = false;+#if defined(SEEK_HOLE) && defined(SEEK_DATA)+ scan->src_total_size = src_total_size;+#endif}#ifdef __linux__@@ -106,6 +110,113 @@ extent_scan_read (struct extent_scan *scan)return true;}+#elif defined(SEEK_HOLE) && defined(SEEK_DATA)+extern bool+extent_scan_read (struct extent_scan *scan)+{+ off_t data_pos, hole_pos;+ union { struct extent_info ei; char c[4096]; } extent_buf;+ struct extent_info *ext_info = &extent_buf.ei;+ enum { count = (sizeof extent_buf / sizeof *ext_info) };+ verify (count != 0);++ memset (&extent_buf, 0, sizeof extent_buf);++ if (scan->scan_start == 0)+ {+# ifdef _PC_MIN_HOLE_SIZE+ /* To determine if the underlaying file system support+ SEEK_HOLE, if not, fall back to the standard copy. */+ if (fpathconf (scan->fd, _PC_MIN_HOLE_SIZE) < 0)+ {+ scan->initial_scan_failed = true;+ return false;+ }+# endif++ /* If we have been compiled on an OS that supports SEEK_HOLE+ but run on an OS that does not support SEEK_HOLE, we get+ EINVAL. If the underlying filesystem does not support the+ SEEK_HOLE call, we get ENOTSUP, fall back to standard copy+ in either case. */+ hole_pos = lseek (scan->fd, (off_t) 0, SEEK_HOLE);+ if (hole_pos < 0)+ {+ if (errno == EINVAL || errno == ENOTSUP)+ scan->initial_scan_failed = true;+ return false;+ }++ /* Seek back to position 0 first if we detected a real hole. */+ if (hole_pos > 0)+ {+ off_t tmp_pos;+ tmp_pos = lseek (scan->fd, (off_t) 0, SEEK_SET);+ if (tmp_pos != (off_t) 0)+ return false;++ /* The source file is definitely not a sparse file, or it+ maybe a sparse file but SEEK_HOLE returns the source file's+ total size, fall back to the standard copy too. */+ if (hole_pos >= scan->src_total_size)+ {+ scan->initial_scan_failed = true;+ return false;+ }+ }+ }++ unsigned int i = 0;+ /* If lseek(2) failed and the errno is set to ENXIO, for+ SEEK_DATA there are no more data regions past the supplied+ offset. For SEEK_HOLE, there are no more holes past the+ supplied offset. Set scan->hit_final_extent to true for+ either case. */+ do {+ data_pos = lseek (scan->fd, scan->scan_start, SEEK_DATA);+ if (data_pos < 0)+ {+ if (errno != ENXIO)+ return false;+ else+ {+ scan->hit_final_extent = true;+ return true;+ }+ }++ hole_pos = lseek (scan->fd, data_pos, SEEK_HOLE);+ if (hole_pos < 0)+ {+ if (errno != ENXIO)+ return false;+ else+ {+ scan->hit_final_extent = true;+ return true;+ }+ }++ ext_info[i].ext_logical = data_pos;+ ext_info[i].ext_length = hole_pos - data_pos;+ scan->scan_start = hole_pos;+ ++i;+ } while (scan->scan_start < scan->src_total_size && i < count);++ i--;+ scan->ei_count = i;+ scan->ext_info = xnmalloc (scan->ei_count, sizeof (struct extent_info));++ for (i = 0; i < scan->ei_count; i++)+ {+ assert (ext_info[i].ext_logical <= OFF_T_MAX);++ scan->ext_info[i].ext_logical = ext_info[i].ext_logical;+ scan->ext_info[i].ext_length = ext_info[i].ext_length;+ }++ return true;+}#elseextern boolextent_scan_read (struct extent_scan *scan ATTRIBUTE_UNUSED)diff --git a/src/extent-scan.h b/src/extent-scan.hindex 4724b25..a271b95 100644--- a/src/extent-scan.h+++ b/src/extent-scan.h@@ -18,7 +18,6 @@#ifndef EXTENT_SCAN_H# define EXTENT_SCAN_H-/* Structure used to store information of each extent. */struct extent_info{@@ -38,6 +37,11 @@ struct extent_scan/* File descriptor of extent scan run against. */int fd;+#if defined(SEEK_DATA) && defined(SEEK_HOLE)+ /* Source file size, i.e, (struct stat) &statbuf.st_size. */+ size_t src_total_size;+#endif+/* Next scan start offset. */off_t scan_start;@@ -55,7 +59,8 @@ struct extent_scanstruct extent_info *ext_info;};-void extent_scan_init (int src_fd, struct extent_scan *scan);+void extent_scan_init (int src_fd, size_t src_total_size,+ struct extent_scan *scan);bool extent_scan_read (struct extent_scan *scan);--1.7.4
--- End Message ---
--- Begin Message ---Subject: Re: Introduce SEEK_DATA/SEEK_HOLE to extent_scan module Date: Thu, 25 Jun 2020 19:15:21 -0700 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0 This email is follow up to <https://bugs.gnu.org/8601> dated 2011-05-01. Jeff, thanks for reporting the problem. (There's a good chance this email will bounce but I'll send it to your 2011 email address anyway.) I recently ran into the same issue and derived the attached patches independently. I then found your bug report, made sure the attached patches fixed every problem that your proposal did, and installed the attached patches into Savannah. The attached patches 1-3 merely fix typos and refactor. Patch 4 corresponds to your proposal; however, it differs in that its basic idea is to use the FIEMAP code only as a fallback if SEEK_DATA doesn't work, rather than try to add to the already-too-complicated code that fiddles with FIEMAPs. (I don't observe any significant performance advantage to the FIEMAP stuff, but maybe that's just me.) Patch 5 adds opportunistic use of the copy_file_range syscall introduced in Linux kernel 4.5 (2016) and reworked in 5.3 (2019). This should improve 'cp' performance on kernels and file systems that support copy_file_range.0001-maint-typo-fix.patch
Description: Text Data0002-cp-refactor-extent_copy.patch
Description: Text Data0003-cp-avoid-copy_reg-goto.patch
Description: Text Data0004-cp-use-SEEK_DATA-SEEK_HOLE-if-available.patch
Description: Text Data0005-cp-use-copy_file_range-if-available.patch
Description: Text Data
--- End Message ---
[Prev in Thread] | Current Thread | [Next in Thread] |