monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] [PATCH] and RFC: binary files merging and hook


From: rghetta
Subject: [Monotone-devel] [PATCH] and RFC: binary files merging and hook
Date: Wed, 25 May 2005 00:33:04 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050322

Monotone doesn't distinguish beetween binary and text files (it hasn't even the *notion* of binary and text). Therefore its merging algorithm is applied even to files containing binary data, like images or tarballs, leading to silent corruption if the files contain strategically placed newlines (see test t_merge_binary.at for an example).

The attached patch inhibits automatic merging on binary files and adds a new hook "binary_file(filepath)" to help monotone categorize them.
The hook receives the filespec and returns true,false or nil as follows:

        true means the file must be treated as binary
        false means the file must be treated as text
        nil means "guess from content".

If the hook returns nil, the file will be treated as binary if the monotone function guess_binary() returns true, i.e. if the files contains NUL bytes or a selection of other ASCII control chars (for example, STX and ETX).

Right now "binary" means only "don't apply monotone automatic merging". This means that a merge of two binary files will always call the merge2() or merge3() hooks, invoking the external merge tool.
You could, for example, invoke gimp when merging images, and so on.

This patch intentionally doesn't try to update .mt-attrs or modify the attr_functions/attr_init_functions lua tables. It's purpose is fixing a bug and hopefully spawn some discussion about binary file support.

The current hook definition is:

function binary_file(name)
   lowname=string.lower(name)
   -- some known binaries, return true
   if (string.find(lowname, "%.gif$")) then return true end
   if (string.find(lowname, "%.jpe?g$")) then return true end
   if (string.find(lowname, "%.png$")) then return true end
   if (string.find(lowname, "%.bz2$")) then return true end
   if (string.find(lowname, "%.gz$")) then return true end
   if (string.find(lowname, "%.zip$")) then return true end
   -- some known text, return false
   if (string.find(lowname, "%.cc?$")) then return false end
   if (string.find(lowname, "%.cxx$")) then return false end
   if (string.find(lowname, "%.hh?$")) then return false end
   if (string.find(lowname, "%.hxx$")) then return false end
   if (string.find(lowname, "%.lua$")) then return false end
   if (string.find(lowname, "%.texi$")) then return false end
   if (string.find(lowname, "%.sql$")) then return false end
   -- unknown - return nil
   return nil;
end

Comments needed,
        Riccardo


# 
# patch "diff_patch.cc"
#  from [209e60efcca30474071383769840ae18d3d1aeef]
#    to [b0cc050e76c94436ad35ebc9a9b3474b9adc58c9]
# 
# patch "diff_patch.hh"
#  from [8045a7d688a9494585eab247815ef9573687039f]
#    to [608d32049b8a153d6c1d1f8df6bc1648e3d96597]
# 
# patch "file_io.cc"
#  from [35b0d27bb9a0a1c72b651fd2081a0f9a6d6c58f1]
#    to [87e9aee7193f2da70e9fac926bb9b83f652f9892]
# 
# patch "file_io.hh"
#  from [d2865f7695f667b872b858b276edcec5d7cf1605]
#    to [ea8749455633457ef1843c2f63ef16980289efbf]
# 
# patch "lua.cc"
#  from [76cfc26a5a3d0c64a9f244d78ad51c8975aa8d52]
#    to [8c1adf0d74734aaa182ea0436abf630530b293ab]
# 
# patch "lua.hh"
#  from [a1b3877535049b77b678d864354d030f8433854c]
#    to [b13faf78c7f0704ec711d2f0945e86b49d5d1c10]
# 
# patch "monotone.texi"
#  from [ffbdfa684b8b1e995bfbfb51c1851408024e3647]
#    to [1a0beb99ad5db9285034e5ffd8f9d729da32c3c8]
# 
# patch "std_hooks.lua"
#  from [05e5e9b2b7f37107a7f38161d258e582eb13d697]
#    to [52f8fc63ce87bb92dc1050005c6557ab243b98b9]
# 
# patch "tests/t_merge_binary.at"
#  from [4cdc96fa85206b0b6d7488886045f1d541a55131]
#    to [91f0212ff5acbaba6a6d9529b134e9dc63ed9bf7]
# 
--- diff_patch.cc
+++ diff_patch.cc
@@ -21,18 +21,6 @@
 
 using namespace std;
 
-bool guess_binary(string const & s)
-{
-  // these do not occur in ASCII text files
-  // FIXME: this heuristic is (a) crap and (b) hardcoded. fix both these.
-  if (s.find_first_of('\x00') != string::npos ||
-      s.find_first_of("\x01\x02\x03\x04\x05\x06\x0e\x0f"
-                      "\x10\x11\x12\x13\x14\x15\x16\x17\x18"
-                      "\x19\x1a\x1c\x1d\x1e\x1f") != string::npos)
-    return true;
-  return false;
-}
-
 //
 // a 3-way merge works like this:
 //
@@ -537,45 +525,51 @@
 
   file_data left_data, right_data, ancestor_data;
   data left_unpacked, ancestor_unpacked, right_unpacked, merged_unpacked;
-  string left_encoding, anc_encoding, right_encoding;
-  vector<string> left_lines, ancestor_lines, right_lines, merged_lines;
 
   this->get_version(left_path, left_id, left_data);
   this->get_version(anc_path, ancestor_id, ancestor_data);
   this->get_version(right_path, right_id, right_data);
 
-  left_encoding = this->get_file_encoding(left_path, left_man);
-  anc_encoding = this->get_file_encoding(anc_path, anc_man);
-  right_encoding = this->get_file_encoding(right_path, right_man);
-    
   left_unpacked = left_data.inner();
   ancestor_unpacked = ancestor_data.inner();
   right_unpacked = right_data.inner();
 
-  split_into_lines(left_unpacked(), left_encoding, left_lines);
-  split_into_lines(ancestor_unpacked(), anc_encoding, ancestor_lines);
-  split_into_lines(right_unpacked(), right_encoding, right_lines);
-    
-  if (merge3(ancestor_lines, 
-             left_lines, 
-             right_lines, 
-             merged_lines))
+  if (file_mergeable(left_path, left_unpacked, app.lua) && 
+      file_mergeable(right_path, right_unpacked, app.lua) && 
+      file_mergeable(anc_path, ancestor_unpacked, app.lua))
     {
-      hexenc<id> tmp_id;
-      file_data merge_data;
-      string tmp;
-      
-      L(F("internal 3-way merged ok\n"));
-      join_lines(merged_lines, tmp);
-      calculate_ident(data(tmp), tmp_id);
-      file_id merged_fid(tmp_id);
-      merge_data = file_data(tmp);
-
-      merged_id = merged_fid;
-      record_merge(left_id, right_id, merged_fid, 
-                   left_data, merge_data);
-
-      return true;
+      // all files mergeable by monotone internal algorithm, try to merge
+      string left_encoding, anc_encoding, right_encoding;
+      left_encoding = this->get_file_encoding(left_path, left_man);
+      anc_encoding = this->get_file_encoding(anc_path, anc_man);
+      right_encoding = this->get_file_encoding(right_path, right_man);
+        
+      vector<string> left_lines, ancestor_lines, right_lines, merged_lines;
+      split_into_lines(left_unpacked(), left_encoding, left_lines);
+      split_into_lines(ancestor_unpacked(), anc_encoding, ancestor_lines);
+      split_into_lines(right_unpacked(), right_encoding, right_lines);
+        
+      if (merge3(ancestor_lines, 
+                 left_lines, 
+                 right_lines, 
+                 merged_lines))
+        {
+          hexenc<id> tmp_id;
+          file_data merge_data;
+          string tmp;
+          
+          L(F("internal 3-way merged ok\n"));
+          join_lines(merged_lines, tmp);
+          calculate_ident(data(tmp), tmp_id);
+          file_id merged_fid(tmp_id);
+          merge_data = file_data(tmp);
+    
+          merged_id = merged_fid;
+          record_merge(left_id, right_id, merged_fid, 
+                       left_data, merge_data);
+    
+          return true;
+        }
     }
 
   P(F("help required for 3-way merge\n"));
@@ -845,7 +839,7 @@
       if (b_len == 0)
         ost << " +0,0";
       else
-       {
+        {
           ost << " +" << b_begin+1;
           if (b_len > 1)
             ost << "," << b_len;
--- diff_patch.hh
+++ diff_patch.hh
@@ -19,7 +19,6 @@
 // this file is to contain some stripped down, in-process implementations
 // of GNU-diffutils-like things (diff, diff3, maybe patch..)
 
-bool guess_binary(std::string const & s);
 
 enum diff_type
 {
--- file_io.cc
+++ file_io.cc
@@ -254,6 +254,32 @@
   return fs::exists(localized(p)); 
 }
 
+bool guess_binary(string const & s)
+{
+  // these do not occur in ASCII text files
+  // FIXME: this heuristic is (a) crap and (b) hardcoded. fix both these.
+  if (s.find_first_of('\x00') != string::npos ||
+      s.find_first_of("\x01\x02\x03\x04\x05\x06\x0e\x0f"
+                      "\x10\x11\x12\x13\x14\x15\x16\x17\x18"
+                      "\x19\x1a\x1c\x1d\x1e\x1f") != string::npos)
+    return true;
+  return false;
+}
+
+// returns true if the file can be merged by monotone built in merger
+bool file_mergeable(file_path const & path, data const &dt, lua_hooks & lua)
+{ 
+  bool is_binary;
+  if (lua.hook_binary_file(path, is_binary))
+    return !is_binary; // the hook returned a valid response
+  else
+    {
+      // the hook returned "unknown".  If the guess_binary() returns true,
+      // the file is assumed to be binary, thus not mergeable
+      return !guess_binary(dt()); 
+    }
+}
+
 void 
 delete_file(local_path const & p) 
 { 
--- file_io.hh
+++ file_io.hh
@@ -59,6 +59,12 @@
 bool file_exists(local_path const & path);
 bool file_exists(file_path const & path);
 
+// returns true if the string content is binary according to monotone euristic
+bool guess_binary(std::string const & s);
+
+// returns true if the file can be merged by monotone built in merger
+bool file_mergeable(file_path const & path, data const &dt, lua_hooks & lua);
+
 void mkdir_p(local_path const & path);
 void mkdir_p(file_path const & path);
 void make_dir_for(file_path const & p);
--- lua.cc
+++ lua.cc
@@ -745,7 +745,22 @@
   return exec_ok && ignore_it;
 }
 
+// returns false if the hook returned nil or something not boolean
+// otherwise returns true and is_binary contains the return value from the hook
 bool 
+lua_hooks::hook_binary_file(file_path const & p, bool &is_binary)
+{
+  is_binary = false;
+  bool exec_ok = Lua(st)
+    .func("binary_file")
+    .push_str(p())
+    .call(1,1)
+    .extract_bool(is_binary)
+    .ok();
+  return exec_ok;
+}
+
+bool 
 lua_hooks::hook_non_blocking_rng_ok()
 {
   bool ok = false;
--- lua.hh
+++ lua.hh
@@ -71,6 +71,13 @@
   // local repo hooks
   bool hook_ignore_file(file_path const & p);
   bool hook_ignore_branch(std::string const & branch);
+
+  // if the file type is known, the return value is true and the is_binary 
+  // parameter will contain true if the file is binary, false if is text.
+  // if the hook value is false, then the file type is not known and must
+  // be resolved otherwise
+  bool hook_binary_file(file_path const & p, bool &is_binary);
+
   bool hook_merge2(file_path const & left_path,
                    file_path const & right_path,
                    file_path const & merged_path,
--- monotone.texi
+++ monotone.texi
@@ -5404,6 +5404,38 @@
 branches. Otherwise returns @code{false}. This hook has no default
 definition, therefore the default behavior is to list all branches.
 
address@hidden binary_file (@var{filename})
+
+Returns @code{true} if @var{filename} is binary, @code{false} is the
address@hidden is certainly text and @code{nil} if the file type is
+unknown and should be @emph{guessed} by monotone. 
+The default definition of this hook is:
+
address@hidden
address@hidden
+function binary_file(name)
+   lowname=string.lower(name)
+   -- some known binaries, return true
+   if (string.find(lowname, "%.gif$")) then return true end
+   if (string.find(lowname, "%.jpe?g$")) then return true end
+   if (string.find(lowname, "%.gif$")) then return true end
+   if (string.find(lowname, "%.bz2$")) then return true end
+   if (string.find(lowname, "%.gz$")) then return true end
+   if (string.find(lowname, "%.zip$")) then return true end
+   -- some known text, return false
+   if (string.find(lowname, "%.cc?$")) then return false end
+   if (string.find(lowname, "%.cxx$")) then return false end
+   if (string.find(lowname, "%.hh?$")) then return false end
+   if (string.find(lowname, "%.hxx$")) then return false end
+   if (string.find(lowname, "%.lua$")) then return false end
+   if (string.find(lowname, "%.texi$")) then return false end
+   if (string.find(lowname, "%.sql$")) then return false end
+   -- unknown - return nil
+   return nil;
+end
address@hidden group
address@hidden smallexample
+
 @item get_revision_cert_trust (@var{signers}, @var{id}, @var{name}, @var{val})
 
 Returns whether or not you @emph{trust} the assertion
--- std_hooks.lua
+++ std_hooks.lua
@@ -91,6 +91,28 @@
    return false;
 end
 
+-- return true means "binary", false means "text",
+-- nil means "unknown, try to guess"
+function binary_file(name)
+   lowname=string.lower(name)
+   -- some known binaries, return true
+   if (string.find(lowname, "%.gif$")) then return true end
+   if (string.find(lowname, "%.jpe?g$")) then return true end
+   if (string.find(lowname, "%.png$")) then return true end
+   if (string.find(lowname, "%.bz2$")) then return true end
+   if (string.find(lowname, "%.gz$")) then return true end
+   if (string.find(lowname, "%.zip$")) then return true end
+   -- some known text, return false
+   if (string.find(lowname, "%.cc?$")) then return false end
+   if (string.find(lowname, "%.cxx$")) then return false end
+   if (string.find(lowname, "%.hh?$")) then return false end
+   if (string.find(lowname, "%.hxx$")) then return false end
+   if (string.find(lowname, "%.lua$")) then return false end
+   if (string.find(lowname, "%.texi$")) then return false end
+   if (string.find(lowname, "%.sql$")) then return false end
+   -- unknown - return nil
+   return nil
+end
 
 function edit_comment(basetext, user_log_message)
    local exe = "vi"
--- tests/t_merge_binary.at
+++ tests/t_merge_binary.at
@@ -1,11 +1,11 @@
 AT_SETUP([merge binary file])
 MONOTONE_SETUP
 
 NEED_UNB64
 
-# This is a real merge error.  A binary file is happily merged by monotone
+# This was a real merge error.  A binary file happily merged by monotone
 # just because contains some strategically placed line feeds
-AT_XFAIL_IF(true)
+# now is a test for the new hook binary_file() and its effect on merging
 
 AT_DATA(parent.bmp.b64, 
[Qk1mdQAAAAAAADYAAAAoAAAAZAAAAGQAAAABABgAAAAAADB1AADrCgAA6woAAAAAAAAAAAAAQJ9C
 QJ9CQJ9CQJ9CQJ9CQJ9CQJ9CQJ9CQJ9CQJ9CQJ9CQJ9CQJ9CQJ9CQJ9CQJ9CQJ9CQJ9CQJ9CQJ9C
@@ -221,6 +221,18 @@
 UNB64(left.bmp.b64, left.bmp)
 UNB64(right.bmp.b64, right.bmp)
 
+# hook forces all files binary
+AT_DATA(binary.lua, [function binary_file(name) return true end
+])
+
+# hook forces all files text
+AT_DATA(text.lua, [function binary_file(name) return false end
+])
+
+# hook make all files guess
+AT_DATA(guess.lua, [function binary_file(name) return nil end
+])
+
 AT_CHECK(cp -f parent.bmp testfile.bmp)
 AT_CHECK(MONOTONE add testfile.bmp, [], [ignore], [ignore])
 COMMIT(testbranch)
@@ -234,7 +246,13 @@
 AT_CHECK(cp -f right.bmp testfile.bmp)
 COMMIT(testbranch)
 
-# merge should fail!
-AT_CHECK(MONOTONE --branch=testbranch merge, [1], [ignore], [ignore])
+# file marked binary: merge should fail
+AT_CHECK(MONOTONE --rcfile=binary.lua --branch=testbranch merge, [1], 
[ignore], [ignore])
 
+# file marked "guess": merge should fail
+AT_CHECK(MONOTONE --rcfile=guess.lua --branch=testbranch merge, [1], [ignore], 
[ignore])
+
+# finally, file marked text: merge should work!
+AT_CHECK(MONOTONE --rcfile=text.lua --branch=testbranch merge, [0], [ignore], 
[ignore])
+
 AT_CLEANUP

reply via email to

[Prev in Thread] Current Thread [Next in Thread]