monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] rfc: small simplification to paths.cc/constants.cc


From: Zack Weinberg
Subject: [Monotone-devel] rfc: small simplification to paths.cc/constants.cc
Date: Thu, 13 Jul 2006 13:17:39 -0700

Currently the knowledge of which characters are not allowed in a
pathname is split between paths.cc and constants.cc.
paths.cc:has_bad_chars is the sole user of
constants.cc:illegal_path_bytes, but adds more to the set (notably
backslash).  I note also that this code is all marked as "must be
super fast" but has_bad_chars uses a relatively inefficient algorithm.
This patch deletes illegal_path_bytes and reduces has_bad_chars to a
simple loop with the forbidden bytes expressed in code, rather than
looked up in a table.  The LIKELY and UNLIKELY coerce gcc 4.1 into
generating code which is, um, not actively stupid (bug filed).

Thoughts?

zw

       * constants.cc (illegal_path_bytes_arr, illegal_path_bytes): Delete.
       * constants.hh (illegal_path_bytes): Delete.
       * paths.c (has_bad_chars): Code set of forbidden characters
       explicitly here.

#
# old_revision [17ed988d5665a99c943bfcc810c1ec9accdcd8d5]
#
# patch "constants.cc"
#  from [942d3eebad05095d859d2641150968f01f37c95e]
#    to [b812f3fff900905f174e164024e18c52ff8ffdad]
#
# patch "constants.hh"
#  from [7812034aa4a4a35decd8018d849102c06623bcd4]
#    to [c5fe8274ac31f96c9cc610e6f6ee8cbed2079aa7]
#
# patch "paths.cc"
#  from [4c98560ebccf3c70cfa26b985403a0a3fd66fb90]
#    to [79d3e24da249f12334bbb673686ed71159d21fb5]
#
============================================================
--- constants.cc        942d3eebad05095d859d2641150968f01f37c95e
+++ constants.cc        b812f3fff900905f174e164024e18c52ff8ffdad
@@ -110,22 +110,6 @@

  string const regex_legal_key_name_bytes("(address@hidden)");

-  // all the ASCII characters (bytes) which are illegal in a (file|local)_path
-
-  char const illegal_path_bytes_arr[33] =
-    {
-      0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
-      0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
-      0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
-      0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f,
-      0x7f, 0x00
-    }
-  ;
-
-  char const * const illegal_path_bytes =
-  illegal_path_bytes_arr
-  ;
-
  // merkle tree / netcmd / netsync related stuff

  size_t const merkle_fanout_bits = 4;
============================================================
--- constants.hh        7812034aa4a4a35decd8018d849102c06623bcd4
+++ constants.hh        c5fe8274ac31f96c9cc610e6f6ee8cbed2079aa7
@@ -89,9 +89,6 @@
  // boost regex that matches the bytes in legal_key_name_bytes
  extern std::string const regex_legal_key_name_bytes;

-  // all the ASCII characters (bytes) which are illegal in a (file|local)_path
-  extern char const * const illegal_path_bytes;
-
  // remaining constants are related to netsync protocol

  // number of bytes in the hash used in netsync
============================================================
--- paths.cc    4c98560ebccf3c70cfa26b985403a0a3fd66fb90
+++ paths.cc    79d3e24da249f12334bbb673686ed71159d21fb5
@@ -121,6 +121,8 @@
//  -- no doubled /'s
//  -- no trailing /
//  -- no "." or ".." path components
+//
+// ??? Ensure use of UTF8 encoding internally, validate encoding here.
static inline bool
bad_component(string const & component)
{
@@ -138,25 +140,13 @@
static inline bool
has_bad_chars(string const & path)
{
-  static bool bad_chars_init(false);
-  static u8 bad_table[128] = {0};
-  if (UNLIKELY(!bad_chars_init))
+  for (string::const_iterator c = path.begin(); LIKELY(c != path.end()); c++)
    {
-      string bad_chars = string("\\") + constants::illegal_path_bytes
+ string(1, '\0');
-      for (string::const_iterator b = bad_chars.begin(); b !=
bad_chars.end(); b++)
-        {
-          u8 x = (u8)*b;
-          I((x) < sizeof(bad_table));
-          bad_table[x] = 1;
-        }
-      bad_chars_init = true;
-    }
-
-  for (string::const_iterator c = path.begin(); c != path.end(); c++)
-    {
      u8 x = (u8)*c;
-      if (x < sizeof(bad_table) && bad_table[x])
-          return true;
+      // 0x5c is '\\'; we use the hex constant to make the dependency on
+      // ASCII encoding explicit.
+      if (UNLIKELY(x <= 0x1f || x == 0x5c || x == 0x7f))
+        return true;
    }
  return false;
}




reply via email to

[Prev in Thread] Current Thread [Next in Thread]