[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH] updatedb: run in the C locale, don't do case-folding.
From: |
James Youngman |
Subject: |
[PATCH] updatedb: run in the C locale, don't do case-folding. |
Date: |
Sat, 9 Jan 2016 21:18:24 +0000 |
* locate/updatedb.sh: Set LC_ALL to C to avoid unexpected character
encodings in path names causing sort to fail (idea from Clarence
Risher). Don't do case-folding, since the character set in now C,
which is likely inconsistent with the user's expectations anyway.
Honour $TMPDIR. Correct the error message you get if you specify
both --old-format and --dbformat.
* NEWS: Explain these changes.
---
NEWS | 7 +++++++
locate/updatedb.sh | 33 ++++++++++++++++++++++++---------
2 files changed, 31 insertions(+), 9 deletions(-)
diff --git a/NEWS b/NEWS
index f72f021..8865b8e 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,13 @@ GNU findutils NEWS - User visible changes. -*- outline
-*- (allout)
* Major changes in release 4.7.0-git, YYYY-MM-DD
+** Changes to locate / updatedb
+
+The updatedb script now operates in the C locale only. This means
+that character encoding issues are now not likely to cause sort to
+fail. It also honours the TMPDIR environment variable if that was
+set, and no longer sorts file names case-insensitively.
+
** Translations
Updated translations: Hungarian, Slovak, Dutch, German.
diff --git a/locate/updatedb.sh b/locate/updatedb.sh
index 9cb2811..3861915 100644
--- a/locate/updatedb.sh
+++ b/locate/updatedb.sh
@@ -31,6 +31,19 @@ There is NO WARRANTY, to the extent permitted by law.
Written by Eric B. Decker, James Youngman, and Kevin Dalley.
'
+# File path names are not actually text, anyway (since there is no
+# mechanism to enforce any constraint that the basename of a
+# subdirectory has the same character encoding as the basename of its
+# parent). The practical effect is that, depending on the way a
+# oarticular system is configured and the content of its filesystem,
+# passing all the file names in the system through "sort" may generate
+# character encoding errors in text-based tools like "sort". To avoid
+# this, we set LC_ALL=C. This will, presumably, not work perfectly on
+# systems where LC_ALL is not the way to do locale configuration or
+# some other seting can override this.
+LC_ALL=C
+export LC_ALL
+
usage="\
Usage: $0 [--findoptions='-option1 -option2...']
@@ -75,7 +88,7 @@ done
case "${dbformat:+yes}_${old}" in
yes_yes)
- echo "The --dbformat and --old cannot both be specified." >&2
+ echo "The --dbformat and --old-format cannot both be specified." >&2
exit 1
;;
*)
@@ -186,12 +199,14 @@ test -z "$PRUNEREGEX" &&
: address@hidden@}
# Directory to hold intermediate files.
-if test -d /var/tmp; then
- : ${TMPDIR=/var/tmp}
-elif test -d /usr/tmp; then
- : ${TMPDIR=/usr/tmp}
-else
- : ${TMPDIR=/tmp}
+if test -z "$TMPDIR"; then
+ if test -d /var/tmp; then
+ : ${TMPDIR=/var/tmp}
+ elif test -d /usr/tmp; then
+ : ${TMPDIR=/usr/tmp}
+ else
+ : ${TMPDIR=/tmp}
+ fi
fi
export TMPDIR
@@ -320,7 +335,7 @@ if [ "$myuid" = 0 ]; then
exit $?
fi
fi
-} | $sort -f | $frcode $frcode_options > $LOCATE_DB.n
+} | $sort | $frcode $frcode_options > $LOCATE_DB.n
then
: OK so far
true
@@ -387,7 +402,7 @@ if test -n "$NETPATHS"; then
exit $?
fi
fi
-} | tr / '\001' | $sort -f | tr '\001' / > "$filelist"
+} | tr / '\001' | $sort | tr '\001' / > "$filelist"
# Compute the (at most 128) most common bigrams in the file list.
$bigram $bigram_opts < $filelist | sort | uniq -c | sort -nr |
--
2.1.4