[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[platform-testers] new snapshot available: grep-2.18.143-b298
From: |
Jim Meyering |
Subject: |
[platform-testers] new snapshot available: grep-2.18.143-b298 |
Date: |
Sat, 10 May 2014 22:43:09 -0700 |
Here's the latest, in preparation for a grep-2.19 release.
Please give it a good work-out and let us know of any problems.
This release includes an unusually large number of bug fixes and
impressive performance improvements, thanks to a lot of work
by Norihiro Tanaka and Paul Eggert.
grep snapshot:
http://meyering.net/grep/grep-ss.tar.xz 1.2 MB
http://meyering.net/grep/grep-ss.tar.xz.sig
http://meyering.net/grep/grep-2.18.143-b298.tar.xz
Here are the new parts of the NEWS file, followed by git shortlog entries:
=================================================
** Improvements
Performance has improved, typically by 10% and in some cases by a
factor of 200. However, performance of grep -P in UTF-8 locales has
gotten worse as part of the fix for the abovementioned crashes.
** Bug fixes
grep no longer mishandles patterns like [a-[.z.]], and no longer
mishandles patterns like [^a] in locales that have multicharacter
collating sequences so that [^a] can match a string of two characters.
grep no longer mishandles an empty pattern at the end of a pattern list.
[bug introduced in grep-2.5]
grep -C NUM now outputs separators consistently even when NUM is zero,
and similarly for grep -A NUM and grep -B NUM.
[bug present since "the beginning"]
grep -f no longer mishandles patterns containing NUL bytes.
[bug introduced in grep-2.11]
Plain grep, grep -E, and grep -F now treat encoding errors in patterns
the same way the GNU regular expression matcher treats them, with respect
to whether the errors can match parts of multibyte characters in data.
[bug present since "the beginning"]
grep -w no longer mishandles a potential match adjacent to a letter that
takes up two or more bytes in a multibyte encoding.
Similarly, the patterns '\<', '\>', '\b', and '\B' no longer
mishandle word-boundary matches in multibyte locales.
[bug present since "the beginning"]
grep -P now reports an error and exits when given invalid UTF-8 data.
Previously it was unreliable, and sometimes crashed or looped.
[bug introduced in grep-2.16]
grep -P now works with -w and -x and backreferences. Before,
echo aa|grep -Pw '(.)\1' would fail to match, yet
echo aa|grep -Pw '(.)\2' would match.
grep -Pw now works like grep -w in that the matched string has to be
preceded and followed by non-word components or the beginning and end
of the line (as opposed to word boundaries before). Before, this
echo a@@a| grep -Pw @@ would match, yet this
echo a@@a| grep -w @@ would not. Now, they both fail to match,
per the documentation on how grep's -w works.
grep -i no longer mishandles patterns containing titlecase characters.
For example, in a locale containing the titlecase character
'Lj' (U+01C8 LATIN CAPITAL LETTER L WITH SMALL LETTER J),
'grep -i Lj' now matches both 'LJ' (U+01C7 LATIN CAPITAL LETTER LJ)
and 'lj' (U+01C9 LATIN SMALL LETTER LJ).
=================================================
Changes in grep since v2.18:
Jim Meyering (18):
maint: post-release administrivia
maint: dfa: pass NULL, not 0, as 2nd arg to setlocale
tests: make a performance-measuring test less system-sensitive
tests: avoid false-positive failure on some AMD CPUs
maint: fix "make dist"
tests: placate "make syntax-check" re compare arg ordering
build: avoid OS X 10.8.5 build failure due to lack of static_assert
maint: avoid sc_po_check syntax-check failure (kwset.c)
tests: detect an infloop-inducing bug in grep -P (pcre-8.35)
dfa: avoid new NULL dereference
maint: Revert "dfa: avoid new NULL dereference"
build: reenable some compiler warning options
tests: use consistent spelling for locale name, en_US.UTF-8
grep: fix new heap write buffer overrun
gnulib: update to latest
maint: make ChangeLog generation more robust
maint: mark some breakless cases with /* fallthrough */ comment
gnulib: update submodule to latest, and bootstrap
Norihiro Tanaka (33):
grep: don't match line-by-line for case-insensitive with grep and awk
grep: remove trivial_case_ignore
grep: optimization of bracket expression for non-UTF8 locales
grep: revert removal of trivial_case_ignore
grep: avoid to add same character to a bracket expression
grep: optimization for fgrep with changing the macher to grep macher.
grep: perform the kwset-helping DFA match in narrower range
grep: take mbrtowc_cache into new member of struct dfa
dfa: avoid re-building a state built previously
grep: reuse multibyte DFA buffers in non-UTF8 locales
grep: fix performance bug with regex in line-by-line mode
grep: optimization with the superset of DFA
grep: use the Galil rule for Boyer-Moore algorithm in KWSet
grep: prefer regex to DFA for ANYCHAR in multibyte locales
grep: no match for the empty string included in multiple patterns
grep: open CSET and transform into uppercase when MB_CUR_MAX == 1
dfa: speed up by checking multibyte characters on demand
grep: speed-up for exact matching with begline and endline constraints.
grep: may also use Boyer-Moore algorithm for case-insensitive matching
grep: speed-up by using memchr() in Boyer-Moore searching
grep: avoid wasting memory for large patterns in dfamust
grep: skip checking of multibyte character boundary, reaching at eolbyte
grep: speed up for a case to repeat failure in DFA after success in kwset
kwset: improve performance by inlining tr
dfa: optimize memory allocation
grep: simplify superset
grep: adjust timing back to kwset when dfaisfast is true
grep: fix the bug in previous patch.
grep: make KWset and DFA agree about invalid sequences in patterns
dfa: speed up 'dfaisfast'
grep: improve performance of -v when combined with -L, -l or -q
dfa: fix inconsistency in multibyte locales
grep: retry DFA superset after matching multiple lines
Paul Eggert (90):
grep: fix multiple bugs with bracket expressions
* src/dfa.c (parse_bracket_exp): Parenthesize.
* src/dfa.c (prednames): POSIX allows [[:xdigit:]] to match
multibyte chars.
grep: remove lint
grep: fix bugs with -i and titlecase
grep: avoid 'inline' when it doesn't matter
grep: minor tuning for mb_case_map_apply
doc: describe titlecase fix better
grep: fix some unlikely bugs in trivial_case_ignore
grep: fix comment
maint: remove differences from gnulib regex code
doc: do not overpromise --ignore-case's behavior
build: update gnulib submodule to latest
grep: fix case-fold mismatches between DFA and regex
fgrep: fix case-fold incompatibility with plain 'grep'
maint: pacify 'make dist'
dfa: port to freestanding DJGPP (Bug#17056)
egrep, fgrep: go back to shell scripts
grep: fix and simplify grep -iF optimization
dfa: avoid undefined behavior
egrep, fgrep: improve diagnostics from shell scripts
dfa: improve port to freestanding DJGPP
dfa: cache results of mbrtowc for speed
dfa: avoid an indirection and port wint_t usage
dfa: improve port to freestanding DJGPP
grep: simplify dfa.c by having it not include mbsupport.h directly
grep: minor improvements to previous patch
grep: cleanup DFA superset optimization
grep: minor cleanups for Galil speedups
grep: simplify memory allocation in kwset
grep: remove trival_case_ignore
grep: prefer bool in DFA internals
grep: port better to hosts with nonstandard nl_langinfo
grep: remove bool_bf
grep: cleanup for empty-string fix
grep: cleanup for HAS_DOS_FILE_CONTENTS issue
grep: improvements for the open-CSET patch
build: update gnulib submodule to latest
dfa: clarify memory allocation and port to IRIX
dfa: avoid unnecessary work and other initialization
dfa: better size-overflow check
dfa: simplify transition table allocation
dfa: simplify range char allocation
dfa: simplify multibyte_prop allocation
dfa: simplify position set and element count allocation
dfa: simplify memory allocation
dfa: avoid duplicate strlen when allocating memory
dfa: simplify freelist
dfa: simplify dfmust initialization
dfa: trans reallocation microoptimization
dfa: minor cleanup
dfa: fix pointer type conversion bug
dfa: fix bug that caused NUL to be mishandled in patterns
dfa: minor improvements to previous patch
grep: -P now rejects invalid input sequences in UTF-8 locales
kwset: simplify Boyer-Moore with unibyte -i
kwset: simplify and speed up Boyer-Moore unibyte -i in some cases
dfa: omit static variables that limited dfaexec to one struct dfa
dfa: fix memory leak reintroduced by previous patch
build: suppress unsafe-loop-optimizations warnings
dfa: minor tuneup of dfamust memory savings patch
dfa: fix incorrect comment that led to heap overrun
dfa: simplify and be more consistent about MB_CUR_MAX
dfa: minor simplification of dfaexec
misc: fix doc and test bugs re grep -z
dfa: fix recently-introduced memory leak
dfa: fix index bug in previous patch, and simplify
kwset: improve performance when large Boyer-Moore key doesn't match
kwset: speed up by using memchr2
kwset: improve performance by inlining more
grep: simplify EGexecute further
grep: clarify EGexecute slightly
tests: improve coverage for prefix-of-multibyte
grep: simplify and fix problems with KWset-DFA agreement patch
dfa: minor simplification
grep: fix encoding-error incompatibilities among regex, DFA, KWset
grep: improve internal API for multibyte boundary
grep: fix -w match next to a multibyte letter
dfa: minor performance improvement for previous change
dfa: clarify use of "if"
doc: mention performance changes
grep: simplify and clarify invert-related code
maint: fix indenting to pacify 'prohibit_tab_based_indentation'
dfa: don't assume unsigned int is exactly 32 bits wide
dfa: assume C89 for CHAR_BIT
grep: minor improvements to retry-DFA-superset patch
grep: -A 0, -B 0, -C 0 now output a separator
tests: add test case for -C 0 change
dfa: fix bug with \< etc in multibyte locales
dfa: omit double includes
Stephane Chazelas (2):
grep -P: fix it so backreferences now work with -w and -x
align grep -Pw with grep -w
Changes in gnulib since v2.18:
* gnulib 497f4cd...c2e80b7 (49):
> update from texinfo
> autoupdate
> autoupdate
> autoupdate
> gitlog-to-changelog: revert inclusion of git-log-fix file
> maint.mk: Relax the copyright check to cater for non FSF projects
> physmem: use sysinfo if _SC_PHYS_PAGES unavailable
> exclude: port to strict C99
> regex: do not depend on malloc-gnu
> autoupdate
> expl: avoid incorrect expl(small_value) on OpenBSD 5.4
> xalloc: allow x2nrealloc (P, PN, S) where P && !*PN
> fts: avoid unnecessary strlen calls
> fts: avoid unnecessary strlen calls
> fts: avoid unnecessary strlen calls
> autoupdate
> autoupdate
> obstack: Remove ancient NeXTSTEP gcc support conditional
> obstack: merge with glibc changes
> strftime: wrap macros in "do {...} while(0)"
> modechange: avoid memory leaks for invalid octal modes
> autoupdate
> gitlog-to-changelog: include a dummy git-log-fix file
> autoupdate
> update from texinfo
> gitlog-to-changelog: also include the file, git-log-fix
> autoupdate
> regex: port to OS X 10.8.5 en_US.UTF-8 locale
> maint: fix ChangeLog to match commit record
> stdint, read-file: fix missing SIZE_MAX on Android (tiny change)
> parse-datetime: fix crash or infloop in TZ="" parsing
> * NEWS: Recent changes are not that important.
> savedir: new symbol for fast-read version
> unistd: port readlink to Mac OS X 10.3.9
> * NEWS: Document recent change to diffseq.
> diffseq: remove TOO_EXPENSIVE heuristic
> savedir: simplify by using stpcpy
> spawn: fix link error on uclibc
> m4: fix gl_TIMER_TIME() detection of threads on uClibc
> maintainer-makefiles: provide AC_PROG_SED for older autoconf
> exclude: add support for posix regexps
> maintainer-makefiles: use $(SED) for syntax check
> update from texinfo
> savedir: add sorting arg to savedir, streamsavedir; remove fdsavedir
> autoupdate
> update from texinfo
> update from texinfo
> file-type: add support for doors and other less-common file types
> update from texinfo
- [platform-testers] new snapshot available: grep-2.18.143-b298,
Jim Meyering <=