>From 5acb1dc0dffbf8a8e9db87bc6caf9fa7c3dc170e Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Mon, 8 Mar 2010 17:14:51 +0100 Subject: [PATCH] more work on TODO * TODO: More work on the first section. Use clearer section headers. --- TODO | 99 +++++++++++++++++++++++++++++++---------------------------------- 1 files changed, 47 insertions(+), 52 deletions(-) diff --git a/TODO b/TODO index 62e302e..2cfd0ce 100644 --- a/TODO +++ b/TODO @@ -4,58 +4,52 @@ are permitted in any medium without royalty provided the copyright notice and this notice are preserved. -Get sane performance with UTF-8 locales. +=============== +Short term work +=============== -Improve the test infrastructure. +See where we are with UTF-8 performance. -Other small patches which wait for a test case. +Merge Debian patches 55-bigfile.patch, 69-mbtowc.patch and +70-man_apostrophe.patch. Go through patches in Savannah. -Some _minimal_ cleanup of the grep(), grepdir(), recursion (the "main -loop") and fix --directories=read +Cleanup of the grep(), grepdir(), recursion (the "main loop") to use fts. +Fix --directories=read. Write better Texinfo documentation for grep. The manual page would be a good place to start, but Info documents are also supposed to contain a tutorial and examples. -Fix the DFA matcher to never use exponential space. (Fortunately, these -cases are rare.) - -Improve the performance of the regex backtracking matcher. This matcher -is agonizingly slow, and is responsible for grep sometimes being slower -than Unix grep when backreferences are used. +Some test in tests/spencer2.tests should have failed! Need to filter out +some bugs in dfa.[ch]/regex.[ch]. -Some test in tests/spencer2.tests should have failed! -Need to filter out some bugs in dfa.[ch]/regex.[ch]. +Multithreading? -Threads for grep? - -GNU grep does 32-bit arithmetic, it needs to move to 64-bit. +GNU grep does 32-bit arithmetic, it needs to move to 64-bit (i.e. +size_t/ptrdiff_t). Clean up, too many #ifdefs! -Check some new algorithms for matching; talk to Karl Berry and Nelson. -Sunday's "Quick Search" Algorithm (CACM 33, 1990-08-08 pp. 132-142) -claim that his algorithm is faster than Boyer-More. Worth checking. - -Lazy dynamic linking of libpcre, libz, and libbz2? +Lazy dynamic linking of libpcre. Check FreeBSD's integration of zgrep (-Z) and bzgrep (-J) in one binary. Is there a possibility of doing even better by automatically checking the magic of binary files ourselves (0x1F 0x8B for gzip, 0x1F -0x9D for compress, and 0x42 0x5A 0x68 for bzip2)? +0x9D for compress, and 0x42 0x5A 0x68 for bzip2)? Once what to do with +libpcre is decided, do the same for libz and libbz2. -## + +================== +Matching algorithms +================== -Check . -Take a look at these and consider opportunities -for merging or cloning: +Check . Take a look at these +and consider opportunities for merging or cloning: -- ja-grep's mlb2 patch (Japanese grep) -- lgrep (from lv, a Powerful Multilingual File Viewer / Grep) ; - -- pcregrep (from Perl-Compatible Regular Expressions library) - ; -- cgrep (Context grep) seems like nice work; -- sgrep (Struct grep) ; @@ -65,25 +59,38 @@ for merging or cloning: ; -- ggrep (Grouse grep) ; -- grep.py (Python grep) ; - -- freegrep (a BSD-licensed grep for those who can't stand the GNU GPL) - ; + -- freegrep ; -## +Check some new algorithms for matching; talk to Karl Berry and Nelson. +Sunday's "Quick Search" Algorithm (CACM 33, 1990-08-08 pp. 132-142) +claim that his algorithm is faster than Boyer-More. Worth checking. -POSIX Compliance: see p10003.x +Fix the DFA matcher to never use exponential space. (Fortunately, these +cases are rare.) -In general, interesting things to check in POSIX/OpenGroup include: + +============================ +Standards: POSIX and Unicode +============================ -Provide support for the POSIX [= =] and [. .] constructs. This is -difficult because it requires locale-dependent details of the -character set and collating sequence, but POSIX does not standardize -any method for accessing this information! +For POSIX compliance, see p10003.x. Current support for the POSIX [= =] +and [. .] constructs is limited. This is difficult because it requires +locale-dependent details of the character set and collating sequence, +but POSIX does not standardize any method for accessing this information! -Moving away from GNU regex API for POSIX regex API. +For Unicode, interesting things to check include the Unicode Standard + and the Unicode Technical +Standard #18 ( “Unicode Regular +Expressions”). Talk to Bruno Haible who's mantaining GNU libunistring. +See also Unicode Standard Annex #15 ( +“Unicode Normalization Forms”), already implemented by GNU libunistring. -## +In particular, --ignore-case needs to be evaluated against the standards. +We may want to deviate from POSIX if Unicode provides better or clearer +semantics. POSIX and --ignore-case +----------------------- For this issue, interesting things to check in POSIX include the Volume “Base Definitions (XBD)”, Chapter “Regular Expressions” and in @@ -215,21 +222,9 @@ a composition of the two conversions. Any optimization in the implementation of each logic must not change its basic semantic. -## - -In general, interesting things to check in Unicode include: - -The Unicode Standard. - -Unicode Technical Standard #18 ( -“Unicode Regular Expressions”). - -Unicode Standard Annex #15 ( -“Unicode Normalization Forms”). - -## Unicode and --ignore-case +------------------------- For this issue, interesting things to check in Unicode include: -- 1.6.6