Project Name: GNU findutils - slocate compatibility and other enhancements
Summary
Enhance locate
1. Enhance locate to understand the database format used by
slocate.
Implement a replacement for the current updatedb shell script
which does pretty much the same thing but is less ugly.
Don't introduce a dependency on anything not in the base
system install (i.e. /bin/sh and C are OK, but Perl
probably isn't).
Add updatedb functionality to traverse the filesystem as
root, preserving enough permissions information to allow
us to provide the same functionality as slocate. Use the
same database format as slocate unless there is a reason
not to.
Enhance find
Add tests which allow [acm]time to be compared against a specified
timestamp, as opposed to the timestamp of a file (-newer) or an
age (-mtime). Add relevant tests to the test suite and document
the changes.
Instrument find to allow us to improve the guesses that parser.c
makes for struct predicate . est_success_rate. Measure the (lack
of) performance increase in find 4.3.x with optimisation turned
on.
Enhance xargs
1.Implement an optional feature in which xargs figures out how
long a command line it can pass to exec() without necessarily
believing ARG_MAX (because for example with the Linux kernel
this can be an underestimate).
Benefits to the Community
Each of these enhancements will have their own benefits, so:
slocate compatibility: slocate compatibility will reduce user
confusion and add important new capabilities to a tool that
is installed by default on a vast number of Linux distributions.
updatedb replacement: a new updatedb will be easier to maintain,
and may be faster as well. Adding new capabilities or locate
database enhancements will be less of a chore once the tool
that produces the database is better-designed.
xargs enhancement: This enhancement appears primarily to be a
performance enhancement for large-scale xargs usage. However,
as the easiest enhancement to implement, the lesser benefit
is also acceptable.
Deliverables
- NOTE: all patches are to include updates to all relevant
documentation.
- Patch to xargs to add optional automated ARG_MAX recaculation.
- Patch for find to add new options for checks of [acm]time vs.
a particular time/date. Names and syntax to be discussed with
project mentor(s). To include test cases as needed.
- Patch to find to add est_success_rate
instrumentation/improvements.
- New updatedb, either a C program or a clean shell script. The new
version will be capable of generating both current locate and
slocate-style databases.
- Patch for locate to add slocate compatibility and (in the presence
of a slocate-style database) functionality.
- Extra: if all of the above get done with time to spare, work on an
additional patch to locate/updatedb to add ACL and support to the
security-checking mechanism.
Implementation Plan:
Start with the xargs patch, which should be relatively quick. Discuss
and prototype the new find predicates next; if there are issues, work
on this in parallel with the next item. Continue working with find,
on the est_success_rate improvements. After that, compare the locate
and slocate updatedb implementations and decide exactly how to
approach writing the new updatedb. Finally, write the new version of
updatedb and the supporting changes to locate in parallel.
Post-Completion Plans:
How much of a role I take with the findutils project afterwards will
depend largely on how much I enjoy working with the code over the
summer. However, as a minimum I will maintain the new updatedb and
database-reading code for at least 6-8 months after the SoC, or until
any issues seem to be ironed out, whichever is longer.
Qualifications:
I find this project appealing because, as a regular user of all of
these tools, I can really see myself making use of them. They're
also in a domain of which I have a very good understanding.