[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


From: David A. Wheeler
Subject: Re: Idea: Add .COMMANDCHANGE and .CACHE
Date: Mon, 10 Jun 2019 22:10:44 -0400 (EDT)

On further reflection I noticed a bug in my .CACHE proposal.

Below is a discussion of the bug, and my proposed solution to it.
The proposed solution turns out to be a minor change,
but it might not be obvious why it needed changing.
So I'm explaining that in detail below.

TL; DR version: If .CACHE is enabled, the cache contents
are stored in a filesystem key constructed this way:

The result is that users can add a simple ".CACHE" entry
in their Makefile, and magically get caching for whatever they want
cached.  That's nice functionality for what I expect will be
relatively few lines of code.  Sure, users can write code to cache
things themselves, but I think getting that functionality with
a single additional line in the Makefile is valuable.

--- David A. Wheeler



I proposed both .COMMANDCHANGE and .CACHE simultaneously
because they both require a database of past activity
(in my proposal, the database is stored in the filesystem).
However, they have a difference my first proposal didn't fully
account for.

The ".COMMANDCHANGE" only needs to detect when a rule
needs to be *rerun* even if all the prerequisites are older than the targets.
Thus, I believe the SHA256 of the expanded command (as proposed)
is enough to work.

I proposed that .CACHE use the first target name and the
"$(SHA256 of expanded command)" as the key for data retrieval, but
in fact this does *not* work. The cached results depend on many things
*OTHER* than the contents of the expanded command, in particular,
they depend on the filenames & contents of all inputs of the command
(we'll assume that they are all identified as prerequisites)
and all the filenames of the targets.

So we need to replace "$(SHA256 of expanded command)"
with something else that depends not only on the expanded command,
but also on all the exact prerequisite names and their file contents.


First: In many cases we do *NOT* want to read the contents of all the inputs,
because the inputs may be large.  So we should find a way to
quickly detect important differences without always reading entire files.

So instead, let's do two calculations:
1. a quick hash calculation that's also based on target names (in order) and 
file lengths.
   We'll differentiate between a 0-length existing file and a non-existent file;
    .PHONY targets always don't exist (for this purpose).
2. A slower hash calculation that's based on the contents of all the 
    We only do this calculation if the quick calculation says we might be
    able to use the cache.

The proposal is to tweaked as follows when .CACHE is applied to a rule.
The cached results (using the target names)
of a rule execution will be stored and later retrieved
from a directory with the following name:

The hashes are calculated as possible:

quick_hash_data = sequence of "prerequisite info", blank line, and expanded 
command line.
  Each "prerequisite info" is the expanded prerequisite name,
  tab, its file length (empty if non-existent including .PHONY prerequisites),
  and a terminating newline.
  This is followed by a single newline-only blank line.
  This is followed by the expanded-by-make command line with a terminating 
  (note that the *expanded* version is used, so if $(CFLAGS) is referenced in 
the command,
  and the user changes the CFLAGS value, then the quick_hash_data will be 
quick_hash = sha256(quick_hash_data)
slow_hash_data = sequence of newline-terminated sha256(prerequisite contents)
slow_hash = sha256(slow_hash_data)

Note that slow_hash is a "hash of hash".  This means that, for example,
text moving from the end of one prerequisite to the beginning of the
next prerequisite will still cause completely different hashes.

When trying to determine if a cached value is available, the
quick_hash is calculated first & the directory checked.
If any prerequisite name changes, or if any length changes, then
the cache will change, so in many cases we can quickly determine
if a cache changed without completely reading possibly-big files.

To help cache debugging, I suggest that the directory
include a file named "quick_hash" which contains the quick_hash_data.
This can be written when the other cached values are written.

Note that if automatic variables are used in the command,
that continues to be a non-problem.  The automatic variables are expanded before
being included in the command hash, so their changes will be detected.
In particular, if "$?" is used in the command it will be expanded and then
have its SHA256 calculated, so differences in $? will cause the cache to
be ignored (that's the safest course, so that's a good thing).

I've been using SHA256, but another hash algorithm could be used.
We probably don't need every digit; we can cut off after a certain
reasonable number of digits.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]