On Wed, Jan 21, 2009 at 3:17 PM, Felipe Contreras <address@hidden>
However, I found some issues:
1) no author
Where is no author it appears as "unknown<unknown>"; it's missing a
space and I think first-letter capitalization looks better for names
Good point. I'll fix that.
2) no name
As discussed before I prefer "Unknown <address@hidden>" but your approach
("<address@hidden>") is not bad.
3) no email
When there's no email I get "Name<unknown>"; it's missing a space.
What I'm planning to do is this:
1. start with "unknown <unknown>" (all lowercase, with a space) and only used this when there are no author certs
2. grab the value from the first (see below for definition of "first") author cert if there is one
3. look up the value from above in the --author-file to see if there is a mapping to something else and use the result if there is one
4. use the result from above for both committer and author
There's a bit of extra complexity in this at the moment (adding < and > around unadorned email addresses, etc.) that's left over from before I added the --author-file option and it doesn't really make a lot of sense any more. If you want something other than "unknown <unknown>" then mapping that to whatever you would like in the --author-file should suffice. For other authors that either don't have a name or don't have an email address you'll need to add mappings (IIRC git will not accept committers that lack an email address).
I really don't like 1), there is *always* a committer in mtn. I
There ought to be but there's no real requirement by the data model and if a pull operation was interrupted at the wrong moment it is likely possible to miss some certs. Also, when pulling you always get revs, but you might not get certs (at least branch certs) if they don't match the pattern you're pulling with. I can't recall if no certs are pulled for revs that don't match the branch pattern.
propose to use the first committer of the changelog cert as the git
There is not really any inherent order in the author certs so by first I mean "which ever one I get first."
The changelog does not have a "first committer", all it has is a signature from some key, who's name might match the value of the author cert, or it might not. If your database is old enough to have been through a rebuild (and an epoch change) then all of the certs from prior to the rebuild will be signed by the person who did the rebuild, not their original signer. Using these as committers wouldn't be very good.
committer, and then, if there's no mtn author, use the same committer
Anyway, I'll try to simulate that behaviour so I can make an exact
comparison of the repos.
Sounds good. I'm also thinking of adding --export-marks and --import-marks options as documented in git-fast-import and git-fast-export, which should allow for incremental exporting. It will probably be a few days before I get to any of this though.