monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] A Two-Fold Proposal: On Formats And Front-Ends


From: Hendrik Boom
Subject: Re: [Monotone-devel] A Two-Fold Proposal: On Formats And Front-Ends
Date: Tue, 4 Oct 2005 09:50:21 -0400
User-agent: Mutt/1.5.9i

On Tue, Oct 04, 2005 at 04:21:26AM -0700, Larry Hastings wrote:
> 
> 
> I have for you two separate but intimately related proposals, and I'm 
> going to describe them in reverse order of implementation because it 
> flows better.  Keep in mind, I'm still quite wet-behind-the-ears with 
> respect to Monotone; I typed my first *monotone* command less than two 
> weeks ago.  It wouldn't surprise me if this was all a bad idea for any 
> number of unforseen reasons.  And yet I bravely post, having read 
> somewhere that the *monotone* community is "friendly".  So... be kind.  :)
> 
> 
> Right now *monotone* makes life tough for front-end developers.  The 
> main problem I see is the N different output/file formats of *monotone*, 
> where N is /distressingly/ large:

I'm all for uniformity in configuration file formats.  Maybe this should
be on the list, too?  Every Linux program seems to have a different
configuration file format.

> [...]
> 
> To me the solution is clear, and it is here we arrive at my first actual 
> proposal: separate /presentation/ from /application logic/ by breaking 
> the current *monotone* executable into two pieces.  One piece would be 
> the "engine" that did all the actual work.  The second piece would be 
> the "front-end", or "driver", which drives the *monotone* engine (as gcc 
> drives the front-end, back-end, and linker).  The driver would provide 
> the command-line interface of the current *monotone* executable, convert 
> internal messages to their user-friendly localized equivalents, etc.  
> The communication between the two would be done over pipes in some 
> easy-to-cope-with data specification.

Would (multiple) front-end(s) and a shared engine library do?
It's possible you might get more efficient communications between the
front and back ends that way.  And if you really want a pipe, one of
the front ends could handle it.  Of course you'd end up having to
worry about that drastic differences in the ways different OS's
handle their shared libraries.

> 
> This has many advantages:
> 
> [...]
>
>    * If you ship file data over the pipes, and restrict the engine to
>      so the only file it talks to is the database, bingo!  Throw ssh
>      around it and now you have client-server *monotone*.  If you were
>      just a little bit more careful, you could even have multiple
>      computers all sharing the same database, each with just their own
>      working copies rather than their own full-blown *monotone*
>      databases.  (Though I gather this is strongly discouraged.)  The
>      back-end could be written to accept() on a socket directly, and
>      run as a daemon in the background, thus allowing people to run
>      *monotone* in a traditional client-server mode if they so chose. 
>      We might even be able to fold push/pull/sync into this same
>      socket, making database synchronization even more effortless.

I'd be quite happy to have only *one* monotone database on each machine.
At present I seem to have a monotone server that's permanently availiable
over my LAN, and another personal database that I use and regularly sync with 
the server.  (my personal files are also on the same server).
> 
> A moment ago I handwaved what "data representation" I had in mind.  I 
> propose there are three main candidates: XML, ASN.1, and JSON.  I will 
> immediately dispense with the first two, and show why the third is far 
> more likely.  :)
> 
>    * How can I automatically cull XML from the running?

XML is doomed to succeed.  That being said, it's not clear we have to use it.
Aside from its historically acquired complexity, it makes one fatal mistake:
it fails to distinguish a type from a field-selector.
> 
>    * What about ASN.1?

ASN.1 makes the same fatal mistake.  But there's one thing it does right:
YOu can encode it in such a way that every nested phrase is prefixed with
its length.  This makes it possible to tree-walk through an ASM file
without parsing the parts one is not interested it.

>  Well, have you /looked/ at some ASN.1?  It has
>      none of XML's effervescent charm.  It's more parsimonious than
>      XML, but then what data encapsulation format is not?  Its biggest
>      failing: it looks to be wholly inflexible.  We couldn't add new
>      fields without creating a new grammar, which would break tools
>      that didn't upgrade their grammar.  I say screw it .

ASN.1 can be parsed without a grammar to derive its tree structure.
You won't know what any of the values at the leaves of the tree are,
though, and if you have an aggregate, you may not know the semantics
of the particular aggregator you have.  And if you are willing to
ignore type-identifiers that you happen not to understand, you can
still process the rest.

> 
> On the other hand, I first heard of JSON by reading a *monotone* IRC log 
> somewhere, wherein Mr. Hoare hisself said "[his] money's on JSON for 
> this sort of task":
>    
> http://www.loglibrary.com/show_page/view/106?Multiplier=3600&Interval=6&StartTime=1124773859
> JSON is small, easy to parse, easy to generate, and covers all the 
> bases.  It has explicit and well-defined quoting rules.  It's flexible; 
> we could add new fields to a message and it wouldn't break a receiver 
> who wasn't expecting that field.  So I'll go ahead and assume that, if 
> something like this did come to pass, it'd use JSON.  Specifically, JSON 
> encoded in UTF-8.  You can read more about JSON here:
>    http://www.json.org/

JSon also fails to distinguish between types and field-selectors.
The difference here is that it fails to provide types for composite
objects.  The other two fail to provide field-selectors
and end up using types as stand-ins, but JSon soes it the other
way around.

I *am* happy that it distinguishes between the types of primitive values,
though.  In ASN if you have an onject that contains up to three optional
integer components, and you have to be able to distinguish them, the
only way is to assign them different *types*, which you define to be
represented in the same way as integers.  There's no way for a parser
to know that they have to be parsed as integers if it doesn't have
the grammar.  In JSOn you use different field-selectors.

So I was thinking about something like a syntax line

        object
                name{ members }
                {}

or even

        object
                optional-name{ members }
                {}

instead of

        object
                { members }
                {}

This would provided flexibility if you ever wanted to upgrade a file
format and use different *kind* of (composite) value for some field
-- one which might otherwise be confused with the one in a previous
version.  You chenge, or provide an explicit name for a type.

In principle, I suppose, this could be accomplished by adding a new member
like type:foo within the existing JSon, but a name seems cleaner.

> 
> 
> While proposing this I realized that, while this would fix all the 
> output of various commands, it wouldn't fix the commands that were 
> really just dumping *monotone* internal files--revisions, certs, 
> manifests, and the like.  Thus is my second proposal revealed: rework 
> *monotone*'s internal data structures using this data format.  This 
> would make the *monotone* engine itself easier to write and maintain, as 
> we wouldn't have N mini-libraries for reading/writing these N formats.  
> We'd just have one library we used for everything.  (For backwards 
> compatibility we could have the front-end massage the data it prints out 
> back into the old format upon request.)  Specifically, 
> revisions/certs/manifests would be stored as JSON, indented by one tab 
> (\t) per indention level, lines ended with \n (no \r), and children of 
> objects stored in sorted order.

If these files get large, if will become a burden that you can't browse
to the parts you're interested in without parsing the whole thing.
> 
> We're already breaking all the eggs when doing rosters; upgrading 
> existing databases will /already/ require rebuilding every relevant data 
> structure.   If we were ever going to consider a radical move like this, 
> it seems to me that now, /before/ rosters ship, is the best possible 
> time.  Breaking up the *monotone* monolithic executable would (will?) be 
> nice, but it can wait.
> 
> 
> 
> Undertaking any of the above would be a ton of work.  And I am sadly 
> /not/ volunteering to do it myself, or even contribute all that much 
> code to it.  (Though I am working up a reasonably-clever JSON library, 
> and I would love to help define the shape of the JSON data, particularly 
> the communications protocol between the front- and back-ends.)  I 
> realize that waltzing into an open-source project, vaguely sketching 
> castles in the air, then saying "now go build it guys!", easily strays 
> into boorishness.  So I apologize if I'm stepping on any toes.
> 
> 
> What do you think?

XML is also relatively easy to parse without knowing its data
type descriptor, provided you are happy leaving all the leaves of
the tree as strings, and are only concerned with syntax.

What makes XML so complicated (aside from its syntactic legacy as
an experimental text mark-up notations for the American
Association of Publishers way back in the 80's) is that it addresses
a number of issues:
        name clashes: What if you want to include foregin objects,
                which may use different meanings for the
                tags that you use?
        identity: You might want to indicate that several objects
                in a file are really the same object.
        You may want to specify that lists are ordered or unordered.
        When can you repeat a field-selector in an object?

All these are relatively unimportant, in the sense that an application
processint XML (or JSon) can make its own decisions and be coded
accordingly.  But where they become inportant is  when you start
doing automated processing or validation of lots of different
XML-based file formats.  And it JSon becomes widespread, sooner
or later it's going to have to address issues like these.  But given
the greater simplicity of its syntax (no textual mark-up legacy), it
will probably be easier to address them.

-- hendrik





reply via email to

[Prev in Thread] Current Thread [Next in Thread]