[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Monotone-devel] Text under revision control
From: |
hendrik |
Subject: |
[Monotone-devel] Text under revision control |
Date: |
Wed, 25 Feb 2009 19:01:34 -0500 |
User-agent: |
Mutt/1.5.13 (2006-08-11) |
On Thu, Feb 26, 2009 at 11:16:35AM +1100, Daniel Carosone wrote:
> On Thu, Feb 26, 2009 at 12:09:45AM +0100, Philipp Gr?schler wrote:
> > Philipp Gr?schler schrieb:
> > > In the course of the current Mini Summit I spent the afternoon hacking
> > > on a (yet still) small XSLT file whose purpose will be the conversion of
> > > Monotone's Texinfo Documentation to a set of multiple files which can be
> > > used for the Wiki.
> > > ....
> >
> > I just committed the first release of this thing, in a very *pre-alpha*
> > state.
>
> I saw the commits before this thread, and was curious what you were up to.
> Alas, I missed the mini-summit this time.
>
> But - excellent!
>
> As far as output format goes, mdwn or others can be deal with by
> ikiwiki. The limitations there are around some of the more specific
> semantic markup: noting that this represents a command, or an option,
> or a literal vs a variable, and getting this information through to
> the point where CSS can render it with visual distinctions.
>
> Markdown offers some basic notations, and the opportunity to revert to
> html elements for more detailed cases, but this can be a little
> disruptive as a document author writing a wiki page (it's a sudden
> shift from minimal to more extensive internal markup). That is much
> less an issue if, at least in the first phases, we're talking about
> keeping the source in texinfo and rendering to something that ikiwiki
> can consume to produce a better-integrated output on the website
> (indexing, etc).
>
> These are good examples of the discontinuity, by the way, because many
> of these element types native to texinfo are focused on software
> documentation, where markdown is more focused on general writing.
>
> Longer term, we need to develop a strategy for more unified
> documentation. That may involve changing the markup source for some
> components, and potentially integrating your work into ikiwiki
> (allowing it to read essentially another markup input language). It
> almost certainly involves unifying the stylesheet, both in terms of
> the output rendering and the selection of styles available.
>
> It also would involve allowing the creation of narrative navigaton
> paths through the page collection, both as a reading guide online and
> to structure the generation of offline formats (e.g. PDF output of
> documents similar to the current manual, in an organised sequence of
> chapters and sections).
>
> This means we'll have pages on the site intended for differnet
> purposes, generated from a number of mechanisms (including automatic
> aggregation via some of ikiwiki's tricks), and potentially from
> sources in different markup styles.
>
> The great thing about this work is it (begins to) breaks the coupling
> between purpose and style, which means content can be used for
> multiple purposes regardless of style, in turn meaning that
> "unification" doesn't get confused with "markup conversion".
>
> So, really, yay, and yay again.
>
> --
> Dan.
There's a real need for a document file format that
(1) behaves well with versin-control software (VCS) (i.e., independent
changes are likely to be treated as such duting merging and other
operations),
(2) follows international standard notations, or is easily converted
to them, and
(3) is easily converted to popular file formats, and to the file
formats publishers demand (such as LaTEX, pdf, and Word).
I'd like to open the discussion on whether this involves innovation in
the VCS or in the file formats, and what kind of innovation is needed.
Part of the problem is that an VCS usually treats change as being
insertion, deletion, and possibly movement of lines. Word-processors
usually treat the division of text into lines either as completely fluid
(resulting in lots of spurious changes in the eyes of the VCS) or as
absent from the file format (resulting in entire paragraphs being single
lines).
In either case, changes like single-character typo-fixes are promoted
into paragraph replacements, and independent typo-fixes become
conflicting changes.
Further, word-processors often compress their files, resulting in
complete loss of structure for the VCS. I had had hopes for .odt files
(at last there was a real standard), but zipping them (which is sht
standard) turns them into binary gibberish. .fodt files (the same XML
stuff, but packaged into one text file instead of zipped into
gibberish) could be better, but here the entire text of the document
seems to end up being one single line.
Now the VCS could use a different difference algorithm when processing
them. Or it could unpack them into something easier to process (like a
sequence of words instead of lines). Or the word-processor could use a
better file-format, or be careful to preserve the locations of the
meaningless line numbers in the file, or insert many of them in standard
places (such as sentence breaks, or punctuation, or between every two
words).
But until some such VCS-compatible file format becomes well-established,
and easy to convert to other standard forms (and nonstandard forms like
Word) it's the VCSs that will have to deal, if they are to be widely
used for managing frequently-edited text.
We're trying to cobble together disparate systems to get something that
works for now. A kind of "Documents in our time", perhaps. It's a
start. I've cobbled together some stuff for my own use, too, but I
wouldn't want to foist it on others. Its main virtue is that when I
need further features I can hack together further code. The C++ code
that translates it is almost part of the document, and no, I don't think
that's an appropriate style to propagate.
Now there are file formats that meet some of the technical
requirements, Almost anything with explicit markup that's edited by
emacs will do, as long as emacs isn't made to flow words from one line
to the next, and the human wielding the editor knows not to do this.
But few have the necessary social acceptance and easy convertibility to
and from the other formats.
-- hendrik