chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-hackers] Distributed egg repo proposal


From: Imran Rafique
Subject: Re: [Chicken-hackers] Distributed egg repo proposal
Date: Thu, 17 Mar 2011 12:16:19 +0000

Its been interesting reading through this thread. I'd like to suggest
some amendments to Peter's proposals, addressing some of the concerns
highlighted. Specifically:

Felix:
> WE DON'T WANT BROKEN LINKS. As you correctly point out, a non-existent deep
> dependency will be horrible indeed (catastrophic, actually).  How to avoid
> broken links or missing eggs without a centralized repository I can't say.

= Back to basics =
In essence, we're trying to provide a single "blessed" tree of eggs.
In itself, this doesn't have to be in a VCS (although thats a useful
tool to maintain the tree in). All thats relevant is that, at the end,
we have a central, blessed tree which allows us to install eggs and
their respective docs - without potentially breaking because some
upstream repo's are offline.

(By upstream, I mean egg author's own repos, where the main
development of their egg occurs)

Requirements:
1. if you just want to install an egg (any version), you only need the
official blessed tree.
2. if you want to track development, where do I go so I can see the
full version history?
3. if you want to add some contributions, where do I go?
4. if some emergency bugfixes are needed, where do they go?
5. how easily can I submit an egg for addition
6. how easily can I update an existing egg

2 and 3 are effectively the same, solved by:
  * a pointer to a repo somewhere
  * a convenient one-liner to check out a working-copy or clone from that repo
  * a link to someone who "owns" that repo ("hey, I've got a great
patch. Can I have commit access / email it to you / chat about it?")

3 and 4 are *NOT* the same. Consider what happens when we have a new
chicken release, which might (hypothetically) break a dozen eggs
(messy in real life too!). Felix (or other dev) knows what the
breakages are, and knows what simple fixes are required, and where
(salmonella to the rescue).

With our current setup, he can just commit the fixes directly in the
svn egg repo. That solves the immediate problem. But he has to make
sure that those fixes also travel upstream to the main dev repo for
each egg (or else a new egg release will potentially re-introduce the
same problem, if those devs were still working with the previous
chicken version).

This is sort of manageable now, where everyone knows everybody. But
we're talking about building a system which can handle many many more
eggs from many many more contributors. This just won't scale.

We have the worst of both worlds right now. We're sort-of forcing all
egg contributors to use the existing svn repo - with all the hassle
that entails (getting commit access, importing sources, etc), while
also having to deal with more than 1 place for a bugfix to be pushed
to (so having that central repository doesn't really help, in the long
run).


= An amendment to Peter's proposal =
I've taken the gentoo portage system (in its simplest form), and tried
to see how a similar concept might translate to chicken's eggs.
Explaining in a little detail, for those who may be unfamiliar with
portage ...


== The tree ==
A "blessed" tree of 'recipes', subdivided into categories (as we have
now), dictates how every egg is installed.

Our install tool, chicken-install, operates on a LOCAL copy of this
tree, which is first fetched (from the offical egg tree repo at
call-cc.org) by:

    chicken-install --update-tree

Every time chicken-install is run, it should check if the local tree
is up to date, and warn if not.

The tree looks like this:

eggs/
  + networking/
    + my-wonderful-egg/
      + recipe-my-wonderful-egg-1.0
      + recipe-my-wonderful-egg-2.0
      + patches/
        + fix-for-chicken-4.9.patch

We have 1 recipe FOR EACH version of the egg. This tree, once obtained
- works 'as is' for all versions of all eggs. You don't need to rewind
your local working copy of this tree just to install an older version
of an egg. You always want the latest version of the official tree. So
we have 1 recipe *for each* version of the egg, containing setup &
meta s-expr's, and additional info (see below).

Why a local tree? Because this makes it easier to overlay a personal
tree on top. In ~/.chicken-install, I can define a local overlay to be
applied on top of the official tree. I can pop my new egg recipes in
this local tree, and try them out, before submitting them upstream.

Also, in certain circumstances, like re-installing an egg where all
the dependencies are now available locally - you don't need network
access. So why enforce that need?

As you read on, please remember that at no time are we importing
sources (or tarballs) from upstream into this tree. Thats not
necessary in order to avoid broken dependencies.


= (1) Installing an egg =
chicken-install takes the recipe, and from that knows where to get the
files for that version. Wait a minute. Outside dependency?

Well, no. We have 2 stores (lets say *pkgs* and *local-pkgs*). *pkgs*
points to somewhere on call-cc.org, and *local-pkgs* (default #f) can
be redefined to some locally-available dir (by ~/.chicken-install)

Example recipe-my-wonderful-egg-2.0 :

    (meta
      (upstream-pkgs http://www.rafique.org/my-wonderful-egg/pkgs)
      ...)

    (get-files
      ; the 2nd arg, 'tgz, is optional
      ; if absent, it will be construed from the file extension of the 1st arg
      (unpack "my-wonderful-egg-2.0.tgz" 'tgz)
      (unpack "required-data-5.1.tgz"))

    (get-docs
      (unpack "my-wonderful-docs-1.0.tgz")
      (apply-patches
        (unpack "my-wonderful-docs-update-2.0.tgz)))

    (setup ...)

chicken-install first looks in *local-pkgs* and then *pkgs* to see if
my-wonderful-egg-2.0.tgz exists. If it doesn't, then chicken-install
looks upstream (from upstream-pkgs). If chicken-install had to fetch
something from upstream-pkgs, then this is automatically copied to
*local-pkgs*. This gives rise to a number of possibilities:

* easy for egg tree maintainers to keep *pkgs* up to date (ensuring
that we don't have broken dependencies for versioned eggs). A cron job
checks if new egg recipes are available, does a `chicken-install
--fetch-only`, and then copies *local-pkgs* over to *pkgs* (it's the
only process which has write access to the global *pkgs*).

* we can have "recipe-my-wonderful-egg-latest", where chicken-install
grabs the sources from the upstream repos HEAD.

* we can have "recipe-my-wonderful-egg-felix-2.0", pointing to Felix's
forked version of my egg (different info for "upstream")


= (2 & 3) Access to upstream source code =
We can add more upstream tags to (meta). Just thinking quickly here
(this is just a small re-ordering of what Peter has in his proposal):

    (meta
      (upstream-contact "<email>")
      (upstream-browse "<guthub_url>")
      (upstream-checkout "git clone ...))

Now its much easier to track the real source history of any egg:

    chicken-install --upstream-checkout

So, without going through the hassle of forcing egg authors to push
their sources to an egg svn repo (and either abandoning their existing
VCS, or keeping their personal VCS in sync with the egg svn), we still
have easy immediate access to the entire history of their egg's source
code.


= (4) Patches =
Ok, now back to that scenario I had outlined earlier. Felix wants to
fix a few eggs quickly, so that new users can use the latest chicken
release with the full suite of eggs. And upstream devs are
unavailable/unresponsive/slow. What does Felix do?

Felix (who has unfettered write access to the entire tree) adds a
patch to the patches/ dir, creates a new 2.0-r1 (or 2.0.1, as long as
we're consistant) recipe copied from 2.0, and adds the following:

    (get-files
      ...
      (apply-patches "fix-for-chicken-4.9.patch"))

chicken-install will look under patches/ for
"fix-for-chicken-4.9.patch", and apply it to the sources it already
got.

We can get far beyond simple patches here, and maybe add other rules, like:
* (apply-rename ) which will do a search & replace from one deprecated
function name to a new one.

Of course, the patch should still be pushed upstream (for the same
reasons as originally expressed above). But I think this makes it
easier (these patches stand out like a sore thumb - they won't be
easily overlooked). Also, depending upon the patch, it may still be
valid and apply cleanly to a newer version (if upstream didn't yet get
the patch) - so it can be reused (you'd be surprised how often in
gentoo's portage this is the case).


= (5 & 6) Making it easier for egg authors =
If we want to make egg submission as easy as possible for egg authors,
then instead of requiring then to provide tarballs for each release,
we could provide additional functionality to (get-files). Something
like this?

    (get-files
      (fetch-from-git 'tagXX))

(fetch-from-git) would use the already provided upstream info. It can
be used in (get-docs) as well (assuming the egg author has his docs
checked in, in chicken-ready format)

Again, we want to minimise potential dependency failures.
(fetch-from-git) should first look in *local-pkgs* and *pkgs* for
something like:

    <this-egg>.<from-git>.<tagXX>.tgz

Only if it doesn't find it, then it pulls from git. And afterwards, it
creates that tarball and puts it into *local-pkgs*. Again, making it
easy for the maintainers to ensure that their egg tree is kept
dependency free (as much as possible).

Now, releasing a new version of my egg is as simple as:
a) copying the last recipe file, changing the version number in the filename
b) changing the tag symbol
c) sending the new recipe to the egg tree maintainers (or adding it
myself, if I have write access)


--
Regards,
       Imran Rafique



reply via email to

[Prev in Thread] Current Thread [Next in Thread]