lilypond-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Automated testing for users' LilyPond collections with new developme


From: Jean Abou Samra
Subject: Re: Automated testing for users' LilyPond collections with new development versions
Date: Wed, 30 Nov 2022 23:44:02 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.5.0

Le 28/11/2022 à 23:49, Karlin High a écrit :
This message intrigued me:

<https://lists.gnu.org/archive/html/lilypond-devel/2022-11/msg00222.html>

In it, Eric Benson reported a setup that allows testing new versions of LilyPond on a sizable body of work in a somewhat automated fashion.

Now, could automation like that also make use of the infrastructure for LilyPond's regression tests?

<http://lilypond.org/doc/v2.23/Documentation/contributor/regtest-comparison>

What effort/value would there be in making an enhanced convert-ly tool that tests a new version of LilyPond on a user's entire collection of work, reporting differences between old and new versions in performance and output?

Enabling something like this:

* New release of LilyPond comes out. Please test.

* Advanced users with large collections of LilyPond files do the equivalent of "make test-baseline," but for their collection instead of LilyPond's regtests. Elapsed time is recorded, also CPU and RAM info as seems good.

* New LilyPond gets installed

* Upgrade script runs convert-ly on the collection, first offering backup via convert-ly options or tarball-style.

* Equivalent of "make check" runs

* A report generates, optionally as email to lilypond-devel, with summary of regression test differences and old-vs-new elapsed time.

Ideally, this could quickly produce lots of good testing info for development versions of LilyPond, in a way encouraging user participation.




How much work: I don't know. Nonzero, probably not big.

Keep in mind, however, that on a regular basis, there is a change that generates lots of small differences, so you are likely to get mostly noise from a comparison like this. You can only really do it between consecutive unstable releases, because if you compare the last stable release with the current unstable release (assuming that a few unstable releases have passed since the stable one), the noise will likely be overwhelming. For this reason, the testers need to be really dedicated.

Best,
Jean

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]