On Sat, Oct 18, 2008 at 9:10 PM, Paolo Missier <address@hidden>
the need for sanity checks of myExp workflows has been raised before, and not only in the context of the so-called "workflow decay monitor", and indeed I have long been an advocate of "packing" all the necessary test data along with the workflow.
thanks for the comments.
But I was thinking about a more rigorous section for tests, as I did describe in a previous mail.
I believe that testing is very important for scientific programming - it is a good practice, something that should always be requested when publishing the results of an experiment that made use of a bioinformatics tool.
For the moment, maybe, you could just add a temporary section 'Test and Controls' to the detailed view, so people will be able to describe which tests they have made with words.
Well... can you please give me some other comments on the other mails I wrote in this thread? You seem to understand the problem very well.
Adding exception management to SW components is also good SW eng practice in general, but we can't expect users to do that systematically by adding new processors, and as you (Paul) point out this may reduce the readability of the process. On the other hand, provider-supplied error messages may not be informative enough.
Most experimental scientists are more familiar with the concepts of testing and designing controls than with programming.
So I don't think it will be too difficult for them to understand it as new feature for myExperiment. Actually it could be a point for very interesting discussions.
You don't have to put a sanity check for every input, if you think it will complicate the workflow un-necessarly. But I believe there is should be at least some description on which of the user's input are checked and how, and which are not.
Tests in a bioinformatics protocol are not only related to validating input and outputs, but there are a lot of options, one of which for example is to use test data.
I would prefer to choose a workflow by looking at its inputs and outputs, and the tests that are made, rather than looking if it is complicated or not.
it seems to me that there was a bug in the workflow (but I am not sure), which only came up because the output "didn't look right" to me (and I barely understand SNPs etc.).
Having made test I/O available would have (a) spotted this and (b) avoided having me bother people to verify this.
So I believe this is indeed an important issue...
I agree with you... unfortunately, for historical reasons, many computational biologists are not used to these concepts. It would be good if myExperiment could have a role of instructing people about this issue.
Paul Fisher wrote:
I see your point, but, this will in most cases be handeled directly by the service provider. An example is to supply a gene identifier, when a protein identifier is needed. The service would recognise that you have put the wrong id in, as it simply wont return any results, or return an error stating that the input was incorrect. This would be a service-side means of error checking. On the other hand you COULD add in error checks for the entire workflow, but for an example of:
you can quickly see that the size of the workflow will be incredibly large, if each input is to be checked before it is passed to the next service.. Given that people choose to re-use not only on "if it works" but also on the size of the workflow, and many other things, then this may result in workflow no longer being used as they are too big to understand.
Perhaps this discussion should move to the Taverna-Users list instead, as a feature for Taverna or a workflow best practice thought?!?!?!
I do know that a workflow decay monitor has been in production, and there may be plans for its' integration into other projects, such as BioCatalogue. Not that I want to speculate here,