[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: New strsplit function
From: |
Ben Abbott |
Subject: |
Re: New strsplit function |
Date: |
Thu, 16 May 2013 14:19:38 +0800 |
On May 16, 2013, at 1:39 PM, John W. Eaton wrote:
> I received a report that the new strsplit function doesn't match
> Matlab behavior for the following input. I looked at fixing it, but
> I'm afraid I'll screw something else up because of the fairly complex
> interactions among all the different options (legacy,
> collapsedelimiters, etc.). Here's the simple test case:
>
> With Matlab 2013a:
>
> matlab> sgeQueryStr = '::'
>
> sgeQueryStr =
>
> ::
>
> matlab> splitStr = strsplit(deblank(sgeQueryStr), ':')
>
> splitStr =
>
> '' ''
>
> matlab> length(splitStr)
>
> ans =
>
> 2
>
>
> So, what's the proper fix?
>
> Also, I think that Matlab is saying that a delimiter at the beginning
> of a string generates an empty result, but one at the end does not.
> Before the recent changes to strsplit, Octave would return three empty
> strings for this case. So should we consider that a bug in Octave?
> If not, how do we preserve old behavior and also get Matlab
> compatibility right in this case?
>
> If we can't do both, maybe we should just abandon the "legacy"
> behavior in our current strsplit function? If we do that, I suppose
> we could distribute the old version as ostrsplit for a release or two.
>
> jwe
hmmm ... I took a look at Matlab 2013a. It's not clear to me that we'd want to
copy this.
matlab> strsplit('', 'a')
ans =
{''}
matlab> strsplit('a', 'a')
ans =
'' ''
matlab> strsplit('aa', 'a')
ans =
'' ''
matlab> strsplit('aaa', 'a')
ans =
'' ''
matlab> strsplit('aaaa', 'a')
ans =
'' ''
matlab> strsplit ('abc', {'a','b','c'})
ans =
'' ''
In case it isn't clear, the output is a cellstring containing two empty strings.
The Matlab docs (http://www.mathworks.com/help/matlab/ref/strsplit.html) says
that consecutive delimiters are collapsed by default. Which means the
documented behavior is to return {''} in each case above. If I had to guess,
I'd say Matlab's first attempt at strsplit () has a bug? Either that, or the
documentation is wrong.
In either event, I'm ok with preserving the original strsplit as a separate
file. Do you prefer ostrsplit.m or (for consistency with cstrcat.m) should be
go with cstrsplit.m?
What is the best way to re-introduce & rename the original version? is there a
mercurial trick that will do that?
Ben