octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: new strsplit function


From: Ben Abbott
Subject: Re: new strsplit function
Date: Sat, 20 Apr 2013 14:38:17 -0400

On Apr 3, 2013, at 8:44 AM, Ben Abbott wrote:

> On Apr 2, 2013, at 11:42 PM, John W. Eaton wrote:
> 
>> On 04/02/2013 07:40 PM, Ben Abbott wrote:
>> 
>>> On Apr 2, 2013, at 6:35 PM, Carnë Draug wrote:
>>> 
>>>> On 2 April 2013 18:04, Ben Abbott<address@hidden>  wrote:
>>>> 
>>>>> On Apr 2, 2013, at 12:00 PM, Carnë Draug wrote:
>>>>> 
>>>>>> On 2 April 2013 13:02, Ben Abbott<address@hidden>  wrote:
>>>>>>> I've added a "conventional" to the possible delimitertypes, and 
>>>>>>> selected this as the default when all delimiters are scalar characters. 
>>>>>>> I've also enabled 2D character array inputs. The following replacements 
>>>>>>> may be made into OF, and Octave core, to use Jaroslav's implementation. 
>>>>>>>  A changeset to core to use these substitutions will also be needed.
>>>>>>> 
>>>>>>> - strsplit (str, del)
>>>>>>> + strsplit (str, del, "collapsedelimiters", false, "delimitertype", 
>>>>>>> "conventional")
>>>>>>> 
>>>>>>> - strsplit (str, del, false)
>>>>>>> + strsplit (str, del, "collapsedelimiters", false, "delimitertype", 
>>>>>>> "conventional")
>>>>>>> 
>>>>>>> - strsplit (str, del, true)
>>>>>>> + strsplit (str, del, "collapsedelimiters", true, "delimitertype", 
>>>>>>> "conventional")
>>>>>> 
>>>>>> That means that OF packages with this change, will only work after 3.8
>>>>>> being released, and the ones without the change will stop working once
>>>>>> 3.8 is released, so the releases would have to be timed.
>>>>>> 
>>>>>> Instead, I propose we add the old strsplit in the private directory of
>>>>>> each package. Then, once 3.8 gets released, we can remove them and
>>>>>> make the change as each package gets re-released.
>>>>> 
>>>>> That's a good idea!
>>>> 
>>>> I have just made the change on revision 11779. I have also created a
>>>> new task on the trackers https://savannah.gnu.org/task/index.php?12561
>>>> so we don't forget what packages got affected.
>>>> 
>>>> Finally, geometry and mechanics package need to be fixed some other
>>>> way since they make use of it on PKG_DEL and PKD_ADD scripts.
>>>> 
>>>> Carnë
>>> 
>>> I pushed a slightly modified changeset.
>>> 
>>>     http://hg.savannah.gnu.org/hgweb/octave/rev/5be43435bd5b
>> 
>> In a recent changeset, you made the following change:
>> 
>> -    list = strsplit (list, " \n\t", true);
>> +    list = strsplit (list, " \n\t", true, "delimitertype", "legacy");
>> 
>> If people will have to make a change in their code to recover previous 
>> behavior does is really help to introduce a new option like this?  Isn't 
>> this equivalent to
>> 
>> list = strsplit (list, {" ", "\n", "\t"})
>> 
>> ?  Why not just tell people to make this change instead of introducing the 
>> "legacy" delimiter type?
> 
> The "legacy" delimiter type was introduced because Jaroslav's implementation 
> was roughly 30x faster than the regexp approach.  In many instances the slow 
> down probably isn't a big deal, but for strread / textscan the slow down 
> would be a heavy burden.
> 
>> Also, the new strsplit does not seem to be working correctly for 
>> single-quoted strings that contain escape sequences.  I think that
>> 
>> strsplit ("foo\tbar", '\t')
>> 
>> should split on the TAB, but it is currently returning the original string 
>> for me.
> 
> Opps.  I hadn't considered single quoted strings.  Your example does work in 
> Matlab.  How is this sort of thing handled in other places?  It is sufficient 
> to just ...
> 
>       delimiter = sprintf (delimiter) ?
> 
>> This is starting to look a bit messy.  Maybe we should revert this change 
>> until we have a better plan for the transition.  Maybe it would be good to 
>> give people some advance warning before making this change all at once.
> 
> 
> Sorry, I hadn't considered that.  I say Jordi's request for someone to take 
> this on, and it looked like a short project I could find the time for.
> 
>       
> https://mailman.cae.wisc.edu/pipermail/octave-maintainers/2013-March/032734.html
> 
> To revert, the following three changesets need to be included.
> 
>       http://hg.savannah.gnu.org/hgweb/octave/rev/1de4ec2a856d
> 
>       http://hg.savannah.gnu.org/hgweb/octave/rev/5be43435bd5b
> 
>       http://hg.savannah.gnu.org/hgweb/octave/rev/61989cde13ae
> 
> Ben

All,

The attached changeset adds sq-string support for "simple" delimiters.  Unless, 
I've missed something else, that should make strsplit() fully Matlab compatible.

Regarding the "legacy" delimiter type, jwe made a good point.  I also 
sympathised with Rik/Philip suggestion to preserve the original algorithm for 
its speed advantage.   Personally, I'm inclined to leave "legacy" in place, but 
can be easily swayed to revert that.

Comments?

Ben

Attachment: changeset.patch
Description: Binary data





reply via email to

[Prev in Thread] Current Thread [Next in Thread]