[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: new strsplit function
From: |
Ben Abbott |
Subject: |
Re: new strsplit function |
Date: |
Sat, 20 Apr 2013 14:38:17 -0400 |
On Apr 3, 2013, at 8:44 AM, Ben Abbott wrote:
> On Apr 2, 2013, at 11:42 PM, John W. Eaton wrote:
>
>> On 04/02/2013 07:40 PM, Ben Abbott wrote:
>>
>>> On Apr 2, 2013, at 6:35 PM, Carnë Draug wrote:
>>>
>>>> On 2 April 2013 18:04, Ben Abbott<address@hidden> wrote:
>>>>
>>>>> On Apr 2, 2013, at 12:00 PM, Carnë Draug wrote:
>>>>>
>>>>>> On 2 April 2013 13:02, Ben Abbott<address@hidden> wrote:
>>>>>>> I've added a "conventional" to the possible delimitertypes, and
>>>>>>> selected this as the default when all delimiters are scalar characters.
>>>>>>> I've also enabled 2D character array inputs. The following replacements
>>>>>>> may be made into OF, and Octave core, to use Jaroslav's implementation.
>>>>>>> A changeset to core to use these substitutions will also be needed.
>>>>>>>
>>>>>>> - strsplit (str, del)
>>>>>>> + strsplit (str, del, "collapsedelimiters", false, "delimitertype",
>>>>>>> "conventional")
>>>>>>>
>>>>>>> - strsplit (str, del, false)
>>>>>>> + strsplit (str, del, "collapsedelimiters", false, "delimitertype",
>>>>>>> "conventional")
>>>>>>>
>>>>>>> - strsplit (str, del, true)
>>>>>>> + strsplit (str, del, "collapsedelimiters", true, "delimitertype",
>>>>>>> "conventional")
>>>>>>
>>>>>> That means that OF packages with this change, will only work after 3.8
>>>>>> being released, and the ones without the change will stop working once
>>>>>> 3.8 is released, so the releases would have to be timed.
>>>>>>
>>>>>> Instead, I propose we add the old strsplit in the private directory of
>>>>>> each package. Then, once 3.8 gets released, we can remove them and
>>>>>> make the change as each package gets re-released.
>>>>>
>>>>> That's a good idea!
>>>>
>>>> I have just made the change on revision 11779. I have also created a
>>>> new task on the trackers https://savannah.gnu.org/task/index.php?12561
>>>> so we don't forget what packages got affected.
>>>>
>>>> Finally, geometry and mechanics package need to be fixed some other
>>>> way since they make use of it on PKG_DEL and PKD_ADD scripts.
>>>>
>>>> Carnë
>>>
>>> I pushed a slightly modified changeset.
>>>
>>> http://hg.savannah.gnu.org/hgweb/octave/rev/5be43435bd5b
>>
>> In a recent changeset, you made the following change:
>>
>> - list = strsplit (list, " \n\t", true);
>> + list = strsplit (list, " \n\t", true, "delimitertype", "legacy");
>>
>> If people will have to make a change in their code to recover previous
>> behavior does is really help to introduce a new option like this? Isn't
>> this equivalent to
>>
>> list = strsplit (list, {" ", "\n", "\t"})
>>
>> ? Why not just tell people to make this change instead of introducing the
>> "legacy" delimiter type?
>
> The "legacy" delimiter type was introduced because Jaroslav's implementation
> was roughly 30x faster than the regexp approach. In many instances the slow
> down probably isn't a big deal, but for strread / textscan the slow down
> would be a heavy burden.
>
>> Also, the new strsplit does not seem to be working correctly for
>> single-quoted strings that contain escape sequences. I think that
>>
>> strsplit ("foo\tbar", '\t')
>>
>> should split on the TAB, but it is currently returning the original string
>> for me.
>
> Opps. I hadn't considered single quoted strings. Your example does work in
> Matlab. How is this sort of thing handled in other places? It is sufficient
> to just ...
>
> delimiter = sprintf (delimiter) ?
>
>> This is starting to look a bit messy. Maybe we should revert this change
>> until we have a better plan for the transition. Maybe it would be good to
>> give people some advance warning before making this change all at once.
>
>
> Sorry, I hadn't considered that. I say Jordi's request for someone to take
> this on, and it looked like a short project I could find the time for.
>
>
> https://mailman.cae.wisc.edu/pipermail/octave-maintainers/2013-March/032734.html
>
> To revert, the following three changesets need to be included.
>
> http://hg.savannah.gnu.org/hgweb/octave/rev/1de4ec2a856d
>
> http://hg.savannah.gnu.org/hgweb/octave/rev/5be43435bd5b
>
> http://hg.savannah.gnu.org/hgweb/octave/rev/61989cde13ae
>
> Ben
All,
The attached changeset adds sq-string support for "simple" delimiters. Unless,
I've missed something else, that should make strsplit() fully Matlab compatible.
Regarding the "legacy" delimiter type, jwe made a good point. I also
sympathised with Rik/Philip suggestion to preserve the original algorithm for
its speed advantage. Personally, I'm inclined to leave "legacy" in place, but
can be easily swayed to revert that.
Comments?
Ben
changeset.patch
Description: Binary data
- Re: new strsplit function, Rik, 2013/04/01
- Re: new strsplit function, Ben Abbott, 2013/04/01
- Re: new strsplit function, Philip Nienhuis, 2013/04/01
- Re: new strsplit function, Ben Abbott, 2013/04/02
- Re: new strsplit function, Carnë Draug, 2013/04/02
- Re: new strsplit function, Ben Abbott, 2013/04/02
- Re: new strsplit function, Carnë Draug, 2013/04/02
- Re: new strsplit function, Ben Abbott, 2013/04/02
- Re: new strsplit function, John W. Eaton, 2013/04/02
- Re: new strsplit function, Ben Abbott, 2013/04/03
- Re: new strsplit function,
Ben Abbott <=
- Re: new strsplit function, Philip Nienhuis, 2013/04/20
- Re: new strsplit function, Ben Abbott, 2013/04/20
- Re: new strsplit function, Philip Nienhuis, 2013/04/20
- Re: new strsplit function, Ben Abbott, 2013/04/20
- Re: new strsplit function, Ben Abbott, 2013/04/23
- Re: new strsplit function, Ben Abbott, 2013/04/23
- Re: new strsplit function, Ben Abbott, 2013/04/02