bug-apl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-apl] Question about behavior of ⍋


From: David B. Lamkins
Subject: Re: [Bug-apl] Question about behavior of ⍋
Date: Mon, 07 Jul 2014 21:46:30 -0700

On Tue, 2014-07-08 at 12:38 +0800, Elias Mårtenson wrote:
> Right, but just having a "plain" collating order for Unicode would
> require me to pass a million-element array (⎕UCS¨⍳1114111) as left
> argument to grade.
> 

I guess you could do that if you needed to impose a complete collating
order upon every code point. Most applications would be content, I
think, with sorting alphanumerics (in all the languages of interest)
plus common punctuation.

> 
> That said, I can't even get dyadic grade to work at all, but that's a
> separate issue.
> 

Here's a working example.

∇z←suffix CF⍙ls path;dir
  ⍝ Return a character matrix of directory entries. Left argument,
  ⍝ if present, filters entries by suffix.
  z←0 0⍴''
  dir←CF¯FILEIO[28] path
  dir←(⍳↑⍴dir) ⎕io⌷dir
  ⍎(0≠⎕nc 'suffix')/'dir←((⊂,suffix)≡¨(-⍴,suffix)↑¨dir)/dir'
  →(0=⍴dir)/0
  dir←⊃dir
  z←dir[(⎕ucs ⎕io-⍨⍳256)⍋dir;]
∇

CF¯FILEIO is the bound name of the lib_file_io native function.

On the last line, dir is a character matrix.

> 
> Regards,
> Elias
> 
> 
> On 8 July 2014 12:27, David B. Lamkins <address@hidden> wrote:
>         The problem with generating a permutation vector for an
>         "arbitrary"
>         Unicode string is still a problems of collating order. There
>         is no
>         inherent order in Unicode; someone has to decide on what makes
>         sense as
>         a collating order for the subset of code points used by the
>         application.
>         
>         You should use ⎕ucs with a vector of code points to define
>         your own
>         collating order for Unicode; any code points not explicitly
>         specified in
>         the collating order will sort to the end.
>         
>         For example (and this is an easy case) you can use this to
>         specify a
>         default collating order (based upon ordinal value of the code
>         points
>         themselves) for the 8-bit ASCII subset:
>         
>         ⎕ucs ⎕io-⍨⍳256
>         
>         
>         
>         On Tue, 2014-07-08 at 12:09 +0800, Elias Mårtenson wrote:
>         > Dyadic grade doesn't make much sense in the context of
>         Unicode though.
>         > How do you grade an arbitrary Unicode string?
>         >
>         >
>         > That issue is there even if we completely disregard all the
>         > other Unicode-related collating issues.
>         >
>         >
>         > Regards,
>         > Elias
>         >
>         >
>         > On 8 July 2014 12:00, David B. Lamkins <address@hidden>
>         wrote:
>         >         Check my follow-up post.
>         >
>         >         I'm fairly certain that the issue is whether monadic
>         grade
>         >         applied to a
>         >         list of strings should do anything but signal a
>         domain error.
>         >         The ISO
>         >         spec says that monadic grade is defined only on
>         numeric
>         >         arguments.
>         >
>         >         My test case appears to have monadic grade treating
>         strings as
>         >         if they
>         >         encode numbers in a sufficiently large base.
>         >
>         >         If you want to sort strings, use dyadic grade. The
>         left
>         >         argument
>         >         specifies a collating sequence.
>         >
>         >         On Tue, 2014-07-08 at 11:43 +0800, Elias Mårtenson
>         wrote:
>         >         > Ordering by size first makes very little sense to
>         me. It
>         >         makes it very
>         >         > hard to sort any list of strings.
>         >         >
>         >         >
>         >         > I was hoping that the following would have done
>         so, but it
>         >         also
>         >         > suffers from the "length first" issue:
>         >         >
>         >         >
>         >         >       z[⍋ ⎕UCS¨ z←'aa' 'xx' 'aaa' 'xxx']
>         >         >  aa xx aaa xxx
>         >         >
>         >         >
>         >         > What is the proper way to sort strings given the
>         existing
>         >         semantics of
>         >         > grade?
>         >         >
>         >         >
>         >         > Regards,
>         >         > Elias
>         >         >
>         >         >
>         >         > On 8 July 2014 02:34, David Lamkins
>         <address@hidden>
>         >         wrote:
>         >         >         Looking at the spec, it seems that monadic
>         grade is
>         >         defined
>         >         >         only for numeric data.
>         >         >
>         >         >
>         >         >         That leaves open the question of whether
>         my example
>         >         should
>         >         >         have signaled a domain error.
>         >         >
>         >         >
>         >         >
>         >         >         On Mon, Jul 7, 2014 at 11:25 AM, David
>         Lamkins
>         >         >         <address@hidden> wrote:
>         >         >                 Given a list of character vectors
>         (and
>         >         scalars), grade
>         >         >                 appears to generate the
>         permutation vector
>         >         first by
>         >         >                 length then by content.
>         >         >
>         >         >                       ⍋'aaa' 'xx' 'y' 'bbb' 'cc'
>         >         >                 3 5 2 1 4
>         >         >
>         >         >
>         >         >                 This seems counterintuitive. It
>         seems as if
>         >         ⍋ treats
>         >         >                 character strings like numbers. Is
>         this a
>         >         bug?
>         >         >
>         >         >                 --
>         >         >                 "The secret to creativity is
>         knowing how to
>         >         hide your
>         >         >                 sources."
>         >         >                    Albert Einstein
>         >         >
>         >         >
>         >         >                 http://soundcloud.com/davidlamkins
>         >         >                 http://reverbnation.com/lamkins
>         >         >                 http://reverbnation.com/lcw
>         >         >                 http://lamkins-guitar.com/
>         >         >                 http://lamkins.net/
>         >         >                 http://successful-lisp.com/
>         >         >
>         >         >
>         >         >
>         >         >         --
>         >         >         "The secret to creativity is knowing how
>         to hide
>         >         your
>         >         >         sources."
>         >         >            Albert Einstein
>         >         >
>         >         >
>         >         >         http://soundcloud.com/davidlamkins
>         >         >         http://reverbnation.com/lamkins
>         >         >         http://reverbnation.com/lcw
>         >         >         http://lamkins-guitar.com/
>         >         >         http://lamkins.net/
>         >         >         http://successful-lisp.com/
>         >         >
>         >         >
>         >
>         >
>         >
>         >
>         >
>         
>         
>         
> 
> 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]