octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #62495] [octave forge] (statistics) pdist 'cos


From: Nicholas Jankowski
Subject: [Octave-bug-tracker] [bug #62495] [octave forge] (statistics) pdist 'cosine' metric - internal expansion causes out of memory error
Date: Fri, 20 May 2022 15:46:52 -0400 (EDT)

URL:
  <https://savannah.gnu.org/bugs/?62495>

                 Summary: [octave forge] (statistics) pdist 'cosine' metric -
internal expansion causes out of memory error
                 Project: GNU Octave
            Submitted by: nrjank
            Submitted on: Fri 20 May 2022 03:46:50 PM EDT
                Category: Octave Forge Package
                Severity: 3 - Normal
                Priority: 5 - Normal
              Item Group: Unexpected Error or Warning
                  Status: Confirmed
             Assigned to: None
         Originator Name: Nicholas Jankowski
        Originator Email: 
             Open/Closed: Open
                 Release: other
         Discussion Lock: Any
        Operating System: Any


    _______________________________________________________

Follow-up Comments:


-------------------------------------------------------
Date: Fri 20 May 2022 03:46:50 PM EDT By: Nicholas Jankowski <nrjank>
following a query over at stackoverflow [1], an attempt to use the
'silhouette' function was resulting in an unexpected "out of memory or
dimension too large for Octave's index type" error for inputs that are well
within expected memory/index length limits. Examlpe code below.

It turns out 'pdist' is called with the 'cosine' metric, and the vectorization
used in that method causes an extreme expansion in an internal variable,
causing the error. The test case input is 864x25333, the expected output is
864x1, but internally it attempts to create a 25333x372816 array. 

test code:

pkg load statistics
data = rand(864,25333);
idx = kmeans(data,3,'Distance','cosine');
test1 = silhouette(data, idx, 'cosine');

error: out of memory or dimension too large for Octave's index type
error: called from
    pdist at line 164 column 14
    silhouette at line 125 column 16


pdist, lines 163-166 'cosine' block:

```
case "cosine"
        prod = X(:,Xi) .* X(:,Yi);
        weights = sumsq (X(:,Xi), 1) .* sumsq (X(:,Yi), 1);
        y = 1 - sum (prod, 1) ./ sqrt (weights);
```

Xi and Yi are calculated from nchoosek(data, 2), resulting in a 2x372816
array.  Thus X(:,Xi) and X(:,Yi) are each ~75GB if type double).

Testing against Matlab 2022a, the test code runs without issue in a few
seconds, and memory use never spikes more than 500MB over base usage. So such
an expansion seems to not be absolutely necessary for the algorithm. Would be
worth determining if a more memory efficient option is available. 

[1]
https://stackoverflow.com/questions/72282190/octave-error-out-of-memory-or-dimension-too-large-for-octaves-index-type







    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?62495>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]