bug-apl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Absolute limits of rank 2 bool matrix size in GNU APL?


From: Dr . Jürgen Sauermann
Subject: Re: Absolute limits of rank 2 bool matrix size in GNU APL?
Date: Tue, 28 Dec 2021 22:15:44 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0

Hi,

I have spent many hours to do a proper memory handling. The current solution is not perfect but will work in many cases. If anybody has suggestions how to improve then I'll be happy to use them.

The current approach is roughly this (if I remember correctly)

1. if the user specifies a *ulimit -v* other than *unlimited* then use it and assume it will be available
at any time. Otherwise assume a limit of 2 GB memory.
2. decrease the limit a little so that C++ exception handlers can allocate their memory. 3. during apl execution, keep track of the amunt of memory used and released by *operator new
*(= essentially malloc), which is roughly the physical memory used by apl.
4. When the limit is reached, raise WS FULL.

Before that, the apl process was sometimes killed (un-catchable) before it could raise WS FULL, which is IMHO worse than raising WS FULL a little too early (i.e. before the memory is really exhausted. The current approach means in Blake's terms, that we try our best to stay at Level 1. You can actually go directly from Level 1 to Level 3 by creating a large apl value, so any solution
for only one level does not really help.

As far as I can see, the free utility is just a nicer front-end to */proc/meminfo*.

We cannot check */proc/meminfo* before every memory allocation, so we do it at start-up and trust our own memory bookkeeping. There is a little slack caused by malloc, but our safety-margin 2.
above handles that.

I tried earlier to depend on paging, but I was running into problems (crashes instead of paging
even though the paging space was huge enough).

Also, conceptually *⎕WA *becomes useless if it can change without anything happening in apl (e.g, when other processes use memory). In theory we could not only allocate but also use all memory beforehand, but that is not very practical if it is not used. In the old days,
a process would know how much memory it has, but nowadays it does not.

Best Regards,
Jürgen


On 12/28/21 8:25 PM, Blake McBride wrote:
I think what you're talking about and what I am talking about are not exactly the same thing.  I can see what you are saying, but that is a contrived example.  For example:

Level 1:  you are using the RAM that exists (not over-committed)

Level 2:  you are using more RAM than you have causing over-commit and paging.

Level 3:  you allocate more memory than exists in RAM + paging.

I am talking about Levels 1 and 2.  You are talking about Level 3.  At least in my world, I have never hit Level 3 because Level 2 becomes intolerable slow and I kill the process myself.  For this reason, in my world, Level 3 never occurs.

It doesn't make sense to handle Level 2 problems with Level 3 logic.

Blake


On Tue, Dec 28, 2021 at 1:16 PM Elias Mårtenson <lokedhs@gmail.com <mailto:lokedhs@gmail.com>> wrote:

    Actually, the default memory allocator in glibc uses mmap and not
    sbrk, if I'm not mistaken. However, they both do the same thing at
    the end of the day, and it's to request more pages from the kernel.

    Unfortunately, Linux malloc never returns NULL. Even if you try to
    allocate a petabyte in one allocation. You can write to this
    memory and new pages will be created as you write, and at some
    point your write will fail with a SEGV because there are no free
    page left.

    What's even worse is that once you start to fail, the kernel will
    randomly kill processes until it gets more free pages. If that
    sounds idiotic, that's because it is, but that's how it works.

    You can configure the kernel to use not do this, but unfortunately
    you're going to have other problems if you do, primarily because
    all Linux software is written with the default behaviour in mind.

    You can try it right now. Write a C program that allocates a few
    TB of RAM and see what happens. Actually, nothing will happen
    until you start writing to this memory.

    Den ons 29 dec. 2021 03:09Blake McBride <blake1024@gmail.com
    <mailto:blake1024@gmail.com>> skrev:

        Hi Jürgen,

        First, to be sure you know where I'm coming from.  I know
        malloc() and sbrk() pretty well.  I've written malloc's before.

        It is utter news to me that malloc would return a pointer that
        you can't use - even in over-commit situations. 
        Theoretically, in over-commit situations, if you access a
        pointer that's paged out, the system should page it in first. 
        If that doesn't happen, that's a bug in the OS.  I'd create a
        simple program that mimics this behavior and submit it as a
        bug report.  (What I almost always find is that during the
        process of creating the sample program, I find the actual bug
        in my own program.)

        What I'd try to do is write my own malloc that does the following:

        1. Checks for the amount of real memory available using
        something I took from the /free/ command.

        2. If there is enough memory, call the real malloc and return it.

        3. If there is not enough real memory, rather than calling
        malloc and causing paging you can just return a null.

        Of course, I understand that sbrk() adjusts the system
        breakpoint essentially giving you more heap space. Malloc
        manages that free space that's already been taken from the OS
        via sbrk().

        With regard to item #1, you'd have to have an intelligent way
        of determining when to calculate the true available space. 
        You can't do it for 16 bytes, and you don't want to do it for
        every allocation.  On the other hand for large allocations or
        if you haven't queried the system for some time, it makes
        sense to update your opinion of the available space.

        Thanks.

        Blake




        On Tue, Dec 28, 2021 at 12:50 PM Dr. Jürgen Sauermann
        <mail@jürgen-sauermann.de
        <mailto:mail@j%C3%BCrgen-sauermann.de>> wrote:

            Hi Blake,

            I know. The problem is that other processes may, just as
            GNU APL does, allocate memory
            successfully via malloc(). malloc, on "success" returns a
            non-zero virtual memory address.

            The fact that malloc returns a non-zero address (=
            indicating malloc() success) does unfortunately
            not guarantee that you would be able to access that
            memory. You may or may
            not get a page fault when accessing the memory. The point
            in time when you call 'free' or
            similar tools (which basically read /proc/memory tell you
            the situation at that time which can
            be very different from the situation when you fill your
            virtual memory with real data (which
            assigns a physical memory page to the successfully
            allocated virtual memory page) and
            that may fail. Even worse, when this happen then your
            machine has no more memory and
            the C++ exception handlers will also fails due to lack of
            memory and the process is terminated
            without any chance to raise a WS full or even remain in apl.

            Best Regards,
            Jürgen


            On 12/28/21 6:52 PM, Blake McBride wrote:
            Since the Linux "free" command knows the difference
            between how much real memory vs. virtual memory is
            available, it may be useful to use a similar method.

            Blake


            On Tue, Dec 28, 2021 at 11:34 AM Dr. Jürgen Sauermann
            <mail@jürgen-sauermann.de
            <mailto:mail@j%C3%BCrgen-sauermann.de>> wrote:

                Hi Russ,

                it has turned out to be very difficult to find a
                reliable way to figure the memory that is available for
                a process like the GNU APL interpreter. One (of
                several) reasons for that is that GNU linux tells you
                that it has more memory than it really has (so-called
                over-commitment). You can malloc() more memory
                than you have and malloc will return a virtual memory
                (and no error) when you ask for it. However, if you
                access the virtual memory at a later point in time
                (i.e. requesting real memory for it) then the process
                may crash badly if that real memory is not available.

                GNU APL deals with this by assuming (by default) that
                the total memory available for the GNU APL process is
                about 2 GB,

                You can increase that amount (outside apl) be setting
                a larger memory limit, probably with:

                *ulimit --virtual-memory-size*or *ulimit -v
                *(depending on platform).

                in the shell or script before starting apl. However,
                note that:

                * *ulimit -v* *ulimited* will not work because this
                is the default and GNU APL will apply its default of
                2 GB in this case,
                * the WS FULL behaviour of GNU APL (ans ⎕WA) will
                become unreliable. You must ensure that GNU APL will get
                  as much memory as you promised it with ulimit, and
                * all GNU APL cells have the same size (e.g. 24 byte
                on a 64-bit CPU, see *apl -l37*) even if the cells
                contain only
                 Booleans.

                The workaround for really large Boolean arrays is to
                pack them into the 64-bits of a GNU APL integer
                and maybe use the bitwise functions (⊤∧, ⊤∨, ...) of
                GNU APL to access them group-wise.

                Best Regards,
                Jürgen


                On 12/28/21 3:53 AM, Russtopia wrote:
                Hi, doing some experiments in learning APL I was
                writing a word frequency count program that takes in
                a document, identifies unique words and then outputs
                the top 'N' occurring words.

                The most straightforward solution, to me, seems to
                be ∘.≡ which works up to a certain dataset size. The
                main limiting statement in my program is

                wordcounts←+⌿ (wl ∘.≡ uniqwords)

                .. which generates a large boolean array which is
                then tallied up for each unique word.

                I seem to run into a limit in GNU APL. I do not see
                an obvious ⎕SYL parameter to increase the limit and
                could not find any obvious reference in the docs
                either. What are the absolute max rows/columns of a
                matrix, and can the limit be increased? Are they
                separate or a combined limit?

                      5 wcOuterProd 'corpus/135-0-5000.txt'    ⍝⍝
                5000-line document
                Time: 26419 ms
                  the   of   a and  to
                 2646 1348 978 879 858
                      ⍴wl
                36564
                      ⍴ uniqwords
                5695

                      5 wcOuterProd 'corpus/135-0-7500.txt'   ⍝⍝
                7500-line document
                WS FULL+
                wcOuterProd[8]  wordcounts←+⌿(wl∘.≡uniqwords)
                                              ^           ^
                      ⍴ wl
                58666
                      ⍴ uniqwords
                7711


                I have an iterative solution which doesn't use a
                boolean matrix to count the words, rather looping
                through using pick/take and so can handle much
                larger documents, but it takes roughy 2x the
                execution time.

                Relating to this, does GNU APL optimize boolean
                arrays to minimize storage (ie., using larger bit
                vectors rather than entire ints per bool) and is
                there any clever technique other experience APLers
                could suggest to maintain the elegant 'loop-free'
                style of computing but avoid generating such large
                bool matrices? I thought of perhaps a hybrid
                approach where I iterate through portions of the
                data and do partial ∘.≡ passes but of course that
                complicates the algorithm.

                [my 'outer product' and 'iterative' versions of the
                code are below]

                Thanks,
                -Russ

                ---
                #!/usr/local/bin/apl --script
                 
⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝
                ⍝  ⍝
                ⍝ wordcount.apl      2021-12-26  20:07:07 (GMT-8)  ⍝
                ⍝  ⍝
                 
⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝

                ⍝ function edif has ufun1 pointer 0!

                ∇r ← timeMs; t
                  t ← ⎕TS
                  r ←
                (((t[3]×86400)+(t[4]×3600)+(t[5]×60)+(t[6]))×1000)+t[7]
                ∇

                ∇r ← lowerAndStrip s;stripped;mixedCase
                 stripped ← '
                abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz*'
                 mixedCase ← ⎕av[11],'
                
,.?!;:"''()[]-ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
                 r ← stripped[mixedCase ⍳ s]
                ∇

                ∇c ← h wcIterative fname
                  ⍝⍝;D;WL;idx;len;word;wc;wcl;idx
                  ⍝⍝ Return ⍒-sorted count of unique words in string
                vector D, ignoring case and punctuation
                  ⍝⍝ @param h(⍺) - how many top word counts to return
                  ⍝⍝ @param D(⍵) - vector of words
                  ⍝⍝⍝⍝
                  D ← lowerAndStrip (⎕fio['read_file'] fname)  ⍝ raw
                text with newlines
                  timeStart ← timeMs
                  D ← (~ D ∊ ' ') ⊂ D ⍝ make into a vector of words
                  WL ← ∪D
                  ⍝⍝#DEBUG# ⎕ ← 'unique words:',WL
                  wcl ← 0⍴0
                  idx ← 1
                  len ← ⍴WL
                count:
                  ⍝⍝#DEBUG# ⎕ ← idx
                  →(idx>len)/done
                  word ← ⊂idx⊃WL
                  ⍝⍝#DEBUG# ⎕ ← word
                  wc ← +/(word≡¨D)
                  wcl ← wcl,wc
                  ⍝⍝#DEBUG# ⎕ ← wcl
                  idx ← 1+idx
                  → count
                done:
                  c ← h↑[2] (WL)[⍒wcl],[0.5]wcl[⍒wcl]
                  timeEnd ← timeMs
                  ⎕ ← 'Time:',(timeEnd-timeStart),'ms'
                ∇

                ∇r ← n wcOuterProd fname
                  ⍝⍝ ;D;wl;uniqwords;wordcounts;sortOrder
                  D ← lowerAndStrip (⎕fio['read_file'] fname)  ⍝ raw
                text with newlines
                  timeStart ← timeMs
                  wl ← (~ D ∊ ' ') ⊂ D
                  ⍝⍝#DEBUG# ⎕ ← '⍴ wl:', ⍴ wl
                  uniqwords ← ∪wl
                  ⍝⍝#DEBUG# ⎕ ← '⍴ uniqwords:', ⍴ uniqwords
                  wordcounts ← +⌿(wl ∘.≡ uniqwords)
                  sortOrder ← ⍒wordcounts
                  r ← n↑[2]
                uniqwords[sortOrder],[0.5]wordcounts[sortOrder]
                  timeEnd ← timeMs
                  ⎕ ← 'Time:',(timeEnd-timeStart),'ms'
                ∇








reply via email to

[Prev in Thread] Current Thread [Next in Thread]