gm2
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: wide set re-implementation progress


From: Michael Riedl
Subject: Re: wide set re-implementation progress
Date: Mon, 29 Jan 2024 19:16:48 +0100
User-agent: Mozilla/5.0 (X11; Linux i686; rv:102.0) Gecko/20100101 Thunderbird/102.11.0

Gaius,

attached some code from the self-consistent-field part of my QM program condensed in one file.

For benchmarking maybe better suited than the production code which involves several thousand lines of code.

The actual code for benchmarking is after the line with "MaxIJKL := (N*(N+1) DIV 8)*(N*(N+1)+2);", code before is only some tests to validate that packing/unpacking is working correctly. If you increase "N" in the lines before you can see how quick the number of calculations is increasing (it goes by O(8) )

The fourfold nested loop comes from a simplified code where only one six-dimensional integral is calculated after another, the production code is doing pre-screening on integrals if they vanish by parity or symmetry of the problem and organized the calculation in batches to re-use intermediate result, but that's not essential for checking execution times.

In case of questions do not hesitate to contact me.

Michael

PS1: The 64 bit packing/unpacking is quite similar, I only use a record of two 32-bit cardinals, one holding the first index pair (i,j) and the other the second pair (k,l).

PS2: I inlined some code to write out bit-pattern ...


Am 09.01.24 um 23:23 schrieb Gaius Mulley:
Michael Riedl<udo-michael.riedl@t-online.de>  writes:

Hallo Gaius,
Hi Michael,

one short question on the topic - at what time is a set considered to
be "wide" ?
wide is more than TBITSIZE (BITSET) (32 bits on most architectures).  No
change for sets <= TBITSIZE (BITSET).

I use SYSTEM.SHIFT and SYSTEM.ROTATE and bit-wise AND/OR operations
for index related packing/unpacking in a four-dimensional space, and

it would be painful slow if this would be done within a runtime
library for 32 or 64 bit arithmetic.
I'm aiming to make it just as efficient - the M2WIDESET module will have
<* module: inline *> (or some appropriate attribute name) which will
result in inlined trees - the same as produced internally by cc1gm2.
I'd be interested in using your library code for bench-marking - to make
sure that the new wide set performs well.  The <* module: inline *> can
also be used by your library module for example - and the M2WIDESET can
be internally optimised at m2 source level rather than lispy C code in
cc1gm2.

The M2WIDESET module provides a superset of SYSTEM.ROTATE and friends.
Currently SYSTEM.ROTATE is internally implemented as a series of 32 bit
BITSET operations (with cc1gm2 detecting whether operands are constant
and calling the appropriate routine) - this can be done with the new
wide set implementation - in effect they should produce identical trees
(in an ideal world :-).  One major benefit is maintainability of
M2WIDESET (in m2 rather than lispy trees) and it might offer the
potential for detecting more optimisation cases - maybe the answer is to
heavily benchmark both during development.  For testing purposes we
could test against a baseline gm2-14 -fm2-whole-program as a suggestion

regards,
Gaius


Gruß

Michael

Am 09.01.24 um 17:01 schrieb Gaius Mulley:
Hi,

I thought I'd post an update on the re-implementation of wide sets in
gm2.  The wide set re-implementation moves all of the set arithmetic
operators into a runtime library (removing a substantial amount of
complexity from the compiler).  The development version of the compiler
will currently bootstrap using this technique (although it does not use
the runtime code).  All the libraries in /libgm2 build (and many use wide
sets - SET OF CHAR - for example).

Wide sets are re-implemented internally as arrays of bytes - thus
removing any endian portability issues.  Offloading these arithmetic
operators is inspired by m2r10 and when coupled with module inlining
should cause little if any performance penalty - and module inlining
would be available for general use.  I anticipate that the
re-implementation of wide sets would arrive in gcc master around end of
April (during the start of stage1 - when major changes are allowed
again).  At some point there should be a development branch in git
appear containing the wide set for testing,

regards,
Gaius

Attachment: PackInt.mod
Description: audio/mod


reply via email to

[Prev in Thread] Current Thread [Next in Thread]