Index: arrays.xml =================================================================== RCS file: /home/pooma/Repository/r2/docs/manual/arrays.xml,v retrieving revision 1.2 diff -c -p -r1.2 arrays.xml *** arrays.xml 2002/01/22 15:48:49 1.2 --- arrays.xml 2002/01/24 04:56:31 *************** *** 1,3 **** --- 1,4 ---- + &array; Containers *************** *** 961,967 **** Since an &array; can be queried for its domain, we briefly describe some &domain; operations. A fuller description, including arithmetic operations, occur in . --- 962,973 ---- Since an &array; can be queried for its domain, we briefly describe some &domain; operations. A fuller description, including arithmetic operations, occur in . As we mentioned in , the Pooma/Domains.h header file ! declares &domain;s, but most container header files automatically ! include Pooma/Domains.h ! so no explicit inclusion of is usually necessary.
*************** *** 976,983 **** ! Other &domain; accessors are described in . --- 982,990 ---- ! D abbreviates the particular &domain; ! type, e.g., &interval; or &grid;. Other &domain; accessors ! are described in . *************** *** 1037,1053 **** &domain; member functions are listed in . Functions applicable ! to one-dimensional and multidimensional &domain;s are listed before functions that only applicable to one-dimensional &domain;s. The size member function yields the total number of indices in a given &domain;. If and only if this number is zero, empty will yield &true;. A multidimensional domain<&dim;> is the direct product of &dim; ! one-dimensional &domain;s. ! ! HERE ! --- 1044,1087 ---- &domain; member functions are listed in . Functions applicable ! to both one-dimensional and multidimensional &domain;s are listed before functions that only applicable to one-dimensional &domain;s. The size member function yields the total number of indices in a given &domain;. If and only if this number is zero, empty will yield &true;. A multidimensional domain<&dim;> is the direct product of &dim; ! one-dimensional &domain;s. The operator[](int ! dimension) operator extracts the one-dimensional ! &domain; corresponding to its parameter. For example, the three ! Range<1> (one-dimensional) &domain;s can be ! extracted from a Range<3> ! object r using ! r[0], r[1], and ! r[2]. ! ! &domain; accessors applicable only to one-dimensional ! &domain;s are listed in the second half of . The ! length member function, analogous to the ! multidimensional size function, returns ! the number of indices in the &domain;. The ! first and last ! member functions return the domain's beginning and ending ! indices. The begin and ! end member functions return input ! iterators pointing to these respective locations. They have type ! D<1>::iterator, where D ! abbreviates the &domain;'s type, e.g., &interval; or &grid;. ! The min and ! max member functions return the minimum ! and maximum indices in the &domain; object, respectively. For ! &locone; and &intervalone;, these are the same as ! first and last, ! but &rangeone; and &gridone; can have their largest index at the ! beginning of their &domain;s. ! *************** std::cout &openopen; a.read(2,-2) &openo *** 1814,1820 ****
! &dynamicarray;s: Dynamically Changing Domain Sizes &array;s have fixed domains so the set of valid indices remains fixed after declaration. The &dynamicarray; class --- 1848,1854 ----
! &dynamicarray;s &array;s have fixed domains so the set of valid indices remains fixed after declaration. The &dynamicarray; class Index: concepts.xml =================================================================== RCS file: /home/pooma/Repository/r2/docs/manual/concepts.xml,v retrieving revision 1.7 diff -c -p -r1.7 concepts.xml *** concepts.xml 2002/01/22 15:48:49 1.7 --- concepts.xml 2002/01/24 04:56:32 *************** *** 1,3 **** --- 1,4 ---- + Overview of &pooma; Concepts *************** *** 13,22 **** separate categories: ! container ! data structure holding one or more values and usually addressed ! by indices --- 14,23 ---- separate categories: ! containers ! data structures holding one or more values and usually accessed ! using indices *************** *** 34,41 **** ! See . Many &pooma; programs ! select one possibility from each column. For example, used &array; containers and stencils for sequential computation, while used &field; --- 35,43 ---- ! categorizes the &pooma; ! concepts. Many &pooma; programs select one possibility from each ! category. For example, used &array; containers and stencils for sequential computation, while used &field; *************** *** 103,115 **** Most &pooma; programs use containers to store groups of values. &pooma; containers are objects that store other objects such as numbers or vectors. They control allocation ! and deallocation of and access to these stored objects. They are a ! generalization of &c; arrays, but &pooma; containers are first-class ! objects so they can be used directly in expressions. They are ! similar to &cc; containers such as vector, ! list, and stack. See for a summary of the ! containers. This section describes many concepts, but one need not understand them all to begin programming with the &poomatoolkit;. --- 105,117 ---- Most &pooma; programs use containers to store groups of values. &pooma; containers are objects that store other objects such as numbers or vectors. They control allocation ! and deallocation of these stored objects and access to them. They ! are a generalization of &c; arrays, but &pooma; containers are ! first-class objects so they can be used directly in expressions. ! They are also similar to &cc; containers such as ! vector, list, and stack. See ! for a summary of ! the containers. This section describes many concepts, but one need not understand them all to begin programming with the &poomatoolkit;. *************** *** 126,131 **** --- 128,137 ---- multiple processors. The programs in the previous chapter illustrate many of these concepts. + briefly + describes the six &pooma; containers. They are more fully described + in the paragraphs below. +
&pooma; Container Summary *************** *** 201,210 **** to &array;s, each cell may contain multiple values and multiple materials. A &field;'s mesh stores its spatial ! characteristics and can map yield, e.g., the cell at a particular ! point, the distance between two cells, or a cell's normals. A ! &field; should be used whenever geometric or spatial computations ! are needed, multiple values per index are desired, or a computation involves more than one material. --- 207,216 ---- to &array;s, each cell may contain multiple values and multiple materials. A &field;'s mesh stores its spatial ! characteristics and can yield, e.g., the cell at a particular point, ! the distance between two cells, or a cell's normals. A &field; ! should be used whenever geometric or spatial computations are ! needed, multiple values per index are desired, or a computation involves more than one material. *************** *** 230,254 **** multiplying a &matrix; and a &vector;.The data of an &array;, &dynamicarray;, or &field; can be ! viewed using more than one container by taking a view. A view of an existing container &container; is a container whose domain ! is a subset of &container;. The subset can equal the original ! domain. A view acts like a reference in that changing any of the ! view's values also changes the original container's and vice versa. ! While users sometimes explicitly create views, they are perhaps more ! frequently created as temporaries in expressions. For example, if ! A is an &array; and I is a ! domain, A(I) - A(I-1) uses two views to form ! the difference between adjacent values.
Choosing a Container The two most commonly used &pooma; containers are &array;s ! and &field;s, while &vector;, &matrix;, or &tensor; frequently ! represent mathematical objects. contains a decision tree describing how to choose an appropriate container. --- 236,262 ---- multiplying a &matrix; and a &vector;. The data of an &array;, &dynamicarray;, or &field; can be ! accessed using more than one container by taking a view. A ! view of an existing container &container; is a container whose domain ! is a subset of &container;'s domain. The subset can equal the ! original domain. A view acts like a reference in that changing any ! of the view's values also changes the original container's and vice ! versa. While users sometimes explicitly create views, they are ! perhaps more frequently created as temporaries in expressions. For ! example, if A is an &array; and ! I is a domain, A(I) - ! A(I-1) uses two views to form the difference between ! adjacent values.
Choosing a Container The two most commonly used &pooma; containers are &array;s ! and &field;s, while &vector;, &matrix;, and &tensor; represent ! mathematical objects. contains a decision tree describing how to choose an appropriate container. *************** *** 299,304 **** --- 307,324 ---- in declaring them. Concepts specific to distributed computation are described in the next section. + + illustrates the containers and the concepts involved in their + declarations. The containers are listed in the top row. Lines + connect these containers to the components necessary for their + declarations. For example, an &array; declaration requires an + engine and a layout. These, in turn, can depend on other &pooma; + concepts. Declarations necessary only for distributed, or + multiprocessor, computation are also indicated. Given a desired + container, one can use this figure to determine the concepts needed + to declare a particular container. +
Concepts For Declaring Containers *************** *** 311,328 ****
- - illustrates the containers and the concepts involved in their - declarations. The containers are listed in the top row. Lines - connect these containers to the components necessary for their - declarations. For example, an &array; declaration requires an - &engine; and a layout. These, in turn, can depend on other &pooma; - concepts. Declarations necessary only for distributed, or - multiprocessor, computation are surrounded by dashed lines. These - dependences to indicate the concepts needed for a particular - container. - An engine stores and, if necessary, computes a container's values. A --- 331,336 ---- *************** *** 332,338 **** for all indices can use a constant engine, which need only store one value for the entire domain. A &compressiblebrick; &engine; reduces its space requirements to a constant whenever all its ! values are the same. The separation also permits taking views of containers without copying storage. --- 340,347 ---- for all indices can use a constant engine, which need only store one value for the entire domain. A &compressiblebrick; &engine; reduces its space requirements to a constant whenever all its ! values are the same. The separation between a container and its ! engine also permits taking views of containers without copying storage. *************** *** 350,360 **** A layout ! maps domain indices to the ! processors and computer memory used by a container's engines. See ! . ! A program computes a container's values using a processor and memory. The layout specifies the processors and memory to use for each particular index. A container's layout for a uniprocessor implementation consists of its domain, the processor, and its --- 359,369 ---- A layout ! maps domain indices to the processors and ! computer memory used by a container's engines. See . ! A program computes a container's values using these processors and memory. The layout specifies the processors and memory to use for each particular index. A container's layout for a uniprocessor implementation consists of its domain, the processor, and its *************** *** 378,388 **** interval [0,n). A domain need not contain all integral points between its endpoints. A stride ! is a subset of an interval consisting of regularly-spaced points. ! A range is a subset of an interval of regularly-spaced points specified by ! strides. A &field;'s mesh --- 387,396 ---- interval [0,n). A domain need not contain all integral points between its endpoints. A stride ! indicates a regular spacing between points. A range is a subset of an interval of regularly-spaced points specified by ! a stride. A &field;'s mesh *************** *** 399,422 **** linkend="glossary-point">point in &space; corresponding to the cell in the lower, left corner of its domain. Combining this, the ! domain, and the cell size fully specifies the mesh's map from ! indices to &space;. A mesh's cell ! size specifies the spatial dimensions of ! a &field; cell, e.g., its ! width, height, and depth, in &space;. Combining this, the ! domain, and the corner position fully specifies the mesh's map ! from indices to &space;.
Declaring Distributed Containers ! In the previous section, we introduced the concepts important ! when declaring containers for use on uniprocessor computers. When using multiprocessor computers, we augment these concepts with those for distributed computation. Reading this section is important only for running a program on multiple processors. Many --- 407,430 ---- linkend="glossary-point">point in &space; corresponding to the cell in the lower, left corner of its domain. Combining this, the ! domain, and the cell size can specify the mesh's map from indices ! to &space;. A mesh's cell ! size specifies the spatial dimensions of a ! &field; cell, e.g., its width, ! height, and depth, in &space;. Combining this, the domain, ! and the corner position can specify the mesh's map from indices to ! &space;.
Declaring Distributed Containers ! In the previous section, we introduced the important concepts ! for declaring containers for use on uniprocessor computers. When using multiprocessor computers, we augment these concepts with those for distributed computation. Reading this section is important only for running a program on multiple processors. Many *************** *** 457,463 **** linkend="glossary-external_guard_layer">external guard layer specifies values surrounding the entire domain. Its presence eases computation along the domain's ! edges by permitting the same computations as for more internal computations. An internal guard layer duplicates values from adjacent --- 465,471 ---- linkend="glossary-external_guard_layer">external guard layer specifies values surrounding the entire domain. Its presence eases computation along the domain's ! edges by permitting the same computations as for more-internal computations. An internal guard layer duplicates values from adjacent *************** *** 488,503 **** &pooma; computations can be expressed using a variety of modes. Many &pooma; computations involve &array; or &field; ! containers, but how their values are accessed and the associated ! algorithms using them varies. For example, element-wise computation involves explicitly accessing a container's values. A data-parallel ! computation uses expressions to represent larger subsets of a ! container's values. Stencil-based computations express a ! computation as repeatedly applying a local computation to each ! element of an array. A relation among containers establishes a ! dependency among them so the values of one container are updated ! whenever any other's values change. A program may use any or all of ! these styles, which are described below. Element-wise --- 496,511 ---- &pooma; computations can be expressed using a variety of modes. Many &pooma; computations involve &array; or &field; ! containers, but how their values are accessed and how the associated ! algorithms use them varies. For example, element-wise computation involves explicitly accessing a container's values. A data-parallel ! computation operates on larger subsets of a container's values. ! Stencil-based computations express a computation as repeatedly ! applying a local computation to each element of an array. A ! relation among containers establishes a dependency among them so the ! values of one container are updated whenever any other's values ! change. A program may use any or all of these styles, which are ! described below. Element-wise *************** *** 515,524 **** linkend="tutorial-array_parallel-doof2d">, a(I,J) represents the subset of &array; a's values having coordinates in the domain ! specified by the one-dimensional &interval;s I ! and J. Using data-parallel expressions ! frequently eliminates the need for writing explicit loops in ! code. A stencil --- 523,532 ---- linkend="tutorial-array_parallel-doof2d">, a(I,J) represents the subset of &array; a's values having coordinates in the domain ! specified by the direct product of one-dimensional &interval;s ! I and J. Using ! data-parallel expressions frequently eliminates the need for writing ! explicit loops. A stencil *************** *** 550,557 ****
Computation Environment ! A &pooma; program can execute on a wide variety of computers. ! The default sequential computing environment consists of one processor and its associated memory, as found on a personal computer. In --- 558,565 ----
Computation Environment ! The same &pooma; program can execute on a wide variety of ! computers. The default sequential computing environment consists of one processor and its associated memory, as found on a personal computer. In *************** *** 574,580 **** library. ! The &pooma; executable must be run using the library. All of these were illustrated in ! The &pooma; executable must be run using the ! communications library. All of these were illustrated in
--- 620,628 ---- contexts, all of which is hidden from both the programmer and the user. &pooma; works with the Message Passing Interface (&mpi;) Communications Library ! ! and the &mm; Shared Memory Library. See for ! details.
Index: data-parallel.xml =================================================================== RCS file: /home/pooma/Repository/r2/docs/manual/data-parallel.xml,v retrieving revision 1.1 diff -c -p -r1.1 data-parallel.xml *** data-parallel.xml 2002/01/14 17:33:33 1.1 --- data-parallel.xml 2002/01/24 04:56:34 *************** *** 14,20 **** After introducing data-parallel expressions and statements, we present the corresponding &pooma; syntax. Then we present its ! implementation, which uses expression-template technology. A naive data-parallel implementation might generate temporary variables, cluttering a program's inner loops and slowing its execution. Instead, &pooma; uses &pete, the Portable Expression Template --- 14,20 ---- After introducing data-parallel expressions and statements, we present the corresponding &pooma; syntax. Then we present its ! implementation, which uses expression-template technology. A &naive; data-parallel implementation might generate temporary variables, cluttering a program's inner loops and slowing its execution. Instead, &pooma; uses &pete, the Portable Expression Template *************** *** 51,57 **** height h and to an entire field of particles with masses m and heights h. Our algorithm works with data-parallel syntax, and we would like to write the corresponding ! computer program using data-parallel syntax as well..
--- 51,57 ---- height h and to an entire field of particles with masses m and heights h. Our algorithm works with data-parallel syntax, and we would like to write the corresponding ! computer program using data-parallel syntax as well.
*************** std::cout << A-B << std::endl; *** 881,887 **** Data-parallel statements involving containers occur frequently in the inner loops of scientific programs so their ! efficient execution is important. A naive implementation for these statements may create and destroy containers holding intermediate values, slowing execution considerably. In 1995, Todd enumeration ! distinct &cc; integral type with named constants. These are frequently used in template programming because they can be used as template arguments. --- 314,327 ---- enumeration ! ! ! enumeration ! ! &cc; integral type with named constants. These are frequently used in template programming because they can be used as template arguments. *************** *** 324,330 **** external guard layer ! guard layer surrounding a container's domain used to ease computation along the domain's edges by permitting the same computations as for more internal computations. It is an optimization, not required --- 335,350 ---- external guard layer ! ! ! guard layer ! external ! ! ! external guard layer ! guard layer, external. ! ! guard layer surrounding a container's domain used to ease computation along the domain's edges by permitting the same computations as for more internal computations. It is an optimization, not required *************** *** 382,394 **** function template ! a definition of an unbounded set of related functions, all ! having the same name but whose parameter types can depend on ! template parameters. They are particularly useful when ! overloading operator ! functions to accept parameters that themselves depend ! on templates. --- 402,434 ---- function template ! ! ! function ! template ! ! a definition of an unbounded set of related functions, all having ! the same name but whose types can depend on template parameters. ! They are particularly useful when overloading ! ! overloaded function ! function, overloaded. ! ! ! function ! overloaded ! ! operator ! functions ! ! operator function ! function, operator. ! ! ! function ! operator ! ! to accept parameters that themselves depend on templates. *************** *** 399,417 **** guard layer ! domain surrounding each patch of a container's domain. It contains read-only values. External guard layers ease programming, while internal guard layers permit each patch's computation to be occur without copying values from adjacent patches. They are optimizations, not ! required for program correctness. external guard ! layer internal guard ! layer partition patch domain --- 439,460 ---- guard layer ! ! ! guard layer ! ! domain surrounding each patch of a container's domain. It contains read-only values. External guard layers ease programming, while internal guard layers permit each patch's computation to be occur without copying values from adjacent patches. They are optimizations, not ! required for program correctness. ! external guard layer ! internal ! guard layer ! partition patch domain *************** *** 448,454 **** internal guard layer ! guard layer containing copies of adjacent patches' values. These copies can permit an individual patch's computation to occur without asking adjacent patches for values. This can speed computation but are --- 491,506 ---- internal guard layer ! ! ! guard layer ! internal ! ! ! internal guard layer ! guard layer, internal. ! ! guard layer containing copies of adjacent patches' values. These copies can permit an individual patch's computation to occur without asking adjacent patches for values. This can speed computation but are *************** *** 498,503 **** --- 550,560 ---- M + + matrix + + + mesh *************** *** 516,524 **** operator function ! function defining an operator's code. For example, ! operator+ defines the result of using the ! +. --- 573,586 ---- operator function ! ! ! function ! operator ! ! function defining a function invoked using a &cc; operator. For ! example, the operator+ function defines the ! result of using the +. *************** *** 545,551 **** patch ! subset of a container's domain with values computed by a particular context. A partition splits a domain into patches. It may be surrounded by external and internal guard layers. partition --- 607,617 ---- patch ! ! ! patch ! ! subset of a container's domain with values computed by a particular context. A partition splits a domain into patches. It may be surrounded by external and internal guard layers. partition *************** *** 568,574 **** programming time ! time in the process from writing a program to executing it when the program is being written by a programmer. compile time run time --- 634,644 ---- programming time ! ! ! programming time ! ! in the process from writing a program to executing it, the time when the program is being written by a programmer. compile time run time *************** *** 613,619 **** run time ! time in the process from writing a program to executing it when the program is executed. This is also called execution time. compile time --- 683,697 ---- run time ! ! ! run time ! ! ! execution time ! run time. ! ! in the process from writing a program to executing it, the time when the program is executed. This is also called execution time. compile time *************** *** 654,665 **** stride ! a subset of regularly-spaced points in an integral ! interval. For example, the set of points a, a+2, a+4, …, ! b-2, b is specified by [a,b] with stride 2. It is a ! domain. range ! interval domain --- 732,743 ---- stride ! spacing between regularly-spaced points in a domain. For ! example, the set of points a, a+2, a+4, …, b-2, b is ! specified by [a,b] with stride 2. It is a domain. range ! interval domain *************** *** 681,690 **** T template instantiation ! applying a template class to template parameters to create a type. For example, foo<double,3> instantiates template <typename T, int n> class foo with the type &double; and the constant --- 759,788 ---- T + + template + + + + template + + class or function definition having template parameters. + These parameters' values are used at compile time, not run time, + so they may include types and other compile-time values. + + template instantiation + template specialization + + + template instantiation ! ! ! template instantiation ! ! applying a template class to template parameter arguments to create a type. For example, foo<double,3> instantiates template <typename T, int n> class foo with the type &double; and the constant *************** *** 696,702 **** template specialization ! class or function definition for a particular (special) subset of template arguments. --- 794,804 ---- template specialization ! ! ! template specialization ! ! class or function definition for a particular (special) subset of template arguments. *************** *** 724,730 **** trait ! a characteristic of a type. traits class --- 826,836 ---- trait ! ! ! trait ! ! a characteristic of a type. traits class *************** *** 732,739 **** traits class ! a class containing one or more traits all describing a ! particular type's chacteristics. trait --- 838,849 ---- traits class ! ! ! traits class ! ! a class containing one or more traits all describing a particular ! type's chacteristics. trait *************** *** 771,783 **** view of a container ! a container derived from another. The former's domain is a subset of the latter's, but, where the domains intersect, accessing a value through the view is the same as accessing it through the original container. In Fortran 90, these are called array sections. Only &array;s, &dynamicarray;s, and ! &field;s support views. container --- 881,893 ---- view of a container ! a container derived from another. The view's domain is a subset of the latter's, but, where the domains intersect, accessing a value through the view is the same as accessing it through the original container. In Fortran 90, these are called array sections. Only &array;s, &dynamicarray;s, and ! &field;s support views. ! container Index: introduction.xml =================================================================== RCS file: /home/pooma/Repository/r2/docs/manual/introduction.xml,v retrieving revision 1.3 diff -c -p -r1.3 introduction.xml *** introduction.xml 2002/01/14 17:33:34 1.3 --- introduction.xml 2002/01/24 04:56:35 *************** *** 20,32 **** automatic creation of all interprocessor communication for ! parallel and distributed programs - several container storage classes to reduce a program's - storage requirements, and - - automatic out-of-order execution and loop rearrangement for fast program execution. --- 20,28 ---- automatic creation of all interprocessor communication for ! parallel and distributed programs, and automatic out-of-order execution and loop rearrangement for fast program execution. *************** *** 44,50 ****
&pooma; Goals ! The goals for the &poomatoolkit; have remained unchanged since its conception in 1994: --- 40,50 ----
&pooma; Goals ! ! &pooma; ! goals ! ! The goals for the &poomatoolkit; have remained unchanged since its conception in 1994: *************** *** 74,89 **** Code Portability for Sequential and Distributed Programs ! The same &pooma; programs run on sequential, distributed, and parallel computers. No change in source code is required. Two or ! three lines specifying how each container's domain should be distributed among available processors. Using these directives and run-time information about the computer's configuration, the &toolkit; automatically distributes pieces of the container domains, called patches, among the available processors. If a computation needs values from ! another patch, &pooma; automatically passes the value to the patch where it is needed. The same program, and even the same executable, works regardless of the number of the available processors and the size of the containers' domains. A programmer interested in only --- 74,92 ---- Code Portability for Sequential and Distributed Programs ! ! code portability ! ! The same &pooma; programs run on sequential, distributed, and parallel computers. No change in source code is required. Two or ! three lines specify how each container's domain should be distributed among available processors. Using these directives and run-time information about the computer's configuration, the &toolkit; automatically distributes pieces of the container domains, called patches, among the available processors. If a computation needs values from ! another patch, &pooma; automatically passes the values to the patch where it is needed. The same program, and even the same executable, works regardless of the number of the available processors and the size of the containers' domains. A programmer interested in only *************** *** 92,116 **** Rapid Application Development ! The &poomatoolkit; is designed to enable rapid development of scientific and distributed applications. For example, its vector, matrix, and tensor classes model the corresponding mathematical concepts. Its &array; and &field; classes model the discrete spaces ! and mathematical arrays frequently found in computational science and ! math. See . ! The left column indicates theoretical science and math concepts, the ! middle column computational science and math concepts, and the right ! column computer science implementations. For example, theoretical ! physics frequently uses continuous fields in three-dimension space, ! while algorithms for a corresponding computational physics problem ! usually uses discrete fields. &pooma; containers, classes, and ! functions ease engineering computer programs for these algorithms. ! For example, the &pooma; &field; container models discrete fields; ! both map locations in discrete space to values and permit ! computations of spatial distances and values. The &pooma; &array; ! container models the mathematical concept of an array, used in ! numerical analysis.
How &pooma; Fits Into the Scientific Process --- 95,125 ---- Rapid Application Development ! ! rapid development ! ! The &poomatoolkit; is designed to enable rapid development of scientific and distributed applications. For example, its vector, matrix, and tensor classes model the corresponding mathematical concepts. Its &array; and &field; classes model the discrete spaces ! and mathematical arrays frequently found in computational science ! and math. See . The left column ! indicates theoretical science and math concepts, the middle column ! computational science and math concepts, and the right column ! computer science implementations. For example, theoretical physics ! frequently uses continuous fields in three-dimension space, while ! algorithms for a corresponding computational physics problem usually ! uses discrete fields. &pooma; containers, classes, and functions ! ease engineering computer programs for these algorithms. For ! example, the &pooma; &field; container models discrete fields: both ! map locations in discrete space to values and permit computations of ! spatial distances and values. The &pooma; &array; container models ! the mathematical concept of an array, frequently used in numerical ! analysis. + +
How &pooma; Fits Into the Scientific Process *************** *** 121,129 **** &pooma; helps translate algorithms into programs.
--- 130,138 ---- &pooma; helps translate algorithms into programs. *************** *** 131,191 **** &pooma; containers support a variety of computation modes, easing translation of algorithms into code. For example, many algorithms for solving partial differential equations use ! stencil-based computations. &pooma; supports stencil-based ! computations on &array;s and &field;s. It also supports ! data-parallel computation similar to &fortran 90 syntax. For ! computations where one &field;'s values is a function of several ! other &field;'s values, the programmer can specify a relation. ! Relations are lazily evaluated: whenever the dependent &field;'s ! values are needed and it is dependent on a &field; whose values have ! changed, its values are computed. Lazy evaluation also assists ! correctness by eliminating the frequently forgotten need for a ! programmer to ensure a &field;'s values are up-to-date before being ! used.Efficient Code&pooma; incorporates a variety of techniques to ensure it ! produces code that executes as quickly as special-case, ! hand-written code. ! ! These techniques include extensive use of templates, out-of-order ! evaluation, use of guard layers, and production of fast inner loops. ! ! &pooma;'s uses of &cc; templates permits the expressiveness ! from using pointers and function arguments but ensures as much as ! work as possible occurs at compile time, not run time. This speeds ! programs' execution. Since more code is produced at compile time, ! more code is available to the compiler's optimizer, further speeding ! execution. The &pooma; &array; container benefits from the use of ! template parameters. Their use permits the use of specialized data ! storage classes called engines. An ! &array;'s &engine; template parameter specifies how data is stored and ! indexed. Some &array;s expect almost all values to be used, while ! others might be mostly vacant. In the latter case, using a specialized engine storing the few nonzero values greatly reduces ! space requirements. Using engines also permits fast creation of ! container views, known as array sections in ! Fortran 90. A view's engine is the same as the original ! container's engine, but the view object maps its restricted domain to ! the original domain. Space requirements and execution time to use ! views are minimal. Using templates also permits containers to ! support polymorphic indexing, e.g., indexing both by integers and by ! three-dimensional coordinates. A container defers indexing ! operations to its engine's templatized index operator. Since it uses ! templates, the &engine; can define indexing functions with different ! function arguments, without the need to add corresponding container ! functions. Some of these benefits of using templates can be ! expressed without them, but doing so increases execution time. For ! example, a container could have a pointer to an engine object, but ! this requires a pointer dereference for each operation. Implementing ! polymorphic indexing without templates would require adding virtual ! functions corresponding to each of the indexing functions. ! To ensure multiprocessor &pooma; programs execute quickly, it is important that interprocessor communication overlaps with intraprocessor computations as much as possible and that communication is minimized. Asynchronous communication, out-of-order --- 140,235 ---- &pooma; containers support a variety of computation modes, easing translation of algorithms into code. For example, many algorithms for solving partial differential equations use ! stencil-based computations so &pooma; supports stencil-based ! computations on &array;s and &field;s. &pooma; also supports ! data-parallel computation similar to &fortran 90 syntax. ! ! relation ! ! To ease implementing computations where one &field;'s values are a ! function of several other &field;'s values, the programmer can ! specify a relation. Relations are ! lazily evaluated: whenever the dependent &field;'s values are needed ! and they are dependent on a &field; whose values have changed, the ! values are computed. Relations also assists correctness by ! eliminating the frequently forgotten need for a programmer to ensure ! a &field;'s values are up-to-date before being used. Efficient Code &pooma; incorporates a variety of techniques to ensure it ! produces code that executes as quickly as special-case, hand-written ! code. These techniques include extensive use of templates, ! out-of-order evaluation, use of guard layers, and production of fast ! inner loops. ! ! ! templates ! use ! ! &pooma;'s uses of &cc; templates ensures as much as work as possible ! occurs at compile time, not run time. This speeds programs' ! execution. Since more code is produced at compile time, more code ! is available to the compiler's optimizer, further speeding ! execution. ! ! engines ! ! The &pooma; &array; container benefits from the use of template ! parameters. Their use permits the use of specialized data storage ! classes called engines. An ! &array;'s &engine; template parameter specifies how data is stored ! and indexed. Some &array;s expect almost all values to be used, ! while others might be mostly empty. In the latter case, using a specialized engine storing the few nonzero values greatly reduces ! storage requirements. Using engines also permits fast creation of ! container views, known as array ! sectionsarray sections ! in &fortran; 90. A view's engine is the same as the original ! container's engine, but the view object's restricted domain is a ! subset of the original domain. Space requirements and execution ! time to use views are minimal. ! + + + polymorphic indexing + + Using templates also permits containers to support polymorphic + indexing, e.g., indexing both by integers and by three-dimensional + coordinates. A container uses templatized indexing functions that + defer indexing operations to its engine's index operators. Since + the container uses templates, the &engine; can define indexing + functions with different function arguments, without the need to add + corresponding container functions. Some of these benefits of using + templates can be expressed without them, but doing so increases + execution time. For example, a container could have a pointer to an + engine object, but this requires a pointer dereference for each + operation. Implementing polymorphic indexing without templates + would require adding virtual functions corresponding to each of the + indexing functions. + + + ! ! ! asynchronous communication ! ! ! &cheetah; ! ! To ensure multiprocessor &pooma; programs execute quickly, it is important that interprocessor communication overlaps with intraprocessor computations as much as possible and that communication is minimized. Asynchronous communication, out-of-order *************** *** 199,241 **** sender to put and get data without synchronizing with the recipient processor, and it also permits invoking functions at remote sites to ensure desired data is up-to-date. Thus, out-of-order evaluation must be supported. Out-of-order evaluation also has another benefit: Only computations directly or indirectly related to values that are ! printed need occur. ! Surrounding a patch with guard layers can help reduce interprocessor communication. For distributed computation, each container's domain is split into pieces distributed among the available processors. Frequently, computing a container value is local, involving just the ! value itself and a few neighbors, but computing a value near the edge ! of a processor's domain may require knowing a few values from a neighboring domain. Guard layers permit these values to be copied locally so they need not be repeatedly communicated. ! &pooma; uses &pete; technology to ensure inner loops involving &pooma;'s object-oriented containers run as quickly as hand-coded ! ! loops. &pete; (the Portable Expression Template Engine) uses ! expression-template technology to convert data-parallel statements ! in the inner loops of programs into efficient loops ! without any intermediate computations. For example, consider ! evaluating the statement ! ! A += -B + 2 * C; ! where A and C are vector<double>s and B is a ! vector<int>. Naive evaluation might introduce intermediaries for -B, 2*C, and their sum. The presence of these ! intermediaries in inner loops can measurably slow evaluation. To produce a loop without intermediaries, &pete; stores each expression ! as a parse tree. The resulting parse trees can be combined into a ! larger parse tree. Using its templates, the parse tree is converted, ! at compile time, to a loop evaluating each component of the result. ! Thus, no intermediate values are computed or stored. For example, ! the code corresponding to the statement above is vector<double>::iterator iterA = A.begin(); vector<int>::const_iterator iterB = B.begin(); --- 243,308 ---- sender to put and get data without synchronizing with the recipient processor, and it also permits invoking functions at remote sites to ensure desired data is up-to-date. Thus, out-of-order evaluation + + out-of-order evaluation + + + evaluation + out-of-order + out-of-order evaluation. + + must be supported. Out-of-order evaluation also has another benefit: Only computations directly or indirectly related to values that are ! printed need occur. ! ! ! ! guard layer ! ! Surrounding a patch with guard layers can help reduce interprocessor communication. For distributed computation, each container's domain is split into pieces distributed among the available processors. Frequently, computing a container value is local, involving just the ! value itself and a few neighbors, but computing a value near the ! edge of a processor's domain may require knowing a few values from a neighboring domain. Guard layers permit these values to be copied locally so they need not be repeatedly communicated. ! ! ! &pete; ! ! ! Portable Expression Template Engine ! &pete;. ! ! ! inner-loop evaluation ! ! &pooma; uses &pete; technology to ensure inner loops involving &pooma;'s object-oriented containers run as quickly as hand-coded ! loops. &pete; (the Portable Expression Template ! Engine) uses expression-template technology to convert data-parallel ! statements into efficient loops without any intermediate ! computations. For example, consider evaluating the statement ! ! A += -B + 2 * C; ! where A and C are vector<double>s and B is a ! vector<int>. &naivecap; evaluation might introduce intermediaries for -B, 2*C, and their sum. The presence of these ! intermediaries in inner loops can measurably slow performance. To produce a loop without intermediaries, &pete; stores each expression ! as a parse tree. Using its templates, the parse tree is ! converted, at compile time, to a loop directly evaluating each component of ! the result without computing intermediate values. ! For example, the code corresponding to the statement above is vector<double>::iterator iterA = A.begin(); vector<int>::const_iterator iterB = B.begin(); *************** *** 244,267 **** *iterA += -*iterB + 2 * *iterC; ++iterA; ++iterB; ++iterC; } ! Furthermore, since the code is available at compile, not run, time, it can be further optimized, e.g., moving any loop-invariant code out of the loop. Used for Diverse Set of Scientific Problems &pooma; has been used to solve a wide variety of scientific problems. Most recently, physicists at Los Alamos National ! Laboratory implemented an entire library of hydrodynamics codes as part of the U.S. government's science-based Stockpile Stewardship ! Program to simulate nuclear weapons. Other applications include a matrix solver, an accelerator code simulating the dynamics of high-intensity charged particle beams in linear accelerators, and a ! Monte Carlo neutron transport code. Easy Implementation ! &pooma;'s tools greatly reduce the time to implement applications. As we noted above, &pooma;'s containers and expression syntax model the computational models and algorithms most frequently found in scientific programs. These high-level tools are known to be --- 311,357 ---- *iterA += -*iterB + 2 * *iterC; ++iterA; ++iterB; ++iterC; } ! Furthermore, since the code is available at compile time, not run time, it can be further optimized, e.g., moving any loop-invariant code out of the loop. + Used for Diverse Set of Scientific Problems &pooma; has been used to solve a wide variety of scientific problems. Most recently, physicists at Los Alamos National ! Laboratory ! ! Los Alamos National Laboratory ! ! implemented an entire library of hydrodynamics codes ! ! hydrodynamics ! ! as part of the U.S. government's science-based Stockpile Stewardship ! Program ! ! Stockpile Stewardship Program ! ! to simulate nuclear weapons. Other applications include a matrix solver, an accelerator code simulating the dynamics of high-intensity charged particle beams in linear accelerators, and a ! Monte Carlo ! ! Monte Carlo simulation ! ! neutron transport code. + Easy Implementation ! ! ! &pooma; ! ease of writing programs ! ! &pooma;'s tools greatly reduce the time to implement applications. As we noted above, &pooma;'s containers and expression syntax model the computational models and algorithms most frequently found in scientific programs. These high-level tools are known to be *************** *** 271,280 **** computers. With no additional work, the same program runs on computers with hundreds of processors; the code is exactly the same, and the &toolkit; automatically handles distribution of the data, all ! data communication, and all synchronization. The net results is a significant reduction in programming time. For example, a team of two physicists and two support people at Los Alamos National ! Laboratory implemented a suite of hydrodynamics kernels in six months. Their work replaced a previous suite of less-powerful kernels which had taken sixteen people several years to implement and debug. Despite not have previously implemented any of the kernels, --- 361,378 ---- computers. With no additional work, the same program runs on computers with hundreds of processors; the code is exactly the same, and the &toolkit; automatically handles distribution of the data, all ! data communication, and all synchronization. The net result is a significant reduction in programming time. For example, a team of two physicists and two support people at Los Alamos National ! Laboratory ! ! Los Alamos National Laboratory ! ! implemented a suite of hydrodynamics kernels ! ! hydrodynamics ! ! in six months. Their work replaced a previous suite of less-powerful kernels which had taken sixteen people several years to implement and debug. Despite not have previously implemented any of the kernels, *************** *** 283,352 ****
&pooma; Produces Fast Programs almost as fast as &c;. wide variety of configurations: one processor, many processors, give performance data for at least two ! different programs ! HERE ! describe &doof2d; here &doof2d; is a two-dimensional diffusion simulation program. Initially, all values in the square two-dimensional grid are zero ! except for the central value. ! ! HERE
-
&pooma; is Free, Open-Source Software The &poomatoolkit; is open-source software. Anyone may download, read, redistribute, and modify the &pooma; source code. ! If an application requires a specialized container, any programmer ! may add it. Any programmer can extend it to solve problems in ! previously unsupported domains. Companies using the &toolkit; can ! read the source code to ensure it has no hidden back doors or ! security holes. It may be downloaded for free and used for ! perpetuity. There are no annual licenses and no on-going costs. By ! keeping their own copies, companies are guaranteed the software will ! never disappear. In summary, the &poomatoolkit; is free, low-risk ! software.
History of &pooma; The &poomatoolkit; was developed at Los Alamos National ! Laboratory to assist nuclear fusion and fission research. In 1994, the &toolkit; grew out of the Object-Oriented Particle Simulation ! class library developed for particle-in-cell simulations. The goals of the Framework, as it was called at the time, were driven by the ! Numerical Tokamak's Parallel Platform Paradox:
The average time required to implement a moderate-sized application on a parallel computer architecture is equivalent to the half-life of the latest parallel supercomputer.
The framework's goal of being able to quickly write efficient scientific code that could be run on a wide variety of platforms remains unchanged today. Development, mainly at the ! Advanced Computing Laboratory at Los Alamos, proceeded rapidly. ! A matrix solver application was written using the framework. ! Support for hydrodynamics, Monte Carlo simulations, and molecular ! dynamics modeling soon followed.
! ! By 1998, &pooma; was part of the U.S. Department of ! Energy's Accelerated Strategic Computing Initiative ! (ASCI). The Comprehensive Test Ban Treaty forbid nuclear weapons testing so they were instead simulated using computers. ASCI's goal was to radically advance the state of the art in high-performance computing and numerical --- 381,516 ----
+ &pooma; Produces Fast Programs almost as fast as &c;. wide variety of configurations: one processor, many processors, give performance data for at least two ! different programs UNFINISHED ! describe &doof2d; at this location &doof2d; is a two-dimensional diffusion simulation program. Initially, all values in the square two-dimensional grid are zero ! except for the central value. UNFINISHED + ]]>
&pooma; is Free, Open-Source Software + + open-source software + + + &pooma; + open-source + + The &poomatoolkit; is open-source software. Anyone may download, read, redistribute, and modify the &pooma; source code. ! If an application requires a specialized container not already ! available, any programmer may add it. Any programmer can extend it ! to solve problems in previously unsupported domains. Companies ! using the &toolkit; can read the source code to ensure it has no ! security holes. It may be downloaded for free ! and used for perpetuity. There are no annual licenses and no ! on-going costs. By keeping their own copies, companies are ! guaranteed the software will never disappear. In summary, the ! &poomatoolkit; is free, low-risk software.
History of &pooma; + + &pooma; + history + + + Los Alamos National Laboratory + + The &poomatoolkit; was developed at Los Alamos National ! Laboratory to assist nuclear fusion ! ! fusion ! ! and fission ! ! fission ! ! research. In 1994, the &toolkit; grew out of the Object-Oriented Particle Simulation ! ! Object-Oriented Particle Simulation Library ! ! Class Library developed for particle-in-cell simulations. The goals of the Framework, as it was called at the time, were driven by the ! Numerical Tokamak's ! ! Tokamak ! ! Parallel Platform Paradox:
The average time required to implement a moderate-sized application on a parallel computer architecture is equivalent to the half-life of the latest parallel supercomputer.
+ + Parallel Platform Paradox + The framework's goal of being able to quickly write efficient scientific code that could be run on a wide variety of platforms remains unchanged today. Development, mainly at the ! Advanced Computing Laboratory ! ! Los Alamos National Laboratory ! Advanced Computing Laboratory ! ! at Los Alamos, proceeded rapidly. A matrix solver application was ! written using the framework. ! Support for hydrodynamics, ! ! hydrodynamics ! ! Monte Carlo simulations, ! ! Monte Carlo simulation ! ! and molecular dynamics ! ! molecular dynamics modeling ! ! modeling soon followed.
! ! ! By 1998, &pooma; was part of the U.S. Department of ! Energy's ! ! Department of Energy ! ! Accelerated Strategic Computing Initiative ! (ASCI). ! ! Department of Energy ! Accelerated Strategic Computing Initiative ! ! ! Accelerated Strategic Computing Initiative ! Department of Energy, Accelerated Strategic Computing Initiative. ! ! The Comprehensive Test Ban Treaty ! ! Comprehensive Test Ban Treaty ! ! forbid nuclear weapons testing so they were instead simulated using computers. ASCI's goal was to radically advance the state of the art in high-performance computing and numerical *************** HERE *** 361,388 **** &pooma; 2 involved a new conceptual framework and a complete rewriting of the source code to improve performance. The ! &array; class was introduced with its use of &engine;s, separating ! container use from container storage. An asynchronous scheduler ! permitted out-of-order execution to improve cache coherency. Incorporating the Portable Expression Template Engine (PETE) ! permitted faster loop execution. Soon, container views and ! ConstantFunction and IndexFunction ! &engine;s were added. Release 2.1.0 included &field;s with ! their spatial extent and &dynamicarray;s with the ability to ! dynamically change its domain size. Support for particles and their interaction with &field;s were added. The &pooma; messaging implementation was revised in release 2.3.0. Use of the ! &cheetah; Library separated &pooma; from the actual messaging library used, and support for applications running on clusters of computers was added. CodeSourcery, LLC, and ! Proximation, LLC, took over &pooma; development from Los Alamos National Laboratory. During the past two years, the &field; ! abstraction and implementation was improved to increase its flexibility, add support for multiple values and materials in the ! same cell, and permit lazy evaluation. Simultaneously, the execution speed of the inner loops was greatly increased.
--- 525,609 ---- &pooma; 2 involved a new conceptual framework and a complete rewriting of the source code to improve performance. The ! &array; class ! ! &array; ! ! was introduced with its use of &engine;s, ! ! &engine; ! ! separating ! container use from container storage. A new asynchronous scheduler ! permitted out-of-order execution ! ! out-of-order evaluation ! ! to improve cache coherency. Incorporating the Portable Expression Template Engine (PETE) ! ! &pete; ! ! permitted faster loop execution. Soon, container views ! ! container ! view ! ! and ! ConstantFunction ! ! &engine; ! ConstantFunction ! ! and IndexFunction ! ! &engine; ! IndexFunction ! ! &engine;s were added. Release 2.1.0 included &field;s ! ! &field; ! ! with ! their spatial extent and &dynamicarray;s ! ! &dynamicarray; ! ! with the ability to ! dynamically change domain size. Support for particles and their interaction with &field;s were added. The &pooma; messaging implementation was revised in release 2.3.0. Use of the ! &cheetah; Library ! ! &cheetah; ! ! separated &pooma; from the actual messaging library used, and support for applications running on clusters of computers was added. CodeSourcery, LLC, ! ! CodeSourcery, LLC ! ! and ! Proximation, LLC, ! ! Proximation, LLC ! ! took over &pooma; development from Los Alamos National Laboratory. During the past two years, the &field; ! abstraction ! ! &field; ! ! and implementation was improved to increase its flexibility, add support for multiple values and materials in the ! same cell, and permit lazy evaluation. ! ! lazy evaluation ! ! Simultaneously, the execution speed of the inner loops was greatly increased. Index: manual.xml =================================================================== RCS file: /home/pooma/Repository/r2/docs/manual/manual.xml,v retrieving revision 1.8 diff -c -p -r1.8 manual.xml *** manual.xml 2002/01/22 15:48:49 1.8 --- manual.xml 2002/01/24 04:56:38 *************** *** 1,4 **** --- 1,6 ---- + + + + + + + + *************** *** 158,163 **** --- 166,173 ---- + + d"> *************** *** 173,194 **** ! http://pooma.codesourcery.com/pooma/download'> ! http://www.pooma.com/'> ! ! ! --- 183,210 ---- ! http://pooma.codesourcery.com/pooma/download'> ! http://www.pooma.com/'> ! ! ! ! + + + + + *************** *** 260,277 **** CodeSourcery, LLC ! 2001CodeSourcery, LLC () ! Los Alamos National Laboratory All rights reserved. This document may not be redistributed in any form without the express permission of the author. ! 0.01 ! 2002 Jan 14 jdo ! first draft --- 276,293 ---- CodeSourcery, LLC ! 2002CodeSourcery, LLC () ! Los Alamos National Laboratory All rights reserved. This document may not be redistributed in any form without the express permission of the author. ! 1.00 ! 2002 Jan 23 jdo ! First publication. *************** *** 279,284 **** --- 295,301 ---- + Preface *************** *** 338,349 **** --- 355,369 ---- + ]]> + Programming with &pooma; + ]]> &introductory-chapter; *************** *** 420,429 **** components of each vector in an &array; to form its own &array;. Since each container has one or more &engine;s, we can also describe the latter category as containers that compute their ! values using other containers' values. A MultiPatch ! &engine; distributes its domain among various processors and ! memory spaces, each responsible for computing values associated ! with a portion, or patch, of the domain.
Just as multiple containers can use the same engine, multiple &engine;s can use the same underlying data. As we --- 440,449 ---- components of each vector in an &array; to form its own &array;. Since each container has one or more &engine;s, we can also describe the latter category as containers that compute their ! values using other containers' values. A &multipatch; &engine; ! distributes its domain among various processors and memory spaces, ! each responsible for computing values associated with a portion, ! or patch, of the domain. Just as multiple containers can use the same engine, multiple &engine;s can use the same underlying data. As we *************** *** 491,497 **** &dynamic; is a one-dimensional &brick; with dynamically ! resizable domain. HERE ever explicitly declare these? &engine;s That Compute --- 511,518 ---- &dynamic; is a one-dimensional &brick; with dynamically ! resizable domain. This should be used with &dynamicarray;, ! not &array;. &engine;s That Compute *************** *** 620,626 **** operator() take Loc<1> or one ∫ parameter. In addition, the one-dimensional domain can be dynamically resized using create ! and destroy; see . HERE Dynamic. How does one change the domain size? What is the model? --- 641,647 ---- operator() take Loc<1> or one ∫ parameter. In addition, the one-dimensional domain can be dynamically resized using create ! and destroy; see . HERE Dynamic. How does one change the domain size? What is the model?
*************** HERE Dynamic. How does one change the do *** 696,708 **** --- 717,780 ---- Container Views + + container + view + + + view of a container + container, view. + + + A view of a + container &container; is a container + accessing a subset of &container;'s domain &containerdomain; + and values. The subset can include all of &containerdomain;. + A view is so named because it is a different way to + access, or view, another container's values. Both the container + and its view share the same underlying engine so changing values in + one also changes them in the other. + + A view is created by following a container's name by + parentheses containing a domain &containerdomain;. For + example, consider this code extracted from in . + + Interval<1> N(0, n-1); + Interval<2> vertDomain(N, N); + Interval<1> I(1,n-2); + Interval<1> J(1,n-2); + Array<2, double, Brick> a(vertDomain); + Array<2, double, Brick> b(vertDomain); + a(I,J) = (1.0/9.0) * + (b(I+1,J+1) + b(I+1,J ) + b(I+1,J-1) + + b(I ,J+1) + b(I ,J ) + b(I ,J-1) + + b(I-1,J+1) + b(I-1,J ) + b(I-1,J-1)); + The last statement creates ten views. For example, + + a(I,J) creates a view of + a using the smaller domain specified by + I and J. This omits the + outermost rows of columns of a. The views + of b illustrate the use of views in + data-parallel statements. b(I-1,J-1) has a + subset shifted up one row and left one column compared with + b(I,J). + ]]> + Be sure to list the various arithmetic operations on domains that can be used. This was deferred from the &array; and domain chapter. Explain &array;'s comp function. + ]]> + + Writing Sequential Programs *************** UNFINISHED *** 1086,1095 **** dependence computations, so the &author; recommends calling Pooma::blockAndEvaluate before each access to a particular value in an &array; or &field;. Omitting a necessary ! call may lead to a race condition. See for instructions how to diagnose and eliminate these race ! conditions.Where talk about various &pooma; streams? --- 1158,1171 ---- dependence computations, so the &author; recommends calling Pooma::blockAndEvaluate before each access to a particular value in an &array; or &field;. Omitting a necessary ! call may lead to a race condition. ! for instructions how to diagnose and eliminate these race ! conditions. ! ]]> ! Where talk about various &pooma; streams? *************** UNFINISHED *** 1193,1199 **** in the input domain: A(i1, i2, ..., iN).The &pooma; multidimensional Array concept is similar to ! the &fortran; 90 array facility, but extends it in several ways. Both &pooma; and &fortran; arrays can have up to seven dimensions, and can serve as containers for arbitrary types. Both support the notion of views of a portion of the --- 1269,1275 ---- in the input domain: A(i1, i2, ..., iN).The &pooma; multidimensional Array concept is similar to ! the &fortran; 90 array facility, but extends it in several ways. Both &pooma; and &fortran; arrays can have up to seven dimensions, and can serve as containers for arbitrary types. Both support the notion of views of a portion of the *************** UNFINISHED *** 1492,1498 **** &pooma; II's expression trees and expression engines. ! MultiPatch Engine From README: To actually use multiple contexts effectively, you need to use the MultiPatch engine with --- 1568,1574 ---- &pooma; II's expression trees and expression engines. ! &multipatch; Engine From README: To actually use multiple contexts effectively, you need to use the MultiPatch engine with *************** UNFINISHED *** 1508,1515 **** --- 1584,1593 ---- + ]]> + Writing Distributed Programs *************** UNFINISHED *** 1562,1569 **** --- 1640,1649 ---- + ]]> + Debugging and Profiling &pooma; Programs *************** UNFINISHED *** 1607,1615 **** --- 1687,1700 ---- region's size should reveal where calls are missing. + ]]> + + + ]]> + &pooma; Reference Manual *************** UNFINISHED *** 3489,3496 **** --- 3574,3583 ---- + ]]> + Future Development *************** UNFINISHED *** 3610,3615 **** --- 3697,3703 ---- + ]]> *************** UNFINISHED *** 3644,3650 **** Download the library from the &pooma; Download page ! available off the &pooma; home page (&poomaHomePage;). Extract the source code using tar xzvf --- 3732,3738 ---- Download the library from the &pooma; Download page ! available off the &pooma; home page (&poomahomepage;). Extract the source code using tar xzvf *************** UNFINISHED *** 3715,3721 **** Download the library from the &pooma; Download page ! available off the &pooma; home page (&poomaHomePage;). Extract the source code using tar xzvf --- 3803,3809 ---- Download the library from the &pooma; Download page ! available off the &pooma; home page (&poomahomepage;). Extract the source code using tar xzvf *************** UNFINISHED *** 3863,3868 **** --- 3951,3957 ---- + Dealing with Compilation Errors *************** UNFINISHED *** 4039,4044 **** --- 4128,4134 ---- + ]]> &bibliography-chapter; Index: template.xml =================================================================== RCS file: /home/pooma/Repository/r2/docs/manual/template.xml,v retrieving revision 1.1 diff -c -p -r1.1 template.xml *** template.xml 2002/01/14 17:33:34 1.1 --- template.xml 2002/01/24 04:56:39 *************** *** 1,7 **** Programming with Templates ! &pooma; extensively uses &cc; templates to support type polymorphism without incurring any run-time cost. In this chapter, we briefly introduce using templates in &cc; programs by relating them to ordinary &cc; constructs such as values, --- 1,16 ---- Programming with Templates ! ! templates ! ! ! template programming ! templates ! ! ! &pooma; extensively uses &cc; templates to support type polymorphism without incurring any run-time cost. In this chapter, we briefly introduce using templates in &cc; programs by relating them to ordinary &cc; constructs such as values, *************** *** 9,69 **** templates will occur repeatedly: ! Template programming occurs at compile time, not run time. ! That is, template operations occur within the compiler, not when ! a program runs. ! Templates permit declaring families of classes using a ! single declaration. For example, the &array; template ! declaration permits using arrays with many different value types, e.g., arrays of integers, arrays of floating point numbers, and arrays of arrays. ! For those interested in the implementation of &pooma;, we close ! with a discussion of some template programming concepts used in the ! implementation but not likely to be used by &pooma; users.
! Templates Occur at Compile-Time &pooma; uses &cc; templates to support type polymorphism without incurring any run-time cost as a program executes. All template operations are performed at compile time by the compiler. ! Prior to the introduction of templates, almost all a program's interesting computation occurred when it was executed. When writing the program, the programmer, at programming ! time, would specify which statements and ! expressions would occur and which types to use. At compile time, the compiler would convert the program's source code into an executable program. Even though the compiler uses the types to produce the executable, no interesting computation would occur. At run ! time, the resulting executable program would actually perform the operations. The introduction of templates permits interesting computation to occur while the compiler produces the executable. ! Most interesting is template instantiation, which produces a type at compile time. For example, the &array; type definition requires template parameters Dim, T, and EngineTag, specifying ! its dimension, the type of its elements, and its &engine; type. To use this, a programmer specifies values for the template parameters: Array<2,double,Brick> ! specifies a dimension of 2, an element type of &double;, and the ! &brick; &engine; type. At compile time, the compiler creates a type ! definition by substituting the values for the template parameters ! in the template definition. The substitution is analogous to the ! run-time application of a function to specific values. All computation not involving run-time input or output can occur at program time, compile time, or run time, whichever is --- 18,107 ---- templates will occur repeatedly: ! Template programming constructs execute at compile time, ! not run time. That is, template operations occur within the ! compiler, not when a program runs. ! Templates permit declaring families of classes using a ! single declaration. For example, the &array; ! ! &array; ! ! ! type polymorphism ! ! template ! declaration permits using &array;s with many different value types, e.g., arrays of integers, arrays of floating point numbers, and arrays of arrays. ! For those interested in the implementation of &pooma;, we close the ! section with a discussion of some template programming concepts ! used in the implementation but not likely to be used by &pooma; ! users.
! Templates Execute at Compile-Time ! ! ! compile time ! ! ! compiler ! &pooma; uses &cc; templates to support type polymorphism without incurring any run-time cost as a program executes. All template operations are performed at compile time by the compiler. ! Prior to the introduction of templates, almost all of a program's interesting computation occurred when it was executed. When writing the program, the programmer, at programming ! time, ! ! programming time ! ! would specify which statements and expressions will occur and ! which types to use. At compile time, the compiler would convert the program's source code into an executable program. Even though the compiler uses the types to produce the executable, no interesting computation would occur. At run ! time, ! ! run time ! ! the resulting executable program would actually perform the operations. The introduction of templates permits interesting computation to occur while the compiler produces the executable. ! Most interesting is template instantiation, ! ! template ! instantiation ! ! which produces a type at compile time. For example, the &array; type definition requires template parameters Dim, T, and EngineTag, specifying ! its dimension, the type of its values, and its &engine; type. To use this, a programmer specifies values for the template parameters: Array<2,double,Brick> ! specifies a dimension of 2, a value type of &double;, and the ! &brick; &engine; type. At compile time, the compiler creates a ! type definition by substituting the values for the template ! parameters in the templatized type definition. The substitution ! is analogous to the run-time application of a function to specific ! values. All computation not involving run-time input or output can occur at program time, compile time, or run time, whichever is *************** *** 71,83 **** computations by hand rather than writing code to compute it. &cc; templates are Turing-complete so they can compute anything computable. Unfortunately, syntax for compile-time computation is ! more difficult than for run-time computation, and also current compilers are not as efficient as code executed by hardware. ! Run-time &cc; constructs are Turing-complete so using templates is unnecessary. Thus, we can shift computation to the time which best trades off the ease of expressing syntax with the speed of computation by programmer, compiler, or computer chip. For ! example, &pooma; uses expression template technology to speed run-time execution of data-parallel statements. The &pooma; developers decided to shift some of the computation from run-time to compile-time using template computations. The resulting --- 109,129 ---- computations by hand rather than writing code to compute it. &cc; templates are Turing-complete so they can compute anything computable. Unfortunately, syntax for compile-time computation is ! more difficult than for run-time computation. Also current compilers are not as efficient as code executed by hardware. ! Run-time &cc; constructs are Turing-complete ! ! Turing complete ! ! so using templates is unnecessary. Thus, we can shift computation to the time which best trades off the ease of expressing syntax with the speed of computation by programmer, compiler, or computer chip. For ! example, &pooma; uses expression template technology ! ! expression templates ! ! to speed run-time execution of data-parallel statements. The &pooma; developers decided to shift some of the computation from run-time to compile-time using template computations. The resulting *************** *** 100,111 **** parameters, both of which are used in this book. ! template instantiation, i.e., specifying a particular ! type by specifying values for template parameters. ! nested type names, which are types specified within a ! class definition. We discuss each of these below. --- 146,170 ---- parameters, both of which are used in this book. ! template instantiation, ! ! template ! instantiation ! ! i.e., specifying a particular type by specifying values for ! template parameters. ! nested type names, ! ! nested type ! type, nested. ! ! ! type ! nested ! ! which are types specified within a class definition. We discuss each of these below. *************** *** 174,179 **** --- 233,242 ---- brackets (<>). For example, pair<int> instantiates + + template + instantiation + the pair template class definition with T equal to ∫. That is, the compiler creates a definition for pair<int> by copying *************** *** 184,193 **** The result is a definition exactly the same as pairOfInts. !
! In the translation from theoretical science and math to ! computational science and math to computer programs, &pooma; eases ! the implementation of algorithms as computer programs. ! In the translation from theoretical science to ! computational science to computer programs, &pooma; eases the ! implementation of algorithms as computer programs.
! Correspondences Between Run-Time and Compile-Time Programming Constructs --- 247,286 ---- The result is a definition exactly the same as pairOfInts. ! As we mentioned above, template instantiation ! ! template ! instantiation ! ! is analogous to function application. ! ! function ! application ! ! A template class is analogous to a ! function. The analogy between compile-time and run-time ! programming constructs can be extended. ! lists these correspondences. For example, at run time, values ! consist of things such as integers, floating point numbers, ! pointers, functions, and objects. Programs compute by operating ! on these values. The compile-time values ! ! compile time ! value ! ! include types, and ! compile-time operations use these types. For both run-time and ! compile-time programming, &cc; defines default sets of values that ! all conforming compilers must support. For example, ! 3 and 6.022e+23 are ! run-time values that any &cc; compiler must accept. It must also ! accept the ∫, &bool;, and int* types. ! !
! Correspondences Between Run-Time and Compile-Time Programming Constructs *************** *** 198,204 **** compile time ! values integers, strings, objects, functions, … --- 291,297 ---- compile time ! values integers, strings, objects, functions, … *************** *** 222,236 **** packaging repeated operations ! A function generalizes a particular operation applied to ! different values. The function parameters are placeholders ! for particular values. ! A template class generalizes a particular class ! definition using different types. The template parameters are ! placeholders for particular values. ! application Use a function by appending function arguments surrounded by parentheses. Use a template class by appending template arguments --- 315,342 ---- packaging repeated operations ! A function ! ! function ! ! generalizes a particular operation applied to different ! values. The function parameters are placeholders for ! particular values. ! A template class generalizes a particular class ! definition using different types. The template parameters ! are placeholders for particular values. ! application ! ! function ! application ! ! ! application ! function, application. ! ! Use a function by appending function arguments surrounded by parentheses. Use a template class by appending template arguments *************** *** 239,262 ****
- - As we mentioned above, template instantiation is analogous - to function application. A template class is analogous to a - function. The analogy between compile-time and run-time - programming constructs can be extended. - lists these correspondences. For example, at run time, values - consist of things such as integers, floating point numbers, - pointers, functions, and objects. Programs compute by operating - on these values. The compile-time values include types, and - compile-time operations use these types. For both run-time and - compile-time programming, &cc; defines default sets of values that - all conforming compilers must support. For example, - 3 and 6.022e+23 are - run-time values that any &cc; compiler must accept. It must also - accept the ∫, &bool;, and int* types. ! The set of supported run-time and compile-time values can be extended. Run-time values can be extended by creating new objects. Although not part of the default set of values, these objects are treated and operated on as values. To extend the set --- 345,359 ---- ! ! ! object ! ! ! class definition ! ! The set of supported run-time and compile-time values can be extended. Run-time values can be extended by creating new objects. Although not part of the default set of values, these objects are treated and operated on as values. To extend the set *************** *** 268,282 **** built-in types, these types can be used in the same way that any other types can be used, e.g., declaring variables. ! Functions generalize similar run-time operations, while template class generalize similar class definitions. A function definition generalizes a repeated run-time operation. For example, consider repeatedly printing the largest of two numbers: ! std::cout << (3 > 4 ? 3 : 4) << std::endl; ! std::cout << (4 > -13 ? 4 : -13) << std::endl; ! std::cout << (23 > 4 ? 23 : 4) << std::endl; ! std::cout << (0 > 3 ? 0 : 3) << std::endl; Each statement is exactly the same except for the repeated two values. Thus, we can generalize these statements writing a function: --- 365,383 ---- built-in types, these types can be used in the same way that any other types can be used, e.g., declaring variables. ! ! ! function ! ! Functions generalize similar run-time operations, while template class generalize similar class definitions. A function definition generalizes a repeated run-time operation. For example, consider repeatedly printing the largest of two numbers: ! std::cout &openopen; (3 > 4 ? 3 : 4) &openopen; std::endl; ! std::cout &openopen; (4 > -13 ? 4 : -13) &openopen; std::endl; ! std::cout &openopen; (23 > 4 ? 23 : 4) &openopen; std::endl; ! std::cout &openopen; (0 > 3 ? 0 : 3) &openopen; std::endl; Each statement is exactly the same except for the repeated two values. Thus, we can generalize these statements writing a function: *************** void maxOut(int a, int b) *** 285,294 **** { std::cout &openopen; (a > b ? a : b) &openopen; std::endl; } The function's body consists of the statement with variables substituted for the two particular values. Each parameter ! is a placeholder that, when used, holds one particular value among the ! set of possible integral values. The function must be named to permit ! its use, and declarations for its two parameters follow. Using the ! function simplifies the code: maxOut(3, 4); maxOut(4, -13); --- 386,395 ---- { std::cout &openopen; (a > b ? a : b) &openopen; std::endl; } The function's body consists of the statement with variables substituted for the two particular values. Each parameter ! variable is a placeholder that, when used, holds one particular value ! among the set of possible integral values. The function must be named ! to permit its use, and declarations for its two parameters follow. ! Using the function simplifies the code: maxOut(3, 4); maxOut(4, -13); *************** maxOut(0, 3); *** 298,306 **** parentheses surrounding specific values for its parameters, but the function's return type is omitted. ! A template class definition generalizes repeated class definitions. If two class definitions differ only in a few types, ! template parameters can be substituted. Each parameter is a placeholder that, when used, holds one particular value, i.e., type, among the set of possible values. The class definition is named to permit its use, and declarations for its parameters --- 399,417 ---- parentheses surrounding specific values for its parameters, but the function's return type is omitted. ! ! ! template ! definition ! ! A template class definition generalizes repeated class definitions. If two class definitions differ only in a few types, ! template parameters ! ! template ! parameter ! ! can be substituted. Each parameter is a placeholder that, when used, holds one particular value, i.e., type, among the set of possible values. The class definition is named to permit its use, and declarations for its parameters *************** maxOut(0, 3); *** 313,323 **** Note the notation for the template class parameters. template <typename T> precedes the class definition. The keyword ! typename indicates the template parameter is a type. T is the template parameter's name. (We could have used any other identifier such as pairElementType or foo.) ! Note that using class is equivalent to using typename so template <class T> is equivalent to template <typename T>. While declaring a template class --- 424,442 ---- Note the notation for the template class parameters. template <typename T> precedes the class definition. The keyword ! typename ! ! typename ! ! indicates the template parameter is a type. T is the template parameter's name. (We could have used any other identifier such as pairElementType or foo.) ! Note that using class ! ! class ! ! is equivalent to using typename so template <class T> is equivalent to template <typename T>. While declaring a template class *************** maxOut(0, 3); *** 327,336 **** for its parameters. As we showed above, pair<int> instantiates the template class pair with ∫ for its type parameter T. ! In template programming, nested type names store compile-time data that can be used within template classes. Since compile-time class definitions are analogous to run-time objects and the latter stores named values, nested type names are values, --- 446,468 ---- for its parameters. As we showed above, pair<int> instantiates + + template + instantiation + the template class pair with ∫ for its type parameter T. ! ! ! type ! nested ! ! ! nested type ! type, nested. ! ! In template programming, nested type names store compile-time data that can be used within template classes. Since compile-time class definitions are analogous to run-time objects and the latter stores named values, nested type names are values, *************** maxOut(0, 3); *** 338,349 **** template class &array; has an nested type name for the type of its domain: ! typedef typename Engine_t::Domain_t Domain_t; ! This typedef, i.e., type definition, defines the type Domain_t as equivalent to Engine_t::Domain_t. The ! :: operator selects the ! Domain_t nested type from inside the Engine_t type. This illustrates how to access &array;'s Domain_t when not within &array;'s scope: Array<Dim, T, EngineTag>::Domain_t. The --- 470,493 ---- template class &array; has an nested type name for the type of its domain: ! typedef typename Engine_t::Domain_t Domain_t; ! This typedef, ! ! typedef ! type, definition. ! ! ! type ! definition ! ! i.e., type definition, defines the type Domain_t as equivalent to Engine_t::Domain_t. The ! :: operator ! ! :: operator ! ! selects the Domain_t nested type from inside the Engine_t type. This illustrates how to access &array;'s Domain_t when not within &array;'s scope: Array<Dim, T, EngineTag>::Domain_t. The *************** maxOut(0, 3); *** 363,371 **** &poomatoolkit;. In this section, we present template programming techniques used to implement &pooma;. We extend the correspondence between compile-time template programming ! constructs and run-time constructs. Reading this section is not ! necessary unless you wish to understand how &pooma; is ! implemented. In the previous section, we used a correspondence between run-time and compile-time programming constructs to introduce --- 507,515 ---- &poomatoolkit;. In this section, we present template programming techniques used to implement &pooma;. We extend the correspondence between compile-time template programming ! constructs and run-time constructs started in the previous ! section. Reading this section is not necessary unless you wish to ! understand how &pooma; is implemented. In the previous section, we used a correspondence between run-time and compile-time programming constructs to introduce *************** maxOut(0, 3); *** 390,396 **** compile time ! values integers, strings, objects, functions, … --- 534,540 ---- compile time ! values integers, strings, objects, functions, … *************** maxOut(0, 3); *** 414,430 **** values stored in a collection An object stores values. A traits ! class contains values describing a type. extracting values from collections An object's named values are extracted using the ! . operator A class's nested types and classes are extracted using ! the :: operator. ! control flow to choose among operations if, while, goto, … template class specializations with pattern matching --- 558,595 ---- values stored in a collection An object stores values. A traits ! class ! ! traits class ! ! ! class ! traits ! traits class ! ! contains values describing a type. extracting values from collections An object's named values are extracted using the ! . operator. ! ! . operator ! ! A class's nested types and classes are extracted using ! the :: operator. ! ! :: operator ! ! ! control flow ! ! control flow ! ! to choose among operations if, while, goto, … template class specializations with pattern matching *************** maxOut(0, 3); *** 432,444 **** ! The only compile-time value described in the previous ! section was types, but any compile-time constant can also be used. Integral literals, const variables, and other constructs can be used, but the main use is enumerations. An enumeration ! enumeration is a distinct integral type with named constants. For example, the &array; declaration declares two separate enumerations: --- 597,619 ---- ! ! ! enumeration ! ! ! compile time ! value ! ! The only compile-time values described in the previous ! section were types, but any compile-time constant can also be used. Integral literals, const variables, and other constructs can be used, but the main use is enumerations. An enumeration ! is a distinct integral type with named constants. For example, the &array; declaration declares two separate enumerations: *************** enum { dimensionPlusRank = dimensions + *** 480,502 **** The use of non-integral constant values such as floating-point ! numbers at compile time is restricted. ! ! Other compile-time values include pointers and references to ! objects and functions and executable code. For example, a pointer ! to a function sometimes is passed to a template function to ! perform a specific task. Even though executable code cannot be ! directly represented in a program, it is a compile-time value ! which the compiler uses. A simple example is a class that is ! created by template instantiation, e.g., ! pair<int>. Conceptually, the ∫ template argument is substituted throughout the pair template class to produce a class definition. Although neither the programmer nor the user sees this class definition, it is represented inside the compiler, which can use and manipulate the code. ! ! Through template programming, the compiler's optimizer can transform complicated code into much simpler code. In , we describe the complicated template code used to implement efficiently --- 655,723 ---- The use of non-integral constant values such as floating-point ! numbers at compile time is restricted. ! ! ! ! ! Other compile-time values include pointers ! ! pointer ! ! to objects and ! functions, references ! ! reference ! ! to objects and functions, and executable ! code. For example, a pointer to a function ! ! pointer ! function ! ! ! function pointer ! pointer, function. ! ! sometimes is passed to ! a template function to perform a specific task. Even though ! executable code ! ! executable code ! ! cannot be directly represented in a program, it is ! a compile-time value which the compiler uses. A simple example is ! a class that is created by template instantiation, ! ! template ! instantiation ! ! e.g., pair<int>. Conceptually, the ∫ template argument is substituted throughout the pair template class to produce a class definition. Although neither the programmer nor the user sees this class definition, it is represented inside the compiler, which can use and manipulate the code. ! ! ! ! ! Through template programming, the compiler's optimizer ! ! optimizer ! compiler, optimizer. ! ! ! optimization ! compiler, optimizer. ! ! ! compiler ! optimizer ! ! can transform complicated code into much simpler code. In , we describe the complicated template code used to implement efficiently *************** struct usuallySimpleClass<false> { *** 537,544 **** compilers that translate &cc; code into &c; code may permit inspecting the resulting code. For example, using the command-line option with the ! KAI &cc; compiler creates a file ! containing the result of intermediate code. Unfortunately, reading and understanding the code is frequently difficult. Perhaps future &cc; compilers will support easy inspection of optimized code. --- 758,775 ---- compilers that translate &cc; code into &c; code may permit inspecting the resulting code. For example, using the command-line option with the ! KAI &cc; compiler ! ! ! compiler ! KAI ! ! ! KAI &cc; compiler ! compiler, KAI. ! ! creates a file ! containing the intermediate code. Unfortunately, reading and understanding the code is frequently difficult. Perhaps future &cc; compilers will support easy inspection of optimized code. *************** struct usuallySimpleClass<false> { *** 550,556 **** > and ==. At run time, the category of strings can be compared using == and characters can be extracted using ! subscripts and the [] operator. Compile-time operations are more limited. Types may be declared and used. The sizeof operator yields the number of bytes to represent an object of the specified type. Enumerations, --- 781,787 ---- > and ==. At run time, the category of strings can be compared using == and characters can be extracted using ! subscripts with the [] operator. Compile-time operations are more limited. Types may be declared and used. The sizeof operator yields the number of bytes to represent an object of the specified type. Enumerations, *************** struct usuallySimpleClass<false> { *** 562,582 **** used as template arguments. At compile time, pointers and references to objects and functions can be used as template arguments, while the category of executable code supports no ! operations. (The compiler's optimizer may simplify it, ! though.) ! ! At run time, an object can store multiple values, each having its own name. For example, a pair<int> object p stores two ∫s named left_ and right_. The . ! operator extracts a named member from an object: p.left_. At compile time, a class can store multiple values, each having its own name. These are sometimes called traits classes. For example, implementing ! data-parallel operations requiring storing the a tree of types. The ExpressionTraits<BinaryNode<Op, Left, Right&closeclose; traits class stores the types of a binary node representing the operation of Op on left --- 793,829 ---- used as template arguments. At compile time, pointers and references to objects and functions can be used as template arguments, while the category of executable code supports no ! operations. (The compiler's optimizer ! ! compiler ! optimizer ! ! may simplify it, though.) ! ! ! ! traits class ! ! At run time, an object ! ! object ! ! can store multiple values, each having its own name. For example, a pair<int> object p stores two ∫s named left_ and right_. The . ! operator ! ! . operator ! ! extracts a named member from an object: p.left_. At compile time, a class can store multiple values, each having its own name. These are sometimes called traits classes. For example, implementing ! data-parallel operations requiring storing a tree of types. The ExpressionTraits<BinaryNode<Op, Left, Right&closeclose; traits class stores the types of a binary node representing the operation of Op on left *************** struct ExpressionTraits<BinaryNode< *** 590,629 **** typedef typename CombineExpressionTraits<Left_t, Right_t>::Type_t Type_t; }; consists of a class definition and internal type ! ! definitions. This traits class contains three values, all types, ! named Left_t, Right_t, and Type_t, representing the type of the left child, the ! right child, and the entire node, respectively. No enumerations ! or constant values occur. See for more details ! regarding the implementation of data-parallel operators. Many ! traits classes, such as this one, use internal type definitions to ! store values. ! ! The example also illustrates using the ! :: operator to extract a member of a traits ! class. The type ExpressionTraits<Left> ! contains an internal type definition of Type_t. ! Using the :: operator extracts it: ExpressionTraits<Left>::Type_t. Enumerations and other values can also be extracted. For example, Array<2, int, Brick>::dimensions yields the dimension of the array's domain. ! Control flow determines which code is used. At run time, control-flow statements such as if, while, and goto determine which statements to execute. Template programming uses two mechanisms: template class specializations and pattern matching. These are similar to ! control flow for functional programming languages. A template class specialization is a class definition specific to one or more template arguments. For example, the ! implementation for data-parallel operations uses the templated ! CreateLeaf. The default definition works for any ! template argument T: template<class T> struct CreateLeaf --- 837,891 ---- typedef typename CombineExpressionTraits<Left_t, Right_t>::Type_t Type_t; }; consists of a class definition and internal type ! definitions. This traits class contains three values, all types ! and named Left_t, Right_t, and Type_t, representing the type of the left child, the ! right child, and the entire node, respectively. Many traits ! classes, such as this one, use internal type definitions to store ! values. No enumerations or constant values occur in this traits ! class, but other such classes include them. See for more details ! regarding the implementation of data-parallel operators. ! ! ! ! :: operator ! ! The example also illustrates using the :: ! operator to extract a member of a traits class. The type ! ExpressionTraits<Left> contains an internal ! type definition of Type_t. Using the ! :: operator extracts it: ExpressionTraits<Left>::Type_t. Enumerations and other values can also be extracted. For example, Array<2, int, Brick>::dimensions yields the dimension of the array's domain. ! ! ! template ! specialization ! ! ! control flow ! ! Control flow determines which code is used. At run time, control-flow statements such as if, while, and goto determine which statements to execute. Template programming uses two mechanisms: template class specializations and pattern matching. These are similar to ! control flow in functional programming languages. A template class specialization is a class definition specific to one or more template arguments. For example, the ! implementation for data-parallel operations ! ! data-parallel operation ! ! uses the templated CreateLeaf. The default ! definition works for any template ! argument T: template<class T> struct CreateLeaf *************** struct CreateLeaf<Expression<T&clo *** 644,650 **** CreateLeaf's template argument is an Expression type. ! Pattern matching of template arguments to template parameters determines which template code is used. The code associated with the match that is most specific is the one that is used. For example, CreateLeaf<int> uses the --- 906,921 ---- CreateLeaf's template argument is an Expression type. ! ! ! template ! pattern matching ! ! ! pattern matching ! template, pattern matching. ! ! Pattern matching of template arguments to template parameters determines which template code is used. The code associated with the match that is most specific is the one that is used. For example, CreateLeaf<int> uses the *************** struct CreateLeaf<Expression<T&clo *** 663,672 **** Control flow using template specializations and pattern matching is similar to switch ! statements. A switch statement has a condition and one or more pairs of case labels and associated code. The code associated with the the case label whose value ! matched the condition is executed. If no case label matches the condition, the default code, if present, is used. In template programming, instantiating a template, e.g., CreateLeaf<Expression<int&closeclose; serves as --- 934,947 ---- Control flow using template specializations and pattern matching is similar to switch ! statements. ! ! switch ! ! A switch statement has a condition and one or more pairs of case labels and associated code. The code associated with the the case label whose value ! matches the condition is executed. If no case label matches the condition, the default code, if present, is used. In template programming, instantiating a template, e.g., CreateLeaf<Expression<int&closeclose; serves as *************** struct CreateLeaf<Expression<T&clo *** 681,689 **** default label since it matches any arguments. If no set of template parameters match (which is impossible for our example) or if more than one set are best matches, the code is ! incorrect. ! ! Functions as well as classes may be templated. All the concepts needed to understand function templates have already been introduced so we illustrate using an example. The templated function f takes one parameter of any type: --- 956,975 ---- default label since it matches any arguments. If no set of template parameters match (which is impossible for our example) or if more than one set are best matches, the code is ! incorrect. ! ! ! ! ! ! template ! function ! ! ! function template ! function, template. ! ! Functions as well as classes may be templated. All the concepts needed to understand function templates have already been introduced so we illustrate using an example. The templated function f takes one parameter of any type: *************** void f(const T& t) { … } *** 697,704 **** functions equivalent to f(const int&), f(const bool&), f(const int*&), …. Using a templated class ! definition with a static member function, we can define an ! equivalent function: template <typename T> class F { --- 983,1008 ---- functions equivalent to f(const int&), f(const bool&), f(const int*&), …. Using a templated class ! definition with a static member function, ! ! function ! static member ! ! ! static member function ! function, static member ! ! we can define an equivalent function: ! ! function ! static member ! equivalence with function template ! ! ! template ! function ! equivalence with static member function ! template <typename T> class F { *************** class F { *** 706,735 **** }; Both the templated class and the templated function take the same template arguments, but the class uses a static ! member function so the notation to invoke it is slightly more ! verbose: F<T>::f(t). The advantage ! of a function template is that it can be overloaded, particularly ! operator functions. For example, the + ! operator is overloaded to add two &array;s, which require template ! parameters to specify: template <int D1,class T1,class E1,int D2,class T2,class E2> // complicated return type omitted operator+(const Array<D1,T1,E1> & l,const Array<D2,T2,E2> & r); Without using function templates, it would not be - possible to write expressions such as a1 + a2. Member functions can also be templated. This permits, for example, overloading of assignment operators defined ! within templated classes. ! ! Function objects are frequently useful in run-time code. They consist of a function plus some additional storage and are usually implemented as structures with data members and a function ! call operator. Analogous classes can be used at compile time. ! Using the transformation introduced in the previous paragraph, we see that any function can be transformed into a class containing a ! static member function. Internal type definitions, enumerations, and static constant values can be added to the class. The static member function can use these values during its computation. The CreateLeaf structure, introduced above, illustrates this. --- 1010,1077 ---- }; Both the templated class and the templated function take the same template arguments, but the class uses a static ! member function. Thus, the notation to invoke it is slightly more ! verbose: F<T>::f(t). ! ! ! ! function ! overloaded ! ! ! function ! operator ! ! The advantage of a function template is that it can be overloaded, ! particularly operator functions. For example, the ! + operator is overloaded to add two &array;s, ! which require template parameters to specify: template <int D1,class T1,class E1,int D2,class T2,class E2> // complicated return type omitted operator+(const Array<D1,T1,E1> & l,const Array<D2,T2,E2> & r); Without using function templates, it would not be possible to write expressions such as a1 + a2. Member functions can also be templated. This permits, for example, overloading of assignment operators defined ! within templated classes. ! ! ! ! ! Function objects ! ! function ! object ! ! are frequently useful in run-time code. They consist of a function plus some additional storage and are usually implemented as structures with data members and a function ! call operator. ! ! function ! call operator ! ! Analogous classes can be used at compile time. ! Using the transformation ! ! function ! static member ! equivalence with function template ! ! ! template ! function ! equivalence with static member function ! ! introduced in the previous paragraph, we see that any function can be transformed into a class containing a ! static member function. ! ! function ! static member ! ! Internal type definitions, enumerations, and static constant values can be added to the class. The static member function can use these values during its computation. The CreateLeaf structure, introduced above, illustrates this. Index: tutorial.xml =================================================================== RCS file: /home/pooma/Repository/r2/docs/manual/tutorial.xml,v retrieving revision 1.6 diff -c -p -r1.6 tutorial.xml *** tutorial.xml 2002/01/22 15:48:49 1.6 --- tutorial.xml 2002/01/24 04:56:40 *************** *** 1,17 **** A Tutorial Introduction - UPDATE: In the following paragraph, fix the cross-reference - to the actual section. - &pooma; provides different containers and processor configurations and supports different implementation styles, as ! described in . In this ! chapter, we present several different implementations of the ! &doof2d; two-dimensional diffusion simulation program: ! a C-style implementation omitting any use of &pooma; computing each array element individually, --- 1,15 ---- + A Tutorial Introduction &pooma; provides different containers and processor configurations and supports different implementation styles, as ! described in . In this ! chapter, we present several different implementations of the &doof2d; ! two-dimensional diffusion simulation program: ! a C-style implementation omitting any use of &pooma; and computing each array element individually, *************** *** 40,52 **** ! These illustrate the &array;, &field;, &engine;, layout, ! mesh, and domain data types. They also illustrate various ! immediate computation styles (element-wise accesses, data-parallel ! expressions, and stencil computation) and various processor ! configurations (one sequential processor and multiple ! processors).
&doof2d; Averagings --- 38,68 ---- ! These illustrate the &array;, &field;, &engine;, layout, mesh, ! and &domain; data types. They also illustrate various immediate ! computation styles (element-wise accesses, data-parallel expressions, ! and stencil computation) and various processor configurations (one ! processor and multiple processors). ! ! The &doof2d; diffusion program starts with a two-dimensional ! grid of values. To model an initial density, all grid values are ! zero except for one nonzero value in the center. Each averaging, ! each grid element, except the outermost ones, updates its value by ! averaging its value and its eight neighbors. To avoid overwriting ! grid values before all their uses occur, we use two arrays, reading ! the first and writing the second and then reversing their roles ! within each iteration. ! ! We illustrate the averagings in . Initially, only the ! center element has nonzero value. To form the first averaging, each ! element's new value equals the average of its and its neighbors' ! previous values. Thus, the initial nonzero value spreads to a ! three-by-three grid. The averaging continues, spreading to a ! five-by-five grid of nonzero values. Values in the outermost grid cells ! are always zero. +
&doof2d; Averagings *************** *** 75,177 ****
! The &doof2d; diffusion program starts with a two-dimensional ! grid of values. To model an initial density, all grid values are ! zero except for one nonzero value in the center. Each averaging, ! each grid element, except the outermost ones, updates its value by ! averaging its value and its eight neighbors. To avoid overwriting ! grid values before all their uses occur, we use two arrays, reading ! the first and writing the second and then reversing their roles ! within each iteration. ! ! Figure ! illustrates the averagings. Initially, only the center element has ! nonzero value. To form the first averaging, each element's new ! value equals the average of its and its neighbors' previous values. ! Thus, the initial nonzero value spreads to a three-by-three grid. ! The averaging continues, spreading to a five-by-five grid of ! nonzero values. Values in outermost grid cells are always ! zero. ! ! Before presenting various implementations of %doof2d;, we explain how to install the &poomatoolkit;. REMOVE: &doof2d; algorithm and code is illustrated in Section 4.1 of pooma-publications/pooma.ps. It includes a figure illustrating parallel communication of data.
Installing &pooma; ADD: How does one install &pooma; using Windows or Mac? UPDATE: Make a more recent &pooma; source code file ! available on &poomaDownloadPage;. For example, LINUXgcc.conf is not available. In this section, we describe how to obtain, build, and install the &poomatoolkit;. We focus on installing under the ! Unix operating system. Instructions for installing on computers running Microsoft Windows or MacOS, as well as more extensive instructions for Unix, appear in . Obtain the &pooma; source code &poomaSourceFile; ! from the &pooma; download page (&poomaDownloadPage;) available off ! the &pooma; home page (&poomaHomePage;). The tgz indicates this is a compressed tar archive file. To extract the ! source files, use tar xzvf &poomaSourceFile;. Move into the source code directory &poomaSource; directory; e.g., ! cd &poomaSource;. ! Configuring the source code prepares the necessary paths for ! compilation. First, determine a configuration file in ! corresponding to your operating system and compiler in the ! config/arch/ directory. ! For example, LINUXgcc.conf supports compiling ! under a &linux; operating system with &gcc; and SGI64KCC.conf supports compiling ! under a 64-bit SGI Unix operating ! system with &kcc;. Then, configure the source code: ! ./configure &dashdash;arch LINUXgcc &dashdash;opt &dashdash;suite LINUXgcc-opt. The architecture argument to the ! &dashdash;arch option is the name of the corresponding ! configuration file, omitting its .conf suffix. The &dashdash;opt indicates the &poomatoolkit; will ! contain optimized source code, which makes the code run more ! quickly but may impede debugging. Alternatively, the ! &dashdash;debug option supports debugging. The ! suite name can be any arbitrary string. We chose ! LINUXgcc-opt to remind us of the architecture ! and optimization choice. configure creates subdirectories ! named by the suite name LINUXgcc-opt for use when ! compiling the source files. Comments at the beginning of ! lib/suiteName/PoomaConfiguration.h record the configuration arguments. ! To compile the source code, set the ! POOMASUITE environment variable to the suite name ! and then type make. To set the environment ! variable for the bash shell use ! export POOMASUITE=suiteName, ! substituting the suite name's ! suiteName. For the ! csh shell, use setenv POOMASUITE LINUXgcc-opt. Issuing the make command compiles the &pooma; source code files to create the &pooma; library. The &pooma; makefiles assume ! the GNU &make; so substitute the proper ! command if necessary. The &pooma; library can be found in, e.g., ! lib/LINUXgcc-opt/libpooma-gcc.a.
--- 91,183 ----
! Before presenting the various implementations of &doof2d;, we explain how to install the &poomatoolkit;. + REMOVE: &doof2d; algorithm and code is illustrated in Section 4.1 of pooma-publications/pooma.ps. It includes a figure illustrating parallel communication of data.
+ ]]> +
Installing &pooma; + ADD: How does one install &pooma; using Windows or Mac? UPDATE: Make a more recent &pooma; source code file ! available on &poomadownloadpage;. For example, LINUXgcc.conf is not available. + ]]> In this section, we describe how to obtain, build, and install the &poomatoolkit;. We focus on installing under the ! Unix operating system. ! . ! ]]> ! Obtain the &pooma; source code &poomasourcefile; ! from the &pooma; download page (&poomadownloadpage;) available off ! the &pooma; home page (&poomahomepage;). The tgz indicates this is a compressed tar archive file. To extract the ! source files, use tar xzvf &poomasourcefile;. Move into the source code directory &poomasource; directory; e.g., ! cd &poomasource;. ! Configuring the source code determines file names needed for ! compilation. First, determine a configuration file in the config/arch/ directory corresponding to ! your operating system and compiler. For example, LINUXgcc.conf supports compiling ! under a &linux; operating system with &gcc;, while SGI64KCC.conf supports compiling ! under a 64-bit SGI Unix operating system ! ! with &kcc;. Next, configure the source code: ./configure ! &dashdash;arch LINUXgcc &dashdash;opt &dashdash;suite LINUXgcc-opt. The architecture argument to the ! &dashdash;arch option is the name of the ! corresponding configuration file, omitting its .conf suffix. The &dashdash;opt indicates the &poomatoolkit; will ! contain optimized source code, which makes the code run more quickly ! but may impede debugging. Alternatively, use the ! &dashdash;debug option which supports debugging. ! The suite name can be any arbitrary string. We chose ! LINUXgcc-opt to remind us of the architecture and ! optimization choice. configure creates subdirectories ! named LINUXgcc-opt for use when compiling the source ! files. Comments at the beginning of lib/suiteName/PoomaConfiguration.h record the configuration arguments. ! To compile the source code, set the POOMASUITE ! environment variable to the suite name and then type ! make. To set the environment variable for the ! ! bash shell use export POOMASUITE=suiteName, ! substituting the suite name's suiteName. ! ! For the csh shell, use setenv POOMASUITE LINUXgcc-opt. Issuing the make command compiles the &pooma; source code files to create the &pooma; library. The &pooma; makefiles assume ! the GNU &make; is available so substitute the ! proper command to run GNU &make; if ! necessary. The &pooma; library can be found in, e.g., lib/LINUXgcc-opt/libpooma-gcc.a.
*************** *** 181,209 **** Before implementing &doof2d; using the &poomatoolkit;, we present a hand-coded implementation of &doof2d;. See . After querying the ! user for the number of averagings, the arrays' memory is ! allocated. Since the arrays' size is not known at compile time, ! the arrays are accesses via pointers to allocated dynamic memory. ! This memory is deallocated at the program's end to avoid memory ! leaks. The arrays are initialized with initial conditions. For ! the b array, all values except the central ones ! have nonzero values. Only the outermost values of the a array need be initialized to zero, but we ! instead initialize them all using the loop used by ! b. ! The simulation's kernel consists of triply nested loops. ! The outermost loop controls the number of iterations. The inner nested loops iterate through the arrays' elements, excepting the ! outermost elements; note the loop indices range from 1 to n-2 ! while the array indices range from 0 to n-1. Each ! a value is assigned the average of its ! corresponding value in b and the latter's ! neighbors. Values in the two-dimensional grids are accessed using ! two sets of brackets, e.g., a[i][j]. After ! assigning values to a, a second averaging reads ! values in a, writing values in ! b. After the kernel finishes, the final central value is printed. If the desired number of averagings is even, the value --- 187,214 ---- Before implementing &doof2d; using the &poomatoolkit;, we present a hand-coded implementation of &doof2d;. See . After querying the ! user for the number of averagings, the arrays' memory is allocated. ! Since the arrays' size is not known at compile time, the arrays are ! accessed via pointers to allocated dynamic memory. This memory is ! deallocated at the program's end to avoid memory leaks. The arrays ! are initialized with initial conditions. For the ! b array, all values except the central ones have ! nonzero values. Only the outermost values of the a array need be initialized to zero, but we ! instead initialize them all using the same loop ! initializing b. ! The simulation's kernel consists of triply nested loops. The ! outermost loop controls the number of iterations. The two inner nested loops iterate through the arrays' elements, excepting the ! outermost elements; note the loop indices range from 1 to n-2 while ! the array indices range from 0 to n-1. Each a ! value is assigned the average of its corresponding value in ! b and the latter's neighbors. Values in the ! two-dimensional grids are accessed using two sets of brackets, e.g., ! a[i][j]. After assigning values to ! a, a second averaging reads values in ! a, writing values in b. After the kernel finishes, the final central value is printed. If the desired number of averagings is even, the value *************** *** 241,248 **** a array. ! These constants indicate the number of iterations, and ! the average weighting. Each a value, except an outermost one, --- 246,252 ---- a array. ! This constants indicates the average's weighting. Each a value, except an outermost one, *************** *** 268,289 **** To compile the executable, change directories to the &pooma; &poomaExampleDirectory;/Doof2d directory. Ensure the POOMASUITE environment variable specifies the desired suite name suiteName, as we did when compiling ! &pooma; in the previous section . Issuing the ! make Doof2d-C-element command creates the executable suiteName/Doof2d-C-element. ! When running the executable, specify the desired a ! nonnegative number of averagings and the nonnegative number of ! grid cells along any dimension. The resulting grid has the same ! number of cells along each dimension. After the executable ! finishes, the resulting value of the central element is ! printed. --- 272,291 ---- To compile the executable, change directories to the &pooma; &poomaexampledirectory;/Doof2d directory. Ensure the POOMASUITE environment variable specifies the desired suite name suiteName, as we did when compiling ! &pooma; in . Issuing ! the make Doof2d-C-element command creates the executable suiteName/Doof2d-C-element. ! When running the executable, specify the desired nonnegative ! number of averagings and the nonnegative number of grid cells along ! any dimension. The resulting grid has the same number of cells ! along each dimension. After the executable finishes, the resulting ! value of the central element is printed. *************** *** 314,323 **** Before creating an &array;, its domain must be specified. ! The N interval represents the ! one-dimensional integral set {0, 1, 2, …, n-1}. An ! Interval<2> object represents the entire ! two-dimensional index domain. An &array;'s template parameters indicate its dimension, --- 316,325 ---- Before creating an &array;, its domain must be specified. ! The N &interval; represents the ! one-dimensional integral set {0, 1, 2, …, n-1}. The ! Interval<2> vertDomain ! object represents the entire two-dimensional index domain. An &array;'s template parameters indicate its dimension, *************** *** 330,349 **** domain. ! The first statement initializes all &array; values to the ! same scalar value. This is possible because each &array; ! knows its domain. The second statement ! illustrates &array; element access. Indices, separated by commas, are surrounded by parentheses rather than surrounded by square brackets ([]). &array; element access uses parentheses, rather than ! square brackets ! Since &array;s are first-class objects, they ! automatically deallocate any memory they require, eliminating memory leaks. --- 332,349 ---- domain. ! The first loop initializes all &array; values to the ! same scalar value. The second statement ! illustrates assigning one &array; value. Indices, separated by commas, are surrounded by parentheses rather than surrounded by square brackets ([]). &array; element access uses parentheses, rather than ! square brackets. ! The &array;s deallocate any memory they require, eliminating memory leaks. *************** *** 364,370 **** The creation of the a and b &array;s requires an object specifying their index domains. Since these are two-dimensional arrays, their ! index domains are also two dimensional. The two-dimensional Interval<2> object is the Cartesian product of two one-dimensional Interval<1> objects, each specifying the integral set {0, 1, 2, …, n-1}. --- 364,370 ---- The creation of the a and b &array;s requires an object specifying their index domains. Since these are two-dimensional arrays, their ! index domains are also two-dimensional. The two-dimensional Interval<2> object is the Cartesian product of two one-dimensional Interval<1> objects, each specifying the integral set {0, 1, 2, …, n-1}. *************** *** 373,387 **** type of its values, and how the values are stored. Both a and b are two-dimension arrays storing &double;s so their dimension ! is 2 and its element type is &double;. An &engine; stores an ! &array;'s values. For example, a &brick; &engine; explicitly ! stores all values. A &compressiblebrick; &engine; also explicitly ! stores values if more than value is present, but, if all values ! are the same, storage for just that value is required. Since an ! engine can store its values any way it desires, it might instead ! compute its values using a function or compute the values stored ! in separate engines. In practice, most explicitly specified ! &engine;s are either &brick; or &compressiblebrick;.
&array;s support both element-wise access and scalar assignment. Element-wise access uses parentheses, not square --- 373,387 ---- type of its values, and how the values are stored. Both a and b are two-dimension arrays storing &double;s so their dimension ! is 2 and their value type is &double;. An &engine; stores an ! &array;'s values. For example, a &brick; &engine; explicitly stores ! all values. A &compressiblebrick; &engine; also explicitly stores ! values if more than one value is present, but, if all values are the ! same, storage for just that value is required. Since an engine can ! store its values any way it desires, it might instead compute its ! values using a function or compute using values stored in separate ! engines. In practice, most explicitly specified &engine;s are ! either &brick; or &compressiblebrick;. &array;s support both element-wise access and scalar assignment. Element-wise access uses parentheses, not square *************** *** 389,405 **** specifies the central element. The scalar assignment b = 0.0 assigns the same 0.0 value to all array elements. This is possible because the array knows the extent of ! its domain. Any program using the &poomatoolkit; must initialize the &toolkit;'s data structures using ! Pooma::initialize(argc,argv). This ! extracts &pooma;-specific command-line options from the ! command-line arguments in argv and initializes ! the inter-processor communication and other data structures. When ! finished, Pooma::finalize() ensures all ! computation has finished and the communication and other data ! structures are destructed. --- 389,406 ---- specifies the central element. The scalar assignment b = 0.0 assigns the same 0.0 value to all array elements. This is possible because the array knows the extent of ! its domain. We illustrate these data-parallel statements in the ! next section.
Any program using the &poomatoolkit; must initialize the &toolkit;'s data structures using ! Pooma::initialize(argc,argv). This extracts ! &pooma;-specific command-line options from the program's ! command-line arguments and initializes the interprocessor ! communication and other data structures. When finished, ! Pooma::finalize() ensures all computation and ! communication has finished and the data structures are ! destructed. *************** *** 408,437 **** &pooma; supports data-parallel &array; accesses. Many algorithms are more easily expressed using data-parallel ! expressions. Also, the &poomatoolkit; might be able to reorder ! the data-parallel computations to be more efficient or distribute ! them among various processors. In this section, we concentrate ! the differences between the data-parallel implementation of ! &doof2d; listed in and the ! element-wise implementation listed in the previous section . Data-Parallel &array; Implementation of &doof2d; &doof2d-array-parallel; &pooma; may reorder computation of statements. Calling Pooma::blockAndEvaluate ensures all computation finishes before accessing a particular array element. - - These variables specify one-dimensional domains {1, 2, - …, n-2}. Their Cartesian product specifies the domain - of the array values that are modified. - Data-parallel expressions replace nested loops and array element accesses. For example, a(I,J) --- 409,437 ---- &pooma; supports data-parallel &array; accesses. Many algorithms are more easily expressed using data-parallel ! expressions. Also, the &poomatoolkit; can sometimes reorder the ! data-parallel computations to be more efficient or distribute them ! among various processors. In this section, we concentrate on the ! differences between the data-parallel implementation of &doof2d; ! listed in and ! the element-wise implementation listed in the previous ! section. Data-Parallel &array; Implementation of &doof2d; &doof2d-array-parallel; + + These variables specify one-dimensional domains {1, 2, + …, n-2}. Their Cartesian product specifies the domain + of the array values that are modified. + &pooma; may reorder computation of statements. Calling Pooma::blockAndEvaluate ensures all computation finishes before accessing a particular array element. Data-parallel expressions replace nested loops and array element accesses. For example, a(I,J) *************** *** 443,462 **** ! Data-parallel expressions apply domain objects to containers ! to indicate a set of parallel expressions. For example, in the ! program listed above, a(I,J) specifies all ! of a array excepting the outermost elements. ! The array's vertDomain domain consists of the ! Cartesian product of {0, 1, 2, …, n-1} and itself, while I and J each specify {1, 2, …, n-2}. Thus, a(I,J) is the subset ! with a domain of the Cartesian product of {1, 2, …, n-2} ! and itself. It is called a view of an ! array. It is itself an array, with a domain and supporting ! element access, but its storage is the same as ! a's. Changing a value in ! a(I,J) also changes the same value in a. Changing a value in the latter also changes the former if the value is not one of a's outermost elements. The expression --- 443,461 ---- ! Data-parallel expressions use containers and domain objects to ! indicate a set of parallel expressions. For example, in the program ! listed above, a(I,J) specifies the subset of ! a array omitting the outermost elements. The ! array's vertDomain domain consists of the ! Cartesian product of {0, 1, 2, …, n-1} with itself, while I and J each specify {1, 2, …, n-2}. Thus, a(I,J) is the subset ! with a domain of the Cartesian product of {1, 2, …, n-2} with ! itself. It is called a view of an array. It ! is itself an &array;, with a domain and supporting element access, but ! its storage is the same as a's. Changing a value ! in a(I,J) also changes the same value in a. Changing a value in the latter also changes the former if the value is not one of a's outermost elements. The expression *************** *** 465,474 **** product of {2, 3, …, n-1}, i.e., the same domain as a(I,J) but shifted up one unit and to the right one unit. Only an &interval;'s value, not its name, is ! important. Thus, all uses of J in this program could be replaced by I without changing the semantics.
Adding &array;s --- 464,483 ---- product of {2, 3, …, n-1}, i.e., the same domain as a(I,J) but shifted up one unit and to the right one unit. Only an &interval;'s value, not its name, is ! important so all uses of J in this program could be replaced by I without changing the semantics. + The statement assigning to a(I,J) + illustrates that &array;s may participate in expressions. Each + addend is a view of an array, which is itself an array. The views' + indices are zero-based so their sum can be formed by adding + identically indexed elements of each array. For example, the lower, + left element of the result equals the sum of the lower, left + elements of the addend arrays. + illustrates adding two arrays. +
Adding &array;s *************** *** 476,528 **** ! Adding two arrays with different domains is supported. ! When adding arrays, values in corresponding positions are ! added even if they have different indices, indicated by the ! small numbers adjacent to the arrays.
- The statement assigning to a(I,J) - illustrates that &array;s may participate in expressions. Each - addend is a view of an array, which is itself an array. Each view - has the same domain size so their sum can be formed by - corresponding elements of each array. For example, the lower, - left element of the result equals the sum of the lower, left - elements of the addend arrays. For the computation, indices are - ignored; only the relative positions within each domain are used. - - illustrates adding two arrays with different domain indices. The - indices are indicated by the small numbers to the left and the - bottom of the arrays. Even though 9 and 3 have different indices - (1,1) and (2,0), they are added to each other because they have - the same relative positions within the addends. - Just before accessing individual &array; values, the code contains calls to Pooma::blockAndEvaluate. &pooma; may reorder computation or distribute them among various processors. Before reading an individual &array; value, calling ! the function ensures all computations affecting its value have ! finished, i.e., it has the correct value. Calling this function ! is necessary only when accessing individual array elements because ! &pooma; cannot determine when to call the function itself. For ! example, before printing an array, &pooma; will call ! blockAndEvaluate itself.
Stencil &array; Implementation ! Many computations are local, computing an &array;'s value by ! using close-by &array; values. Encapsulating this computation in ! a stencil can yield faster code because the compiler can determine ! all accesses come from the same array. Each stencil consists of a ! function object and an indication of the stencil's extent. Stencil &array; Implementation of &doof2d; --- 485,523 ---- ! Adding two arrays is supported. ! When adding arrays, values with the same indices, indicated ! by the small numbers adjacent to the arrays, are added.
Just before accessing individual &array; values, the code contains calls to Pooma::blockAndEvaluate. &pooma; may reorder computation or distribute them among various processors. Before reading an individual &array; value, calling ! this function ensures all computations affecting its value have ! finished, i.e., it has the correct value. Calling this function is ! necessary only when accessing individual array elements. For ! example, before the data-parallel operation of printing an array, ! &pooma; will call blockAndEvaluate ! itself.
Stencil &array; Implementation ! Many scientific computations are localized, computing an ! array's value by using neighboring values. Encapsulating this local ! computation in a stencil ! can yield faster code because the compiler can determine that all ! array accesses use the same array. Each stencil consists of a ! function object and an indication of which neighbors participate in ! the function's computation. Stencil &array; Implementation of &doof2d; *************** *** 546,552 **** These two functions indicate the stencil's size. For each dimension, the stencil extends one cell to the left of (or ! below) its center and also one call to the right (or above) its center. These two functions indicate the stencil's size. For each dimension, the stencil extends one cell to the left of (or ! below) its center and also one cell to the right (or above) its center. ! Before we describe how to create a stencil, we describe how ! to apply a stencil to an array, yielding values. To compute the ! value associated with index position (1,3), the stencil's center ! is placed at (1,3). The stencil's ! upperExtent and ! lowerExtent functions indicate which &array; ! values the stencil's function will use. See . ! Applying the stencil's function call ! operator() yields the computed value. To ! compute multiple &array; values, apply a stencil to the array and ! a domain object: stencil(b, ! interiorDomain). This applies the stencil to each ! position in the domain. The user must ensure that applying the ! stencil does not access nonexistent &array; values.
Applying a Stencil to an &array; --- 559,578 ---- ! Before we describe how to create a stencil, we describe how to ! apply a stencil to an array, yielding computed values. To compute ! the value associated with index position (1,3), the stencil's center ! is placed at (1,3). The stencil's upperExtent ! and lowerExtent functions indicate which ! &array; values the stencil's function will use. See . ! Applying the stencil's function call operator() ! yields the computed value. To compute multiple &array; values, ! apply a stencil to the array and a domain object: ! stencil(b, interiorDomain). This applies the ! stencil to each position in the domain. The user must ensure that ! applying the stencil does not access nonexistent &array; ! values.
Applying a Stencil to an &array; *************** *** 592,598 **** To compute the value associated with index position (1,3) of an array, place the stencil's center, indicated with dashed ! lines, at the position. The computation involves the array values covered by the array and delineated by upperExtent and lowerExtent. --- 586,592 ---- To compute the value associated with index position (1,3) of an array, place the stencil's center, indicated with dashed ! lines, at the position (1,3). The computation involves the array values covered by the array and delineated by upperExtent and lowerExtent. *************** *** 607,625 **** must define a function call operator() with a container parameter and index parameters. The number of index parameters, indicating the stencil's center, must equal the ! container's dimension. For example, DoofNinePt ! defines operator()(const C& c, int i, int ! j). We templated the container type ! C although this is not strictly necessary. The ! two index parameters i and j ! ensure the stencil works with two-dimensional containers. The ! lowerExtent indicates how far to the left ! (or below) the stencil extends beyond its center. Its parameter ! indicates a particular dimension. Index parameters i and j are in dimension 0 and 1. upperExtent serves an analogous purpose. The &poomatoolkit; uses these functions when ! distribution computation among various processors, but it does not use these functions to ensure nonexistent &array; values are not accessed. Caveat stencil user!
--- 601,619 ---- must define a function call operator() with a container parameter and index parameters. The number of index parameters, indicating the stencil's center, must equal the ! container's dimension. For example, DoofNinePt defines ! operator()(const C& c, int i, int j). We ! templated the container type C although this is ! not strictly necessary. The two index parameters ! i and j ensure the stencil ! works with two-dimensional containers. The ! lowerExtent function indicates how far to ! the left (or below) the stencil extends beyond its center. Its ! parameter indicates a particular dimension. Index parameters i and j are in dimension 0 and 1. upperExtent serves an analogous purpose. The &poomatoolkit; uses these functions when ! distributing computation among various processors, but it does not use these functions to ensure nonexistent &array; values are not accessed. Caveat stencil user!
*************** *** 634,640 **** only specify how each container's domain should be split into patches. The &poomatoolkit; automatically distributes the data among the available processors and handles ! any required communication among processors.
Distributed Stencil &array; Implementation of &doof2d; --- 628,637 ---- only specify how each container's domain should be split into patches. The &poomatoolkit; automatically distributes the data among the available processors and handles ! any required communication among processors. illustrates how ! to write a distributed version of the stencil program (). Distributed Stencil &array; Implementation of &doof2d; *************** *** 644,655 **** Multiple copies of a distributed program may simultaneously run, perhaps each having its own input and output. Thus, we use command-line arguments to pass input to ! the program. Using an &inform; object ensures only one program produces output. The UniformGridPartition declaration ! specifies how an array's domain will be partition, of split, into patches. Guard layers are an optimization that can reduce data communication between patches. The UniformGridLayout declaration applies the --- 641,652 ---- Multiple copies of a distributed program may simultaneously run, perhaps each having its own input and output. Thus, we use command-line arguments to pass input to ! the program. Using an &inform; object ensures only one copy produces output. The UniformGridPartition declaration ! specifies how an array's domain will be partitioned, or split, into patches. Guard layers are an optimization that can reduce data communication between patches. The UniformGridLayout declaration applies the *************** *** 657,664 **** patches among various processors. ! The MultiPatch &engine; distributes requests ! for &array; values to the associated patch. Since a patch may associated with a different processor, its remote &engine; has type Remote<Brick>. &pooma; automatically --- 654,661 ---- patches among various processors. ! The &multipatch; &engine; distributes requests ! for &array; values to the associated patches. Since a patch may associated with a different processor, its remote &engine; has type Remote<Brick>. &pooma; automatically *************** *** 675,690 **** Supporting distributed computation requires only minor code changes. These changes specify how each container's domain is ! distributed among the available processors and how input and ! output occurs. The rest of the program, including all the ! computations, remains the same. When running, the &pooma; ! executable interacts with the run-time library to determine which ! processors are available, distributes the containers' domains, and ! automatically handles all necessary interprocessor communication. ! The same executable runs on one or many processors. Thus, the ! programmer can write one program, debugging it on a uniprocessor ! computer and running it on a supercomputer.
The &pooma; Distributed Computation Model --- 672,713 ---- Supporting distributed computation requires only minor code changes. These changes specify how each container's domain is ! distributed among the available processors and how input and output ! occurs. The rest of the program, including all the computations, ! remains the same. When running, the &pooma; executable interacts ! with the run-time library to determine which processors are ! available, distributes the containers' domains, and automatically ! handles all necessary interprocessor communication. The same ! executable runs on one or many processors. Thus, the programmer can ! write one program, debugging it on a uniprocessor computer and run ! it on a supercomputer. ! ! &pooma;'s distributed computing model separates container ! domain concepts from computer configuration concepts. See . ! The statements in the program indicate how each container's domain ! will be partitioned. This process is represented in the upper left ! corner of the figure. A user-specified ! partition specifies how to split the domain ! into pieces. For example, the illustrated partition splits the ! domain into three equal-sized pieces along the x-dimension and two ! equal-sized pieces along the y-dimension. Applying the partition to ! the domain creates patches. The partition ! also specifies external and internal guard layers. A ! guard layer is a domain surrounding a patch. ! A patch's computation only reads but does not write these guarded ! values. An external guard layer conceptually ! surrounds the entire container domain with boundary values whose ! presence permits all domain computations to be performed the same ! way even for computed values along the domain's edge. An ! internal guard layer duplicates values from ! adjacent patches so communication need not occur during a patch's ! computation. The use of guard layers is an optimization; using ! external guard layers eases programming and using internal guard ! layers reduces communication among processors. Their use is not ! required. +
The &pooma; Distributed Computation Model *************** *** 695,732 **** the &pooma; distributed computation model ! The &pooma; distributed computation model combines ! partitioning containers' domains and the computer configuration ! to create a layout.
- &pooma;'s distributed computing model separates container - domain concepts from computer configuration concepts. See . - The program indicates how each container's domain will be - partitioned. This process is represented in the upper left corner - of the figure. A user-specified partition - specifies how to split the domain into pieces. For example, the - illustrated partition splits the domain into three equal-sized - pieces along the x-dimension and two equal-sized pieces along the - y-dimension. Thus, the domain is split into - patches. The partition also specifies - external and internal guard layers. A guard - layer is a domain surrounding a patch. A patch's - computation only reads but does not write these guarded values. - An external guard layer conceptually - surrounds the entire container domain with boundary values whose - presence permits all domain computations to be performed the same - way even for values along the domain's edge. An - internal guard layer duplicates values from - adjacent patches so communication need not occur during a patch's - computation. The use of guard layers is an optimization; using - external guard layers eases programming and using internal guard - layers reduces communication among processors. Their use is not - required. - The computer configuration of shared memory and processors is determined by the run-time system. See the upper right portion of the &pooma; distributed computation model ! The &pooma; distributed computation model creates a layout ! by combining a partitioning of the containers' domains and the ! computer configuration.
The computer configuration of shared memory and processors is determined by the run-time system. See the upper right portion of ) or the &mm; Shared Memory Library (), communicates the available contexts to the executable. &pooma; must be ! configured for the particular run-time system. See . A layout combines patches with contexts so the program can be executed. If &distributedtag; is specified, the patches are distributed among the available contexts. If ! &replicatedtag; is specified, each set of patches is replicated ! among each context. Regardless, the containers' domains are now distributed among the contexts so the program can run. When a patch needs data from another patch, the &poomatoolkit; sends messages to ! the desired patch uses a message-passing library. All such ! communication is automatically performed by the &toolkit; with no need ! for programmer or user input. ! ! FIXME: The two previous paragraphs demonstrate confusion ! between run-time system and message-passing ! library. Incorporating &pooma;'s distributed computation model into a program requires writing very few lines of code. --> or the &mm; Shared Memory Library (), communicates the available contexts to the executable. &pooma; must be ! configured for the particular run-time system in use. See . A layout combines patches with contexts so the program can be executed. If &distributedtag; is specified, the patches are distributed among the available contexts. If ! &replicatedtag; is specified, each set of patches is replicated on ! each context. Regardless, the containers' domains are now distributed among the contexts so the program can run. When a patch needs data from another patch, the &poomatoolkit; sends messages to ! the desired patch uses the message-passing library. All such ! communication is automatically performed by the &toolkit; with no ! need for programmer or user input. Incorporating &pooma;'s distributed computation model into a program requires writing very few lines of code. GuardLayers argument specifies no external guard layer. External guard layers ! simplify computing values along the edges of domains. Since the ! program already uses only the interior domain for computation, we ! do not use this feature. The layout declaration creates a UniformGridLayout layout. As GuardLayers argument specifies no external guard layer. External guard layers ! simplify computing values along the edges of domains. Since our ! program already uses only the interior domain for computation, we do ! not use this feature. The layout declaration creates a UniformGridLayout layout. As layout's three parameters; the contexts are implicitly supplied by the run-time system. ! To create a distributed &array;, it should be created using ! a &layout; object and have a &multipatch; &engine;. Prior ! implementations designed for uniprocessors constructed the ! container using a &domain; object. A distributed implementation ! uses a &layout; object, which conceptually specifies a &domain; ! object and its distribution throughout the computer. A ! &multipatch; &engine; supports computations using multiple patches. ! The UniformTag indicates the patches all have the ! same size. Since patches may reside on different contexts, the ! second template parameter is Remote. Its ! Brick template parameter specifies the &engine; for a ! particular patch on a particular context. Most distributed ! programs use MultiPatch<UniformTag, Remote<Brick> ! > or MultiPatch<UniformTag, ! Remote<CompressibleBrick> > &engine;s. The computations for a distributed implementation are exactly the same as for a sequential implementation. The &poomatoolkit; and ! a message-passing library automatically perform all computation. Input and output for distributed programs is different than ! for sequential programs. Although the same instructions run on ! each context, each context may have its own input and output ! streams. To avoid dealing with multiple input streams, we pass ! the input via command-line arguments, which are replicated for ! each context. Using &inform; streams avoids having multiple ! output streams print. Any context can print to an &inform; stream ! but only text sent to context 0 is sent. At the beginning of ! the program, we create an &inform; object. Throughout the rest of ! the program, we use it instead of std::cout and std::cerr. The command to run the program is dependent on the run-time system. To use &mpi; with the Irix 6.5 operating system, one can use the mpirun command. For example, ! mpirun -np 4 Doof2d-Array-distributed -mpi 2 10 ! 1000 invokes the &mpi; run-time system with four ! processors. The -mpi option tells the ! &pooma; executable Doof2d-Array-distributed to ! use the &mpi; Library. The remaining arguments specify the number ! of processors, the number of averagings, and the array size. The ! first and last values are used for each dimension. For example, ! if three processors are specified, then the x-dimension will have ! three processors and the y-dimension will have three processors, ! totalling nine processors. The command ! Doof2d-Array-distributed -shmem -np 4 2 10 ! 1000 uses the &mm; Shared Memory Library ! (-shmem) and four processors. As for ! &mpi;, the remaining command-line arguments are specified on a ! per-dimension basis for the two-dimensional program. --- 781,837 ---- comprise layout's three parameters; the contexts are implicitly supplied by the run-time system.
! To create a distributed &array;, it should be created using a ! &layout; object and have a &multipatch; &engine; rather than using a ! &domain; object and a &brick; &engine; as we did for the ! uniprocessor implementations. A distributed implementation uses a ! &layout; object, which conceptually specifies a &domain; object and ! its distribution throughout the computer. A &multipatch; &engine; ! supports computations using multiple patches. The ! UniformTag indicates the patches all have the same ! size. Since patches may reside on different contexts, the second ! template parameter is Remote. Its Brick ! template parameter specifies the &engine; for a particular patch on ! a particular context. Most distributed programs use ! MultiPatch<UniformTag, Remote<Brick> > or ! MultiPatch<UniformTag, Remote<CompressibleBrick> ! > &engine;s. The computations for a distributed implementation are exactly the same as for a sequential implementation. The &poomatoolkit; and ! a message-passing library automatically perform all the computation. Input and output for distributed programs is different than ! for sequential programs. Although the same instructions run on each ! context, each context may have its own input and output streams. To ! avoid dealing with multiple input streams, we pass the input via ! command-line arguments, which are replicated for each context. ! Using &inform; streams avoids having multiple output streams print. ! Any context can print to an &inform; stream but only text sent to ! context 0 is displayed. At the beginning of the program, we ! create an &inform; object named output. ! Throughout the rest of the program, we use it instead of ! std::cout and std::cerr. The command to run the program is dependent on the run-time system. To use &mpi; with the Irix 6.5 operating system, one can use the mpirun command. For example, ! mpirun -np 4 Doof2d-Array-distributed -mpi 2 10 ! 1000 invokes the &mpi; run-time system with four ! processors. The option tells the &pooma; ! executable Doof2d-Array-distributed to use the ! &mpi; Library. The remaining arguments specify the number of ! processors, the number of averagings, and the array size. The first ! and last values are the same for each dimension. For example, if three ! processors are specified, then the x-dimension will have three ! processors and the y-dimension will have three processors, totaling ! nine processors. The command Doof2d-Array-distributed ! -shmem -np 4 2 10 1000 uses the &mm; Shared Memory Library ! () and four processors. As for &mpi;, the ! remaining command-line arguments are specified on a per-dimension ! basis for the two-dimensional program. *************** *** 845,851 **** Data-Parallel &field; Implementation &pooma; &array;s support many scientific computations, but ! many scientific computations require values distributed throughout space, and &array;s have no spatial extent. &pooma; &field;s, supporting a superset of &array; functionality, model values distributed throughout space. --- 839,845 ---- Data-Parallel &field; Implementation &pooma; &array;s support many scientific computations, but ! other scientific computations require values distributed throughout space, and &array;s have no spatial extent. &pooma; &field;s, supporting a superset of &array; functionality, model values distributed throughout space. *************** *** 861,869 **** In this section, we implement the &doof2d; two-dimensional diffusion simulation program using &field;s. This simulation does ! not require any &field;-specific features, but we chose to present this program rather than one using &field;-specific features to ! permit comparisons with the &array; versions, especially . --- 855,863 ---- In this section, we implement the &doof2d; two-dimensional diffusion simulation program using &field;s. This simulation does ! not require any &field;-specific features, but we present this program rather than one using &field;-specific features to ! facilitate comparison with the &array; versions, especially . *************** *** 876,886 **** included.
! These statements specify the spacing and number of ! &field; values. First, a layout is explicitly. Then, a mesh, ! which specifies the spacing between cells, is created. The ! &field;'s centering specifies one cell-centered value per ! cell. &field;'s first template parameter specifies the type of --- 870,879 ---- included. ! These statements specify the spacing and number of &field; ! values. First, a layout is specified. Then, a mesh, which ! specifies the spacing between cells, is created. The &field;'s ! centering specifies one cell-centered value per cell. &field;'s first template parameter specifies the type of *************** *** 907,931 **** Since the above program is designed for uniprocessor computation, specifying the domain specifies the layout. A &field;'s mesh specifies its spatial extent. For ! example, one can ask the mesh for the distance between two cells ! or for the normals to a particular cell. Cells in a UniformRectilinearMesh all have the same size and are ! parallelepipeds. To create the mesh, one specifies the layout, ! the location of the spatial point corresponding to the lower, left domain location, and the size of a particular cell. Since this ! program does not use mesh computations, our choices do not much ! matter. We specify the domain's lower, left corner is at spatial ! location (0.0, 0.0) and each cell's width and height is 1. ! Thus, the middle of the cell at domain position (3,4) is (3.5, ! 4.5). A &field; cell can contain one or more values although each ! cell must have the same arrangement. For this simulation, we ! desire one value per cell so we place that position at the cell's center, i.e., a cell centering. The canonicalCentering function returns such a ! centering. We defer discussion of the latter two arguments to ! . A &field; declaration is analogous to an &array; declaration but must also specify a centering and a mesh. In mesh specifies its spatial extent. For ! example, one can ask the mesh for the distance between two cells or ! for the normals to a particular cell. Cells in a UniformRectilinearMesh all have the same size and are ! parallelepipeds. To create the mesh, one specifies the layout, the ! location of the spatial point corresponding to the lower, left domain location, and the size of a particular cell. Since this ! program does not use mesh computations, our choices do not matter. ! We specify the domain's lower, left corner as spatial location (0.0, ! 0.0) and each cell's width and height as 1. Thus, the middle ! of the cell at domain position (3,4) is (3.5, 4.5). A &field; cell can contain one or more values although each ! cell must have the same arrangement of values. For this simulation, ! we desire one value per cell so we place that position at the cell's center, i.e., a cell centering. The canonicalCentering function returns such a ! centering. ]]> . A &field; declaration is analogous to an &array; declaration but must also specify a centering and a mesh. In ! &field; operations are a superset of &array; operations so ! the &doof2d; computations are the same as for . &field; ! accesses require parentheses, not square brackets, and accesses to ! particular values should be preceded by calls to Pooma::blockAndEvaluate. To summarize, &field;s support multiple values per cell and have spatial extent. Thus, their declarations must specify a centering and a mesh. Otherwise, a &field; program is similar to ! one with &array;s. --- 931,947 ---- the &engine; type. Since a &field; has a centering and a mesh in addition to a layout, those arguments are also necessary. ! &field; operations are a superset of &array; operations so the ! &doof2d; computations are the same as in . &field; accesses ! require parentheses, not square brackets, and accesses to individual ! values should be preceded by calls to Pooma::blockAndEvaluate. To summarize, &field;s support multiple values per cell and have spatial extent. Thus, their declarations must specify a centering and a mesh. Otherwise, a &field; program is similar to ! one using &array;s. *************** *** 956,970 **** Distributed &field; Implementation A &pooma; program using &field;s can execute on one or more ! processors. In , we demonstrated how ! to modify a uniprocessor stencil &array; implementation to run on ! multiple processors. In this section, we demonstrate that the ! uniprocessor data-parallel &field; implementation of the previous ! section can be converted. Only the container declarations change; ! the computations do not. Since the changes are exactly analogous ! to those in , ! our exposition here will be shorter. Distributed Data-Parallel &field; Implementation of &doof2d; --- 949,963 ---- Distributed &field; Implementation A &pooma; program using &field;s can execute on one or more ! processors. In , ! we demonstrated how to modify a uniprocessor stencil &array; ! implementation to run on multiple processors. In this section, we ! demonstrate that the uniprocessor data-parallel &field; ! implementation of the previous section can be similarly converted. ! Only the container declarations change; the computations do not. ! Since the changes are exactly analogous to those in , our exposition here ! will be shorter. Distributed Data-Parallel &field; Implementation of &doof2d; *************** *** 974,985 **** Multiple copies of a distributed program may simultaneously run, perhaps each having its own input and output. Thus, we use command-line arguments to pass input to ! the program. Using an &inform; stream ensures only one program produces output. The UniformGridPartition declaration ! specifies how an array's domain will be partition, of split, into patches. Guard layers are an optimization that can reduce data communication between patches. The UniformGridLayout declaration applies the --- 967,978 ---- Multiple copies of a distributed program may simultaneously run, perhaps each having its own input and output. Thus, we use command-line arguments to pass input to ! the program. Using an &inform; stream ensures only one copy produces output. The UniformGridPartition declaration ! specifies how an array's domain will be partitioned, or split, into patches. Guard layers are an optimization that can reduce data communication between patches. The UniformGridLayout declaration applies the *************** *** 991,998 **** uniprocessor and multiprocessor implementations. ! The MultiPatch &engine; distributes requests ! for &array; values to the associated patch. Since a patch may associated with a different processor, its remote engine has type Remote<Brick>. &pooma; automatically --- 984,991 ---- uniprocessor and multiprocessor implementations. ! The &multipatch; &engine; distributes requests ! for &field; values to the associated patch. Since a patch may associated with a different processor, its remote engine has type Remote<Brick>. &pooma; automatically *************** *** 1038,1051 **** The command to invoke a distributed program is ! system-dependent. For example, the mpirun -np 4 ! Doof2d-Field-distributed -mpi 2 10 1000 command might use &mpi; communication. ! Doof2d-Field-distributed -shmem -np 4 2 10 ! 1000 might use the &mm; Shared Memory Library. !
--- 1031,1044 ---- The command to invoke a distributed program is ! system-dependent. For example, the mpirun -np 4 ! Doof2d-Field-distributed -mpi 2 10 1000 command might use &mpi; communication. ! Doof2d-Field-distributed -shmem -np 4 2 10 ! 1000 might use the &mm; Shared Memory Library. !