==================================================================
===                                                            ===
===           GENESIS Distributed Memory Benchmarks            ===
===                                                            ===
===                           PDE2                             ===
===                                                            ===
===          2-Dimensional Multi-grid Poisson Solver           ===
===                                                            ===
===                 Versions: Std F77, PARMACS, Subset HPF,    ===
===                           PVM 3.1                          ===
===                                                            ===
===         Original authors: R. Hempel, A. Schueller,         ===
===                           M. Lemke                         ===
===              Modified by: J. Klose                         ===
===           PARMACS macros: Clemens-August Thole             ===
===               Subset HPF: Bryan Carpenter                  ===
===                      PVM: Ian Glendinning                  ===
===                                                            ===
===                Inquiries: HPC Centre                       ===
===                           Computing Services               ===
===                           University of Southampton        ===
===                           Southampton SO17 4BJ, U.K.       ===
===                                                            ===
===   Fax: +44 703 593939   E-mail: support@par.soton.ac.uk    ===
===                                                            ===
===          Last update: Jun 1994; Release: 3.0               ===
===                                                            ===
==================================================================

1. Description
--------------

This benchmark solves a 2D Poisson equation using a multigrid method.
A mixture of fine and coarse grids are used to accelerate the solution
process. Multigrid methods are important as they are one
of the fastest methods for the solution of systems of equations
originating from the discretization of partial differential equations.
The multigrid kernel is an important solver for the solution phase in
the numerical treatment of partial differential equations.

Many problems in the area of scientific computing are formulated in
terms of partial differential equations. Typical application areas are
Computational Fluid Dynamics, Meteorology, Climate Research, Oil
Reservoir simulation and others. The resulting PDEs are discretized
on some grid structure. The PDE is then represented by a large set of
(non)linear equations, each of which couples values at neighbouring grid
point with each other. For time-dependent problems, this set of
equations has to be determined (integration phase) and solved (solution
phase) at each time step.

The parallelization is performed by grid splitting. A part of the
computational grid is assigned to each processor. After each
computational step, values at the boundary of the subgrids are exchanged
with nearest neighbours.  For simplicity, in this benchmark coarse grids
are only used up to the level where each node contains only one interior
gridpoint. At this level 10 relaxation steps are used to solve the
two-dimensional problem on the coarsest grid.

2) Operating Instructions
-------------------------

The sequential version automatically produces results for a range of
problem sizes. The problem size is determined by the grid size, which
is related to the parameter N. The number of grid points in each
direction is 2**N + 1, giving (2**N + 1)**2 points in 2 dimensions.

The parameter N is varied from 3 to MMAX within the benchmark and
the benchmark performance calculated for each resulting problem size.
The value of MMAX can be changed by editing the PARAMETER statement
in the file pde2.inc. The maximum value of MMAX which is consistent with
the available processor memory should be chosen.

The memory required for array  storage is approximately
{ 3 * (2**MMAX + 1)**2 } * 8 bytes, which gives the following table:

MMAX            Approx Memory required (Mbyte)
8                       1.6
9                       6.4
10                      27.0
11                     101.0

For the largest problem size the relative sizes of the grid in each
dimension is varied whilst keeping the overall problem size constant.
This variation in the shape of the grid for the same problem size
can increase performance by allowing more efficient vectorization.

To achieve a given accuracy in the timing measurements the number of
multigrid cycles timed by the benchmark is specified by the input
parameter NITER. This should be chosen so that the benchmarked time
for NITER cycles on the smallest problem size is at least 100 times the
clock resolution .  (If the clock resolution is unknown this can be
determined using the TICK1 benchmark). For larger problem sizes, the
value of NITER is automatically reduced (subject to a minumum value of 10)
to maintain the overall benchmarked time constant for each problem size.

Compiling and running the sequential benchmark:

1) Change value of MMAX in file pde2.inc, if appropriate, to give maximum
problem size compatible with the available memory. (see above)

2) To compile and link the benchmark type:   make slave

3) To run the benchmark type:     pde2

4) Input NITER, the number of multigrid cycles (suggested value 160)

Output from the benchmark is written to the file "result"

B) Distributed Version

In the distributed version of the program the problem size and the
number of processors are input from the standard input on channel 5.

The problem size is proportional to the total grid size, which is
determined by the input parameter NN.
The number of grid points in each direction is 2**NN + 1,
giving (2**NN + 1)**2 points in 2 dimensions.

The number of processors over which the lattice is distributed
is determined by the input parameter LOGP which is the log to base 2 of
the required number of processors, ie.  number of processors = 2**LOGP.

The specified number of processors is configured as a 2D grid
internally within the program.

The size of the local lattice determines the size of the workspace
required in the node program. The size of this workspace is determined
by a PARAMETER statement in the file node.u of the form:

PARAMETER (NWORKD = 300000)

The size of NWORKD should be changed if necessary to ensure that it is
greater than or equal to 4 * (2**NN + 4)**2/(2**LOGP)
The maximum size of NWORKD, and hence of the local lattice size, is
constrained by the available node memory. The node memory required
is approximately  NWORKD * 8 bytes.

Suggested Problem Sizes :

It is recommended that the benchmark is run with four standard problem
sizes, given by NN  = 8, 10, 12 and 13.

Note that it may not be possible to run the largest problem size on all
machines because of restrictions on the available memory.
The approximate total memory required for array storage is given by the
following table:

NN	Approx value of 4*(4+2**NN)**2	 Approx Memory required (Mbyte)
8		   .3 * 10**6			   2.4
10		  4.3 * 10**6			  35
12		   68 * 10**6			 544
13		  270 * 10**6			2160

To find the minimum node memory required to run each problem size, the total
memory required should be divided by the number of processors on which the
benchmark is run.

The number of processors to be used will obviously depend on the
system available. The most important measurement is likely to be for
the largest power of two that will fit in to the machine. If time
permits the variation of performance with number of processors should be
investigated by reducing the number of processors by successive
factors of two or four.

As for the sequential version the accuracy of timing may be adjusted
through the parameter NITER.

Compiling and running the distributed benchmark:

1) Change value of NWORKD in file node.u, if appropriate, to give maximum
work space compatible with the available memory. (see above)

2) To compile and link the benchmark type:   `make'

3) To run the benchmark type:     pde2

4) Input parameters NN, LOGP, NITER on standard input.

Output from the benchmark is written to the file "pde2.res"

\$Id: ReadMe,v 1.4 1994/06/28 11:35:26 igl Exp igl \$

High Performance Computing Centre

Submitted by Mark Papiani,
last updated on 10 Jan 1995.