next up previous
Next: Bibliography

Distributed Numerical Integration Algorithms and Applications1

Elise H. de Doncker, Rodger R. Zanny, Karlis J. Kaugars
Department of Computer Science,
Western Michigan University,
{elise,rrzanny,kkaugars}@cs.wmich.edu


Date:

We describe the PARINT package for distributed numerical integration, its basic methods and architectural design as well as some applications, ongoing projects and future extensions. The evaluation of multivariate integrals over hyper-rectangular regions is supported using adaptive subdivision or Quasi-Monte Carlo approximations. Both incorporate load balancing strategies for an efficient implementation on (possibly heterogeneous) distributed systems. The code is written in C and layered over MPI.

A web browser based Java applet provides a graphical user interface. The PARVIS package allows for postmortem visualization and performance analysis of the parallel adaptive computation.

 
Keywords: distributed multivariate integration, adaptive task partitioning, Quasi-Monte Carlo, graphical user interface, visualization.

 

1. INTRODUCTION

The focus of the PARINT project is the design, analysis and development of a set of coarse grain parallel and distributed algorithms for multivariate numerical integration [5,4]. This work culminated in the release of a package for parallel integration, freely available for download, installation, and use by end-users. PARINT1.0 is available via the web at [7] (with a beta version of PARINT1.1 also available). In this paper we will describe the current status of PARINT, its applications, and ongoing projects.

Integration problems with a variety of characteristics (singularities, high dimensions, etc.) require a range of fundamentally different numerical techniques to get adequate results. Techniques are being added to the PARINT software to expand the range of problems that can be solved. These include Quasi-Monte Carlo methods for solving problems of moderately high to high dimensions (e.g., as arise in computational finance), the development of a hierarchical process structure for the computation of large collections of integrals (as for finite element problems), methods for integration over a simplex or a set of simplex regions, and extrapolation techniques for singular problems.

Corresponding developments of the package's graphical user interface allow for easy use across research disciplines. We are also adding visualization tools which give the user feedback on the difficulty of the problem and the mesh refinement needed, and may suggest alternative or better methods for posing the problem. We also intend to set up a server allowing users to submit integration problems remotely, and which may later be incorporated as a resource into existing web computing systems. The interface is a web browser based Java applet [20], allowing for platform independence for the end-user.

There currently is no other integration package available for users that attempts to combine this ease of use with a wide range of problem solving abilities. In addition, many of the techniques have not been implemented elsewhere in a distributed, asynchronous environment. The benefits of the research are therefore two-fold: the development of novel distributed integration methods, and the implementation of these algorithms in a form suitable for easy use by end-users across multiple disciplines. It may be noted that the parallelization of adaptive integration methods also relates to other areas which employ task partitioning methods, including mesh refinement for PDEs, progressive radiosity, and branch and bound applications.

 

2. PROBLEM DESCRIPTION AND METHODS

The objective is to compute an approximation Q to the multivariate integral $I = \int_{\cal{D}}f({\bf x})d{\bf x}$ and an error bound Esuch that $\vert I-Q\vert \leq E \leq \epsilon = \max\{\epsilon _a,\epsilon
_r\vert I\vert\},$ for given absolute and relative error tolerances $\epsilon_a$ and $\epsilon_r$, respectively. The integration domain $\cal{D}$ is a hyper-rectangular region.

In PARINT1.0 we have focused on the adaptive integration of real valued functions over hyper-rectangular regions in N dimensions. The algorithms also allow for vector function integrands as long as the component functions behave similarly over the integration region.

Our adaptive integration approach is mapped to p asynchronous processes, one of which functions as a controller and keeps track of the global quantities (such as Q and E); so that it can test the global acceptance criterion, $E \leq \epsilon =
\max\{\epsilon_a,\epsilon_r \vert Q\vert\},$ where the total error estimate Eis compared to $\epsilon$, the computed tolerated error.

Each step of the worker process algorithm begins with the selection, from a local priority queue of regions, of the next region to subdivide. The selection criterion uses the size of the estimated absolute error in the approximation of the integral over the region. The region is subdivided and its subregions evaluated and added to the local priority queue. Dynamic load balancing is used to supply idle processes with new work.

Evaluation of a subregion consists of applying an integration rule; the rule determines an estimated result and estimated error for that subregion. Currently we use multivariate rules from Genz and Malik [15,16], with error estimation techniques from [1], and univariate rules from Quadpack [24]. The adaptive process allows for the concentration of integration rule applications in areas of the original integration region where the integrand is the least well-behaved.

The pattern of subdividing regions can be modelled as a tree referred to as a subregion evaluation tree. The root of the tree corresponds to the initial integration region. Each internal node of the tree has two children, and corresponds to a subdivided region (the two children correspond to the two new subregions). Leaf nodes, having no children, correspond to non-subdivided regions.

The use of adaptive methods is generally recommended for low dimensional problems (say, $\le 15$ dimensions), while Monte Carlo or number-theoretic type methods are used in higher dimensions. Indeed, the dimensional effect in the required number of subdivisions and thus evaluation points grows to an unacceptable level when N is large [3].

We have implemented a non-adaptive method for the asynchronous computation of a sequence of Quasi-Monte Carlo approximations [11]. Based on a sequential implementation by Genz [14], the algorithm focuses on the asynchronous computation of randomized lattice (Korobov) rules. Randomization provides a tool for estimating the error. The algorithm generates a sequence of stochastic families, using an increasing number of points, for the purpose of automatic termination.

In [12] we outline the incorporation of a load balancing scheme to make the method suitable to be run in a heterogeneous environment. Each randomized rule constitutes a single unit of work; a work assignment consists of a set of work units. Static and dynamic load balancing strategies are explored to keep the processors busy performing useful work while gradually calculating higher-level families needed to reach the desired accuracy.

 

3. SOME APPLICATIONS

During our research we came across some very difficult integrals arising in real-world situations. For example, the integral $ \int_0^1
\int_0^1 \frac{2\alpha y}{(x+y-1)^2+\alpha^2}\,dx\,dy$ mimics the behavior of a class of particle physics problems dealt with at CERN [23], where the parameter $\alpha$ ranges between 10-1 and 10-6. Note that the integrand has a peaked (ridged) behavior at y = 1-x. With $\alpha = 10^{-6}$, the height of the ridge is $2\times 10^6 y$. In benchmarks, on the order of 108function evaluations were required to achieve 5 digit accuracy. Yet the actual integrals are more complicated and lengthy to compute, e.g., for dimensions ranging between 8 and 20.

Other examples are found in statistics, including the computation of multivariate normal distributions [10] and financial derivatives.

In the area of computational finance, multivariate integrals in dimensions as large as several hundred (see [2,19,25,22] for examples) are common. These integrals take the form $I = \frac{1}{(\sqrt{2\pi})^N} \int_{-\infty}^\infty
... \int_{-\infty}^\infty e^{- {\bf x}^t {\bf x}/2}f({\bf x}) d{\bf
x}$. I is typically the expected value of some financial quantity of interest, for example, the present value of a mortgage backed security. The serial computation takes an unacceptably long time.

Applications in Bayesian statistics include problems resulting from the use of logistic regression, e.g., for a birth weight model [6]. The logistic regression is used to estimate the parameters for a birth weight model discussed in the book by Venables and Ripley [26]. A sample speedup graph obtained on a network of Ultra SPARC workstations is given in Figure 1 for this problem, as well as for a linear data model problem.


  
Figure 1: Sample PARINTResults
\includegraphics[clip]{graph1b.eps}

As another example, from psychological modeling, models are derived to make testable predictions about choice behavior and judgment of customers in sales [21]. For example, assume two sets of objects, $\underline{S}_x$ and $\underline{S}_y$. A subject is presented with two stimulus objects from $\underline{S}_x$ and one from $\underline{S}_y$ in random order, and required to identify the ``odd'' one (i.e., a triangular model). The correct answer identifies the object from $\underline{S}_y$.

Let $\{ {\bf x}_1, {\bf x}_2\}$ and ${\bf y}$ denote the sensory parameters representing the objects from $\underline{S}_x$ and $\underline{S}_y$, respectively, which follow independent n-variate normal distributions. The probability of a correct response can be given by $P_{tr} = P(\vert\vert u\vert\vert < \vert\vert v\vert\vert~\mbox{and}~\vert\vert u\vert\vert <
\vert\vert w\vert\vert)$, where ${\bf u}$, ${\bf v}$ and ${\bf w}$ represent the distances ${\bf u} = {\bf x}_1 -{\bf x}_2$, ${\bf v} = {\bf x}_1 -{\bf
y}$ and ${\bf w} = {\bf x}_2 -{\bf y}$. Then, Ptr is an integral in ${\bf u}$ over a region $\cal{D}$ inside the hyper-sphere of radius ||v|| centered at ${\bf u} = {\bf0}$, and outside a hyper-sphere of the same radius centered at ${\bf u} = {\bf v}$,

\begin{displaymath}P_{tr} = \int_{{\cal R}^n} \int_{\cal D} \frac{\exp^{-\frac{1...
...pi)^n
\vert V\vert^\frac{1}{2}} d{\bf u}~d{\bf v},~\mbox{where}\end{displaymath}

${\bf z}$ is the combined vector of ${\bf u}$ and ${\bf v}$, and ${\boldsymbol \mu}$ and Vare the mean vector and variance-covariance matrix, respectively, of the joint distribution. After transforming the inner integral to spherical coordinates, the computation of these integrals still takes a large amount of time in view of the N=2n-dimensionality, the outer integral over ${\cal R}^n$, and the complexity of each integrand evaluation.

Our hierarchical methods provide a tool for solving multiple problems simultaneously. The hierarchy is layered over existing PARINT methods, and has applications to finite element methods and more generally to problems where many integrals must be calculated [8]. This technique divides up all of the processors into groups, with each group solving integration problems simultaneously. A global controller feeds new integration problems to the groups as they finish their work. We have observed that as the number of processors solving a single integration problem increases beyond a certain point, the scalability of the algorithm suffers. Dividing up the processors into groups increases the scalability as the total number of processors increases.

 

4. ARCHITECTURAL DESIGN

Our guiding principle in the design of PARINT is a layered approach. Figure 2 shows the architecture of the system.


  
Figure 2: PARINT Architecture
\includegraphics[clip]{block.eps}

At the lowest level is the integration engine, relying upon a communication library. PARINT uses MPI [13], a standard for a message passing interface. There are various implementations of MPI, some for specific parallel machines, others designed to work on any generic network of workstations. PARINT primarily uses MPICH [17], an open source implementation of the MPI standard that can run on heterogeneous networks of workstations (among many other platforms). Using MPI provides for a degree of platform independence.

Above the integration engine is the set of functions that make up the PARINT Application Programming Interface ( API). These functions allow for the initialization and solving of an integration problem.

The PARINT executable interfaces with the API, and can be executed on the Unix/Linux command line; command line parameters allow for the specification of the problem to solve and various other options. The Java-based GUI provides a platform independent point-and-click interface for specifying the problem and related parameters, and offers real-time visualization of some parallel execution performance characteristics [20].

In addition, users can run the GUI client at a site remote from where the problem is actually solved. Also, users can write their own applications (in C), calling functions in the API as needed, and possibly using the results of the integration subsequently in later steps in a large scientific application.

Users can specify the integrand function in most common programming languages; we have so far implemented functions in C and FORTRAN.

 

5. PARVIS: A VISUALIZATION TOOL

PARVIS [9,18] provides an extension of the current ParInt GUI, allowing a data-driven examination of the adaptive algorithm behavior, even for large and complex problems. For various events during the execution of the algorithm, PARINT can be instructed to log information about these events to a log file. The log file is then read by PARVIS, post mortem, to analyze the execution. Figure 3 shows the main screen of PARVIS.


  
Figure 3: PARVIS Main Screen
\includegraphics[clip]{fcn34-pe04-2.ps}

The central portion of the screen shows the subregion evaluation tree for the 4-dimensional integral


\begin{displaymath}\int_{\cal{D}}{\frac{1}{(x_0 + x_1 + x_2 + x_3)^2}}d{\bf x},
\end{displaymath}

where $\cal{D}$ is the four-dimensional unit hypercube. This function has a singularity at the origin of the hypercube.

As this tree can contain tens of thousands of nodes (the tree in Figure 3 contains 10566 nodes), the individual nodes of the tree are often scaled down to some minimum size when drawn. To allow users to still analyze the details of the tree, this view can be zoomed in to any portion of the tree. At higher zoom levels, each displayed region becomes large enough for it to contain textual or graphical information about the region.

The left portion of the screen displays information about the entire log file. In the bottom left is a zoom indicator window, showing the position of the currently zoomed portion within the entire tree.

Regions can be colored according to different criteria. In Figure 3, the coloring shows which worker (out of the 4 workers) evaluated a region. The legend on the right shows the assigned colors.

For the algorithm developers, PARVIS supports the analysis of load balancing techniques, subregion error patterns, rate of algorithm convergence for specific function classes, and parallel work anomaly [27]. For the end users it allows for the visualization of the subdivision trees representing the adaptive mesh refinement, thereby supplying feedback on the difficulty of the problem and often revealing alternative and more suitable methods for posing the problem.

Improvements to PARVIS will allow it to act as a visualization tool in related problem domains, such as adaptive mesh refinement, parallel ray tracing, and even branch and bound optimization methods.

 

6. CONCLUSIONS

There are many classes of integration problems (e.g., those needing high accuracy, with many dimensions, over simplex or hyper-rectangular regions, with singularities, etc.). These different classes of problems may require radically different techniques to solve them, and while we and others are developing these techniques, the goal of the PARINT project has been to incorporate them into a usable software package, especially in a distributed hetergeneous environment. Such an environment is often necessary to provide enough computing resources to solve computationally intensive problems. Furthermore, the current trend in allowing for grid computations emphasizes that the methods should be viable in heterogeneous environments.

Our current research efforts involve the inclusion of integration over simplex regions, improved load balancing techniques, and tighter integration of our hierarchical methods with the release code. We are in the process of acquiring Beowulf cluster. We have identified areas that require significant experimentation to further understand the behavior of our algorithms; the cluster will allow us to perform these experiments. In addition, we anticipate experimenting with multiple, geographically distributed clusters of workstations, in an effort to determine the viability and scalability of adaptive task partitioning on such a network.



 
next up previous
Next: Bibliography

2000-05-19