Date Awarded

2011

Document Type

Dissertation

Degree Name

Doctor of Philosophy (Ph.D.)

Department

Computer Science

Advisor

Nikos Chrisochoides

Abstract

Scientists commonly turn to supercomputers or Clusters of Workstations with hundreds (even thousands) of nodes to generate meshes for large-scale simulations. Parallel mesh generation software is then used to decompose the original mesh generation problem into smaller sub-problems that can be solved (meshed) in parallel. The size of the final mesh is limited by the amount of aggregate memory of the parallel machine. Also, requesting many compute nodes on a shared computing resource may result in a long waiting, far surpassing the time it takes to solve the problem.;These two problems (i.e., insufficient memory when computing on a small number of nodes, and long waiting times when using many nodes from a shared computing resource) can be addressed by using out-of-core algorithms. These are algorithms that keep most of the dataset out-of-core (i.e., outside of memory, on disk) and load only a portion in-core (i.e., into memory) at a time.;We explored two approaches to out-of-core computing. First, we presented a traditional approach, which is to modify the existing in-core algorithms to enable out-of-core computing. While we achieved good performance with this approach the task is complex and labor intensive. An alternative approach, we presented a runtime system designed to support out-of-core applications. It requires little modification of the existing in-core application code and still produces acceptable results. Evaluation of the runtime system showed little performance degradation while simplifying and shortening the development cycle of out-of-core applications. The overhead from using the runtime system for small problem sizes is between 12% and 41% while the overlap of computation, communication and disk I/O is above 50% and as high as 61% for large problems.;The main contribution of our work is the ability to utilize computing resources more effectively. The user has a choice of either solving larger problems, that otherwise would not be possible, or solving problems of the same size but using fewer computing nodes, thus reducing the waiting time on shared clusters and supercomputers. We demonstrated that the latter could potentially lead to substantially shorter wall-clock time.

DOI

https://dx.doi.org/doi:10.21220/s2-hsxy-8829

Rights

© The Author

Share

COinS